In some parts of the digital marketing world - Web Analytics in particular - A/B tests, multivariate tests, and other more advanced forms of scientific testing are common. Increasingly, media agencies are looking for ways to bring this kind of rigor to their media planning and evaluation process. The Cookie Targeting (CT) technology available to all advertisers using Microsoft as their Third Party Ad Server provides a way to conduct simple tests of media effectiveness. While this capability has been a part of the Atlas platform for a long time, recently the Institute has been advising more clients on how to conduct effective tests. I decided that a primer on some of the more common perils I discuss with clients might be helpful to anyone thinking aspiring to test their digital media.
Know What You Are Testing and Why
Start by knowing what media strategy you are testing. Is it the value of sequencing creative vs. random delivery? Or are you testing the effectiveness of re-messaging? Or are you trying to find the optimal mix of video placements to buy for a campaign? Getting very precise on what you want to test and why will save many headaches and make getting stakeholder buy-in and the technical set-up much easier. Many different media planning strategies and tactics can be put to a test, if a clever enough test-design is used. Make sure you sanity-check a possible test design by talking the idea over with a team. Picking the right things to test to answer your question is the most difficult part of designing a good experiment but worth the cups of coffee spent brainstorming.
I also recommend thinking through the resulting data set and thinking through the likely outcomes-can you draw the conclusions you want to? Sometimes it's not clear your test design actually tests what you think it does. For example, advertiser X wants to test out the effectiveness of geo-targeting creative at the city-level. So, they decide that all cookies from San Francisco zip codes will be enrolled in the test-group and receive special creative when they see campaign placements while everyone else will receive the default vanilla creative. There is a problem with this design. It isn't testing geo-targeting because no one in San Francisco saw the default creative. You don't know if the response you got from the San Francisco residents was due to something special in the creative or if people who want to live in San Francisco are different from everyone else. What advertiser X needs to do is take every cookie from a San Francisco zip code and randomly assign them to either the test or control experience. You will get better at catching this kind of problem with experience. If you have co-workers who do testing for web analytics, they are a great resource to consult. Also, this is an area where my colleagues and I at the Advertising Institute often give advice.
Know What Success Is
The most common mistake when first starting to do testing is not having thought out a clear success metric. A success metric could be anything from landing page visits to registrations to dollars spent. Clients using CT usually pick an action tag (or group of actions) and use simple conversion rates as their test metric. Conversions make a good test metric because a user can only fall into one of two groups - converters or non-converters. This binary split helps cut down on the sample size. This is good for reasons I'll discuss shortly.
Some actions make better success metrics than others. For example, an action tag at the beginning of a registration or checkout process is often a better place to measure than the Thank You tag. The confirmation page may be the ultimate goal of the process, but the effect of off-site media buys diminishes rapidly once a user enters a registration or check-out process. Site-design and content are largely responsible for how much fall-off occurs between the beginning and end of such a process. This brings us back to knowing what you are testing. Pick an action tag that minimizes the effects of site-design choices so that the effects of the media are what is being tested.
Time Matters
Short tests are preferable to longer tests but they shouldn't be too short. Like in Goldilocks story, there is a window of time that is just right for tests. A test that is run for only a day or two is vulnerable to producing results that don't reflect the changes in audience that happen between weekdays and
weekends. Also, one-time events such as holidays during a short test period can skew results. Finally, short tests won't give many consumers enough time to travel down the conversion funnel and may distort your results. One full week is an absolute minimum time for a test.
On the other hand, the digital world moves fast and the web technology behind it penalizes the slow. The CT technology is great at assigning individual cookies to either a 'test' or 'control' experience. But, users delete cookies. When deletes get new cookies, they can end up switching from a 'test' to a 'control' group. Now your test group is contaminated and you'll see a shrinking of the effect size. Cookie deletion isn't a big problem for tests that last only a few weeks but becomes a major issue once you reach six to eight weeks out. You can look at historical site or campaign reach numbers to get a sense of whether your test will build up enough sample size within a reasonable amount of time. Aim to collect enough sample size so that your test runs for two weeks to one month.
Take Action
This post has avoided some of the technical details of testing. If you want to learn more about the statistics behind testing or a need a quick refresher check out this paper. I like to use this sample size calculator for simple a/b split tests. Once you're ready to run a media test, connect your Media Console account manager and engage us in helping you execute your vision.
Thanks
Andrew
Check out more posts from the Microsoft Advertising Institute.