A|B Testing Arbitrary Product Versions

2 min readDec 28, 2020

The objective of experimentation is to find the global max in a version landscape, not just the local max between arbitrary points

In the last post, we discussed the common waste, or measurable opportunity cost, associated with traditional A|B testing. Today, we’ll explore in a little more detail another common issue: most A|B tests are comparing two or more arbitrary versions of the product. Let’s examine the problem to understand why this shortcoming is such a handicap for A|B tests and how Tozan has developed a solution.

Let’s take a simple variable, like the background color of a website. Testing new colors to find a better outcome (i.e. maybe clicks, transactions, or dollars spent) sounds straightforward — choose a few new colors and compare the KPIs between versions. While this approach will likely help identify the relative values of each color, at least within a given time period, it inevitably excludes many colors. There are many R-G-B shades available, so testing a few means leaving the vasty majority untested.

How does this contribute to inefficiency? If we test Green vs Red today and then set one of those two colors live universally across the site, it is only a matter of time before we decide to test Blue, Purple, Orange, or some other portfolio of colors. We’ll go through the exercise once again of setting up and executing tests, and we’ll be trapped in the cycle of identifying other untested versions that might be superior. It’s not uncommon to spend months if not years testing in search of a clear answer, and by the time enough options have been examined, the product and business might have evolved sufficiently to render the outcome irrelevant. A good many analyst teams spend their time on this type of circuitous testing.

The inherent problem is that the landscape being explored is large. By choosing a few items off of a large menu, we are resigning ourselves to finding the local maximum between points. The goal of course is to find the global maximum, which means we’d need to measure a larger set of options.

We built Tozan to tackle this challenge. Instead of treating experiments as discrete time blocks with singular and clear conclusions, we conceive of testing as continuous, Always-On optimization processes. Tozan employs AI algorithms to rapidly adjust traffic between versions so that the stronger performing version rises to the surface and the weaker one disappears. But even after the winner is obvious and consuming the majority or all of site traffic, our experiment lives on. Because Tozan is available as an Always-On connection to the product, the experiment framework becomes a permanent fixture that is added to over time. No product is ever complete — there are always superior ideas on the horizon, and Tozan enables fast and easy deployment of these new versions into production.

A|B Testing Arbitrary Product Versions

Written by Tozan