We're sorry but our site requires JavaScript

A/B testing: the importance of Central limit theorem

Data analysis requires accurate and even respectful data treatment. When AB testing has completed, and a statistically significant amount of data has collected, it is time to determine a winner. However, before the analysis, it is crucial to clean and prepare AB testing dataset. Also, it is extremely important to study the data nature and characteristics and choose an appropriate statistics assessment method.

The central limit theorem is a fundamental component for working with data and samples. Without an understanding of the central limit theorem, it is impossible to form and evaluate A/B testing samples and data analysis in general. In this article, I will explain the practical benefits of this theorem and its importance in A/B testing.

A central limit theorem is a powerful tool in the analyst’s equipment.

article image

Some of the theorem theses

  • Forming large random samples from any population tend to ​​distribute its mean values according to the normal law and close to the mean value of the general population from which we are sampling, regardless of the distribution form of the general population. Even if the population distribution is exponential, multiple random samples extraction tend toward normal distribution.
  • Most of the mean values ​​of the samples will be close enough to the mean of the population. What exactly should be considered “close enough” is determined by the standard error.
  • It is relatively unlikely that the sample mean value will be farther than two standard errors from the population mean value, and it is extremely unlikely that the sample mean value will be farther than three or more standard errors from the population mean value.
  • The less likely that some outcome was purely random, the more we can be sure that there is an influence of some other factor.

Let’s image we launched an experiment where the target metric is an average check. The null hypothesis is that there is no difference in the average check value between the control and experiment groups. An alternative hypothesis is that the difference exists.

article image

As we know, a small sample size results in an inaccurate assessment of statistics. According to the large numbers law, the larger sample size, the closer sample mean value tends toward the general population mean value. That means to get a more accurate assessment of the population mean, we need a large enough sample.

This can be understood looking at the chart below, which shows that with increasing sample size, the sample mean tends closer toward the general population mean value:

article image

We can use bootstrap to determine confidence intervals of our exponentially distributed average check data. As we can see the mean values of the arithmetic sample mean values is approximately equal to the sample mean value from which the statistics have been extracted. The standard deviation has become lesser, as the observations are now as close as possible to the true population mean value.

article image

In this case, the standard deviation of the means is the standard error over which confidence intervals were previously plotted. Now using confidence intervals we can assess the statistics. This is one of the main practical values ​​of the central limit theorem.

If the goal is to obtain a more accurate estimate of the mean, then it is necessary to minimize the variance. The smaller the spread, the more accurate the mean value. To make the standard deviation of the mean small enough requires a large enough sample size.

The bottom line

The above example assumes that we know something about the form of distribution. In reality, there are many cases when observations are not explained by the law of any distribution. Most likely these cases are associated with outliers. The ethical question is, to adjust the data to the required distribution form to get an adequate assessment or to leave as is. In A/B testing, this decision depends on the hypothesis, sampling, and metrics. In some cases, it makes sense to get rid of outliers, but in another, it is worth taking into account "whales".

Bootstrap, for example, not perfect when dealing with outliers. See what happens if we add a value of 100 and 1000 into the sample, from which we are resampling

article image


In order to perform a robust data analysis it is crucial to know statistics basics, otherwise, the risk of getting misleading results are extremely high. Using assessment methods which are based on a normal distribution assumption on a dataset with a not normal distribution will lead to making a wrong decision.

What's next

In the next articles, we will review which metrics usually have normal and not normal distribution forms and what statistical criterions should be applied in each case to test hypotheses.