Data analysis requires accurate and even respectful data treatment. When AB testing has completed, and a statistically significant amount of data has collected, it is time to determine a winner. However, before the analysis, it is crucial to clean and prepare AB testing dataset. Also, it is extremely important to study the data nature and characteristics and choose an appropriate statistics assessment method.
The central limit theorem is a fundamental component for working with data and samples. Without an understanding of the central limit theorem, it is impossible to form and evaluate A/B testing samples and data analysis in general. In this article, I will explain the practical benefits of this theorem and its importance in A/B testing.
A central limit theorem is a powerful tool in the analyst’s equipment.
Let’s image we launched an experiment where the target metric is an average check. The null hypothesis is that there is no difference in the average check value between the control and experiment groups. An alternative hypothesis is that the difference exists.
As we know, a small sample size results in an inaccurate assessment of statistics. According to the large numbers law, the larger sample size, the closer sample mean value tends toward the general population mean value. That means to get a more accurate assessment of the population mean, we need a large enough sample.
This can be understood looking at the chart below, which shows that with increasing sample size, the sample mean tends closer toward the general population mean value:
We can use bootstrap to determine confidence intervals of our exponentially distributed average check data. As we can see the mean values of the arithmetic sample mean values is approximately equal to the sample mean value from which the statistics have been extracted. The standard deviation has become lesser, as the observations are now as close as possible to the true population mean value.
In this case, the standard deviation of the means is the standard error over which confidence intervals were previously plotted. Now using confidence intervals we can assess the statistics. This is one of the main practical values of the central limit theorem.
If the goal is to obtain a more accurate estimate of the mean, then it is necessary to minimize the variance. The smaller the spread, the more accurate the mean value. To make the standard deviation of the mean small enough requires a large enough sample size.
The above example assumes that we know something about the form of distribution. In reality, there are many cases when observations are not explained by the law of any distribution. Most likely these cases are associated with outliers. The ethical question is, to adjust the data to the required distribution form to get an adequate assessment or to leave as is. In A/B testing, this decision depends on the hypothesis, sampling, and metrics. In some cases, it makes sense to get rid of outliers, but in another, it is worth taking into account "whales".
Bootstrap, for example, not perfect when dealing with outliers. See what happens if we add a value of 100 and 1000 into the sample, from which we are resampling
In order to perform a robust data analysis it is crucial to know statistics basics, otherwise, the risk of getting misleading results are extremely high. Using assessment methods which are based on a normal distribution assumption on a dataset with a not normal distribution will lead to making a wrong decision.
In the next articles, we will review which metrics usually have normal and not normal distribution forms and what statistical criterions should be applied in each case to test hypotheses.