**This is the fifth post in our ‘eat your greens’ series – a back to basics look at some of the core concepts of statistics and analytics that, in our experience, are frequently misunderstood or misapplied. In this post we’ll look in more depth at the concept of the standard error and confidence intervals.**

In earlier postings in this series, we’ve explored the relationship between the standard deviation and the normal distribution. We also looked at how an individual statistical value drawn from a sample may not equal the ‘true’ value for that variable in the population. This in turn allowed us to look at the concept of the Central Limit theorem, which demonstrates how repeated *estimates* of a population parameter (such as a mean value) are often distributed normally even when variable in question is not normally distributed in the population.

All of this has been leading up to addressing the question of how we can make *inferences* about values in a population (or parameters) using values drawn from samples (or statistics).

But to do this we need to introduce another statistical calculation: the **standard error**. As we’ve already seen, a standard deviation is a measure of *within* sample variation, whereas a standard error is an estimate of variation *between* samples. In other words, it estimates how much we can expect our mean to vary from one sample to another.

The standard error of a mean value is calculated by dividing the standard deviation by the square root of the number of observations in the sample itself. It is, in effect, a simple ratio calculation of the average variation within a sample to the sample size. In this context, the statistic is more likely to give larger standard error values when either the standard deviation is already a large number, or when the sample size is small (or both). Conversely, small standard error values occur when the standard deviation is relatively small, or the sample size is relatively large (or both). Therefore, larger standard error values indicate that the sample mean is *a less reliable* estimate of the population mean (the parameter) than smaller values.

Because the standard error is an indication of how reliable our mean is, it helps us to make an inference about what range of values the true population mean is likely to lie within. In doing so, this simple statistical estimate becomes a gateway to the world of *inferential* *statistics*. Technically speaking, the standard error is the standard deviation of a particular statistic’s sampling distribution. Because the sampling distribution for a mean value is usually normally distributed (even when the sample distribution is not), we can exploit the special properties of the normal distribution to calculate *confidence intervals*.

Just as our earlier blog showed that in normally distributed data, around 95% of the observations will fall within *two standard deviations of the mean* (or 1.96 standard deviations to be exact), we can assume that 95% of *comparable samples* would yield mean values within *two* *standard errors* (again 1.96 to be exact) of our sample mean.

Let’s imagine we wished to estimate the average number of minutes the entire population of workers within the City of London took for lunch. It is of course impractical for us to calculate this exactly as we can’t record the data for everyone. But assume we had a sample of 196 people and found that the mean number of minutes was 45 with a standard deviation of 14. The standard error of the mean in this case would be equal to exactly 1. The 95% confidence intervals could therefore be calculated by subtracting 1.96 from 45 to give 43.04 (the lower bound confidence interval) and adding 1.96 to 45 to give 46.96 (the upper confidence interval).

Of course, we cannot know the *actual* mean number of minutes the workers in the population spend at lunch, and it is a common misconception that 95% confidence intervals indicate there is a 95% probability that the true value lies within the calculated range. This is an incorrect interpretation of confidence intervals, not least because the true population mean is a fixed but unknown value that is *either inside or outside* the intervals range with 100% certainty. In fact, the more correct interpretation is that if we were to repeatedly re-sample the population in the same way, we can eventually be 100% confident that the procedure will provide us with intervals that in 95% of cases contain the population value.

This same approach is used by government researchers when trying to calculate the COVID-19 ‘R’ value. The R value is an estimate of the average number of people a person infected with COVID-19 will pass the disease on to. At the time of writing, the R value for the whole of the UK was calculated to be between 0.9 and 1.1. If the government were to use a separate, comparable sample to re-estimate the R value, they might find the intervals to range from 0.92 to 1.3. Or from 0.88 to 1.08. Again, the true R value cannot be known, but in the long run, 95% of such intervals would contain the actual R value for the whole population. Bear in mind, that this power to *infer* the magnitude of an unknown value is driven by a simple statistic: the standard error.