This post is the first in a short series that we’re calling ‘eat your greens’ – a back to basics look at some of the core concepts of statistics and analytics that, in our experience, are frequently misunderstood or misapplied. In this first post we’ll look in more depth at the concept of statistical significance.
The phrase ‘statistically significant’ occurs so frequently in data analysis reports, training courses and books about statistics, that most professionals rarely give it a second thought. But as a technical expression the analytical community has long regarded it as unhelpful at best and downright misleading at worst. This is despite the fact that many analysts (myself included) are guilty of using this idiom on multiple occasions.
If you need a reminder as to what ‘statistically significant’ refers to, it’s how professional data analysts characterise the results of a statistical procedure that indicates the ‘null hypothesis’ is unlikely to be true. To confused students learning statistics, it’s how you describe the results of a statistical test when the probability value is (typically) below 0.05. Among the wider data-literate public, it refers to a finding that was unlikely to be due to a chance occurrence.
To everyone else, it seems to indicate a ‘Eureka!’ moment in data analysis. You can hardly blame anyone for assuming that it implies some kind of revelation. After all, in common usage, the term ‘significant’ indicates something that is important or at least notable. Unfortunately, in statistics ‘significant’ often means anything but that.
For instance, it’s possible to imagine a situation where a researcher ‘discovers’ that right-handed people are more risk averse than left-handed people. Let’s assume this finding is based on an online survey of 20,000 adults visiting a gaming website. The analysis compared the median amount bet by right-handed and left-handed gamblers over a two-week period and found that for right-handed people, this was £15.50, whereas for left-handed people the amount was £15.65.
The researchers chose a statistical procedure to test the null hypothesis that the median amount bet by these two groups was actually the same in the wider population of all potential adult gamblers. The procedure returned a probability value of 0.04 leading the researchers to claim that right-handed gamblers were prone to making smaller bets than their left-handed counterparts and that the results were ‘statistically significant’.
What’s obvious however, is that just because the difference in median bets might technically be deemed ‘statistically significant’ it’s not ‘practically significant’. The gaming company is hardly likely to redesign their entire website to be more attractive to left-handed people based on such a meagre difference between the two groups, even if the results are accurate (which seems questionable).
This is why many analysts pay attention to statistical concepts such as ‘effect size’ which focus on quantifying the relationship between the factors in a study rather than simply trying to detect if any differences are the result of random chance. The formal use of probability values associated with ‘tests of significance’ is to measure how compatible the results are with a null hypothesis.
Technically speaking, these values show the probability of observing such an extreme result if the null hypothesis was true. With that in mind, it’s hard to imagine a researcher proudly announcing that their research has shown that observing a difference as extreme as 15 pence (i.e. £15.50 vs £15.65) would only happen on 4% of occasions (probability of 0.04) if in reality there is no difference in the median amount bet by left-handed and right-handed people.
Whilst an observed discrepancy might not occur very often, we need to bear in mind that just because something appears to be rare, doesn’t mean it’s valuable or even particularly ‘significant’.