How COVID-19 is changing the nature of advanced analytics
We’re over a year into the COVID-19 pandemic now and it seems that everyone is an amateur statistics expert. Concepts like the R number, exponential growth and, most recently, probability and assessment of risk that used to be really only for discussion amongst epidemiologists and statisticians are now part of the mainstream discourse.
But as well as turning some pretty arcane statistical concepts into topics for mainstream discussion, the pandemic has significantly affected the field of advanced analytics in other ways. Predictive analytics is based on the core principle that past data can predict future behaviour. But the pandemic has disrupted everything to the extent that it’s very hard to see how much of a guide to the future data from the last year can realistically be. Consumer behaviour has been hugely impacted. Supply chains have changed completely. It’s not clear yet to what extent some of these changes are aberrations that will be ‘corrected’ as we slowly return to something more like ‘normal’ or to what extent they’re permanent shifts.
In normal times a supermarket could look at the sales of ice cream over the course of the year, build a model that predicts how ice cream consumption is impacted by temperature and be reasonably confident that if this model worked well in one year it would most probably work well in the next year. The pandemic has changed that confidence. For example, sales of flour have massively increased over the last 12 months. Is the shift to lockdown baking a temporary trend or a permanent shift in consumer behaviour? To what extent can data and trends from the last 12 months be used to make meaningful predictions about the next 12 months?
The pandemic has had a significant impact on organisations that are involved in advanced analytics. Here are just some of the ways that predictive analytics has been affected.
Forecasting is harder so descriptive analytics is coming to the fore
Because the pandemic has affected so many areas of life it’s very hard to use data from the last year to make meaningful predictions about what might happen in the years ahead. That doesn’t mean that there’s no place for analytics though. Invariably the more straightforward, univariate and bivariate descriptive analytics can add significant value to help the analyst get a more detailed picture and guide to what’s happened.
Descriptive statistics e.g. mean, median, mode, visualisation of distributions, comparison of means between groups and techniques such as cross tabulations are the foundation stones of good modelling practice. In all the excitement of automated modelling, artificial intelligence and machine learning algorithms, these arguably less exciting but essential basics of analytical best practice can get overlooked. The pandemic has meant that these descriptive approaches are more important than ever.
Machine learning models may need to be paused
Machine learning algorithms that use past data in order to make predictions or decisions about the future may need to be paused or reconfigured to take into account the fact that data from the last year may not necessarily be that useful or relevant when it comes to future predictions. Models need rebuilding from time to time as the environment in which they operate changes, but COVID-19 has sped this process up significantly.
Predicting the future means predicting the pandemic
The pandemic has in effect added an additional level of analysis into any predictive modelling. Because the impact of the pandemic has been so wide ranging it’s impossible to predict any aspect of the future without an implicit prediction of what the future course of the pandemic might be. We’re all amateur epidemiologists now.
External data has more value
When internal data sources aren’t necessarily as reliable as they once were then it can be useful for organisation to supplement their analytics with external third party data sources. For organisations that might mean supplementing their existing analytics practice with publicly available sources of pandemic-related data, tricky because of the multiple different data sources and the variety of ways in which this data is collected.
The pandemic has also sped up the dawn of an era of even greater digital data capture and collection. Huge volumes of relevant data have been captured as a direct result of the pandemic. Think of all the additional public health data, check in data as part of track and trace systems, all enabled by the now ubiquitous mobile technology. As this additional data becomes available for analysis it will further help analysts, using tools like SPSS, R and Python develop insight into the wide-ranging effects of COVID19 and the pandemic. This of course raises important questions about the appropriate use and secure management of this data under the General Data Protection legislation although that is a whole other topic beyond the scope of this blog post.