Over the past two years I’ve noticed a steady stream of articles in the mainstream press and business journals centred on the themes of a) the dangers of machine learning 1 2 or b) the limitations of machine learning 3 4. Many of these articles refer to incidents where machine learning initiatives have echoed and exasperated our own biases, prejudices and (frankly racist) behaviours 5. Others have focused on their limitations with providing the sorts of ‘informed, idiosyncratic’ recommendations that humans find effortless.
However, for those of us that work in the field of predictive analytics where many of the algorithms at the heart of these stories are routinely used, ‘machine learning’ is nothing new. In fact, many of us are pretty bemused by the fact that the media has leapt on the phrase ‘machine learning’ to stand for everything from multivariate statistics, association modelling and rule induction to operational research, cognitive computing and artificial intelligence. Rather like the word ‘algorithm’ it’s being used to cover pretty much any situation where software generates predictions in the form of risk scores, recommendations, estimates or classifications.
In fact, machine learning as a term simply refers to a class of techniques, often used in prediction problems, that originated in the field of computer science rather than the older techniques that we find in classical statistics – most of which pre-date modern computing. As such, I suspect that that ‘machine learning’ sounds somehow sexier and more ominous than ‘logistic regression’ even though many of the applications that these articles describe may well be driven by much older techniques that most analysts would never describe as ‘machine learning’. I also wonder if the apparent allure of the term can be traced back to the 1985 movie Terminator – where Arnie famously says “My CPU is a neural net processor – a learning computer”.
Look, most of these approaches address a fairly basic problem: some outcomes are entirely random and there’s nothing much you can do about it. Alternatively, some outcomes are not entirely random and you can use an analytical procedure to tell you how likely outcome A is to occur vs outcome B. Humans are often lousy at spotting the subtle differences in patterns that correlate with different results so sometimes an algorithm (including a machine learning algorithm) is a better approach. That’s pretty much it. In fact, we’ve had the technology to do this for many decades – all that’s happened in more recent years is that suddenly we’re able to apply a greater variety of these techniques to a lot more data much faster.
Let’s be clear. There’s nothing dangerous about algorithms generating predictions. It’s the decisions we make as a result of those predictions that we have to answer for. As regards the quality and context of the information we use to make decisions, then of course biased or incomplete data may lead to inaccurate predictions and therefore the wrong recommendation. But then humans are just as likely to make bad decisions based on bad data as an algorithm is. Especially when they consciously or otherwise pay too much attention to irrelevant factors like appearance or gender. Moreover, humans are prone to make bad decisions based on good data. With that in mind, is it possible that there are occasions when machine learning is more accurate, safer and fairer than a human decision?