Fear and loathing in machine learning

computer-1550275-1280x960

Over the past two years I’ve noticed a steady stream of articles in the mainstream press and business journals centred on the themes of a) the dangers of machine learning 1 2 or b) the limitations of machine learning 3 4. Many of these articles refer to incidents where machine learning initiatives have echoed and exasperated our own biases, prejudices and (frankly racist) behaviours 5. Others have focused on their limitations with providing the sorts of ‘informed, idiosyncratic’ recommendations that humans find effortless. However, for those of us that work in the field of predictive analytics where many of the […]

Take your asset management to the next level with predictive asset management

worker with helmet in front of production hall

Why is effective asset management so important? The rapidly changing and volatile global economy is putting enormous pressure on organisations to control or preferably reduce their operating costs. This is resulting in a period of uncertainty with a range of new and complex factors having to be considered. For example: Ageing infrastructure and assets, more demanding operating conditions and higher throughputs Stricter regulatory requirements and much higher penalties for not meeting them The public’s increasing awareness of and care for the environment Increasing global demand for water and other natural resources Shifting economic and political power balances Investors’ attitudes to […]

What do we mean when we talk about data modelling? An overview of different types of models

data analysis word cloud on blackboard

The real world, whether it be the physical world, for example machines, or the natural world, for example human and animal behaviour, is very complex with many factors, some unknown, determining their behaviour and responses to interventions. Even if every contributory factor to a phenomenon is known, it is unrealistic to expect that the unique contribution of each factor to the phenomenon can be isolated and quantified. Thus, mathematical models are simplified representations of reality, but to be useful they must give realistic results and reveal meaningful insights. In his 1976 paper ‘Science and Statistics’ in the Journal of the […]

Which data science tools should you learn?

big data word cloud

I’ve blogged several times now about different aspects of data science. A conversation I’ve been having more and more frequently now is about what tools people should learn if they’re hoping to develop a career in data science. Obviously there are many different factors to be taken into account here. You’ll want to think about whether there’s a tool that’s the standard in your particular industry. You’ll also want to consider whether you want to specialize in a particular area of data science and build a reputation as an expert in a range of related tools, or whether you’d prefer […]

How alternative interfaces can help you get more out of R

chi squared blog post

Contemporary analytical platforms like SPSS and SAS represent the some of the earliest and yet longest-lived examples of proprietary software in the industry. When we think of the tectonic shifts the technology landscape has witnessed in last four decades, through the mainframe era, the rise of the PC, browser wars, the dotcom bubble, the smartphone revolution to the age of the cloud and big data not to mention the number of once seemingly ubiquitous software tools that no longer dominate the marketplace, it’s incredible to think that the first versions of SPSS and SAS were developed as far back as […]

Why R can be hard to learn

formula math

Many of the analysts we speak to are being pushed over to R, primarily because it’s open source and therefore a free alternative to commercial data analytics packages for which the costs can sometimes run into tens of thousands of pounds (or more). However, even experienced analysts often find that getting to grips with R can be a difficult business. Many people view R as being notoriously difficult to learn. There are a number of reasons why this is the case. Lack of consistency In some ways the open source nature of R is its biggest weakness as well as […]

Six questions to ask before you opt for open source software

analytics word cloud on digital tablet

It’s not uncommon for people to say to us that they don’t understand why they should pay for industry standard analytics products like SPSS or SAS when there are strong open source alternatives freely available such as R. Indeed the development of R has really transformed the analytics marketplace in many ways. It’s tempting to make a comparison between R and commercial alternatives such as SPSS and SAS on price grounds alone. When you look at it that way it might seem as though there’s no contest. SPSS and SAS can both involve a significant investment whereas R is free. […]

What’s the difference between business intelligence and predictive analytics?

Analysis concept

It’s not uncommon to talk to potential clients who consider themselves to already be very much data-driven in the way that they operate. However it’s very rare to find a potential client that truly is exploiting the full potential of the data that they hold. That’s because companies often confuse business intelligence with predictive analytics, or think that once they’re using their data for business intelligence that they’re doing all they can to get value from it. Neither of these things is true. Predictive analytics is not the same thing as business intelligence, and if you’re just using your data […]

How repeatable application templates will maximise the effectiveness of your first predictive analytics project

analytics word cloud on digital tablet

As we help our clients get up and running with the predictive analytics tools and skills they need, we see some trends emerging in terms of the kind of applications for which clients tend to use predictive analytics most commonly. These are what we call ‘repeatable application templates’. In my previous post I outlined the 4 reasons  why we believe predictive analytics is a low risk, high return way for many companies to achieve competitive advantage. I have re-capped these below for reference: Implementing predictive analytics is less expensive, quicker and lower risk than almost any other kind of technology-enabled project You already have […]

Data science is everywhere, so why no data scientists to be seen?

data mining word cloud

Data science is everywhere at the moment. Nearly as everywhere as big data, but not quite. Books out there are making the concepts behind statistics and predictive analytics more and more accessible not only to those in business making decisions everyday but also to the average man or woman on the street.  Try Super Crunchers by Ian Ayres, Moneyball (the book  or the film which has the advantage of featuring Brad Pitt and therefore making the business of statistics much sexier than it has been),Freakonomics or the newer Superfreakonomics or pretty much anything by Malcolm Gladwell. All of these books have […]

The A-Z of analytics with IBM SPSS Modeler

Etiqueta redonda azul A-Z

A is for Automation  Why bother trying out loads of modelling techniques to see which one works best when Modeler can do that for you? Modeler can test many permutations of the same algorithm and multiple instances of different methods before selecting the best performers according to a pre-specified criteria. Oh and it will also automatically prepare your data so you can get the best results from your analysis. B is for Boosting and Bagging Boosting is a key technique in Modeler that can generate more accurate models. It works by building the same model multiple times but each time […]

What is a chi-squared test and when would you use it?

chi squared blog post

Take a look at the table below. It describes a relatively common situation in business analytics. Two offers have been made to a sample of 40,000 prospective readers of a magazine. As an experiment, half of the prospects have been offered a 25% discount for the first year and the other half have been offered an extended subscription of 15 months (rather than the normal 12 months). The table seems to indicate a slight increase in the response rate (a mere 0.4%) for those offered the extended subscription. The business analysts want to know how probable it is that this […]

What is correlation and why is it useful?

model and observation data

What is correlation? Correlation is a term that we employ in everyday speech to denote things that appear to have a mutual relationship. In the world of analytics correlations are specific values that are calculated in order quantify the relationships between variables. This kind of analysis is powerful because it allows us measure the association between factors such as advertising spend and website hits, product sales and competitor pricing, Net Promoter Score and customer discount, ambient temperature and component part failure. Not only can we measure this relationship but we can also use one variable to predict the other. For example, […]

Supernode scripting in SPSS Modeler

The terminal supernode

This post describes how to use Python scripts to create and modify Modeler supernodes, and control the execution of the nodes within the supernode. If you’re after a basic overview of Python scripting in Modeler then this post may be of interest, and I’ve also written about how to write standalone Python scripts in Modeler here. As streams get larger and more complex, many users take advantage of supernodes in order to keep the structure of the stream understandable and maintainable. For example, a stream may contain multiple nodes for computing a summary of recent transactions (e.g. number of transactions over the […]

Four reasons why getting started with predictive analytics is simpler than you think

Predictive Analytics  on tablet with graphs. Business concept.

We spend a great deal of our time at Smart Vision helping our clients to establish the use of predictive analytics in their business. For many organisations, getting started with predictive analytics can feel like a real departure from more traditional and familiar areas of activity. That said, organisations in almost every industry sector are becoming increasingly aware that to maintain a competitive edge it’s necessary to have detailed customer, product and operational insight; and that data analysis and modelling of organisational data is a required capability and key source of competitive advantage. In this post I want to talk about […]

Understanding what drives your net promoter score – how data science can help

web plot txt mining nps

The concept of the net promoter score was introduced to the world in Frederick Reichheld’s seminal Harvard Business Review article The One Number You Need to Grow in 2003. Reichheld’s research led him to believe that there was a deep and intrinsic link between profitable growth and customer loyalty. Two years of research revealed that, for most industries, the single best indicator of customer loyalty could be measured by asking people how likely they would be to recommend a particular company to a friend or relative. Reichheld and his colleagues at Bain and Company partnered with Satmetrix to develop a recommendation scale running […]

Data science projects – what skills do you need and where can you get them from?

Working people in the office

Data science is on the rise. A couple of years back Harvard Business Review suggested that ‘data scientist’ is the sexiest job title of the twenty first century and the hype around data science shows no sign of abating. The term ‘data scientist’ itself was only coined in 2008 but since then the number of data science roles in organisations has grown exponentially, as the volume of data available for analysis also grows. But this presents a challenge for organisations – in such a new and fast-changing field how can they identify the skills they need, find appropriate people who have those skills, […]