Thinking of hiring a data analyst? What skills should they have?

Many of our clients regularly hire new analysts and we’re often involved in discussions about what the core skills are that they should be looking for. Similarly, I often talk to people looking to build a career in analytics who want to know what skills they need to develop. The most skilled analysts are in high demand because they blend together a range of skills that are rarely found in a single person. Here are the things that I think are really key. Domain knowledge about your industry It’s not enough just to have the technical skills. As we have […]

Do I need SPSS Statistics or Modeler? How to choose the right product for your needs

We often talk to people who are unsure whether they need SPSS Statistics or whether SPSS Modeler might be more suited to their needs. In fact, it’s not always a clear cut choice as to which tool is more appropriate as it depends on the context in which the technology might be used. With that in mind I thought it might be helpful to develop a little infographic to lay out the sorts of things that you should be thinking about when choosing between SPSS Modeler and SPSS Statistics. We can think of the choice as a sort of continuum, […]

Fear and loathing in machine learning

Over the past two years I’ve noticed a steady stream of articles in the mainstream press and business journals centred on the themes of a) the dangers of machine learning 1 2 or b) the limitations of machine learning 3 4. Many of these articles refer to incidents where machine learning initiatives have echoed and exasperated our own biases, prejudices and (frankly racist) behaviours 5. Others have focused on their limitations with providing the sorts of ‘informed, idiosyncratic’ recommendations that humans find effortless. However, for those of us that work in the field of predictive analytics where many of the […]

What do we mean when we talk about data modelling? An overview of different types of models

The real world, whether it be the physical world, for example machines, or the natural world, for example human and animal behaviour, is very complex with many factors, some unknown, determining their behaviour and responses to interventions. Even if every contributory factor to a phenomenon is known, it is unrealistic to expect that the unique contribution of each factor to the phenomenon can be isolated and quantified. Thus, mathematical models are simplified representations of reality, but to be useful they must give realistic results and reveal meaningful insights. In his 1976 paper ‘Science and Statistics’ in the Journal of the […]

Which data science tools should you learn?

I’ve blogged several times now about different aspects of data science. A conversation I’ve been having more and more frequently now is about what tools people should learn if they’re hoping to develop a career in data science. Obviously there are many different factors to be taken into account here. You’ll want to think about whether there’s a tool that’s the standard in your particular industry. You’ll also want to consider whether you want to specialize in a particular area of data science and build a reputation as an expert in a range of related tools, or whether you’d prefer […]

How alternative interfaces can help you get more out of R

Contemporary analytical platforms like SPSS and SAS represent the some of the earliest and yet longest-lived examples of proprietary software in the industry. When we think of the tectonic shifts the technology landscape has witnessed in last four decades, through the mainframe era, the rise of the PC, browser wars, the dotcom bubble, the smartphone revolution to the age of the cloud and big data not to mention the number of once seemingly ubiquitous software tools that no longer dominate the marketplace, it’s incredible to think that the first versions of SPSS and SAS were developed as far back as […]

Why R can be hard to learn

Many of the analysts we speak to are being pushed over to R, primarily because it’s open source and therefore a free alternative to commercial data analytics packages for which the costs can sometimes run into tens of thousands of pounds (or more). However, even experienced analysts often find that getting to grips with R can be a difficult business. Many people view R as being notoriously difficult to learn. There are a number of reasons why this is the case. Lack of consistency In some ways the open source nature of R is its biggest weakness as well as […]

Six questions to ask before you opt for open source software

It’s not uncommon for people to say to us that they don’t understand why they should pay for industry standard analytics products like SPSS or SAS when there are strong open source alternatives freely available such as R. Indeed the development of R has really transformed the analytics marketplace in many ways. It’s tempting to make a comparison between R and commercial alternatives such as SPSS and SAS on price grounds alone. When you look at it that way it might seem as though there’s no contest. SPSS and SAS can both involve a significant investment whereas R is free. […]

What’s the difference between business intelligence and predictive analytics?

It’s not uncommon to talk to potential clients who consider themselves to already be very much data-driven in the way that they operate. However it’s very rare to find a potential client that truly is exploiting the full potential of the data that they hold. That’s because companies often confuse business intelligence with predictive analytics, or think that once they’re using their data for business intelligence that they’re doing all they can to get value from it. Neither of these things is true. Predictive analytics is not the same thing as business intelligence, and if you’re just using your data […]

How repeatable application templates will maximise the effectiveness of your first predictive analytics project

As we help our clients get up and running with the predictive analytics tools and skills they need, we see some trends emerging in terms of the kind of applications for which clients tend to use predictive analytics most commonly. These are what we call ‘repeatable application templates’. In my previous post I outlined the 4 reasons  why we believe predictive analytics is a low risk, high return way for many companies to achieve competitive advantage. I have re-capped these below for reference: Implementing predictive analytics is less expensive, quicker and lower risk than almost any other kind of technology-enabled project You already have […]

Data science is everywhere, so why no data scientists to be seen?

Data science is everywhere at the moment. Nearly as everywhere as big data, but not quite. Books out there are making the concepts behind statistics and predictive analytics more and more accessible not only to those in business making decisions everyday but also to the average man or woman on the street.  Try Super Crunchers by Ian Ayres, Moneyball (the book  or the film which has the advantage of featuring Brad Pitt and therefore making the business of statistics much sexier than it has been),Freakonomics or the newer Superfreakonomics or pretty much anything by Malcolm Gladwell. All of these books have […]

The A-Z of analytics with IBM SPSS Modeler

A is for Automation  Why bother trying out loads of modelling techniques to see which one works best when Modeler can do that for you? Modeler can test many permutations of the same algorithm and multiple instances of different methods before selecting the best performers according to a pre-specified criteria. Oh and it will also automatically prepare your data so you can get the best results from your analysis. B is for Boosting and Bagging Boosting is a key technique in Modeler that can generate more accurate models. It works by building the same model multiple times but each time […]

What is a chi-squared test and when would you use it?

Take a look at the table below. It describes a relatively common situation in business analytics. Two offers have been made to a sample of 40,000 prospective readers of a magazine. As an experiment, half of the prospects have been offered a 25% discount for the first year and the other half have been offered an extended subscription of 15 months (rather than the normal 12 months). The table seems to indicate a slight increase in the response rate (a mere 0.4%) for those offered the extended subscription. The business analysts want to know how probable it is that this […]

What is correlation and why is it useful?

What is correlation? Correlation is a term that we employ in everyday speech to denote things that appear to have a mutual relationship. In the world of analytics correlations are specific values that are calculated in order quantify the relationships between variables. This kind of analysis is powerful because it allows us measure the association between factors such as advertising spend and website hits, product sales and competitor pricing, Net Promoter Score and customer discount, ambient temperature and component part failure. Not only can we measure this relationship but we can also use one variable to predict the other. For example, […]

Supernode scripting in SPSS Modeler

This post describes how to use Python scripts to create and modify Modeler supernodes, and control the execution of the nodes within the supernode. If you’re after a basic overview of Python scripting in Modeler then this post may be of interest, and I’ve also written about how to write standalone Python scripts in Modeler here. As streams get larger and more complex, many users take advantage of supernodes in order to keep the structure of the stream understandable and maintainable. For example, a stream may contain multiple nodes for computing a summary of recent transactions (e.g. number of transactions over the […]

Four reasons why getting started with predictive analytics is simpler than you think

We spend a great deal of our time at Smart Vision helping our clients to establish the use of predictive analytics in their business. For many organisations, getting started with predictive analytics can feel like a real departure from more traditional and familiar areas of activity. That said, organisations in almost every industry sector are becoming increasingly aware that to maintain a competitive edge it’s necessary to have detailed customer, product and operational insight; and that data analysis and modelling of organisational data is a required capability and key source of competitive advantage. In this post I want to talk about […]

Understanding what drives your net promoter score – how data science can help

The concept of the net promoter score was introduced to the world in Frederick Reichheld’s seminal Harvard Business Review article The One Number You Need to Grow in 2003. Reichheld’s research led him to believe that there was a deep and intrinsic link between profitable growth and customer loyalty. Two years of research revealed that, for most industries, the single best indicator of customer loyalty could be measured by asking people how likely they would be to recommend a particular company to a friend or relative. Reichheld and his colleagues at Bain and Company partnered with Satmetrix to develop a recommendation scale running […]