Choosing a predictive analytics project

At Smart Vision we’re in a pretty strong position to talk authoritatively about the reality of predictive analytics. That’s because we’re comprised of a team of veteran practitioners with decades of experience where we’ve all witnessed plenty of success stories but also one or two ‘data science’ train wrecks. Moreover, like anyone else, we’re exposed to the seemingly constant torrent of stories about the latest developments in machine learning, data science or AI. But we’re often struck by the fact that there seems to be such a focus on emphasising the power of analytics or on explaining how machine learning […]

Statistics in court: the story of a dataset

Like a lot of consultants working in the analytics industry, I’ve built up an extensive portfolio of materials to illustrate different kinds of applications and approaches. Some of these consist of files and slide decks used to explain quite esoteric procedures such as TURF analysis or Partial Least Squares. However, there are certain materials that can be used to demonstrate such a wide number of statistical and predictive analytics techniques, that I’ve found myself immediately reaching for them again and again over the years. One of these is the SPSS Statistics sample dataset ‘Employee data.sav’. Most statistical software programs come […]

6 secrets of building better models part one: bootstrap aggregation

Many analysts who are interested in building predictive models invest a lot of their time and effort in trying to understand how to best tune the parameters of the specific technique that they are using, whether that technique be logistic regression or a neural network, and they are doing this in order to achieve the best accuracy of the resultant model. In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples […]

6 secrets of building better models part two: boosting

Many analysts who are interested in building predictive models invest a lot of their time and effort in trying to understand how to best tune the parameters of the specific technique that they are using, whether that technique be logistic regression or a neural network, and they are doing this in order to achieve the best accuracy of the resultant model. In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples […]

6 secrets of building better models part three: feature engineering

Many analysts who are interested in building predictive models invest a lot of their time and effort in trying to understand how to best tune the parameters of the specific technique that they are using, whether that technique be logistic regression or a neural network, and they are doing this in order to achieve the best accuracy of the resultant model. In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples […]

6 secrets of building better models part four: ensemble modelling

Many analysts who are interested in building predictive models invest a lot of their time and effort in trying to understand how to best tune the parameters of the specific technique that they are using, whether that technique be logistic regression or a neural network, and they are doing this in order to achieve the best accuracy of the resultant model. In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples […]

6 secrets of building better models part five: meta models

Many analysts who are interested in building predictive models invest a lot of their time and effort in trying to understand how to best tune the parameters of the specific technique that they are using, whether that technique be logistic regression or a neural network, and they are doing this in order to achieve the best accuracy of the resultant model. In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples […]

6 secrets of building better models part six: split models

Many analysts who are interested in building predictive models invest a lot of their time and effort in trying to understand how to best tune the parameters of the specific technique that they are using, whether that technique be logistic regression or a neural network, and they are doing this in order to achieve the best accuracy of the resultant model. In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples […]

What’s new in IBM SPSS Statistics v26?

In April of this year, IBM released the latest version of SPSS Statistics. Version 26 introduces a number of additional analysis procedures as well as new command enhancements. If you’re an existing SPSS user and you’d like to upgrade to v26 there’s more information about how to do that here. If you’re interested in trying SPSS Statistics for the first time then do please get in touch – we’ll be happy to help.  New analytical procedures Quantile Regression In standard ‘least squares’ regression the model predictions are based on a single regression line. This line can be used to estimate the […]

Regular Expressions for IBM SPSS Modeler: performance comparison

The Regular Expressions for IBM SPSS Modeler node pack provides 4 nodes that integrate the power and flexibility of regular expression pattern matching into SPSS Modeler. However, some of these capabilities can be supported using the extension nodes built into SPSS Modeler and that begs the question – why buy the Regular Expression nodes? One obvious answer is ease of use. The extension nodes built into SPSS Modeler require expertise in either R or Python programming languages since they are general “code” nodes. Although many data scientists may already have that expertise, most people use SPSS Modeler because of its […]

A first look at SPSS Modeler v18.2

In this video Jarlath Quinn takes a first look at SPSS Modeler v18.2 and demonstrates some of the new functionality that’s included within this release. IBM® SPSS® Modeler adds the following features in this release. New look and feel. A new modern interface theme is available via Tools > User Options > Display. For instructions on switching to the new theme. New data views. You can now right-click a data node and select View Data to examine and refine your data in new ways with advanced data visualizations. IBM Data Warehouse. Database modeling with IBM Netezza Analytics now supports IBM Data Warehouse. Gaussian Mixture node. A new Gaussian Mixture node is available on […]

Three questions to ask when reading articles about artificial intelligence

You may have noticed by now that there seem to be a couple of recurring themes in the plethora of articles and news programmes about artificial intelligence (AI). These themes can be summed up as a) “The dangers of AI” and b) “The limitations of AI”. Articles addressing the dangers of AI tend to focus on issues such as the threat of widespread job losses to AI, the possibility of inherent bias (such as racism and sexism), the lack of transparency in decisions made by AI systems and, as a result, the inability to plead your case with AI (“Computer […]

How to change the appearance of your output in SPSS Statistics

We’re often asked how you can change the appearance of the tables that SPSS generates as output. In this video Jarlath Quinn demonstrates two different ways to do this, either by choosing a different table look in the edit / options function, or by editing the table properties directly yourself.

How to merge files in SPSS Statistics

In this video Jarlath Quinn demonstrates how to merge data files within SPSS Statistics using each of the two main methods, either adding cases (combining files with the same fields but additional rows) or adding variables (combining files by joining variables to a target file using something like an ID field as a ‘keyed variable’).

How to create grouped or banded variables in SPSS Statistics

SPSS users often want to be able to create grouped or banded data from continuous fields such as, for example, creating age groups or income bands from continuous fields. In this video Jarlath Quinn demonstrates how to use the visual binning procedure within SPSS Statistics to do this including how to control the proportion of cases that fall into each band and how to automatically create value labels.

How to recode your data in SPSS Statistics

Recoding your data means changing the values of a variable so that they represent something else. Within SPSS Statistics there is more than one type of recode that can be performed. In this video Jarlath Quinn demonstrates how to:- Recode into the same variables, overwriting an existing variable Recode into different variables, creating a new variable in addition to your existing variables Automatically recode, a particular procedure designed to change string codes into numeric codes Visual binning, visualising a distribution in the form of a histogram and slicing it into ranged categories

How to check your data for normality in SPSS Statistics

When you’re deciding which tests to run on your data it’s important to understand whether your data is normally distributed or not, as a lot of standard parametrical tests assume a normal distribution whereas other non-parametric tests are designed to be run on data which is not normally distributed. A normal distribution has a number of characteristics:- It is symmetrical It is bell-shaped Its mean, median and mode all appear at the same place Normal distributions can be divided up into the same proportions by the standard deviations, so 95% of the area under the curve lies within roughly plus […]

How to calculate with dates in SPSS Statistics

In this video Jarlath Quinn demonstrates how to work with date and time variables in SPSS using the SPSS date and time wizard. This enables you to:- Calculate time units between two dates Add / subtract time units to or from dates Extract part of a date or a time, such as days of the week or months of the year Create date or time variables from variables holding part of dates or times

How to select cases in SPSS Statistics

In this video Jarlath Quinn demonstrates how to use SPSS Statistics to define data filters in order to select particular cases for analysis. This can be done either to create a temporary selection or to create a permanent new file with only a subsection of cases included within it. The video demonstrates how to do this with string variables too, as well as how to combine conditions from multiple variables in your selection.

How to reverse a scale in SPSS Statistics

In this video Jarlath Quinn demonstrates how to reverse the values of a rating scale (such as an agreement scale or a satisfaction scale) in SPSS Statistics, so that the highest value becomes the lowest value and vice versa. Jarlath shows two methods of doing this – one using the compute procedure and the other using the recode procedure.

How to combine variables in SPSS Statistics

SPSS users often want to know how they can combine variables together. In this video Jarlath Quinn demonstrates how to use the compute procedure to calculate the mean of a number of variables to create one combined variable, and also how to use the count values procedure to count how many times a particular value occurs across a series of variables in order to create an overall count.

TURF analysis with SPSS Statistics

In this video Jarlath Quinn introduces the popular TURF analysis technique and demonstrates how to apply it in IBM SPSS Statistics. TURF analysis is used in many industries to find the optimal sub-group of options from a wider portfolio in order to maximise their appeal to an audience or market. As such, TURF analysis is used to: Find the best assortment of SKU’s that appeal to the largest group of customers Identify the best 3 publications to reach the largest share of market Discover the optimal assortment of services to entice the most new clients Note, TURF analysis functionality is […]

Cluster Analysis with IBM SPSS Statistics

In this video Jarlath Quinn explains what cluster analysis is, how it is applied in the real world and how easy it is create your own cluster analysis models in SPSS Statistics. The video includes: A demonstration of cluster analysis using sample data How to use the cluster viewer facility to interpret and make sense of the analysis results How to apply a cluster model to a data file and rename the groups to make them meaningful to non-experts How use cluster analysis to illustrate how a customer base changes over time  

Introduction to the filter node in SPSS Modeler

This is the third in a regular series of videos about SPSS Modeler, designed to help you better understand some of the functions that are available within the package. If you’re an experienced user or you have been on one of our training courses then you’ll probably already be familiar with most of these, but if you’re a new user, you’re self-taught, or you’re currently evaluating the software then there’s likely to be a number of things in these videos that you’ll find helpful. Sometimes you may have problems with your data issues not related so much to the values of […]

How to work with variable sets in SPSS Statistics

This series of videos aims to help you get the best out of SPSS Statistics by using some tools and techniques that a lot of people don’t know about but that we know you’ll find useful. Working with variable sets in SPSS Statistics In this video we explore variable sets – a procedure in SPSS that allows you to generate subsets of variables or fields for display within dialogue boxes and in the data editor itself. This is particularly useful if you are working with a very wide dataset with lots of variables making it very hard to find the […]

Introduction to the generate menu in SPSS Modeler

This is the second in a regular series of videos about SPSS Modeler, designed to help you better understand some of the functions that are available within the package. If you’re an experienced user or you have been on one of our training courses then you’ll probably already be familiar with most of these, but if you’re a new user, you’re self-taught, or you’re currently evaluating the software then there’s likely to be a number of things in these videos that you’ll find helpful. SPSS Modeler generate menu – the super-quick way to create new fields   If you watched the previous […]

Introduction to linear regression

In classical statistics, linear regression is regarded as  the ‘gateway to predictive modelling’. For decades students have been taught about regression from theory to practice simply because it still one of the most versatile and simple ways to understand and predict the effect of key factors on critical outcomes.   For instance, using regression you can: Estimate and predict outcomes such as revenue from sales, machine repair costs, traffic volumes or asset failure rates Predict what effect a change in advertising spend or price will have on your sales Understand the relative strength of different factors that influence your sales – […]

Do I need SPSS Statistics or Modeler? How to choose the right product for your needs

We often talk to people who are unsure whether they need SPSS Statistics or whether SPSS Modeler might be more suited to their needs. In fact, it’s not always a clear cut choice as to which tool is more appropriate as it depends on the context in which the technology might be used. With that in mind I thought it might be helpful to develop a little infographic to lay out the sorts of things that you should be thinking about when choosing between SPSS Modeler and SPSS Statistics. We can think of the choice as a sort of continuum, […]

How to change the defaults in SPSS Statistics

This series of videos aims to help you get the best out of SPSS Statistics by using some tools and techniques that a lot of people don’t know about but that we know you’ll find useful. Changing the default settings of SPSS Statistics SPSS enables quite a high level of customisation so you can set up the software in a way that enables you to be a lot more productive, however many people are unaware of just how powerful these customisation options are. In this video we explore the options edit menu and show you how you can:- Change the […]

Introduction to the data audit node in SPSS Modeler

This is the first in a regular series of videos about SPSS Modeler, designed to help you better understand some of the functions that are available within the package. If you’re an experienced user or you have been on one of our training courses then you’ll probably already be familiar with most of these, but if you’re a new user, you’re self-taught, or you’re currently evaluating the software then there’s likely to be a number of things in these videos that you’ll find helpful. SPSS Modeler data audit node – the Swiss army knife of data cleaning   The data […]