I’ve worked in predictive analytics for many years and have seen that a key factor for increasing the prospects of a project being successful is using a structured approach based around a data mining methodology such as CRISP-DM (a quick declaration of interest here – I was one of the team who originally developed the CRISP-DM methodology). First published in 2001, CRISP-DM remains one of the most widely used data mining/predictive analytics methodologies. I believe its longevity in a rapidly changing area stems from a number of characteristics:
- It encourages data miners to focus on business goals, so as to ensure that project outputs provide tangible benefits to the organisation. Too often analysts can lose sight of the ultimate business purpose of their analysis – the analysis can become an end in itself rather than a means to an end. The CRISP-DM approach helps ensure that the business goals remain at the centre of the project throughout.
- CRISP-DM provides an iterative approach, including frequent opportunities to evaluate the progress of the project against its original objectives. This helps minimise risk of getting to the end of the project and finding that the business objectives have not really been addressed. It also means that the project can be adapted and changed in the light of new findings once it’s up and running, rather than being static.
- The CRISP-DM methodology is both technology and problem-neutral. You can use any software you like for your analysis and apply it to any data mining problem you want to. Whatever the nature of your data mining project, CRISP-DM will still provide you with a framework with enough structure to be useful.
This last has probably contributed most to the success of CRISP-DM over the years although it hasn’t always been an easy balance to achieve. During the development of CRISP-DM, we ran a number of public workshops around the US and Europe to get feedback from potential users. Some people complained that CRISP-DM didn't include prescriptive advice along the lines of "for problem A, use algorithm B".
We originally intended to address this by providing CRISP-DM in two parts – firstly a process reference that was stable and required infrequent updates, and secondly a best practice guide that provided more of the specific analytics guidance that some people wanted. However in the end we decided to focus on the process reference aspect and leave others to build the best practices as new algorithms and techniques became available. Looking at the number of books and articles published over the years that take inspiration from CRISP-DM, we seem to have made the right decision.
For more information about CRISP-DM click here.