Adding New Modules To Jython Scripting In IBM SPSS Modeler

IBM SPSS Modeler supports Python scripting using Jython, a Java[tm] implementation of the Python language. Modeler versions 16 and 17 use Jython 2.5.1 which includes a number of useful and popular modules. However, many other modules are available and customers often want to use their own so a frequent question is how to include them. There are two approaches: Copy the module to the “site-packages” folder under modeler-installation/lib/jython/Lib. This has the advantage of making modules available to anybody who uses that Modeler installation but usually requires someone with a level of administrative privileges to update the Modeler installation. Define a JYTHONPATH environment variable and […]

Using the find all() Function to search for nodes

Most SPSS Modeler scripts include code that locates an existing node e.g.: stream = modeler.script.stream() typenode = stream.findByType(“type”, None) However, some scripts need to search for all nodes – maybe by node type but also matching some other criteria. The Modeler scripting API documentation (PDF) mentions a findAll() function: d.findAll(filter, recursive): Collection filter (NodeFilter) : the node filter recursive (boolean) : if True then composite nodes should be recursively searched Returns a collection of all nodes accepted by the specified filter. If the recursive flag is True then any SuperNodes within this diagram are also searched. Unfortunately, the NodeFilter definition is not specified. NodeFilter is a base class […]

Using SPSS Modeler’s cache_compression setting to speed up your modelling

There are a number of configuration settings associated with IBM SPSS Modeler Server that control its behaviour. The default settings aim to ensure that stream execution will complete successfully even if the host machine is being used by a number of other applications i.e. Modeler Server is trying to be a “good citizen”. However, if Modeler Server is the primary application on the host, then tweaking these settings can reduce the execution times of some streams significantly. One of these settings is cache_compression. The cache_compression setting is used to control whether data that gets spilled to disk is compressed before being written to […]

Supernode scripting in SPSS Modeler

This post describes how to use Python scripts to create and modify Modeler supernodes, and control the execution of the nodes within the supernode. If you’re after a basic overview of Python scripting in Modeler then this post may be of interest, and I’ve also written about how to write standalone Python scripts in Modeler here. As streams get larger and more complex, many users take advantage of supernodes in order to keep the structure of the stream understandable and maintainable. For example, a stream may contain multiple nodes for computing a summary of recent transactions (e.g. number of transactions over the […]

Writing a standalone Python script for Modeler

In my last post I gave a brief overview of the new Python-based scripting available in Modeler 16. In this post, I will cover Modeler 16 scripting in a little more detail. This assumes some familiarity with Python such as the Python module mechanism and exception handling. There are three types of script in Modeler: A stream script – this is the most common type of script and controls execution within a single stream A standalone or session script that can manage multiple streams A supernode script that manages the contents of the nodes within a supernode You may recall […]

An overview of Python scripting in Modeler 16

Modeler scripts are used to automate the creation of streams, construction and configuration of nodes, stream execution and managing the execution results such as saving models to file or a content repository. A major new feature in Modeler 16 is the introduction of Python as the default scripting language. Python replaces the original bespoke language Modeler has had for over 15 years. Although existing legacy scripts will continue to work in Modeler 16, moving to Python offers a number of advantages: It is popular language making it easier to find staff with relevant experience or training to improve existing skills It […]

How the CRISP-DM method can help your data mining project succeed

I’ve worked in predictive analytics for many years and have seen that a key factor for increasing the prospects of a project being successful is using a structured approach based around a data mining methodology such as CRISP-DM (a quick declaration of interest here – I was one of the team who originally developed the CRISP-DM methodology). First published in 2001, CRISP-DM remains one of the most widely used data mining/predictive analytics methodologies. I believe its longevity in a rapidly changing area stems from a number of characteristics: It encourages data miners to focus on business goals, so as to […]