Enhancing the power of SPSS Modeler with regular expressions

This webinar shows how SPSS Modeler’s REGEX nodes make advanced text handling easier. Ideal for Modeler users looking to extend their text-processing skills.

Regular expressions have become the de facto way to undertake any kind of string/text handling. They underpin search engines, text analytics and word processors. The catch is that “REGEX” is effectively a programming language in itself which requires learning and typically a lot of trial and error.

Access this on-demand session to learn how IBM SPSS Modeler’s new REGEX nodes make text handling faster, simpler and more powerful. Regular expressions are central to search, text analytics and data preparation, yet they can be difficult to learn and often require extensive trial and error. The REGEX nodes bring these capabilities into Modeler’s visual data science environment, allowing you to apply sophisticated pattern-matching techniques without writing code.

You will gain a clear understanding of what regular expressions are, how they work and why they add so much value in analytical workflows. The session introduces the four REGEX nodes and shows how each can be used to extract information, split fields, replace patterns and clean complex text across multiple inputs. Practical demonstrations illustrate how to extract insight from log files, clean customer feedback, remove personally identifiable information and standardise formats such as postcodes and phone numbers.

Designed for current SPSS Modeler users, this on-demand session provides a practical, example-led introduction to enhancing your text-processing capabilities with REGEX.

In just one hour we will cover:-

  • What is REGEX? An overview of what regular expressions are and how they work.
  • How is it used? How REGEX can add power to your analytics.
  • An overview of the four new REGEX nodes in SPSS Modeler
    • RX Groups: this node matches specific items in a string which then are extracted into new output fields
    • RX Split: this node splits a string into separate components using a specified delimiter which are then added to new output fields
    • RX Replace: this node matches patterns within a string field and converts them to a different pattern which is added to a new output field
    • String Cleaner: this node provides common string cleaning operations (e.g. removing duplicate whitespace or non-printing characters) across multiple input fields in a way that is simple to use
  • Demonstration of some common applications of REGEX
    • Extracting key components from log files. Log files often contain a lot of extraneous text wrapped around data tokens that have analytical value. With the RX nodes we can identify the valuable data points like timestamps, interesting events, products, temperatures and other identifiers using pattern matching, and separate out these nuggets of value for analysis.
    • Cleaning/parsing complex customer feedback from emails and customer service notes. The text of interest in email threads and call centre notes  – for example customer sentiment and opinions – is often repeated and mixed up among email headers and footers. We can clean/remove the irrelevant text, images, links etc. as a pre-cursor to running more focused text analytics on the more relevant text.
    • Removing Personally Identifiable Information (PII). A use case that has become more pressing since GDPR! We can use the RX nodes to identify PII tokens like email addresses, registration plates, telephone numbers and bank accounts. Once identified they can be redacted/replaced with non PII IDs so that data can be shared in a compliant way.
    • Converting text postcodes and phone numbers to a standard format. We often need to have standard text structures in standard formats for analysis and matching. This may mean ensuring that spaces appear in the right place in postcodes or that phone number have international dialling codes and clearly delineated area codes. With REGEX we can define and apply that format.

Please enter your name and email to access this on demand webinar