Enhancing the power of SPSS Modeler with Regular Expressions
March 28 @ 3:00 pm - 4:00 pm
Regular expressions have become the de facto way to undertake any kind of string/text handling. They underpin search engines, text analytics and word processors. The catch is that “REGEX” is effectively a programming language in itself which requires learning and typically a lot of trial and error.
The new RX nodes in IBM SPSS Modeler extend the Visual Data Science approach of SPSS Modeler itself to Regular Expression handling.
In this webinar we will demonstrate how you can use the power of REGEX to perform the most typical text handling tasks, without the pain of learning yet another programming language.
In just one hour we will cover:-
- What is REGEX? An overview of what regular expressions are and how they work.
- How is it used? How REGEX can add power to your analytics.
- An overview of the four new REGEX nodes in SPSS Modeler
- RX Groups: this node matches specific items in a string which then are extracted into new output fields
- RX Split: this node splits a string into separate components using a specified delimiter which are then added to new output fields
- RX Replace: this node matches patterns within a string field and converts them to a different pattern which is added to a new output field
- String Cleaner: this node provides common string cleaning operations (e.g. removing duplicate whitespace or non-printing characters) across multiple input fields in a way that is simple to use
- Demonstration of some common applications of REGEX
- Extracting key components from log files. Log files often contain a lot of extraneous text wrapped around data tokens that have analytical value. With the RX nodes we can identify the valuable data points like timestamps, interesting events, products, temperatures and other identifiers using pattern matching, and separate out these nuggets of value for analysis.
- Cleaning/parsing complex customer feedback from emails and customer service notes. The text of interest in email threads and call centre notes – for example customer sentiment and opinions – is often repeated and mixed up among email headers and footers. We can clean/remove the irrelevant text, images, links etc. as a pre-cursor to running more focused text analytics on the more relevant text.
- Removing Personally Identifiable Information (PII). A use case that has become more pressing since GDPR! We can use the RX nodes to identify PII tokens like email addresses, registration plates, telephone numbers and bank accounts. Once identified they can be redacted/replaced with non PII IDs so that data can be shared in a compliant way.
- Converting text postcodes and phone numbers to a standard format. We often need to have standard text structures in standard formats for analysis and matching. This may mean ensuring that spaces appear in the right place in postcodes or that phone number have international dialling codes and clearly delineated area codes. With REGEX we can define and apply that format.
Who should attend?
This webinar is aimed at anyone who is currently using SPSS Modeler and would like to learn how our new REGEX nodes can enhance its analytical capabilities.