RX Replace Node

Overview
Settings
Advanced Settings
Examples
Scripting

Overview

Regular expressions are special text strings which are used to describe particular character patterns. The RX Replace node allows regular expressions to match those patterns within a string field and convert them to a different pattern. The replacement pattern can reference elements within the match pattern. The node creates a new field that contains the converted text.

The node uses the ICU Regular Expressions package. Full details can be found here.

Settings

Match field

This is used to select the string field containing the text that should be matched by the Pattern.

Prefix match field to field names

This specifies how the new field name should be generated:

when checked (the default setting), the new field name is generated by joining the name of the Match field to the Replace field name value
when unchecked, the new field name is the Replace field name value

Pattern

This defines the regular expression that will be matched against content of the Match field. Common regular expression components can be viewed and added by using the context menu in the Pattern text area.

Regular Expression Options…

These are described in Advanced Settings below.

Replace field name

This defines either the suffix which will be appended to the Match field or the full name of the new field, depending on the setting of Prefix match field to field names.

Replace pattern

This defines the regular expression that will be used to create the converted text in the output field. If the Pattern did not match anything within the Match field, the output field will be the same as the Match field. Common regular expression components can be viewed and added by using the context menu in the Replace pattern text area.

Replace mode

This defines how many matches to perform on the Match field value:

Replace all: (the default setting) match and replace all occurrences of the Pattern
Replace first occurrence only: match and replace only the first occurrence of the Pattern

Advanced Settings

These settings control the general behaviour of the regular expression matcher. The default is for all settings to be unchecked. These can generally be left in their default state.

Case insensitive

When checked, regular expression matching will ignore character case.

Multiline

By default, ^ and $ match the start and end of the input text. When checked, ^ and $ will also match the start and end of each line within the input text.

Match ‘.’ as line terminator

When checked, a . in a pattern will match a line terminator in the input text which by default it will not.

Comments in patterns

When checked, white space and #comments are allowed within regular expression patterns.

Use Unicode word boundaries

This controls the behaviour of \b in a pattern. When checked, word boundaries are found according to the definitions of word found in Unicode UAX 29.

Examples

All examples assume other settings are set to default.

Match and replace any non-numbers

This removes any non-numeric value from the input string.

Pattern: [^0-9]

Replace pattern: empty

Input	Output
`1234`	`1234`
`-1234`	`1234`
`1,234`	`1234`
`(555) 123 456`	`555123456`

Mask IPv4 addresses (basic)

This masks out numeric IPv4 (Internet Protocol) addresses and replaces the numbers with underscore (_). Numeric IPv4 addresses have the form n.n.n.n where n is an integer in the range 0-255. For simplicity, the example matches against the . delimiter without checking the number of characters matched.

Pattern: ([0-9]*)\.([0-9]*)\.([0-9]*)\.([0-9]*)

Replace pattern: _._._._

Input	Output	Notes
`127.0.0.1`	`_._._._`	Valid IPv4 (matches)
`127.1`	`127.1`	Invalid IPv4 (no match)
`127.0.0.1.12345`	`_._._._.12345`	Matches and replaces the first 4 sections
`12345.0.0.1`	`_._._._`	Matches even though `12345` is not a valid IPv4 value

Mask IPv4 addresses (better)

This masks out numeric IPv4 addresses and replaces the numbers with underscore (_). Unlike the previous example, this also matches the number of numeric characters (1-3 characters only).

Pattern: ([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})

Replace pattern: _._._._

Input	Output	Notes
`127.0.0.1`	`_._._._`	Valid IPv4 (matches)
`127.1`	`127.1`	Invalid IPv4 (no match)
`127.0.0.1.12345`	`_._._._.12345`	Matches and replaces the first 4 sections
`12345.0.0.1`	`12_._._._`	Matches since `345.0.0.1` does meet the match pattern

Mask IPv4 addresses (better still)

This masks out numeric IPv4 addresses and replaces the numbers with underscore (_). Like the previous example, this matches the number of numeric characters (1-3 characters only). However, it also requires that the first number is at the start of the string (specified with ^) and that last number is at the end of the string (specified with $). This also assumes that the input field should only contain IPv4 addresses.

Pattern: ^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$

Replace pattern: _._._._

Input	Output	Notes
`127.0.0.1`	`_._._._`	Valid IPv4 (matches)
`127.1`	`127.1`	Invalid IPv4 (no match)
`127.0.0.1.12345`	`127.0.0.1.12345`	Invalid IPv4 (no match)
`12345.0.0.1`	`12345.0.0.1`	Invalid IPv4 (no match)

Scripting

Settings

Node type name: regexp_replace

Setting	Property	Type	Comment
Match field	`match_field`	Field	–
Prefix match field to field names	`prefix_match_field`	Boolean	–
Pattern	`pattern`	String	–
Replace field name	`replace_field_name`	String	–
Replace pattern	`replace_pattern`	String	–
Replace mode	`replace_mode`	`all` or `first`	–
Case insensitive	`opt_case_insensitive`	Boolean	–
Multiline	`opt_multiline`	Boolean	–
Match ‘.’ as line terminator	`opt_dotall`	Boolean	–
Comments in patterns	`opt_comments`	Boolean	–
Use Unicode word boundaries	`opt_uword_boundaries`	Boolean	–

Scripting Example

node = modeler.script.stream().createAt("regexp_replace", u"RX Replace", 512, 192)
node.setPropertyValue("match_field", u"IPv4")
node.setPropertyValue("prefix_match_field", False)
node.setPropertyValue("pattern", u"([0-9]*)\.([0-9]*)\.([0-9]*)\.([0-9]*)")
node.setPropertyValue("replace_field_name", u"MaskedIP")
node.setPropertyValue("replace_pattern", u"_._._._")
node.setPropertyValue("replace_mode", u"first")

RX replace node

RX Replace Node

Overview

Settings

Match field

Prefix match field to field names

Pattern

Regular Expression Options…

Replace field name

Replace pattern

Replace mode

Advanced Settings

Case insensitive

Multiline

Match ‘.’ as line terminator

Comments in patterns

Use Unicode word boundaries

Examples

Match and replace any non-numbers

Mask IPv4 addresses (basic)

Mask IPv4 addresses (better)

Mask IPv4 addresses (better still)

Scripting

Settings

Scripting Example

Contact us