RX Replace Node


RX Replace Node

Overview
Settings
Advanced Settings
Examples
Scripting

Overview

Regular expressions are special text strings which are used to describe particular character patterns. The RX Replace node allows regular expressions to match those patterns within a string field and convert them to a different pattern. The replacement pattern can reference elements within the match pattern. The node creates a new field that contains the converted text.

The node uses the ICU Regular Expressions package. Full details can be found here.

Settings

Match field

This is used to select the string field containing the text that should be matched by the Pattern.

Prefix match field to field names

This specifies how the new field name should be generated:

  • when checked (the default setting), the new field name is generated by joining the name of the Match field to the Replace field name value
  • when unchecked, the new field name is the Replace field name value

Pattern

This defines the regular expression that will be matched against content of the Match field. Common regular expression components can be viewed and added by using the context menu in the Pattern text area.

Regular Expression Options...

These are described in Advanced Settings below.

Replace field name

This defines either the suffix which will be appended to the Match field or the full name of the new field, depending on the setting of Prefix match field to field names.

Replace pattern

This defines the regular expression that will be used to create the converted text in the output field. If the Pattern did not match anything within the Match field, the output field will be the same as the Match field. Common regular expression components can be viewed and added by using the context menu in the Replace pattern text area.

Replace mode

This defines how many matches to perform on the Match field value:

  • Replace all: (the default setting) match and replace all occurrences of the Pattern
  • Replace first occurrence only: match and replace only the first occurrence of the Pattern

Advanced Settings

These settings control the general behaviour of the regular expression matcher. The default is for all settings to be unchecked. These can generally be left in their default state.

Case insensitive

When checked, regular expression matching will ignore character case.

Multiline

By default, ^ and $ match the start and end of the input text. When checked, ^ and $ will also match the start and end of each line within the input text.

Match '.' as line terminator

When checked, a . in a pattern will match a line terminator in the input text which by default it will not.

Comments in patterns

When checked, white space and #comments are allowed within regular expression patterns.

Use Unicode word boundaries

This controls the behaviour of \b in a pattern. When checked, word boundaries are found according to the definitions of word found in Unicode UAX 29.

Examples

All examples assume other settings are set to default.

Match and replace any non-numbers

This removes any non-numeric value from the input string.

Pattern: [^0-9]

Replace pattern: empty

Input Output
1234 1234
-1234 1234
1,234 1234
(555) 123 456 555123456

Mask IPv4 addresses (basic)

This masks out numeric IPv4 (Internet Protocol) addresses and replaces the numbers with underscore (_). Numeric IPv4 addresses have the form n.n.n.n where n is an integer in the range 0-255. For simplicity, the example matches against the . delimiter without checking the number of characters matched.

Pattern: ([0-9]*)\.([0-9]*)\.([0-9]*)\.([0-9]*)

Replace pattern: _._._._

Input Output Notes
127.0.0.1 _._._._ Valid IPv4 (matches)
127.1 127.1 Invalid IPv4 (no match)
127.0.0.1.12345 _._._._.12345 Matches and replaces the first 4 sections
12345.0.0.1 _._._._ Matches even though 12345 is not a valid IPv4 value

Mask IPv4 addresses (better)

This masks out numeric IPv4 addresses and replaces the numbers with underscore (_). Unlike the previous example, this also matches the number of numeric characters (1-3 characters only).

Pattern: ([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})

Replace pattern: _._._._

Input Output Notes
127.0.0.1 _._._._ Valid IPv4 (matches)
127.1 127.1 Invalid IPv4 (no match)
127.0.0.1.12345 _._._._.12345 Matches and replaces the first 4 sections
12345.0.0.1 12_._._._ Matches since 345.0.0.1 does meet the match pattern

Mask IPv4 addresses (better still)

This masks out numeric IPv4 addresses and replaces the numbers with underscore (_). Like the previous example, this matches the number of numeric characters (1-3 characters only). However, it also requires that the first number is at the start of the string (specified with ^) and that last number is at the end of the string (specified with $). This also assumes that the input field should only contain IPv4 addresses.

Pattern: ^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$

Replace pattern: _._._._

Input Output Notes
127.0.0.1 _._._._ Valid IPv4 (matches)
127.1 127.1 Invalid IPv4 (no match)
127.0.0.1.12345 127.0.0.1.12345 Invalid IPv4 (no match)
12345.0.0.1 12345.0.0.1 Invalid IPv4 (no match)

Scripting

Settings

Node type name: regexp_replace

Setting Property Type Comment
Match field match_field Field -
Prefix match field to field names prefix_match_field Boolean -
Pattern pattern String -
Replace field name replace_field_name String -
Replace pattern replace_pattern String -
Replace mode replace_mode all or first -
Case insensitive opt_case_insensitive Boolean -
Multiline opt_multiline Boolean -
Match '.' as line terminator opt_dotall Boolean -
Comments in patterns opt_comments Boolean -
Use Unicode word boundaries opt_uword_boundaries Boolean -

Scripting Example

node = modeler.script.stream().createAt("regexp_replace", u"RX Replace", 512, 192)
node.setPropertyValue("match_field", u"IPv4")
node.setPropertyValue("prefix_match_field", False)
node.setPropertyValue("pattern", u"([0-9]*)\.([0-9]*)\.([0-9]*)\.([0-9]*)")
node.setPropertyValue("replace_field_name", u"MaskedIP")
node.setPropertyValue("replace_pattern", u"_._._._")
node.setPropertyValue("replace_mode", u"first")