RX Replace Node
Overview
Settings
Advanced Settings
Examples
Scripting
Overview
Regular expressions are special text strings which are used to describe particular character patterns. The RX Replace node allows regular expressions to match those patterns within a string field and convert them to a different pattern. The replacement pattern can reference elements within the match pattern. The node creates a new field that contains the converted text.
The node uses the ICU Regular Expressions package. Full details can be found here.
Settings
Match field
This is used to select the string field containing the text that should be matched by the Pattern.
Prefix match field to field names
This specifies how the new field name should be generated:
- when checked (the default setting), the new field name is generated by joining the name of the Match field to the Replace field name value
- when unchecked, the new field name is the Replace field name value
Pattern
This defines the regular expression that will be matched against content of the Match field. Common regular expression components can be viewed and added by using the context menu in the Pattern text area.
Regular Expression Options…
These are described in Advanced Settings below.
Replace field name
This defines either the suffix which will be appended to the Match field or the full name of the new field, depending on the setting of Prefix match field to field names.
Replace pattern
This defines the regular expression that will be used to create the converted text in the output field. If the Pattern did not match anything within the Match field, the output field will be the same as the Match field. Common regular expression components can be viewed and added by using the context menu in the Replace pattern text area.
Replace mode
This defines how many matches to perform on the Match field value:
- Replace all: (the default setting) match and replace all occurrences of the Pattern
- Replace first occurrence only: match and replace only the first occurrence of the Pattern
Advanced Settings
These settings control the general behaviour of the regular expression matcher. The default is for all settings to be unchecked. These can generally be left in their default state.
Case insensitive
When checked, regular expression matching will ignore character case.
Multiline
By default, ^
and $
match the start and end of the input text. When checked, ^
and $
will also match the start and end of each line within the input text.
Match ‘.’ as line terminator
When checked, a .
in a pattern will match a line terminator in the input text which by default it will not.
Comments in patterns
When checked, white space and #comments are allowed within regular expression patterns.
Use Unicode word boundaries
This controls the behaviour of \b
in a pattern. When checked, word boundaries are found according to the definitions of word found in Unicode UAX 29.
Examples
All examples assume other settings are set to default.
Match and replace any non-numbers
This removes any non-numeric value from the input string.
Pattern: [^0-9]
Replace pattern: empty
Input | Output |
---|---|
1234 | 1234 |
-1234 | 1234 |
1,234 | 1234 |
(555) 123 456 | 555123456 |
Mask IPv4 addresses (basic)
This masks out numeric IPv4 (Internet Protocol) addresses and replaces the numbers with underscore (_
). Numeric IPv4 addresses have the form n.n.n.n
where n
is an integer in the range 0-255. For simplicity, the example matches against the .
delimiter without checking the number of characters matched.
Pattern: ([0-9]*)\.([0-9]*)\.([0-9]*)\.([0-9]*)
Replace pattern: _._._._
Input | Output | Notes |
---|---|---|
127.0.0.1 | _._._._ | Valid IPv4 (matches) |
127.1 | 127.1 | Invalid IPv4 (no match) |
127.0.0.1.12345 | _._._._.12345 | Matches and replaces the first 4 sections |
12345.0.0.1 | _._._._ | Matches even though 12345 is not a valid IPv4 value |
Mask IPv4 addresses (better)
This masks out numeric IPv4 addresses and replaces the numbers with underscore (_
). Unlike the previous example, this also matches the number of numeric characters (1-3 characters only).
Pattern: ([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})
Replace pattern: _._._._
Input | Output | Notes |
---|---|---|
127.0.0.1 | _._._._ | Valid IPv4 (matches) |
127.1 | 127.1 | Invalid IPv4 (no match) |
127.0.0.1.12345 | _._._._.12345 | Matches and replaces the first 4 sections |
12345.0.0.1 | 12_._._._ | Matches since 345.0.0.1 does meet the match pattern |
Mask IPv4 addresses (better still)
This masks out numeric IPv4 addresses and replaces the numbers with underscore (_
). Like the previous example, this matches the number of numeric characters (1-3 characters only). However, it also requires that the first number is at the start of the string (specified with ^
) and that last number is at the end of the string (specified with $
). This also assumes that the input field should only contain IPv4 addresses.
Pattern: ^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$
Replace pattern: _._._._
Input | Output | Notes |
---|---|---|
127.0.0.1 | _._._._ | Valid IPv4 (matches) |
127.1 | 127.1 | Invalid IPv4 (no match) |
127.0.0.1.12345 | 127.0.0.1.12345 | Invalid IPv4 (no match) |
12345.0.0.1 | 12345.0.0.1 | Invalid IPv4 (no match) |
Scripting
Settings
Node type name: regexp_replace
Setting | Property | Type | Comment |
---|---|---|---|
Match field | match_field | Field | – |
Prefix match field to field names | prefix_match_field | Boolean | – |
Pattern | pattern | String | – |
Replace field name | replace_field_name | String | – |
Replace pattern | replace_pattern | String | – |
Replace mode | replace_mode | all or first | – |
Case insensitive | opt_case_insensitive | Boolean | – |
Multiline | opt_multiline | Boolean | – |
Match ‘.’ as line terminator | opt_dotall | Boolean | – |
Comments in patterns | opt_comments | Boolean | – |
Use Unicode word boundaries | opt_uword_boundaries | Boolean | – |
Scripting Example
node = modeler.script.stream().createAt("regexp_replace", u"RX Replace", 512, 192)
node.setPropertyValue("match_field", u"IPv4")
node.setPropertyValue("prefix_match_field", False)
node.setPropertyValue("pattern", u"([0-9]*)\.([0-9]*)\.([0-9]*)\.([0-9]*)")
node.setPropertyValue("replace_field_name", u"MaskedIP")
node.setPropertyValue("replace_pattern", u"_._._._")
node.setPropertyValue("replace_mode", u"first")