String cleaner node

String Cleaner Node



The String Cleaner node provides common string cleaning operations in a way that is simple to use. Operations including removing non-standard characters, trimming leading and/or trailing whitespace and capitalization. These operations can be applied across multiple input fields and a new string output field will be created for each input field.


The node settings are split into related sub-groups.


Clean fields

This is used to select which string fields should be cleaned.

Output suffix

New field names are generated by joining the name of each field selected in Clean fields to the output suffix.

Clean fieldsHomePhoneMobilePhone

Output suffix_Cleaned

Output fields generated: HomePhone_CleanedMobilePhone_Cleaned


Leading and trailing spaces

This specifies how strings should be trimmed:

  • None: (the default setting) the string is not trimmed
  • Left: removes spaces at the start of the string
  • Right: removes spaces at the end of the string
  • Both: removes spaces at the start and end of the string

Replace tab with space

When checked, tab characters will be replaced with space characters.

Replace duplicate space or tab with space

When checked, 2 or more adjacent space or tab characters will be replaced with a single space character.



This specifies how character case should be changed in the string:

  • Leave unchanged: (the default setting) character cases are not modified
  • ALL UPPER CASE: Any lower case characters are converted to the equivalent upper case characters
  • all lower case: Any upper case characters are converted to the equivalent lower case characters

Character Categories


This section lists various character categories which can be checked or unchecked.

  • Upper case English characters: characters representing the letters A to Z
  • Lower case English characters: characters representing the letters a to z
  • Digits: characters representing the numbers 0 to 9
  • Punctuation: punctuation characters are !'#$%&'()*+,-./:;<=>?@[/]^_{|}~
  • Blanks: space or tab characters
  • Spaces: space, tab, new line, vertical tab, form feed or carriage return characters
  • Non-printing characters: other characters that are not normally visible but can sometimes be included in strings

Category handling

This specifies how character categories should be handled:

  • Remove selected categories: (the default setting) character cases are not modified
  • Keep selected categories and remove others: Any lower case characters are converted to the equivalent upper case characters


All examples assume other settings are set to default.

Clean phone numbers

This removes any non-digit characters from phone number strings. Clean fieldsMobilePhone

Output suffix_Cleaned


Category handlingKeep selected categories and remove others

+44 1234 5678944123456789
(555) 567890555567890



Node type nameregexp_cleaner

Clean fieldsclean_fieldsString List
Output suffixoutput_suffixString
Trimtrim_modenoneleftright or both
Replace tab with spacereplace_tabsBoolean
Replace duplicate space or tab with spacereplace_duplicate_blanksBoolean
Capitalizecapitalize_modenoneupper or lower
Upper case English charactersfind_upper_english_charsBoolean
Lower case English charactersfind_lower_english_charsBoolean
Non-printing charactersfind_non_printing_charsBoolean
Category handlingcategories_moderemove or keep

Scripting Example

node ="regexp_cleaner", u"String Cleaner", 512, 192)
node.setPropertyValue("clean_fields", [u"HomePhone", u"MobilePhone"])
node.setPropertyValue("output_suffix", u"_processed")
node.setPropertyValue("trim_mode", u"both")
node.setPropertyValue("replace_tabs", True)
node.setPropertyValue("replace_duplicate_blanks", True)
node.setPropertyValue("capitalize_mode", u"none")
node.setPropertyValue("find_upper_english_chars", False)
node.setPropertyValue("find_lower_english_chars", False)
node.setPropertyValue("find_digits", True)
node.setPropertyValue("find_punctuation", False)
node.setPropertyValue("find_blanks", False)
node.setPropertyValue("find_spaces", False)
node.setPropertyValue("find_non_printing_chars", False)
node.setPropertyValue("categories_mode", u"keep")
Download your free copy of our Understanding Significance Testing white paper
Subscribe to our email newsletter today to receive updates on the latest news, tutorials and events, and get your free copy of our latest white paper.
We respect your privacy. Your information is safe and will never be shared.
Don't miss out. Subscribe today.
WordPress Popup Plugin
Scroll to Top