Metadata node help page

Metadata Node

Overview
Mapping Settings
Local Types Settings
Examples
Built-in Types
Scripting
Trademarks

Overview

The Metadata node provides a convenient mechanism for defining metadata and associating that metadata with one or more fields. Unlike the Type node where each field has its own metadata definition, the Metadata node separates the metadata definition from the mapping of those definitions to fields. In addition, the metadata definitions can be obtained from another Metadata node which means a stream can include a "single source of truth" for all metadata. This makes the management of metadata much simpler than can be achieved using the standard Type node.

The Metadata node also includes a user-friendly interface (UI) which organises definitions of flags, ranges and sets in a clear fashion. It also allows multiple fields to be mapped to a metadata definition with a single click. The UI also makes it simpler to define sets with large numbers of discrete values by allowing rows copied from text files or Microsoft® Excel® to be pasted directly into the UI.

Finally, the Metadata node includes a number of built-in types that occur frequently in data:

  • 0/1 flag values (by default the Type node defines 0/1 values as an integer range)
  • integer sets representing days of the week, months of the year etc.
  • integer and real ranges for probabilities, percentages etc.

 

Mapping Settings

This tab defines the where the metadata definitions are obtained from and how those definitions are mapped to fields.

Metadata mode

This defines where the node obtains its metadata definitions from:

  • Provider: (the default setting) use metadata defined in the Local Types Settings tab
  • Consumer: use metadata defined in the Metadata node defined by Provider node

Provider node

When Metadata mode is set to Consumer, this specifies the ID of the of the Metadata node that provides the metadata.

Field types

This specifies how metadata definitions are mapped to which fields. Multiple fields can be selected from the Fields control and a single type from the Types control. When the right arrow is clicked, this will add the selected fields and type to the Field types control. Mappings can be removed by selecting one or more rows in the Field types control and clicking the left arrow.

Local Types Settings

This tab defines the custom flag, set and range types that can be used by the Mapping Settings when Metadata mode is set to Provider. There are 3 subsections to this tab for each of the 3 types of metadata: Ranges, Sets and Flags. Each subsection contains a table where each row represents a metadata definition.

Controls at the bottom of each table allow you to:

  • Create a new type definition from scratch
  • Clone/create a new type definition based on another definition (requires that a single row has been selected in the table)
  • Edit an existing type definition (requires that a single row has been selected in the table)
  • Delete type definitions (requires that one or more rows have been selected in the table)

Creating A New Type

All new types require:

  • a name
  • a unique ID
  • the type of value ("storage") - for sets and flags, this is either String, Integer or Real; for ranges, this is either Integer or Real

The remaining information depends on the metadata being created:

  • for ranges, this is the lower and upper bounds of the range
  • for sets, this is the list of valid values the make up the set in the data and (optionally) the value labels
  • for flags, this is the true and false values that appear in the data and (optionally) the true and false value labels

Cloning A Type

Sometimes it is simpler to start from an existing type when both types are similar to each other. Cloning creates a copy of an existing type so that the existing settings can be modified. Note that a new type ID must be specified.

Editing An Existing Type

This option allows an existing type to be edited. For example, modifying the name, changing the set members etc. Note that the unique ID cannot be modified.

Deleting Types

This option allows the selected types to be deleted.

Note that if a definition gets deleted that is still mapped to one or more fields then in the Field types setting, those fields will shown with an error symbol. This will also happen in any Metadata nodes that consume this node's metadata. Fields mapped to non-existent types will have their metadata unchanged.

 

Examples

Defining A Household Income Range

  • Name: Change to Household Income. Note that as you type this, the ID will be updated
  • ID: range_household_income
  • Type: Change to Real
  • Lower: Change to 0.0
  • Upper: Change to 1000000.0 (i.e. one million)

Defining A Car Category Set

  • Name: Change to Car Category. Note that as you type this, the ID will be updated
  • ID: set_car_category
  • Type: Keep as String
  • Values: Enter the following values - one value per line, no commas: mpv, suv, hybrid, electric, other
  • Value labels: Enter the following values - one value per line, no commas: MPV, SUV, Hybrid, Electric, Other

Defining A Purchased Flag

  • Name: Change to Purchased. Note that as you type this, the ID will be updated
  • ID: flag_purchased
  • Type: Change to Integer
  • True value: Change to 1
  • True label: Change to Purchased
  • False value: Change to 0
  • False label: Change to Not purchased

Built-in Types

A number of common type definitions are built in to the Metadata Provider node.

Type ID Description Storage Measure
sys_int_range_0_10 Integer range (0 - 10) integer range
sys_int_range_1_10 Integer range (1 - 10) integer range
sys_int_range_1_12 Integer range (1 - 12) integer range
sys_int_range_0_23 Integer range (0 - 23) integer range
sys_int_range_0_59 Integer range (0 - 59) integer range
sys_int_range_0_100 Integer range (0 - 100) integer range
sys_int_range_1_100 Integer range (1 - 100) integer range
sys_real_range_0_1 Real range (0.0 - 1.0) real range
sys_real_range_0_10 Real range (0.0 - 10.0) real range
sys_real_range_1_10 Real range (1.0 - 10.0) real range
sys_real_range_0_100 Real range (0.0 - 100.0) real range
sys_real_range_1_100 Real range (1.0 - 100.0) real range
sys_set_hours_0_23 24 Hours: integer set (0 - 23) integer set
sys_set_hours_1_24 24 Hours: integer set (1 - 24) integer set
sys_set_time_0_59 Minutes/seconds: integer set (0 - 59) integer set
sys_set_time_1_60 Minutes/seconds: integer set (1 - 60) integer set
sys_set_days_0_6_mon Days of week: integer set (0=Monday - 6=Sunday) integer set
sys_set_days_1_7_mon Days of week: integer set (1=Monday - 7=Sunday) integer set
sys_set_days_0_6_sun Days of week: integer set (0=Sunday - 6=Saturday) integer set
sys_set_days_1_7_sun Days of week: integer set (1=Sunday - 7=Saturday) integer set
sys_set_month_day_1_31 Day in month: integer set (1 - 31) integer set
sys_set_month_1_12 Months: integer set (1=January - 12=December) integer set
sys_flag_1_0_tf True/False: integer flag (1=True, 0=False) integer flag
sys_flag_1_0_yn Yes/No: integer flag (1=Yes, 0=No) integer flag

Scripting

Settings

Node type name: metadata_provider

Setting Property Type Comment
Metadata mode metadata_mode provider or consumer -
Provider node metadata_provider String -
Field types field_types List of [field-name, type-id] See below
Local Types: Ranges fixed_ranges List of Range Definition See below
Local Types: Sets fixed_sets List of Set Definition See below
Local Types: Flags fixed_flags List of Flag Definition See below
- rebuild_metadata Boolean Allows a script to force a rebuild of the node's output data model

Type Definitions

All type definitions contain the following values:

  • ID (string): a unique ID for this type
  • Name (string): a display name
  • Last modified (string): a time stamp
  • Value type (string): one of integer or real for ranges; one of string, integer or real for sets and flags

The remaining values depend on which type is being defined.

Note: the values in the type definitions are always defined as strings, regardless of what the defined value type is. This is needed because IBM SPSS Modeler requires a fixed type definition.

Range Definition

A range definition consists of the following fields:

  • ID (string): a unique ID for this type
  • Name (string): a display name
  • Last modified (string): a time stamp (see Note On Last modified below)
  • Value type (string): one of integer or real
  • Lower bound (string): the lower bound as a string
  • Upper bound (string): the upper bound as a string

For example:

# Assumes the function 'now()' has already been defined
rangeDef = ["my_range", "My range", now(), "real", "-100.0", "100.0"]

Set Definition

A set definition consists of the following fields:

  • ID (string): a unique ID for this type
  • Name (string): a display name
  • Last modified (string): a time stamp (see Note On Last modified below)
  • Value type (string): one of string, integer or real
  • Values (list of string): the valid set values
  • Value labels (list of string): the set labels in the same order as the values

Note Unfortunately there appears to be an issue in IBM SPSS Modeler scripting which prevents value lists within structures being processed correctly. For example:

# Assumes the function 'now()' has already been defined
setDef = ["my_set", "My set", now(), "integer", ["0", "1", "2"], ["Low", "Medium", "High"]]

does not work as expected and results in a set type with no values or value labels. To work around this, it is necessary to convert the list definition into an IBM SPSS Modeler structured value, and ensure the list values are also converted to valid Java® values (i.e. Java list containing Java strings).

The following Python script snippet defines functions to do these tasks:

import java.lang.String
import java.util.ArrayList

def createStructure(node, propertyName):
typeDef = node.getStructuredPropertyDefinition(propertyName)
pf = modeler.script.session().getPropertyFactory()
return pf.createDefaultStructuredValue(typeDef)

def fillStructure(structure, values):
index = 0
for value in values:
structure = structure.changeAttributeValue(index, value)
index += 1
return structure

def toStringList(values):
jlist = java.util.ArrayList()
for value in values:
jlist.add(java.lang.String(value))
return jlist

# Convert the values and value labels to Java objects by calling
# "toStringList()" on them.
setDef = ["my_set", "My set", now(), "integer", toStringList(["0", "1", "2"]), toStringList(["Low", "Medium", "High"])]

# Now convert the definition into an SPSS Modeler structure.
# This requires access to a metadata node in order to get the
# structure definition e.g. if one already exists:
node = modeler.script.stream().findByType("metadata_provider", None)

# Need to specify which property to make sure the correct structure
# definition is created.
setTypeDef = createStructure(node, "fixed_sets")
setStructure = fillStructure(setTypeDef, setDef)

 

Flag Definition

A flag definition consists of the following fields:

  • ID (string): a unique ID for this type
  • Name (string): a display name
  • Last modified (string): a time stamp (see Note On Last modified below)
  • Value type (string): one of string, integer or real
  • True value (string): the value representing true
  • True label (string): the label for the true value
  • False value (string): the value representing false
  • False label (string): the label for the false value

For example:

# Assumes the function 'now()' has already been defined
flagDef = ["my_flag", "My flag", now(), "string", "y", "Yes", "n", "No"]

Note On Last modified

The Last modified value is represented as the number of seconds since Jan 1st 1970 displayed as a string. The requirement for the value to be stored as a string is because IBM SPSS Modeler does not support a long value type which would be required to represent the value accurately as a number.

The following Python script snippet defines a function called now() which generates a string in the correct format using the current time:

import time
def now():
return str(int(round(time.time() * 1000)))

# Call now() when create a type definition such as a flag:
flagDef = ["my_flag", "My flag", now(), "string", "y", "Yes", "n", "No"]

 

Scripting Example

import time
import java.lang.String
import java.util.ArrayList

def now():
return str(int(round(time.time() * 1000)))

def createStructure(node, propertyName):
typeDef = node.getStructuredPropertyDefinition(propertyName)
pf = modeler.script.session().getPropertyFactory()
return pf.createDefaultStructuredValue(typeDef)

def fillStructure(structure, values):
index = 0
for value in values:
structure = structure.changeAttributeValue(index, value)
index += 1
return structure

def toStringList(values):
jlist = java.util.ArrayList()
for value in values:
jlist.add(java.lang.String(value))
return jlist

main_node = modeler.script.stream().createAt("metadata_provider", u"Metadata Provider", 192, 192)

# Note that values are specified as strings even though the storage is real
rangeDef = ["temperature_range", "Temperature Range", now(), "real", "-20.0", "50.0"]
# Note that values are specified as strings even though the storage is integer
setDef = ["question_response", "User response", now(), "integer", \
toStringList(["0", "1", "2", "3", "4"]), \
toStringList(["Strongly disagree", "Disagree", "Neutral", "Agree", "Strongly agree"])]
flagDef = ["yn_indicator", "Y/N indicator", now(), "string", "y", "Yes", "n", "No"]

# Remember to convert the basic set definition into a structure
setStructure = fillStructure(createStructure(main_node, "fixed_sets"), setDef)

main_node.setPropertyValue("fixed_flags", [flagDef])
main_node.setPropertyValue("fixed_sets", [setStructure])
main_node.setPropertyValue("fixed_ranges", [rangeDef])

# Finally map input fields to the types
main_node.setPropertyValue("field_types", [ \
["external_temp", "temperature_range"], \
["internal_temp", "temperature_range"], \
["question1", "question_response"], \
["question2", "question_response"], \
["question3", "question_response"], \
["would_buy_now", "yn_indicator"], \
["would_buy_later", "yn_indicator"] \
])

# Now create a consumer that can re-use the metadata definitions from the main metadata node
consumer_node = modeler.script.stream().createAt("metadata_provider", u"Metadata Consumer", 192, 384)
consumer_node.setPropertyValue("metadata_provider", main_node.getID())
consumer_node.setPropertyValue("metadata_mode", "consumer")

# Need to map the fields at this node to the consumed types
consumer_node.setPropertyValue("field_types", [ \
["indoor_temp", "temperature_range"], \
["outdoor_temp", "temperature_range"], \
["q1", "question_response"], \
["q2", "question_response"], \
["q3", "question_response"], \
])

Trademarks

IBM and SPSS are registered trademarks of International Business Machines Corp. Java is a registered trademark of Oracle Corp. and/or its affiliates.