ColumnProperties - DSS 6 | Data Source Solutions Documentation

Documentation: ColumnProperties - DSS 6 | Data Source Solutions Documentation

ColumnProperties

Action ColumnProperties defines properties of a column. This column is matched either by using parameter Name or DataType. The action itself has no effect other than the effect of the other parameters used. This affects both replication (on the Capture and Integrate side), Refresh, and Compare.


Parameters

This section describes the parameters available for action ColumnProperties.

Following are the two tabs/ways, which you can use for defining action parameters in this dialog:

  • Regular: Allows you to define the required parameters by using the UI elements like checkbox and text field.
  • Text: Allows you to define the required parameters by specifying them in the text field. You can also copy-paste the action definitions from Data Source Solutions DSS documentation, emails, or demo notes.

SC-Dss-Action-ColumnProperties.webp

Name

Argument: col_name

Description: Match a column by name. This is the name of the column in the DSS_COLUMN repository table.

The col_name should not be the same as the substitution defined in CaptureExpression and IntegrateExpression. DSS will not populate values for the column if col_name and substitution (sql_expr) are the same. For example, when IntegrateExpression=dss_op is used then in that action definition Name=dss_op should not be used.


DatatypeMatch

Argument: datatypematch

Description: Match column by data type (instead of Name).

Value datatypematch can either be

  • a single data type name (such as number) or

  • a data type name with conditions, specified in the format: datatype[condition].

    The format for condition is attribute operator value, where:

    • attribute can be prec, scale, bytelen, charlen, encoding, or null
    • operator can be =, <>, !=, <, >, <=, or >=
    • value is an integer or a single-quoted string

    Multiple conditions can be supplied, separated by &&.

Examples

DatatypeMatch="number"
DatatypeMatch="number[prec>=19]"
DatatypeMatch="varchar[bytelen>200]"
DatatypeMatch="varchar[encoding='UTF-8' && null='true']"
DatatypeMatch="number[prec=0 && scale=0]" - matches Oracle numbers without any explicit precision or scale.

This parameter can be used to associate a ColumnProperties action with all columns that match the specified data type and the optional attribute conditions.


BaseName

Argument: tbl_name

Description: Defines the actual name of the column in the database location, as opposed to the column name that DSS has in the channel.

This parameter is needed if the base name of the column is different in the capture and integrate locations. In that case, the column name in the DSS channel should have the same name as the 'base name' in the capture database and parameter BaseName should be defined on the integrate side. An alternative is to define the BaseName parameter on the capture database and have the name for the column in the DSS channel the same as the base name in the integrate database.

The concept of the base name in a location as opposed to the name in the DSS channel applies to both columns and tables, see parameter BaseName in action TableProperties.

This parameter can also be defined for file locations (to change the name of the column in the XML tag) or for Salesforce locations (to match the Salesforce API name).

This parameter cannot be used together with Extra and Absent.


Extra

Description: Column exists in the database but not in the DSS_COLUMN repository table. If a column has parameter Extra defined then its value is not captured and not read during Refresh or Compare. If the value is omitted then the appropriate default value is used (null, zero, empty string, etc.).

  • This parameter requires parameter Datatype or SoftDelete.

  • This parameter cannot be used on columns that are part of the replication key. Also, it cannot be defined in a given database on the same column, nor can either be combined on a column with parameter BaseName.

  • This parameter cannot be used together with parameters BaseName and Absent.


Absent

Description: Column does not exist in the database table. If no value is supplied with parameter CaptureExpression then an appropriate default value is used (null, zero, empty string, etc.). When replicating between two tables with a column that is in one table but is not in the other there are two options: either register the table in the DSS repository tables with all columns and add parameter Absent; or register the table without the extra column and add parameter Extra. The first option may be slightly faster because the column value is not sent over the network.

  • This parameter cannot be used on columns that are part of the replication key. Also, it cannot be defined in a given database on the same column, nor can either be combined on a column with parameter BaseName.

  • This parameter cannot be used together with parameters BaseName and Absent.


CaptureExpression

Argument: sql_expr

Description: SQL expression for column value when capturing changes or reading rows. This value may be a constant value or an SQL expression. This parameter can be used to 'map' values data values between a source and a target table. An alternative way to map values is to define an SQL expression on the target side using parameter IntegrateExpression. Possible SQL expressions include null, 5 or 'hello'. For many databases (e.g., Oracle and SQL Server) a subselect can be supplied, for example SELECT descrip FROM lookup WHERE id = {id};.

Expand to see the possible substitutions for the SQL expression

The SQL expression sql_expr can contain the following substitutions:

  • {colname [spec]} is replaced/substituted with the value of current table's column colname. If the target column has a character-based data type or if parameter Datatype=character_data_type is defined, the default format is %[localtime] %Y-%m-%d %H:%M:%S. The default format can be overridden using the timestamp substitution format specifier spec. For more information, see Timestamp Substitution Format Specifier.

  • {colname %[allow_missings]} if the value of column colname is missing, instead of an error, the {%[allow_missings]} specifier causes DSS to replace the value with a default value (0 or an empty string). The {%[allow_missings]} must be the first specifier, if there are more than one (e.g., {%[allow_missings] %[localtime] %H%M%S}).

  • {dss_cap_loc} is replaced with the name of the source location where the change occurred.

  • {dss_cap_tstamp [spec]} is replaced with the moment (time) that the change occurred in the source location. If the target column has a character-based data type or if parameter Datatype=character_data_type then the default format is %[localtime] %Y-%m-%d %H:%M:%S, but this can be overridden using the timestamp substitution format specifier spec. For more information, see Timestamp Substitution Format Specifier.

  • {dss_cap_user} is replaced with the name of the user who made the change.

  • {{dss_col_name [spec]}} is replaced with the value of the current column. If the target column has a character-based data type or if parameter Datatype=character_data_type is defined, the default format is %[localtime] %Y-%m-%d %H:%M:%S. The default format can be overridden using the timestamp substitution format specifier spec. For more information, see Timestamp Substitution Format Specifier.

  • {dss_slice_num} is replaced with the current slice number if slicing is defined with Count (option -S num) in dssrefresh or Compare.

  • {dss_slice_total} is replaced with the total number of slices if slicing is defined with Count (option -S num) in dssrefresh or Compare.

  • {dss_slice_value} is replaced with the current slice value if slicing is defined with Series (option -S val1[;val2]...) in dssrefresh or Compare.

  • {dss_var_xxx} is replaced with value of 'context variable' xxx. The value of a context variable can be supplied using option –Vxxx=val in dssrefresh or Compare.

  • It is recommended to define parameter Context when using the substitutions {dss_slice_num}, {dss_slice_total}, {dss_slice_value}, or {dss_var_xxx} so that it can be easily disabled or enabled.

  • {dss_slice_num}, {dss_slice_total}, {dss_slice_value} cannot be used if the one of the old slicing substitutions {dss_var_slice_condition}, {dss_var_slice_num}, {dss_var_slice_total}, or {dss_var_slice_value} is defined in the channel/table involved in the compare/refresh.


For more information on how to substitute column values into SQL expressions, see the Substituting Column Values Into Expressions section below.


CaptureExpressionType

Argument: expr_type

Description: Type of mechanism used by Capture, Refresh, and Compare job to evaluate the value in parameter CaptureExpression.

Expand to see the options available for this parameter

Available options for expr_type are:

  • SQL_PER_CYCLE (defaultfor database locations if the capture expression matches a pattern in file $DSS_HOME/etc/constsqlexpr.pat): The capture job only evaluates the expression once per replication cycle, so every row captured by that cycle will get the same value. It requires fewer database 'round-trips' than SQL_PER_ROW and SQL_WHERE_ROW. For Refresh and Compare jobs the expression is just included in the main SELECT statement, so no extra database round-trips are used and the database could assign each row a different value.

This type is not supported for file locations.

  • SQL_PER_ROW (defaultfor database locations if the capture expression does not match a pattern in file $DSS_HOME/etc/constsqlexpr.pat): The capture job evaluates the expression for each change captured. This means every row captured by that cycle could get a different value but requires more database 'round-trips' than SQL_PER_CYCLE. For Refresh and Compare jobs the expression is just included in the main SELECT statement, so no extra database round-trips are used and the database could assign each row a different value.

This type is not supported for file locations.

  • SQL_WHERE_ROW: The capture job evaluates the expression for each change captured but with an extra WHERE clause containing the key value for the table on which the change occurred. This allows that expression to include expressions like {colx} which reference other columns of that table. Each row captured could get a different value but requires more database 'round-trips' than SQL_PER_CYCLE. For Refresh and Compare jobs the expression is just included in the main SELECT statement (without the extra WHERE clause), so no extra database round-trips are used and the database could assign each row a different value.

This type is not supported for file locations.


IntegrateExpression

Argument: sql_expr

Description: SQL expression for column value when integrating changes or loading data into a target table. DSS may evaluate itself or use it as an SQL expression. This parameter can be used to 'map' values between a source and a target table. An alternative way to map values is to define an SQL expression on the source side using CaptureExpression. Possible expressions include null, 5, or 'hello'. For many databases (e.g., Oracle and SQL Server), a subselect can be supplied, for example SELECT descrip FROM lookup WHERE id = {id};.

When the integrate method is set to APPEND, the IntegrateExpression that must be delegated to the database for evaluation cannot be used.

Expand to see the possible substitutions for the SQL expression

The SQL expression sql_expr can contain the following substitutions:

This substitution is recommended when action ColumnProperties is defined with parameter TimeKey and the channel has a single source location.

This substitution only has a value during Integrate or if Select Moment - Specific (option -Mtime) is specified during Refresh.

  • It is recommended to define parameter Context when using the substitutions {dss_slice_num}, {dss_slice_total}, {dss_slice_value}, or {dss_var_xxx} so that it can be easily disabled or enabled.

  • {dss_slice_num}, {dss_slice_total}, {dss_slice_value} cannot be used if the one of the old slicing substitutions {dss_var_slice_condition}, {dss_var_slice_num}, {dss_var_slice_total}, or {dss_var_slice_value} is defined in the channel/table involved in the compare/refresh.


For more information on how to substitute column values into SQL expressions, see the Substituting Column Values Into Expressions section below.


ExpressionScope

Argument: expr_scope

Description: Scope for which operations (e.g., INSERT or DELETE) an integrate expression (parameter IntegrateExpression) should be used.

Available options for expr_scope are:

Expand for more information

This parameter can be used only when action Integrate is defined with parameter Burst. This parameter is ignored for database targets if parameter Burst is not defined and for file targets (such as HDFS or S3). This burst restriction means that no scopes exist yet or for 'update before' operations (such as UPDATE_BEFORE_KEY and UPDATE_BEFORE_NONKEY). Only Bulk Refresh obeys this parameter (it always uses scope INSERT); Row-wise Refresh ignores the expression scope. This value of the affected IntegrateExpression parameter can contain its regular substitutions except for {dss_op} which cannot be used.

Example 1: To add a column opcode to a target table (defined with parameter SoftDelete) containing values 'I', 'U', and 'D' (for insert, update, and delete respectively), define these actions:

ActionParameters
ColumnPropertiesName=opcode IntegrateExpression="'I'" ExpressionScope=INSERT Datatype=varchar Length=1 Nullable Extra
ColumnPropertiesName=opcode IntegrateExpression="'U'" ExpressionScope=UPDATE Datatype=varchar Length=1 Nullable Extra
ColumnPropertiesName=opcode IntegrateExpression="'D'" ExpressionScope=DELETE Datatype=varchar Length=1 Nullable Extra

Example 2: To add a column insdate (only filled when a row is inserted) and column upddate (filled on update and SoftDelete), define these actions:

ActionParameters
ColumnPropertiesName=insdate IntegrateExpression=sysdate ExpressionScope=INSERT Datatype=timestamp Extra
ColumnPropertiesName=upddate IntegrateExpression=sysdate ExpressionScope=DELETE Datatype=timestamp Extra

CaptureFromRowId

Description: Capture values from the table's DBMS row-id (Oracle, HANA) or Relative Record Number (RRN in Db2 for i). Define on the capture location.

This parameter is supported only for certain location classes. For the list of supported location classes, see Log-based capture from hidden rowid/RRN column in Capabilities.

This parameter is not supported for Oracle's Index Organized Tables (IOT).


TrimDatatype

Argument: int

Description: Reduce the width of data type when selecting or capturing changes. This parameter affects string data types (such as varchar, nvachar, and clob) and binary data types (such as raw and blob). The value int is a limit in bytes; if this value is exceeded then the column's value is truncated (from the right) and a warning is written.

For example, if action ColumnProperties is defined with the following parameters DatatypeMatch=clob, TrimDatatype=10, Datatype=varchar, Length=30, it will replicate all columns with data type clob into a target table as strings. Note that parameter Datatype and Length ensures that Refresh will create target tables with the smaller data type. Its length is smaller because Length parameter is used.

This parameter is supported only for certain location classes. For the list of supported location classes, see Reduce width of datatype when selecting or capturing changes in Capabilities.


Key

Description: Add column to table's replication key.


SurrogateKey

Description: Use column instead of the regular key during replication. Define on the capture and integrate locations.

Specify in combination with parameter CaptureFromRowId to capture from HANA or from Oracle tables to reduce supplemental logging requirements.

Integrating with SurrogateKey is impossible if the SurrogateKey column is captured from a CaptureFromRowId that is reusable (Oracle).

For Oracle, to use ROWID as a surrogate key, it must remain unchanged over time. Any change to ROWID can result in replication errors. Avoid using ROWID as a surrogate key if any operation might change it.

Oracle documentation states that ROWID is not a permanent identifier and may change under certain conditions, including, but not limited to:


DistributionKey

Description: Distribution key column. The distribution key is used for parallelizing changes within a table. It also controls the DISTRIBUTED BY clause for a CREATE TABLE in distributed databases such as Teradata, Redshift, and Greenplum.


SoftDelete

Description: Convert DELETE operations in the source into UPDATE in the target.

Defining this parameter avoids the actual deletion of rows in the target. Instead, an extra column is added to indicate whether a row was deleted in the source. The initial value in this column is 0, indicating the row is not deleted. The value of this column is updated to 1 when a row is deleted in the source.

In each integrate cycle the changes are coalesced and optimized away. For example, if an INSERT and DELETE operation performed on a row happens in the same integrate cycle, these changes will be coalesced and optimized. As a result, a soft deleted row (with value 1) will not be added to the target. If the same changes happen in two separate integrate cycles, in the first cycle a row will be inserted in the target and in the second cycle, the row will be marked as deleted (value 1) in the target.


TimeKey

Description: Convert all changes (INSERT, UPDATE, DELETE in the source location) into INSERT in the target location.

Defining this parameter affects how all changes are delivered into the target table. This parameter is often used with parameter IntegrateExpression={dss_integ_seq}, which will populate a value.

DSS uses the concept of TimeKey to indicate storing history. TimeKey is defined with an extra column on the target for every table uniquely storing the sequence in which changes came into the channel. Action ColumnProperties with parameter IntegrateExpression={dss_integ_seq} uniquely defines the order in which the changes were applied in the source location.

When using this parameter, the integration process unconditionally appends every operation to the base table. This means that changes made during a Refresh might be captured and appended again, leading to apparent duplicate entries. Although this parameter adds an extra key column to prevent technical duplicates, you may still see two inserts for the same replication key due to Refresh overlaps.

If DSS is configured to replicate only some columns in a table, and an UPDATE affects only the non-replicated columns, the TimeKey still reflects the change.

For Kafka and File locations, this parameter must be defined to replicate DELETE operations.

If the parameter Resilient in action Integrate is already defined in the channel, defining TimeKey will make Resilient ineffective.


IgnoreDuringCompare

Description: Ignore values in this column during Refresh and Compare. Also during integration, this parameter means that this column is overwritten by every update statement, rather than only when the captured update changed this column.

This parameter is ignored during row-wise compare/refresh if it is defined on a key column.


Datatype

Argument: data_type

Description: Data type in the database if this differs from the value defined in the DSS_COLUMN repository table.


Length

Argument: attr_val

Description: String length in the database if this differs from the value defined in the DSS_COLUMN repository table.

When used together with parameter Name or DatatypeMatch parameters, keywords bytelen and charlen can be used and will be replaced by respective values of the matched column. Additionally, basic arithmetic operators (+,-,*,/) can be used with bytelen and charlen. For example, if Length="bytelen/3" is defined, it will be replaced with the byte length of the matched column divided by 3.

This parameter requires parameter Datatype.


Precision

Argument: attr_val

Description: Integer precision in the database if this differs from the value defined in the DSS_COLUMN repository table.

When used together with Name or DatatypeMatch parameters, keywords prec can be used and will be replaced by respective values of the matched column. Additionally, basic arithmetic operators (+,-,*,/) can be used with prec. For example, if Precision="prec+5" is defined, it will be replaced with the precision of the matched column plus 5.

This parameter requires parameter Datatype.


Scale

Argument: attr_val

Description: Integer scale in the database if this differs from the value defined in the DSS_COLUMN repository table.

When used together with Name or DatatypeMatch parameters, keyword scale can be used and will be replaced by respective values of the matched column. Additionally, basic arithmetic operators (+,-,*,/) can be used with scale. For example, if Scale="scale*2" is defined, it will be replaced with the scale of the matched column times 2.

This parameter requires parameter Datatype.


Nullable

Description: Nullability in the database if this differs from the value defined in the DSS_COLUMN repository table.

This parameter requires parameter Datatype.


Context

Argument: context

Description: Action ColumnProperties is effective/applied only if the context matches the context defined in Compare or Refresh. For more information about using Context, see our concept page Refresh or Compare context.

The value should be a context name, specified as a lowercase identifier. It can also have form !context, which means that the action is effective unless the matching context is enabled for Compare or Refresh..

One or more contexts can be enabled for Compare and Refresh.

Defining an action that is only effective when a context is enabled can have different uses. For example, if action ColumnProperties is defined with parameters IgnoreDuringCompare, Context=qqq , then normally all data will be compared, but if context qqq is enabled (-Cqqq), then the values in one column will be ignored.


Columns Which Are Not Enrolled In Channel

Normally all columns in the location's table (the 'base table') are enrolled in the channel definition. But if there are extra columns in the base table (either in the capture or the integrate database) which are not mentioned in the table's column information of the channel, then these can be handled in two ways:


Substituting Column Values Into Expressions

DSS has different actions that allow column values to be used in SQL expressions, either to map column names or to do SQL restrictions. Column values can be used in these expressions by enclosing the column name in braces, for example, a restriction "{price} > 1000" means only rows where the value in price column is higher than 1000.

In some cases, it may be unclear which column names should be used in the braces. Consider the following scenario:

Suppose you are replicating a source base table with three columns (A, B, C) to a target base table with just two columns named (E, F). These columns will be mapped together using action ColumnProperties defined with parameter CaptureExpression or IntegrateExpression.

Theoretically, mapping expressions could be put on both the source and target, in which case the columns enrolled in the channel could be different from both, (e.g., F, G, H), but this is unlikely. But when an expression is being defined for this table, should the source column names be used for the brace substitution (e.g., {A} or {B})? Or should the target parameter be used (e.g., {D} or {E})? The answer is that this depends on which parameter is being used and it depends on whether the SQL expression is being put on the source or the target side.

For parameters IntegrateExpression and IntegrateCondition (in action Restrict), the SQL expressions can only contain {} substitutions with the column names as they are enrolled in the channel definition (the "DSS Column names"), not the "base table's" column names (e.g., the list of column names in the target or source base table). So in the example above substitutions {A}, {B}, and {C} could be used if the table was enrolled with the columns of the source and with mappings on the target side, whereas substitutions {E} and {F} are available if the table was enrolled with the target columns and had mappings on the source.

But for parameters CaptureExpression, CaptureCondition (in action Restrict), and RefreshCondition (in action Restrict) the opposite applies: these expressions must use the "base table's" column names, not the "DSS column names". So in the example these parameters could use {A}, {B}, and {C} as substitutions in expressions on the source side, but substitutions {E} and {F} in expressions on the target.


Timestamp Substitution Format Specifier

{% partial file="dss6/action-reference/timestamp-format-specifier.template.md" /%}

set