Azure Data Lake Storage as Target - DSS 6 | Data Source Solutions Documentation

Documentation: Azure Data Lake Storage as Target - DSS 6 | Data Source Solutions Documentation

Azure Data Lake Storage as Target

Data Source Solutions DSS supports integrating changes into Azure Data Lake Storage (DLS) location. This section describes the configuration requirements for integrating changes using Integrate and Refresh into ADLS location.

Due to technical limitations, Azure Data Lake Storage is not supported in the DSS releases since 6.1.5/3 to 6.1.5/9.

Customize Integrate

Defining action Integrate is sufficient for integrating changes into an ADLS location. However, the default file format written into a target file location is DSS's own XML format and the changes captured from multiple tables are integrated as files into one directory. The integrated files are named using the integrate timestamp.

You may define other actions for customizing the default behavior of integration mentioned above. Following are few examples that can be used for customizing integration into the ADLS location:

Group Table Action Annotation
ADLS * FileFormat

This action may be defined to:

ADLS * Integrate

To segregate and name the files integrated into the target location, define parameter RenameExpression.

For example, if RenameExpression={dss_tbl_name}/{dss_integ_tstamp}.csv is defined, then for each table in the source, a separate folder (with the same name as the table name) is created in the target location, and the files replicated for each table are saved into these folders. This also enforces unique name for the files by naming them with a timestamp of the moment when the file was integrated into the target location.
ADLS * ColumnProperties

This action defines properties for a column being replicated. This action may be defined to:

  • integrate the delete operation. By default, for file-based target locations, DSS does not replicate the delete operation performed at the source location. So to integrate the delete operation, an extra column for timekey needs to be added in the target location. For this, action ColumnProperties may be defined with the following parameters:
    • Name: This parameter defines the name for the extra column in the target location.
    • Extra: This parameter defines that this is an extra column in the target location (a column which is not present in the source location).
    • IntegrateExpression: This parameter defines the expression to be used for generating the timekey value. For example, {dss_integ_seq} can be used here. This is a 36 byte string value (hex characters) which is unique and continuously increasing for a specific source location.
    • TimeKey: This parameter defines that this is a timekey column.
    • Datatype=varchar: This parameter defines the data type for the extra column.
    • Length=36: This parameter defines the data type length for the extra column.
  • add the source operation type (using dss_op) information in the target location. This action definition is required for performing Compare if action ColumnProperties with parameter TimeKey is defined on a target file location. For this, action ColumnProperties may be defined with the following parameters:
    • Name: This parameter defines the name for the extra column in the target location.
    • Extra: This parameter defines that this is an extra column in the target location (a column which is not present in the source location).
    • IntegrateExpression={dss_op}: This parameter defines the expression to be used for generating the information about source operation type.
    • Datatype=integer: This parameter defines the data type for this extra column.

State Directory

{% partial file="dss6/requirements/source-and-target-requirements/state-directory.template.md" /%}

Intermediate Directory

{% partial file="dss6/requirements/source-and-target-requirements/intermediate-directory.template.md" /%}

Intermediate Directory is Local

{% partial file="dss6/requirements/source-and-target-requirements/intermediate-directory-local-files.template.md" /%}

Integrate Limitations

By default, for file-based target locations, DSS does not replicate the <b>delete</b> operation performed at the source location.