Google BigQuery as Target - DSS 6 | Data Source Solutions Documentation

Documentation: Google BigQuery as Target - DSS 6 | Data Source Solutions Documentation

Google BigQuery as Target

Data Source Solutions DSS supports integrating changes into Google BigQuery location. This section describes the configuration requirements for integrating changes using Integrate and Refresh into BigQuery location. For the list of supported BigQuery versions into which DSS can integrate changes, see Integrate changes into location in Capabilities.

For integrating changes using Integrate and Refresh into BigQuery location, you must use staging files. For more information about staging files, see our Staging for BigQuery section.

The preferred methods for writing data into BigQuery are using Burst Integrate and Bulk Refresh as they provide better performance.

  • Row-wise Refresh is not supported for replication to BigQuery.

  • Continuous Integrate is not recommended for replication to BigQuery. The default and highly efficient method is Burst Integrate, as BigQuery is optimized for batch processing. Applying changes one-by-one (Continuous) is significantly less efficient.

We strongly advise against using Continuous Integrate. Not only is it inefficient, but you may also encounter issues with replication. We recommend utilizing Burst Integrate for optimal performance and reliability when replicating to BigQuery.

Grants for Integrate and Refresh

This section lists the grants/permissions required for integrating changes into Google BigQuery.

  • The DSS database user must be granted the following three roles, which include the required permissions:
    • BigQuery Data Editor
      Required for granting the following permissions: bigquery.datasets.get, bigquery.routines.get, bigquery.routines.list, bigquery.tables.create, bigquery.tables.delete, bigquery.tables.get, bigquery.tables.getData, bigquery.tables.list, bigquery.tables.updateData

    • BigQuery Job User
      Required for granting the following permission: bigquery.jobs.create

    • Storage Admin
      Required for granting the following permissions: storage.buckets.get, storage.objects.create, storage.objects.delete, storage.objects.get, storage.objects.list

Multi-statement Transactions

By default, DSS applies changes in BigQuery using auto-commit. The limitation of using auto-commit is that the DSS cannot properly recover if Integrate exits during an integrate cycle, which can result in creating duplicates in the target. To overcome this limitation, you can instead use multi-statement transactions.

To enable multi-statement transactions, you must define the following environment variable:

ActionParameters
EnvironmentName=DSS_BIGQUERY_ENABLE_SESSIONS
Value=1

When multi-statement transactions are enabled, replication fails if the channel contains more than 99 tables since BigQuery does not support more than 100 tables in the same transaction.

Intermediate Directory

{% partial file="dss6/requirements/source-and-target-requirements/intermediate-directory.template.md" /%}

Intermediate Directory is Local

{% partial file="dss6/requirements/source-and-target-requirements/intermediate-directory-local.template.md" /%}

Known Issue

Timestamp values with fractional seconds in years before 1698 or after 2242 are reported as mismatches (Out-of-Sync) by DSS Compare. This happens due to a rounding error during data querying. For example, a timestamp value such as 1697-12-31 23:59:59.123 results in a mismatch.