Capture from PostgreSQL using Logical Replication - DSS 6 | Data Source Solutions Documentation

Documentation: Capture from PostgreSQL using Logical Replication - DSS 6 | Data Source Solutions Documentation

Capture from PostgreSQL Using Logical Replication

This section describes the configuration requirements for capturing changes from PostgreSQL, including managed PostgreSQL services, using logical replication (Capture_Method=SQL). In this capture method, Data Source Solutions DSS reads transaction log records using a special SQL function.

Since DSS 6.1.5/5, this capture method has been renamed as Logical Replication in the DSS UI. Previously, it was known as SQL Fetch from Replication Slot.

Replication Slots

The Logical Replication capture method uses PostgreSQL replication slots. The names for these slots have to be unique for an entire PostgreSQL cluster.

Default Replication Slot Naming

DSS uses the following naming convention for these replication slots: dss_hub_**channel_**location. For example: dss_myhub_mychn_myloc.

This should allow multi capture in most situations. This includes multiple DSS capture jobs and coexistence with other replication products.

Custom Replication Slot Configuration

**Since** v6.2.5/1

Custom replication slot names can be configured using the location property Replication_Slot_Name. If this property is used, the specified custom name will be applied to all channels associated with the location.

Limitations

Using a custom replication slot name may prevent the use of the same location in multiple channels. This is because the custom name applies globally to all channels associated with the location. PostgreSQL requires each replication slot to have a unique name within the cluster. If multiple channels attempt to use the same location, they would all try to use the same custom slot name, resulting in a conflict.

Managing Replication Slots

PostgreSQL will not remove transaction log files for which changes exist that have not been processed by a replication slot. For this reason, replication slots have to be removed when a channel is no longer needed. This can be done manually or by running Deactivate Replication (dssactivate with option <b>-d</b>).

  • To retrieve existing replication slots, execute the following:

    SELECT <em>slot_name</em> FROM pg_replication_slots;
    
  • To manually remove a specific replication slot, execute the following:

    SELECT pg_drop_replication_slot('<em>slot_name</em>');
    

    For example:

    SELECT pg_drop_replication_slot('dss_myhub_mychn_myloc');
    

Generic PostgreSQL Configuration

This section describes the configuration requirements for capturing changes from PostgreSQL (on-premise) using Logical Replication capture method (Capture_Method=SQL):

  1. PostgreSQL configuration file postgresql.conf should have the following settings:

    • wal_level = logical

      SHOW wal_level;
      
      ALTER SYSTEM SET wal_level = logical; -- server restart needed
      
    • max_replication_slots = number_of_slots

      SHOW max_replication_slots;
      
      ALTER SYSTEM SET max_replication_slots = <em>number_of_slots</em>; -- server restart needed
      

      number_of_slots should be set to at least the number of channels multiplied by the number of capture locations in this PostgreSQL installation.

  2. The DSS database User should either be superuser or have replication permission:

    ALTER USER <em>username</em> REPLICATION;
    
  3. The logical replication plugins test_decoding or pgoutput (supported since DSS version 6.2.5/0) should be used, and the DSS database user should have permission to use them.

    Since 6.2.5/0, DSS supports the pgoutput plugin, and it is the default plugin. For versions prior to 6.2.5/0, the default plugin is test_decoding.

    To use the pgoutput plugin, a PostgreSQL publication is required. Publications are available in PostgreSQL version 10 and later. Publication names must be unique within the entire PostgreSQL cluster.

    A publication represents a group of source data tables. DSS will manage the publication on source location to ensure accurate replication.

    DSS uses the following naming convention for publications: dss_hub_**channel_**location. For example: dss_myhub_mychn_myloc.

    To create a new publication in the source PostgreSQL database, while activating the replication, in the DSS UI, ensure to select the Publication option under Replication Component in Activate Replication dialog (or run dssactivate with option <b>-ou</b>).

When using the Logical Replication capture method:

  • PostgreSQL versions before 9.4.12 should be avoided due to a PostgreSQL bug (detected in 9.4.6) which affects this log read method.
  • Capture rewind in Activate Replication (dssactivate with option <b>-i</b>) is not supported.

Amazon RDS and Aurora PostgreSQL Configuration

DSS supports capturing changes from Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL using logical replication (Capture_Method=SQL).

To get the required settings and permissions the Parameter Group assigned to the Instance should have rds.logical_replication=1. Changing this needs to be followed by a restart of PostgreSQL.

Recapturing Using Session Names

**Since** v6.2.5/2

When configured with the pgoutput plugin, DSS allows recapturing changes in PostgreSQL using session names. For this, DSS uses PostgreSQL’s replication management functions (pg_replication_origin).

To use this functionality, the DSS database user must be a superuser or have permission to execute the pg_replication_origin functions. If the DSS database user is not a superuser, you must grant the following permissions:

GRANT EXECUTE ON FUNCTION pg_replication_origin_create(text) TO <em>username</em>;
GRANT EXECUTE ON FUNCTION pg_replication_origin_drop(text) TO <em>username</em>;
GRANT EXECUTE ON FUNCTION pg_replication_origin_session_setup(text) TO <em>username</em>;

GRANT EXECUTE ON FUNCTION pg_replication_origin_session_is_setup() TO <em>username</em>;   -- optional
GRANT EXECUTE ON FUNCTION pg_replication_origin_session_reset() TO <em>username</em>;         -- optional

For configuration examples using session names, see Managing Recapturing Using Session Names.