Location Connection for Google Cloud Storage - DSS 6 | Data Source Solutions Documentation

Documentation: Location Connection for Google Cloud Storage - DSS 6 | Data Source Solutions Documentation

Location Connection for Google Cloud Storage

This section lists and describes the connection parameters required for creating Google Cloud Storage (GCS) location in Data Source Solutions DSS. For information about the pre-requisites, access privileges, and other configuration requirements for creating GCS location, see Google Cloud Storage Requirements.

DSS uses GCS S3-compatible API (cURL library) to connect, read, and write data to Google Cloud Storage during Capture, Continuous Integrate, Bulk Refresh, and Compare (Direct File Compare).

If the managed secrets feature is enabled, option USE TOKEN INSTEAD is displayed in the fields designated for entering secrets.

Field Description Equivalent Location Property
SECURE CONNECTION

Protocol for connecting DSS to the Google Cloud Storage server.

  • Yes (HTTPS)default
  • No (HTTP)
File_Scheme
BUCKET Name or IP address of the Google Cloud Storage bucket. GS_Bucket
AUTHENTICATION METHOD

Authentication method for connecting DSS to the Google Cloud Storage server.

Available options are:

  • Environment variable GOOGLE_APPLICATION_CREDENTIALS: Authentication (OAuth) using the credentials (service account key) fetched from the environment variable GOOGLE_APPLICATION_CREDENTIALS. For more information about setting this environment variable, refer to the Google Cloud documentation.
  • OAuth 2.0 credentials file: Authentication (OAuth) using the credentials supplied in the service account key file (CREDENTIALS FILE). For more information about creating the service account key file, refer to the Google Cloud documentation.
  • HMAC: Authentication using the credentials supplied as hash-based message authentication code (HMAC) keys, which comprises an Access key (ACCESS KEY) and a Secret (SECRET KEY). For more information about HMAC keys, refer to the Google Cloud documentation.
GCloud_Authentication_Method
CREDENTIALS FILE

Directory path for the service account key file (JSON) used in OAuth 2.0 protocol based authentication.

This field is enabled only when the AUTHENTICATION METHOD is set to OAuth 2.0 credentials file.
GCloud_OAuth_File
ACCESS KEY

The HMAC access ID of the service account.

This field is enabled only when the AUTHENTICATION METHOD is set to Hash-based message authentication code (HMAC) keys.
GS_HMAC_Access_Key_Id
SECRET KEY

The HMAC secret of the service account.

This field is enabled only when the AUTHENTICATION METHOD is set to Hash-based message authentication code (HMAC) keys.
GS_HMAC_Secret_Access_Key
DIRECTORY Directory path in the Google Cloud Storage BUCKET where the files are replicated to or captured from. File_Path

Advanced Settings

Field Description Equivalent Location Property
Proxy Show/Hide the connection parameters for proxy server, which is used for connecting DSS to the Google Cloud Storage.
PROXY PROTOCOL

Protocol for the proxy server host used for connecting DSS to the Google Cloud Storage.

Available options are:

  • HTTP
File_Proxy_Scheme
PROXY HOST Host name of the proxy server used for connecting DSS to the Google Cloud Storage. File_Proxy_Host
PROXY PORT Port number for the proxy server host used for connecting DSS to the Google Cloud Storage. File_Proxy_Port
PROXY USER

Username for the proxy server host used for connecting DSS to the Google Cloud Storage.

File_Proxy_User
PROXY PASSWORD Password for the PROXY USER. File_Proxy_Password
Hive External Tables Enable/Disable Hive ODBC connection configuration for Hive external tables created above Google Cloud Storage.

Configuration for Hive External Tables

This section lists and describes the connection parameters required for connecting to Hive External Tables created above Google Cloud Storage.

DSS allows you to create Hive External Tables above Google Cloud Storage which are only used during compare. You can enable/disable the Hive configuration for Google Cloud Storage in location creation screen using the field Hive External Tables.

Field Description Equivalent Location Property
HIVE SERVER TYPE

Type of the Hive server.

Available options are:

  • Hive Server 1: DSS will connect to Hive Server 1 instance.
  • Hive Server 2default: DSS will connect to Hive Server 2 instance.
Hive_Server_Type
HOST(S) Hostname or IP-address of the server on which the database is running. Database_Host
PORT Port number on which the Hive server is expecting connections. Database_Port
DATABASE Name of the database. Database_Name
SERVICE DISCOVERY MODE

Mode for connecting DSS to Hive Server 2.

This field is enabled only if HIVE SERVER TYPE is set to 2.

Available options are:

  • Nonedefault: DSS connects to Hive Server 2 without using the ZooKeeper service.
  • ZooKeeper: DSS discovers Hive Server 2 services using the ZooKeeper service.
Hive_Service_Discovery_Mode
ZOOKEEPER NAMESPACE

Namespace on ZooKeeper under which Hive Server 2 nodes are added.

This field is enabled only if SERVICE DISCOVERY MODE is set to ZooKeeper.
Hive_Zookeeper_Namespace
AUTHENTICATION METHOD

Authentication method for connecting DSS to Hive Server 2.

This field is enabled only if HIVE SERVER TYPE is set to 2.

Available options are:

  • No Authenticationdefault
  • User Name
  • User Name and Password
  • Kerberos
  • Windows Azure HDInsight Service
Hive_Authentication_Method
USER

Username for connecting DSS to the database (defined in Database_Name).

This field is enabled only when the AUTHENTICATION METHOD is set to User Name or User Name and Password.
Database_User
PASSWORD Password for the USER. Database_Password
SERVICE

Kerberos service principal name of the Hive server. This is the service name part of Kerberos principal of the Hive server. For example, if the principal is hive/example.host@EXAMPLE.REALM then "hive" should be specified here.

This field is enabled only if AUTHENTICATION METHOD is set to Kerberos.
Hive_Kerberos_Service
HOST

Fully Qualified Domain Name (FQDN) of the Hive server host. This is the host part of Kerberos principal of the Hive server. For example, if the principal is "hive/example.host@EXAMPLE.REALM" then "example.host" should be specified here.

The value for this field may be set to _HOST to use the Hive server hostname as the domain name for Kerberos authentication.

If SERVICE DISCOVERY MODE is set to None, then the driver uses the value specified in the Host connection attribute.
If SERVICE DISCOVERY MODE is set to ZooKeeper, then the driver uses the Hive Server 2 host name returned by the ZooKeeper.

This field is enabled only if SERVICE DISCOVERY MODE is set to Kerberos.
Hive_Kerberos_Host
REALM

Realm of the Hive Server 2 host.

It is not required to specify any value in this field if the realm of the Hive Server 2 host is defined as the default realm in Kerberos configuration.

This field is enabled only if AUTHENTICATION METHOD is set to Kerberos.
Hive_Kerberos_Realm

THRIFT TRANSPORT


Transport protocol to use in the Thrift layer.

This field is enabled only if HIVE SERVER TYPE is set to Hive Server 2.

Available options are:

  • BINARY (This option can be used only if AUTHENTICATION METHOD is set to No Authentication or User Name and Password.)
  • SASL (This option can be used only if AUTHENTICATION METHOD is set to User Name or User Name and Password or Kerberos.)
  • HTTP (This option can be used only if AUTHENTICATION METHOD is set to No Authentication or User Name and Password or Kerberos or Windows Azure HDInsight Service.)

For information about determining which Thrift transport protocols your Hive server supports, refer to HiveServer2 Overview and Setting Up HiveServer2 sections in Hive documentation.

Hive_Thrift_Transport
HTTP PATH

The partial URL corresponding to the Hive server.

This field is required only if THRIFT TRANSPORT is set to HTTP.
Hive_HTTP_Path
Enable SSL Enable/disable (one way) SSL. If enabled, DSS authenticates the Hive server by validating the SSL certificate shared by the Hive server.
TRUSTED CA CERTIFICATE

Directory path where the .pem file containing the server's public SSL certificate signed by a trusted CA is located.

Defining this property will enable (one way) SSL, which means, DSS will authenticate the Hive server by validating the SSL certificate shared by the Hive server.

This property is also required for enabling two way SSL.
Database_Public_Certificate
Two-way SSL Enable/disable two way SSL. If enabled, both DSS and Hive server authenticate each other by validating each others SSL certificate. This field is enabled only if Enable SSL is selected.
CLIENT PUBLIC CERTIFICATE

Directory path where the .pem file containing the client's SSL public certificate is located.

This field is enabled only if Two-way SSL is selected.
Database_Client_Public_Certificate
CLIENT PRIVATE KEY

Directory path where the .pem file containing the client's SSL private key is located.

This field is enabled only if Two-way SSL is selected.
Database_Client_Private_Key
CLIENT PRIVATE KEY PASSWORD

Password of the client's SSL private key specified in CLIENT PRIVATE KEY.

This field is enabled only if Two-way SSL is selected.
Database_Client_Private_Key_Password

Hive Advanced Settings

Field Description Equivalent Location Property
LINUX / UNIX ODBC DRIVER MANAGER LIBRARY PATH

Directory path where the ODBC Driver Manager Library is installed. This field is applicable only for Linux/Unix operating system.

For a default installation, the ODBC Driver Manager Library is available at /usr/lib64 and does not need to be specified. However, when UnixODBC is installed in for example /opt/unixodbc the value for this field would be /opt/unixodbc/lib.
ODBC_DM_Lib_Path
LINUX / UNIX ODBCSYSINI

Directory path where the odbc.ini and odbcinst.ini files are located. This field is applicable only for Linux/Unix operating system.

For a default installation, these files are available at /etc directory and do not need to be specified using this field. However, when UnixODBC is installed in for example /opt/unixodbc the value for this field would be /opt/unixodbc/etc.
ODBC_Sysini
ODBC Driver Name of the user defined (installed) ODBC driver used for connecting DSS to the Hive server. ODBC_Driver