Presto ingestion using ibm-lh utility through CLI
You can ingest data from an S3 or local location into IBM® watsonx.data using the ibm-lh utility. The following options are supported:
- Command line option
- Configuration file option
This topic provides details about the parameters supported in ibm-lh utility for ingestion using Presto engine. For detailed instructions to ingest, see:
- Ingesting data through command line - Presto ingestion mode
- Ingesting data through config file - Presto ingestion mode
Different options and variables that are supported in a command line and configuration file are listed as follows:
-
Command line option
Command line options and variables Parameter Description Declaration create-if-not-exist Create target table if it does not exist. --create-if-not-exist
dbpassword Database password that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. --dbpassword <DBPASSWORD>
dbuser Database username that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. --dbuser <DBUSER>
ingest-config Configuration file for data migration --ingest-config <INGEST_CONFIGFILE>
ingestion-engine-endpoint Endpoint of ingestion engine. hostname= <hostname>
, port=<port>
. This is a mandatory parameter to run an ingestion job.--ingestion-engine-endpoint <INGESTION_ENGINE_ENDPOINT>
log-directory This option is used to specify the location of log files. See Log directory. --ingest-config <ingest_config_file> --log-directory <directory_path>
schema Schema file that includes CSV specifications, and more. See Schema file specifications. --schema </path/to/schemaconfig/file>
source-data-files Data files or folders for data migration. File name ending with /
is considered a folder. Single or multiple files can be used. This is a mandatory parameter to run an ingestion job. File names are case sensitive. Example:<file1_path>,<file2_path>,<folder1_path>
--source-data-files <SOURCE_DATA_FILE>
staging-location Location where CSV files and in some circumstances Parquet files are staged, see Staging location. This is a mandatory parameter to run an ingestion job. --staging-location <STAGING_LOCATION>
staging-hive-catalog The Hive catalog name configured in the watsonx.data, if not using the default catalog for staging. Default catalog: hive_data. --staging-hive-catalog <catalog_name>
staging-hive-schema The schema name associated with the staging Hive catalog for ingestion. Create and pass in a custom schema name by using this parameter. Default schema: lhingest_staging_schema
. If schema is created as default, you do not have need to specify this parameter.--staging-hive-schema <schema_name>
system-config This parameter is used to specify system related parameters. See System config. --system-config <path/to/system/configfile>
target-table Data migration target table. <catalog>.<schema>.<table1>
. This is a mandatory parameter to run an ingestion job. Example:<iceberg.demo.customer1>
--target-table <TARGET_TABLES>
trust-store-path Path of the truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. --trust-store-path <TRUST_STORE_PATH>
trust-store-password Password of truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. --trust-store-password <TRUST_STORE_PASSWORD>
-
Configuration file option
The Configuration file contains a global ingest configuration section and multiple individual ingest configuration sections to run the ingestion job. The specifications of the individual ingestion sections override the specifications of the global ingestion section.
-
Global ingest config section
Global ingest config options and variables Parameter Description Declaration create-if-not-exist Create target table if not existed create-if-not-exist:<true/false>
ingestion-engine-endpoint Specifies connection parameters of the ingestion engine. Endpoint of ingestion engine. hostname= <hostname>
, port=<port>
ingestion-engine:hostname=<hostname>, port=<port>
target-table Data migration target table. Only one target table can be specified. <catalog>.<schema>.<table1>
target-table:<table_name>
-
Individual ingest config section
There can be multiple individual ingest sections in a configuration file option. Each individual ingest config sections will be ingested separately.
Individual ingest config options and variables Parameter Description Declaration create-if-not-exist Create target table if it does not exist. create-if-not-exist
dbpassword Database password that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. dbpassword:<DBPASSWORD>
dbuser Database username that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. dbuser:<DBUSER>
ingestion-engine-endpoint Endpoint of ingestion engine. hostname= <hostname>
, port=<port>
. This is a mandatory parameter to run an ingestion job.ingestion-engine-endpoint:<INGESTION_ENGINE_ENDPOINT>
schema Schema file that includes CSV specifications, and more. See Schema file specifications schema:/path/to/schemaconfig/file
source-files Data files or folders for data migration. File name ending with /
is considered a folder. This is a mandatory parameter to run an ingestion job.source-files:<SOURCE_DATA_FILE>
staging-location Location where CSV files and in some circumstances Parquet files are staged, see Staging location. This is a mandatory parameter to run an ingestion job. staging-location:<STAGING_LOCATION>
staging-hive-catalog The Hive catalog name configured in the watsonx.data, if not using the default catalog for staging. Default catalog: hive_data. --staging-hive-catalog <catalog_name>
staging-hive-schema The schema name associated with the staging Hive catalog for ingestion. Create and pass in a custom schema name by using this parameter. Default schema: lhingest_staging_schema
. If schema is created as default, you do not have need to specify this parameter.--staging-hive-schema <schema_name>
system-config This parameter is used to specify system related parameters. See System config. --system-config <path/to/system/configfile>
target-catalog-uri Target catalog uri target-catalog-uri:<TARGET_CATALOG_URI>
target-table Data migration target table. <catalog>.<schema>.<table1>
. This is a mandatory parameter to run an ingestion job. Example:<iceberg.demo.customer1>
target-table:<TARGET_TABLES>
target-table-storage Target table file storage location target-table-storage:<TARGET_TABLE_STORAGE>
trust-store-path Path of truststore to access ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. trust-store-path:<TRUST_STORE_PATH>
trust-store-password Password of truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. trust-store-password:<TRUST_STORE_PASSWORD>
-