Presto ingestion using ibm-lh utility through CLI
You can ingest data from an S3 or local location into IBM® watsonx.data using the ibm-lh utility. The following options are supported:
- Command line option
- Configuration file option
This topic provides details about the parameters supported in ibm-lh utility for ingestion using Presto engine. For detailed instructions to ingest, see:
- Ingesting data through command line - Presto ingestion mode
- Ingesting data through config file - Presto ingestion mode
Different options and variables that are supported in a command line and configuration file are listed as follows:
-
Command line option
Command line options and variables Parameter Description Declaration create-if-not-exist Create target table if it does not exist. --create-if-not-existdbpassword Database password that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. --dbpassword <DBPASSWORD>dbuser Database username that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. --dbuser <DBUSER>ingest-config Configuration file for data migration --ingest-config <INGEST_CONFIGFILE>ingestion-engine-endpoint Endpoint of ingestion engine. hostname= <hostname>, port=<port>. This is a mandatory parameter to run an ingestion job.--ingestion-engine-endpoint <INGESTION_ENGINE_ENDPOINT>log-directory This option is used to specify the location of log files. See Log directory. --ingest-config <ingest_config_file> --log-directory <directory_path>schema Schema file that includes CSV specifications, and more. See Schema file specifications. --schema </path/to/schemaconfig/file>source-data-files Data files or folders for data migration. File name ending with /is considered a folder. Single or multiple files can be used. This is a mandatory parameter to run an ingestion job. File names are case sensitive. Example:<file1_path>,<file2_path>,<folder1_path>--source-data-files <SOURCE_DATA_FILE>staging-location Location where CSV files and in some circumstances Parquet files are staged, see Staging location. This is a mandatory parameter to run an ingestion job. --staging-location <STAGING_LOCATION>staging-hive-catalog The Hive catalog name configured in the watsonx.data, if not using the default catalog for staging. Default catalog: hive_data. --staging-hive-catalog <catalog_name>staging-hive-schema The schema name associated with the staging Hive catalog for ingestion. Create and pass in a custom schema name by using this parameter. Default schema: lhingest_staging_schema. If schema is created as default, you do not have need to specify this parameter.--staging-hive-schema <schema_name>system-config This parameter is used to specify system related parameters. See System config. --system-config <path/to/system/configfile>target-table Data migration target table. <catalog>.<schema>.<table1>. This is a mandatory parameter to run an ingestion job. Example:<iceberg.demo.customer1>--target-table <TARGET_TABLES>trust-store-path Path of the truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. --trust-store-path <TRUST_STORE_PATH>trust-store-password Password of truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. --trust-store-password <TRUST_STORE_PASSWORD> -
Configuration file option
The Configuration file contains a global ingest configuration section and multiple individual ingest configuration sections to run the ingestion job. The specifications of the individual ingestion sections override the specifications of the global ingestion section.
-
Global ingest config section
Global ingest config options and variables Parameter Description Declaration create-if-not-exist Create target table if not existed create-if-not-exist:<true/false>ingestion-engine-endpoint Specifies connection parameters of the ingestion engine. Endpoint of ingestion engine. hostname= <hostname>, port=<port>ingestion-engine:hostname=<hostname>, port=<port>target-table Data migration target table. Only one target table can be specified. <catalog>.<schema>.<table1>target-table:<table_name> -
Individual ingest config section
There can be multiple individual ingest sections in a configuration file option. Each individual ingest config sections will be ingested separately.
Individual ingest config options and variables Parameter Description Declaration create-if-not-exist Create target table if it does not exist. create-if-not-existdbpassword Database password that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. dbpassword:<DBPASSWORD>dbuser Database username that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used. dbuser:<DBUSER>ingestion-engine-endpoint Endpoint of ingestion engine. hostname= <hostname>, port=<port>. This is a mandatory parameter to run an ingestion job.ingestion-engine-endpoint:<INGESTION_ENGINE_ENDPOINT>schema Schema file that includes CSV specifications, and more. See Schema file specifications schema:/path/to/schemaconfig/filesource-files Data files or folders for data migration. File name ending with /is considered a folder. This is a mandatory parameter to run an ingestion job.source-files:<SOURCE_DATA_FILE>staging-location Location where CSV files and in some circumstances Parquet files are staged, see Staging location. This is a mandatory parameter to run an ingestion job. staging-location:<STAGING_LOCATION>staging-hive-catalog The Hive catalog name configured in the watsonx.data, if not using the default catalog for staging. Default catalog: hive_data. --staging-hive-catalog <catalog_name>staging-hive-schema The schema name associated with the staging Hive catalog for ingestion. Create and pass in a custom schema name by using this parameter. Default schema: lhingest_staging_schema. If schema is created as default, you do not have need to specify this parameter.--staging-hive-schema <schema_name>system-config This parameter is used to specify system related parameters. See System config. --system-config <path/to/system/configfile>target-catalog-uri Target catalog uri target-catalog-uri:<TARGET_CATALOG_URI>target-table Data migration target table. <catalog>.<schema>.<table1>. This is a mandatory parameter to run an ingestion job. Example:<iceberg.demo.customer1>target-table:<TARGET_TABLES>target-table-storage Target table file storage location target-table-storage:<TARGET_TABLE_STORAGE>trust-store-path Path of truststore to access ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. trust-store-path:<TRUST_STORE_PATH>trust-store-password Password of truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job. trust-store-password:<TRUST_STORE_PASSWORD>
-