Presto ingestion using ibm-lh utility through CLI

You can ingest data from an S3 or local location into IBM® watsonx.data using the ibm-lh utility. The following options are supported:

Command line option
Configuration file option

This topic provides details about the parameters supported in ibm-lh utility for ingestion using Presto engine. For detailed instructions to ingest, see:

Different options and variables that are supported in a command line and configuration file are listed as follows:

Command line option

Command line options and variables
Parameter	Description	Declaration
create-if-not-exist	Create target table if it does not exist.	`--create-if-not-exist`
dbpassword	Database password that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used.	`--dbpassword <DBPASSWORD>`
dbuser	Database username that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used.	`--dbuser <DBUSER>`
ingest-config	Configuration file for data migration	`--ingest-config <INGEST_CONFIGFILE>`
ingestion-engine-endpoint	Endpoint of ingestion engine. hostname=`<hostname>`, port=`<port>`. This is a mandatory parameter to run an ingestion job.	`--ingestion-engine-endpoint <INGESTION_ENGINE_ENDPOINT>`
log-directory	This option is used to specify the location of log files. See Log directory.	`--ingest-config <ingest_config_file> --log-directory <directory_path>`
schema	Schema file that includes CSV specifications, and more. See Schema file specifications.	`--schema </path/to/schemaconfig/file>`
source-data-files	Data files or folders for data migration. File name ending with `/` is considered a folder. Single or multiple files can be used. This is a mandatory parameter to run an ingestion job. File names are case sensitive. Example: `<file1_path>,<file2_path>,<folder1_path>`	`--source-data-files <SOURCE_DATA_FILE>`
staging-location	Location where CSV files and in some circumstances Parquet files are staged, see Staging location. This is a mandatory parameter to run an ingestion job.	`--staging-location <STAGING_LOCATION>`
staging-hive-catalog	The Hive catalog name configured in the watsonx.data, if not using the default catalog for staging. Default catalog: hive_data.	`--staging-hive-catalog <catalog_name>`
staging-hive-schema	The schema name associated with the staging Hive catalog for ingestion. Create and pass in a custom schema name by using this parameter. Default schema: `lhingest_staging_schema`. If schema is created as default, you do not have need to specify this parameter.	`--staging-hive-schema <schema_name>`
system-config	This parameter is used to specify system related parameters. See System config.	`--system-config <path/to/system/configfile>`
target-table	Data migration target table. `<catalog>.<schema>.<table1>`. This is a mandatory parameter to run an ingestion job. Example: `<iceberg.demo.customer1>`	`--target-table <TARGET_TABLES>`
trust-store-path	Path of the truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job.	`--trust-store-path <TRUST_STORE_PATH>`
trust-store-password	Password of truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job.	`--trust-store-password <TRUST_STORE_PASSWORD>`

Configuration file option

The Configuration file contains a global ingest configuration section and multiple individual ingest configuration sections to run the ingestion job. The specifications of the individual ingestion sections override the specifications of the global ingestion section.

Global ingest config section

Global ingest config options and variables
Parameter	Description	Declaration
create-if-not-exist	Create target table if not existed	`create-if-not-exist:<true/false>`
ingestion-engine-endpoint	Specifies connection parameters of the ingestion engine. Endpoint of ingestion engine. hostname=`<hostname>`, port=`<port>`	`ingestion-engine:hostname=<hostname>, port=<port>`
target-table	Data migration target table. Only one target table can be specified. `<catalog>.<schema>.<table1>`	`target-table:<table_name>`

Individual ingest config section

There can be multiple individual ingest sections in a configuration file option. Each individual ingest config sections will be ingested separately.

Individual ingest config options and variables
Parameter	Description	Declaration
create-if-not-exist	Create target table if it does not exist.	`create-if-not-exist`
dbpassword	Database password that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used.	`dbpassword:<DBPASSWORD>`
dbuser	Database username that is used to do ingestion. This is a mandatory parameter to run an ingestion job unless the default user is used.	`dbuser:<DBUSER>`
ingestion-engine-endpoint	Endpoint of ingestion engine. hostname=`<hostname>`, port=`<port>`. This is a mandatory parameter to run an ingestion job.	`ingestion-engine-endpoint:<INGESTION_ENGINE_ENDPOINT>`
schema	Schema file that includes CSV specifications, and more. See Schema file specifications	`schema:/path/to/schemaconfig/file`
source-files	Data files or folders for data migration. File name ending with `/` is considered a folder. This is a mandatory parameter to run an ingestion job.	`source-files:<SOURCE_DATA_FILE>`
staging-location	Location where CSV files and in some circumstances Parquet files are staged, see Staging location. This is a mandatory parameter to run an ingestion job.	`staging-location:<STAGING_LOCATION>`
staging-hive-catalog	The Hive catalog name configured in the watsonx.data, if not using the default catalog for staging. Default catalog: hive_data.	`--staging-hive-catalog <catalog_name>`
staging-hive-schema	The schema name associated with the staging Hive catalog for ingestion. Create and pass in a custom schema name by using this parameter. Default schema: `lhingest_staging_schema`. If schema is created as default, you do not have need to specify this parameter.	`--staging-hive-schema <schema_name>`
system-config	This parameter is used to specify system related parameters. See System config.	`--system-config <path/to/system/configfile>`
target-catalog-uri	Target catalog uri	`target-catalog-uri:<TARGET_CATALOG_URI>`
target-table	Data migration target table. `<catalog>.<schema>.<table1>`. This is a mandatory parameter to run an ingestion job. Example: `<iceberg.demo.customer1>`	`target-table:<TARGET_TABLES>`
target-table-storage	Target table file storage location	`target-table-storage:<TARGET_TABLE_STORAGE>`
trust-store-path	Path of truststore to access ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job.	`trust-store-path:<TRUST_STORE_PATH>`
trust-store-password	Password of truststore to access the ingestion engine. This is used to establish SSL connections. This is a mandatory parameter to run an ingestion job.	`trust-store-password:<TRUST_STORE_PASSWORD>`