IBM Cloud Docs
Known issues (Limitations)

Known issues (Limitations)

The following limitations and known issues apply to IBM® watsonx.data.

DB2 connector cannot access views created using external tools

The DB2 connector in watsonx.data currently allows access to the views created through the watsonx.data instance or DBeaver. However, accessing views created with other tools such as Data Manager Console (DMC), Db2 command line (DB2 cmd), or any other third-party tools is not supported.

Unsupported special characters in schema and table creation

The following special characters are not supported while creating schemas and tables:

Schemas (Hive and Iceberg): {, [, (, )

Tables (Hive): {, (, [, )

Tables (Iceberg): $, @, {, [, ), (

ALTER TABLE operation fails in Spark job submission

Spark jobs that creates a schema, table, and then attempt an ALTER TABLE operation may encounter an authz.AccessControlException due to insufficient permissions.

This occurs because, even though the schema and table creation are successful, the job tries to execute the ALTER TABLE operation before the metastore data is updated with the newly created schema and table details.

Workaround: To prevent access denied errors, you must provide a delay in time between each operations that involves creation of new schemas or tables within the same Python script.

Delayed UI update after successful ingestion jobs

After a successful ingestion job, the schema or table may not be immediately visible in the user interface (UI). This is due to the background execution of the ingestion process.

Workaround: Refresh your browser or refresh the catalogs or schemas from the Data manager page after an ingestion job status is changed to Finished to make sure that the UI is updated with the newly created schema or table. Once the UI is refreshed, you can proceed to run another ingestion job on the same schema or tables that were just created.

Spark application submission fails when DAS (Data Access Service) is enabled

DAS does not currently support buckets or object storage that use HTTP endpoints.

Workaround: You can disable DAS or make sure that your buckets or object storage are configured with HTTPS endpoints.

Attempting to read Parquet v2 tables through Presto (C++) results in an error

When you attempt to read Parquet v2 tables through Presto (C++) that were created via Data manager in watsonx.data, it gives the following error:

Error in ZlibDecompressionStream::Next

Workaround: Presto (C++) currently does not support reading Parquet v2 tables. You must copy the data to a new table in v1 format to be compatible for reading using Presto (C++).

  1. Set the session property to PARQUET_1_0:

    set session <catalog_name>.parquet_writer_version = 'PARQUET_1_0';
    
  2. Run the following command to copy the data to a new table:

    create table <catalog name>.<schema name>.<table name> as (select * from <originaltablename>;
    

Spark ingestion currently does not support special characters like quotation marks, back ticks, and parentheses for partitioned table column names.

Attempting to query Query History and Monitoring Management (QHMM) related tables using Presto (C++) engines might encounter errors

When you attempt to query QHMM related tables using Presto (C++) engines, you might encounter errors due to unsupported file formats. Presto (C++) supports only DWRF and Parquet v1 formats. You can not use Presto (C++) to query data or tables in other formats.

Workaround: You can switch to use Presto (Java) engines to query QHMM related tables.

SELECT queries may fail in Hive table

When a column is dropped from a Hive table, subsequent SELECT queries may fail with the following error:

The file used to create the table is not an ORC file. Based on the file type, specify the file type or use an ORC file. Supported file types are ORC, Parquet, Avro, RCFile, SequenceFile, JSON, and Text. Only if you are using an ORC file, it is not mandatory to specify the file type.

The column drop operation succeeds, but SELECT * queries fail due to a file format issue.

Workaround: Add the following properties to the catalog through customization PATCH API. See Update presto engine.

hive.orc.use-column-names=true
hive.parquet.use-column-names=true

CreateIndex permission denied due to policy sync delay of collection creator

When attempting to create an index immediately following the creation of a collection in Milvus, users with the Viewer or User role might encounter a CreateIndex permission denied error. Apart from the CreateIndex operation, this error can happen to other Milvus operations within a Collection like Insert, CreatePartition, etc.

Workaround: Wait for 5–10 seconds after collection creation to create index.

Server concurrency limit reached error in flight server

You might encounter a Server concurrency limit reached error when using the flight server to run queries. This occurs when the server experiences high memory usage due to a large number of concurrent requests.

Workaround: Pause the query or request and try after few minutes.

Incorrect recognition of Gregorian dates in Presto with Hive Parquet tables

Presto exhibits issues when processing historical dates prior to 0200-01-01, specifically when they are stored in Hive tables formatted as Parquet. This issue occurs due to the conversion between the Gregorian and Julian calendars, which were implemented in 1582-10-15. Dates before this cutoff date are misinterpreted by Presto.

Incomplete information on column length in SHOW COLUMNS output

The SHOW COLUMNS query in Presto currently provides information about columns including name, data type, additional details (extra), and comments. This issue highlights that the existing functionality lacks details about the length of character-based data types (CHAR and VARCHAR). While some connectors return the actual length defined during table creation, others might provide a default value or no information at all.

To address this limitation, three new columns have been added to the SHOW COLUMNS output:

  • Scale: Applicable to DECIMAL data type, indicating the number of digits after the decimal point.

  • Precision: Applicable to numerical data types, specifying the total number of digits. (Default: 10)

  • Length: Intended for CHAR and VARCHAR data types, representing the maximum number of characters allowed.

Current Limitations:

  • The reported length in the Length column might not always reflect the actual size defined in the table schema due to connector limitations.

  • Connectors that don't provide length information will display a default value or null depending upon connector.

Calculation error for OPT_SORTHEAP in Query Optimizer

Due to a calculation error in the configuration setting of Query Optimizer for the value of OPT_SORTHEAP, the performance of Query Optimizer might be affected.

Workaround: To resolve the calculation error for OPT_SORTHEAP in Query Optimizer, complete the following steps to update the configuration as OPT_SORTHEAP= <initial_value> to OPT_SORTHEAP <initial_value>/20.

  1. Set up the PROJECT_CPD_INSTANCE environment variable pointing to the namespace where watsonx.data is installed.
export PROJECT_CPD_INSTANCE=<wxd_namespace
  1. Edit the value of OPT_SORTHEAP to OPT_SORTHEAP <initial_value>/20 by running the following command.
oc edit db2uinstance lakehouse-oaas -n $PROJECT_CPD_INSTANCE
  1. Wait for sometime for the STATE to change to Ready for lakehouse-oaas and run the following command.
watch "oc get db2uinstance  -n $PROJECT_CPD_INSTANCE"

Limitations - Presto (C++)

  • Presto (C++) engine currently does not support database catalogs.
  • Only file formats such as Parquet and DWRF are supported.
  • Hive connector is supported.
  • Default iceberg Table has only read support with Parquet v1 format.
  • TPC-H/TPC-DS queries are supported.
  • The following SQL statements are supported:
    • SELECT all clauses
    • CTAS statements
  • DELETE FROM and CALL SQL statements are not supported.
  • START, COMMIT, and ROLLBACK transactions are not supported.
  • Data types CHAR, TIME, and TIME WITH TIMEZONE are not supported. These data types are subsumed by VARCHAR, TIMESTAMP, and TIMESTAMP WITH TIMEZONE.
    • IPADDRESS, IPPREFIX, UUID, kHYPERLOGLOG, P4HYPERLOGLOG, QDIGEST, and TDIGEST are not supported.
    • VARCHAR supports only a limited length. Varchar(n) with a maximum length bound is not supported.
    • TIME and TIME WITH TIMEZONE is supported in community development.
    • TIMESTAMP columns in Parquet files cannot be read.
  • Scalar functions:
    • IPFunctions, QDigest, HyperLogLog, and Geospatial internationalization are not supported.
  • Aggregate functions:
    • QDigest, Classification metrics, and Differential entropy are not supported.
  • S3 and S3 compatible file systems (both read and write) are supported.

Presto (C++) fails to query an external partitioned table

When you query an external table with CHAR data type columns, the query fails to run. This issue occurs due to the limitation that Presto (C++) does not support CHAR data types.

Workaround: Change the CHAR data type column to VARCHAR data type.

Accessing Hive and Iceberg tables in the same glue metastore catalog

When using the AWS Glue Data Catalog to manage a bucket or storage location containing both Iceberg and Hive tables, attempting to access Iceberg tables from the Hive catalog gives, Not a Hive table error and attempting to access Hive tables from the Iceberg catalog gives, Not an Iceberg table error.

MinIO bucket access through S3 proxy is unavailable

Currently, it is not possible to access buckets stored in MinIO object storage using an S3 proxy functionality.

Presto SQL operations with Spark 3.3 and Iceberg timestamp data

When data containing timestampz is ingested using Spark, Presto queries on these tables fail with the following error Iceberg column type timestamptz is not supported.

Workaround: To ensure interoperability between Spark and Presto for datasets containing timestampz, you must use Spark 3.4 applications with the configuration spark.sql.timestampType set to TIMESTAMP_NTZ.

Using ID as a column name in Cassandra CREATE TABLE

In Cassandra, you cannot create a table with a column named ID while using a Cassandra connector through Presto. This is because ID is a reserved keyword for the Cassandra driver that is used by Presto, which automatically generates a UUID for each row. Attempting to create a table with a column name ID results in an error message indicating a duplicate column declaration as follows: Duplicate column id declaration for table tm_lakehouse_engine_ks.testtable12

Workaround: Avoid using ID as a column name when creating Cassandra tables through Presto.

User role with CreateCollection L3 policy fails to create collection in Milvus

Users with User role while creating collections in Milvus with pymilvus can fail when using the ORM Connection and MilvusClient Connection methods.

Workaround: You must follow the instructions:

ORM Connection: The user requires both DescribeCollection and CreateCollection privileges granted in the L3 policy page. You must select all collections in a database while granting DescribeCollection privilege in the L3 policy through web console.

MilvusClient Connection: Only CreateCollection privilege is necessary in the L3 policy page. However, the first attempt to create a collection will fail.

  1. Run the create_collection function once.
  2. Re-run the create_collection function again. This allows the policies to synchronise and the collection creation will succeed.

Special characters and mixed case impacting data synchronization

When synchronizing data between buckets containing tables or schemas with special characters or mixed case letters in their names, you might encounter with the following unexpected behaviors:

  • Tables or schemas with certain special characters %, ,, {, ), (, @, $, [, : will have their data entirely skipped during synchronization.
  • Tables or schemas with mixed case or uppercase letters will be converted to lowercase before synchronization.

Workaround: Avoid using special characters and mixed case in table and schema names. Rename existing tables and schemas to use only the supported characters.

Missing data validation for Amazon S3 storage endpoints

Currently, the user interface (UI) does not perform data validation for endpoints associated with the Amazon S3 storage type.

Incorrect alias usage in WITH clause and USE catalog.schema

WITH clause: When referencing data within the WITH clause, use the exact alias name assigned during its definition. Using an incorrect alias triggers the following error message.

Schema must be specified when session schema is not set

USE catalog.schema usage along with WITH clause: When tables are specified using WITH and USE catalog.schema, queries with incorrect alias names will result in the following error.

Table does not exist

String literal interpretation in Presto (Java)

Presto (Java), by default interprets string literals as VARCHAR, unlike many other database systems that treat them as CHAR.

In Presto (Java), string comparisons are performed on the actual characters present in the string, excluding trailing spaces. This can cause queries to return incorrect results when working with strings that may contain trailing spaces, as these spaces are not considered during comparison.

Table names with multiple dots

Presto (Java) does not support creating or querying table names that contain three or more consecutive dots in its name. Attempts to reference such tables in queries may result in errors.

User is still visible in the Access control page of an engine after removing the user from IAM.

LDAP authentication is not supported for Teradata connector.

The watsonx.data Teradata connector does not currently support LDAP (Lightweight Directory Access Protocol) for user authentication.

Workaround: If you encounter the 502 error, reload the Spark history UI page after waiting 1-5 seconds. This should allow enough time for the server to become operational.

Cross catalog schema creation anomaly in Presto (Java).

An anomaly exists in schema creation for Hive and Iceberg catalogs managed by Presto (Java). When using a common Hive Metastore Service for multiple catalogs (Example, an Iceberg catalog and a Hive catalog, or two Iceberg or Hive catalogs), creating a schema in one catalog might create it in a wrong catalog. This occurs if the location specified during schema creation belongs to a different catalog than intended.

Workaround: You must always explicitly provide the correct storage path associated with the target catalog when using CREATE SCHEMA statements in Presto (Java). This ensures the schema is created in the desired location.

Presto (Java) queries with many columns and size exceeding default limit.

Presto (Java) queries involving multiple tables with a large number of columns (for example, 1000 columns per table or more) in the SELECT clause might encounter performance issues across all deployment environments.

The iterative optimizer times out when max_reorder_joins is set to 5 or higher (the default timeout is 3 minutes) and gives the following error:

The optimizer exhausted the time limit of 180000 ms

For queries exceeding the default max-task-update-size limit (16MB in Presto (Java)), you might observe a TaskUpdate size exceeding this limit error (the specific value of limit depends on the actual query).

Workaround:

  • You can improve query performance by temporarily disabling the reorder_joins rule using the following session property:

    set session reorder_joins = false;
    
  • Increase the max-task-update-size value in the config.properties file if the issue involves a TaskUpdate size exceeding the limit error and restart Presto (Java).

Example:

experimental.internal-communication.max-task-update-size=64MB

Limitation: Transactions not supported in unlogged Informix databases.

In watsonx.data, when attempting to execute queries with transactional implications on unlogged Informix databases, queries will fail. This is because unlogged Informix databases, by design, do not support transactions.

Limitation: Netezza Performance Server INSERT statement limitation.

Netezza Performance Server currently does not support inserting multiple rows directly into a table using VALUES clause. This functionality is limited to single-row insertions. Refer to the official Netezza Performance Server documentation for details on the INSERT statement.

The following example using VALUES for multiple rows is not supported:

INSERT INTO EMPLOYEE VALUES (3,'Roy',45,'IT','CityB'),(2,'Joe',45,'IT','CityC');

Workaround: Use a subquery with SELECT and UNION ALL to construct a temporary result set and insert it into the target table.

INSERT INTO EMPLOYEE SELECT * FROM(SELECT 4,'Steve',35,'FIN','CityC' UNION ALL SELECT 5,'Paul',37,'OP','CityA') As temp;

Issue: Milvus unresponsive to queries.

Milvus may not respond to queries when attempting to load collections or partitions that exceed available memory capacity. This occurs because all search and query operations within Milvus are executed in memory, requiring the entire collection or partition to be loaded before querying.

Workaround:

  • Consider the memory limitations of your Milvus deployment and avoid loading excessively large collections or partitions.

  • If Milvus becomes unresponsive to queries, employ the appropriate Milvus API to unload or release some collections from memory. An example using Python SDK: collection.release()

Issue: Inaccurate row count after deletions in Milvus.

The collection.num_entities property might not reflect the actual number of rows in a Milvus collection after deletion operations. This property provides an estimate and may not account for deleted entities.

To get an accurate count of rows, execute a count(*) query on the collection. This provides an accurate count even after deletions.

Pymilvus syntax:

collection = pymilvus.Collection(...)
collection.query(expr='', fields=['count(*)'])

Limitations: Unsupported Db2 operations.

watsonx.data currently does not support the ALTER TABLE DROP COLUMN operation for Db2 column-organized tables.

By default, Db2 instances create tables in column-organized format.

watsonx.data does not support creating row-organized tables in Db2.

Limitations: Handling Null Values in Elasticsearch.

Elasticsearch connector requires explicit definition of index mappings for fields to handle null values when loading data.

Limitations: Loading Nested JSON with Elasticsearch.

Elasticsearch connector requires users to explicitly specify nested JSON structures as arrays of type ROW for proper loading and querying. To process such structures, use the UNNEST operation.

Limitation: Users can create 3 instances of Milvus service for a single instance of watsonx.data in IBM Cloud.

Issue: Unable to create views in Presto (Java).

Presto (Java) describes a view in a mapped database as a TABLE rather than a VIEW. This is apparent to JDBC program connecting to the Presto (Java) engine.

Issue: Using special characters in schema, table, or column names.

It is recommended to not use special characters such as question mark (?), hyphen (-), or asterisk (*) in table, column names and schema names. Though these special characters are supported and tables, columns and schemas can be created, using these special characters might cause issues when running the INSERT command.

Issue: User is not removed from the catalog access control on revoking data access.

When you grant user access to a user by adding them to the data control policies by using the Access Control screen, the user is successfully listed against the catalog. On revoking user access from the Access Control page, the user stays listed against the catalog and continues to have user access.

Issue: Unable to view expected catalogs from Presto (Java).

Users with administrator privileges are unable to view the expected Hive and PostgreSQL catalogs from Presto (Java).

Issue: Console UI lists invalid users.

watsonx.data user (user1) invites a new user (user2) to the account by using the Manage access and users screen (Manage > Access (IAM) > Manage access and users) and grants access to a role (MetastoreAccess, Viewer, Operator, Editor, Administrator). User2 gets access to resources in the watsonx.data instance through user1's account. Additionally, user2 is granted data access at the resource level by adding to the data control policies by using the Access Control screen. When user1 removes user2 from the user1's account, user2 is still listed in the Access Control tab at resource level.

Issue: Unable to view created schema.

When a user with the User role and the Create access (the user only has the Create access) is added to an external database, they cannot see the schemas that they created. Though the user can create schemas, they cannot view them. The following is the system response:

presto:default> show schemas;
Schema
--------
(0 rows)

Workaround: Provide select privilege for the schema the user created.

Issue: Access denied when querying an external database.

When a user with the User role and Create access (the user only has Create access), is added to an external database, they cannot run the select query from the table they have created. Though the user can connect to the Presto (Java) engine and create tables and schemas, they cannot query from the table. The system displays a Access Denied message.

Query 20230608_132213_00042_wpmk2 failed: Access Denied: Cannot select from columns [id] in table or view tab_appiduser_01

Workaround: Provide select privilege for the table the user created.

Issue: Schema created under different catalog.

Schemas are available across Iceberg and Hive catalogs. When a schema is created under Iceberg catalog, it is listed under Hive catalog and vice versa.

Issue: Presto (Java) does not support deletion of Iceberg tables.

Issue: DROP SCHEMA in Db2.

In Db2, the schema can be dropped only if it is empty. Initiating DROP SCHEMA statement against a non-empty schema may result in Db2 SQL Error SQLCODE=-478 and SQLSTATE=42893.

Issue: CREATE VIEW statement that is partially supported by Db2.

Db2 connector partially supports CREATE VIEW statement. The Presto (Java) supported SQL syntax does not include creating views with custom column names (different than the table column names).

Issue: CREATE VIEW statement that is partially supported by NPSaaS.

NPSaaS connector partially supports CREATE VIEW statement. The Presto (Java) Supported SQL syntax does not include creating views with custom column names (different than the table column names).

Issue: Presto (Java) does not recognize the path as a directory.

When you create a new table with a Presto (Java) Hive connector that uses an S3 folder from an external location, Presto (Java) does not recognize the path as a directory and an error might occur.

For example, when creating a customer table in the target directory DBCERT/tbint in a bucket that is called dqmdbcertpq by using the IBM Cloud UX and Aspera S3 console, the following error is encountered: External location must be a directory.

CREATE TABLE "hive-beta"."dbcert"."tbint" (
RNUM int , CBINT bigint
) WITH (
format='PARQUET', external_location = 's3a://dqmdbcertpq/DBCERT/tbint'
);
Query 20230509_113537_00355_cn58z failed: External location must be a directory

Objects in a file system are stored as objects and their path. The object and path must have an associated metadata. If the path is not associated with the metadata, Presto (Java) fails to recognize the object and responds that the path is not a directory.

Issue: Assigning Grant or Revoke privilege.

Assigning Grant or Revoke privilege to a user through access policy does not work as expected in the following scenarios:

  1. User_A adds a bucket and a Hive catalog (for example, useracat02).

  2. User_A creates a schema and a table.

  3. User_B and User_C are assigned User roles to the catalog.

  4. User_A adds allow grant policy to User_B.

  5. User_B connects to the catalog and runs grant select to User_C.

    presto:default> grant select on useracat02.schema_test_01.tab_1 to "6ff74bf7-b71b-42f2-88d9-a98fdbaed304";
    
  6. When the User_C connects to the catalog and runs select command on the table, the command fails with access denied message.

    presto:default> select * from useracat02.schema_test_01.tab_1;
    Query 20230612_073938_00132_hthnz failed: Access Denied: Cannot select from columns [name, id, salary, age] in table or view tab_1
    

Issue: Creating schema without a location.

When you create a schema without a location, it is not listed in the schema list of any catalog. For example, if you create a schema without specifying the location of the bucket, the schema is created in HMS and not in the bucket. When you try to create a new schema with the same name, it fails and responds that the schema already exists.

Workaround: Specify the location of the bucket when creating a schema.

Issue: Unique names for schema and bucket.

A schema and a bucket cannot be created with the same name. For example, if you create a schema that is named “sales” in one catalog, the same name cannot be used for another schema in another catalog. Similarly, if you register a bucket with the name “salesbucket”, another bucket with the same cannot be registered, even if the bucket is located in a different object store.

Workaround: Use unique names when creating schemas and buckets.

Issue: Creating schema for target table.

You must create schema for the target table if the schema does not exist.

Issue: Ingestion fails if CSV file contains bad record.

ibm-lh tool does not support skipping maximum bad records for CSV files if the mismatch field is greater than the table definition.

Issue: Creating schema location with path.

Use one of the following location options when creating a schema:

  • Location pointing to a bucket/subpath without a trailing /.
  • Location pointing to a bucket/subpath with a trailing / – Recommended for better structuring.

Though you can use a location pointing to a bucket only with or without a trailing /, it might lead to failure. Therefore, it is recommended to use a subpath.

Issue: Presto (Java) do not support AS OF with iceberg tables.

Presto (Java) do not support AS OF <time stamp> command in a SELECT query.

Workaround: Invoke CALL iceberg_data_rollback_to_snapshot to move to the required timestamp.

If you use CALL iceberg_data_rollback_to_snapshot with a timestamp, you cannot call the stored procedure to move to a later timestamp. Use Spark SQL as an alternative.

Issue: Only the creator has DROP access on the table in Apache Hive (API).

Only the creator of a table can drop the table that is created in the Apache Hive catalog. Other users cannot drop the table even if they have an explicit DROP access to the table. They get the Access Denied message.

Issue: User-provided certificates are not supported by watsonx.data.

Currently, user-provided certificates are not supported in watsonx.data when adding database connections, object store buckets, or when using ibm-lh utility.

Issue: No columns to parse from file error.

When you try to ingest folder from AWS S3 using the ibm-lh tool, the following error may be encountered if there are no empty files in the folder:

No columns to parse from file

Workaround: First list the folders inside the bucket by using aws s3 ls command. If no empty files are listed, copy all the files to another folder by using aws s3 cp command.

Special characters in target table names can cause ingestion failures.

Ingestion fails if a target table name contains special characters in it when ingesting through the web console.

Workaround: You can ingest data by using ingestion through Spark CLI.

Limitation: Presto (Java) does not support VARBINARY datatype.

The current version of Presto (Java) does not support binary strings with length. Execution of an ALTER TABLE statement on a database results in the following error:

Unknown type 'varbinary(n)' for column 'testcolumn'

This is a limitation in Preso and not a limitation in watsonx.data.