IBM Cloud Docs
Enhancing statistics for synced Iceberg tables

Enhancing statistics for synced Iceberg tables

This topic gives the details to gather enhanced statistics for Iceberg tables that are synchronized with the IBM® watsonx.data metastore. It uses the EXT_METASTORE_STATS_SYNC stored procedure to collect and update statistical information within the metastore, which can improve the performance of query optimization and execution.

Before you begin

  1. Make sure that the manual synchronization procedure between the Query Optimizer and the watsonx.data metastore is completed successfully if needed. See Manually syncing Query Optimizer with Watsonx.data metastore.

    The instructions in this topic can now be executed using the enhanced feature Managing statistical updates from the Optimizer dashboard, which enables advanced query performance enhancements and optimization capabilities across multiple catalogs.

Procedure

  1. Run the following command to get enhanced statistics for an Iceberg table that is synced:

    ExecuteWxdQueryOptimizer 'CALL EXT_METASTORE_STATS_SYNC(
      '<CATALOG_NAME>',
      '<SCHEMA_NAME>',
      '<TABLE_NAME>',
      '<PRESTO_INTERNAL_HOST>',
      '<PRESTO_USER>',
      '<PRESTO_PWD>',
      'true'
    )';
    

    <CATALOG_NAME>: The name of catalog (case-sensitive).

    <SCHEMA_NAME>: The name of schema in uppercase.

    <TABLE_NAME>: The name of table in uppercase. It is recommended to gather statistics for each table individually.

    <PRESTO_INTERNAL_HOST>: The internal hostname of Presto engine of which the statistics is collected from. You can find the connection details of Presto engine by clicking on the engine in the Infrastructure manager page of watsonx.data.

    <PRESTO_USER>: The Presto username that is used to run the statistics collection. Username can be ibmlhapikey or ibmlhtoken. It is recommended to use ibmlhapikey.

    Starting with watsonx.data version 2.2.0, authentication using ibmlhapikey and ibmlhtoken as usernames is deprecated. These formats will be phased out in a future release. To ensure compatibility with upcoming versions, use the new format:ibmlhapikey_username and ibmlhtoken_username.

    <PRESTO_PWD>: The Presto password that is used to run the statistics collection. Password can be a base64 API key or token corresponding to the username.

    Verify the sync operation in a few minutes by following the procedure in Verifying table sync in watsonx.data.