Integrating Snowflake Open Catalog in watsonx.data

Snowflake Open Catalog is a unified governance solution for Apache Iceberg tables in Snowflake. By integrating Snowflake Open Catalog with IBM® watsonx.data, you can query remote Snowflake tables without copying data, enabling seamless data federation across your data landscape.

watsonx.data supports querying Snowflake Open Catalog tables through:

  • Spark engine - Query Iceberg tables using PySpark with case-insensitive object names
  • Presto engine - Query Iceberg tables through the Iceberg REST Catalog API with case-sensitive object names

This integration enables:

  • Zero-copy data federation across Snowflake and watsonx.data
  • Unified access to data stored in external locations (Google Cloud Storage)
  • Consistent governance and security policies across platforms

Architecture overview

The integration works through the following components:

  1. Snowflake Open Catalog - Centralized metadata and governance layer for Iceberg tables
  2. Iceberg REST Catalog API - Standard interface for accessing table metadata
  3. watsonx.data engines - Spark or Presto engines that execute queries
  4. External storage - Google Cloud Storage where data resides

Before you begin

Snowflake requirements:

Ensure you have the following:

  • Active Snowflake Open Catalog account
  • Access to a Snowflake Query Workspace
  • Authentication credentials:
    • Service connection with a valid client ID and client secret for authentication
  • The REST Catalog endpoint associated with your Snowflake Open Catalog account

Obtaining Snowflake credentials:

  1. Log in to your Snowflake workspace.
  2. Navigate to Admin > Security Integrations.
  3. Create or select a catalog integration with OAuth authentication.
  4. Note the Client ID and Client Secret for the service connection.
  5. Identify your catalog URI (format: https://<account>.snowflakecomputing.com/polaris/api/catalog).
  6. Note your Open Catalog name and database names.

Snowflake permissions setup:

Your Snowflake service principal must have the required permissions to access the Open Catalog and its tables:

  • USAGE privilege on the catalog integration
  • SELECT privilege on tables you want to query
  • USAGE privilege on schemas containing the tables

For detailed information on Snowflake permissions, see Snowflake Access Control in the Snowflake documentation.

watsonx.data requirements:

  • Provisioned Spark engine (version 3.5 or later) for querying Iceberg tables with case-insensitive object names
  • Provisioned Presto engine for querying Iceberg tables with case-sensitive object names
  • Network connectivity to Snowflake endpoints

Storage requirements:

  • Google Cloud Storage (GCS) bucket configured as external volume in Snowflake
  • GCS bucket located in the same region as the Snowflake Open Catalog account to ensure optimal performance and compatibility
  • Storage access credentials:
    • Google Cloud Storage: Service account JSON key file with appropriate permissions

Object name case sensitivity

Understanding case sensitivity is critical when working with Snowflake tables across different engines:

Spark engine:

  • Object names (schemas and tables) are treated as case-insensitive by default
  • Quoted identifiers are not required when accessing schemas and tables created in Snowflake
  • Example: SELECT * FROM database.schema.table works regardless of the original case

Presto engine:

  • Object names are treated as case-sensitive
  • Schemas and tables in Snowflake must be created using double-quoted identifiers to preserve the intended case
  • Since Presto recognizes object names in lowercase, it is recommended to define all schema and table names in lowercase within double quotes to ensure consistent and reliable access
  • Example: CREATE SCHEMA "myschema" and SELECT * FROM "myschema"."mytable"

Security considerations

Authentication:

  • OAuth: Recommended for production environments with service principal authentication using client ID and client secret
  • Ensure credentials are stored securely and rotated regularly

Data access:

  • All queries execute with the permissions of the authenticated service principal
  • Snowflake enforces row-level and column-level security policies
  • Storage credentials must have appropriate read permissions on external volumes

Next steps

Related information