Integrating Databricks Unity Catalog in watsonx.data

Databricks Unity Catalog is a unified governance solution for data and AI assets in Databricks. By integrating Unity Catalog with IBM® watsonx.data, you can query remote Databricks tables without copying data, enabling seamless data federation across your data landscape.

watsonx.data supports querying Databricks Unity Catalog tables through:

Spark engine - Query both Delta Lake and Iceberg tables using PySpark
Presto engine - Query Iceberg tables and Uniform-enabled Delta tables through the Iceberg REST Catalog API

This integration enables:

Zero-copy data federation across Databricks and watsonx.data
Unified access to data stored in external locations (AWS S3 and Azure Data Lake Storage Gen2)
Consistent governance and security policies across platforms

Architecture overview

The integration works through the following components:

Databricks Unity Catalog - Centralized metadata and governance layer
Iceberg REST Catalog API - Standard interface for accessing table metadata
watsonx.data engines - Spark or Presto engines that execute queries
External storage - AWS S3 and Azure Data Lake Storage Gen2 where data resides

Before you begin

Databricks requirements:

Ensure you have the following:

Active Databricks workspace with Unity Catalog enabled
Unity Catalog with tables (Delta Lake or Iceberg format)
Authentication credentials:
- OAuth credentials (Client ID and Client Secret) for service principal authentication, OR
- Personal Access Token (PAT) for authentication
Unity Catalog REST endpoint

Obtaining Databricks credentials:

Log in to your Databricks workspace.
Navigate to Settings > Identity and access.
Click Manage on Service principals.
Click Add service principal and create a new OAuth application.
Note the Client ID and Client Secret.
Alternatively, generate a Personal Access Token:
- Go to User Settings > Developer > Manage on Access Tokens.
- Click Generate new token.
- Ensure the token has the unity-catalog API scope.
Note your workspace URL (format: https://<workspace-id>.cloud.databricks.com).
Identify your catalog name, schema name, and table names.

Databricks permissions setup:

Your Databricks service principal or user must have the required Unity Catalog privileges. Unity Catalog uses a hierarchical permission model where privileges granted at higher levels (catalog) automatically apply to lower levels (schemas and tables).

Understanding privilege inheritance:

Unity Catalog follows a hierarchical privilege model:

Catalog-level privileges automatically grant access to all schemas and tables within that catalog
Schema-level privileges automatically grant access to all tables within that schema
You can grant privileges at any level depending on your security requirements

For detailed information on privilege inheritance, see Unity Catalog privilege inheritance in the Databricks documentation.

Simplified catalog-level grants (Recommended for testing):

Grant all privileges at the catalog level for broad access to all schemas and tables:

Log in to your Databricks workspace.
Navigate to Catalog in the left sidebar.
Select your catalog, then click the Permissions tab.
Click Grant and add your service principal or user.
Assign the following privileges at the catalog level:
- USE CATALOG - Access to the catalog and all its schemas
- USE SCHEMA - Access to all schemas within the catalog
- SELECT - Read data from all tables in all schemas
- EXTERNAL USE SCHEMA - Access all schemas with external storage locations (required if using external storage)

Granular multi-level grants (Recommended for production):

For fine-grained access control, grant privileges at specific levels:

Catalog level:

  1. Navigate to **Catalog** → **Select your catalog** → **Permissions** tab**.
  2. Grant **USE CATALOG** to allow access to the catalog.

Schema level:

  1. Navigate to **Catalog** → **Select your catalog** → **Select a schema** → **Permissions** tab.
  2. Grant **USE SCHEMA** and **EXTERNAL USE SCHEMA** (if using external storage) for specific schemas.

Table level:

  1. Navigate to **Catalog** → **Select your catalog** → **Select a schema** → **Select a table** → **Permissions** tab**.
  2. Grant **SELECT** on specific tables you want to query.

For detailed information on Unity Catalog permissions, see Unity Catalog privileges and securable objects in the Databricks documentation.

watsonx.data requirements:

Provisioned Spark engine (version 3.5 or later) for querying Delta Lake and Iceberg tables
Provisioned Presto engine for querying Iceberg tables
Network connectivity to Databricks workspace endpoints

Storage requirements:

AWS S3 and Azure Data Lake Storage Gen2 configured as external location in Databricks
Storage access credentials:
- AWS S3: Access key and secret key, S3 region information
- Azure Data Lake Storage Gen2: Storage account name and access key

Security considerations

Authentication:

OAuth (Spark only): Recommended for production environments with service principal authentication
Personal Access Token: Suitable for development and testing; ensure tokens are rotated regularly

Data access:

All queries execute with the permissions of the authenticated user or service principal
Unity Catalog enforces row-level and column-level security policies
Storage credentials must have appropriate read permissions on external locations

Integrating Databricks Unity Catalog in watsonx.data

Architecture overview

Before you begin

Security considerations

Next steps

Related information