Integrating Databricks Unity Catalog in watsonx.data
Databricks Unity Catalog is a unified governance solution for data and AI assets in Databricks. By integrating Unity Catalog with IBM® watsonx.data, you can query remote Databricks tables without copying data, enabling seamless data federation across your data landscape.
watsonx.data supports querying Databricks Unity Catalog tables through:
- Spark engine - Query both Delta Lake and Iceberg tables using PySpark
- Presto engine - Query Iceberg tables and Uniform-enabled Delta tables through the Iceberg REST Catalog API
This integration enables:
- Zero-copy data federation across Databricks and watsonx.data
- Unified access to data stored in external locations (AWS S3 and Azure Data Lake Storage Gen2)
- Consistent governance and security policies across platforms
Architecture overview
The integration works through the following components:
- Databricks Unity Catalog - Centralized metadata and governance layer
- Iceberg REST Catalog API - Standard interface for accessing table metadata
- watsonx.data engines - Spark or Presto engines that execute queries
- External storage - AWS S3 and Azure Data Lake Storage Gen2 where data resides
Before you begin
Databricks requirements:
Ensure you have the following:
- Active Databricks workspace with Unity Catalog enabled
- Unity Catalog with tables (Delta Lake or Iceberg format)
- Authentication credentials:
- OAuth credentials (Client ID and Client Secret) for service principal authentication, OR
- Personal Access Token (PAT) for authentication
- Unity Catalog REST endpoint
Obtaining Databricks credentials:
- Log in to your Databricks workspace.
- Navigate to Settings > Identity and access.
- Click Manage on Service principals.
- Click Add service principal and create a new OAuth application.
- Note the Client ID and Client Secret.
- Alternatively, generate a Personal Access Token:
- Go to User Settings > Developer > Manage on Access Tokens.
- Click Generate new token.
- Ensure the token has the
unity-catalogAPI scope.
- Note your workspace URL (format:
https://<workspace-id>.cloud.databricks.com). - Identify your catalog name, schema name, and table names.
Databricks permissions setup:
Your Databricks service principal or user must have the required Unity Catalog privileges. Unity Catalog uses a hierarchical permission model where privileges granted at higher levels (catalog) automatically apply to lower levels (schemas and tables).
Understanding privilege inheritance:
Unity Catalog follows a hierarchical privilege model:
- Catalog-level privileges automatically grant access to all schemas and tables within that catalog
- Schema-level privileges automatically grant access to all tables within that schema
- You can grant privileges at any level depending on your security requirements
For detailed information on privilege inheritance, see Unity Catalog privilege inheritance in the Databricks documentation.
Simplified catalog-level grants (Recommended for testing):
Grant all privileges at the catalog level for broad access to all schemas and tables:
- Log in to your Databricks workspace.
- Navigate to Catalog in the left sidebar.
- Select your catalog, then click the Permissions tab.
- Click Grant and add your service principal or user.
- Assign the following privileges at the catalog level:
- USE CATALOG - Access to the catalog and all its schemas
- USE SCHEMA - Access to all schemas within the catalog
- SELECT - Read data from all tables in all schemas
- EXTERNAL USE SCHEMA - Access all schemas with external storage locations (required if using external storage)
Granular multi-level grants (Recommended for production):
For fine-grained access control, grant privileges at specific levels:
Catalog level:
1. Navigate to **Catalog** → **Select your catalog** → **Permissions** tab**.
2. Grant **USE CATALOG** to allow access to the catalog.
Schema level:
1. Navigate to **Catalog** → **Select your catalog** → **Select a schema** → **Permissions** tab.
2. Grant **USE SCHEMA** and **EXTERNAL USE SCHEMA** (if using external storage) for specific schemas.
Table level:
1. Navigate to **Catalog** → **Select your catalog** → **Select a schema** → **Select a table** → **Permissions** tab**.
2. Grant **SELECT** on specific tables you want to query.
For detailed information on Unity Catalog permissions, see Unity Catalog privileges and securable objects in the Databricks documentation.
watsonx.data requirements:
- Provisioned Spark engine (version 3.5 or later) for querying Delta Lake and Iceberg tables
- Provisioned Presto engine for querying Iceberg tables
- Network connectivity to Databricks workspace endpoints
Storage requirements:
- AWS S3 and Azure Data Lake Storage Gen2 configured as external location in Databricks
- Storage access credentials:
- AWS S3: Access key and secret key, S3 region information
- Azure Data Lake Storage Gen2: Storage account name and access key
Security considerations
Authentication:
- OAuth (Spark only): Recommended for production environments with service principal authentication
- Personal Access Token: Suitable for development and testing; ensure tokens are rotated regularly
Data access:
- All queries execute with the permissions of the authenticated user or service principal
- Unity Catalog enforces row-level and column-level security policies
- Storage credentials must have appropriate read permissions on external locations