Installing and using dbt-watsonx-spark
This section covers the steps to install and use dbt-watsonx-spark
.
Before you begin
- Subscription to watsonx.data on IBM Cloud.
- Provision native Spark engine in watsonx.data.
- Install DBT core.
Procedure
Create a Spark query server
For the Spark engine to integrate with dbt tool and work as a query engine, you must create a Spark query server. See Create a Spark query server.
Retrieve the query server connection details
To configure the profile file in dbt tool, you must save the query server connection details. See Retrieve the query server connection details.
Set up profiles.yaml for dbt tool
- Go to the
profiles.yml
file that is located in .dbt of your home directory. - Paste the connection details by modifying the parameter values.
- Set up the
profiles.yml
file. For more information, see Configuration (setting up your profile).
Install the dbt tool and verify the connection
-
Run the following command on your system to install
dbt-watsonx-spark
.pip install dbt-watsonx-spark
-
Run the following command to verify the dbt version.
dbt --version
-
If you want to create a dbt project, provide a <project_name> and run the following command .
dbt init <project_name>
- The system prompts to select the database to be used. Select
watsonx_spark
. - Provide watsonx.data host, URI, and schema.
- The system prompts to select the database to be used. Select
-
To test the connection, run:
cd <project_name> dbt debug
-
Run the seeds by using the following command to create a table and insert the data.
cd <project_name> dbt run
-
In
<project_name>/models
, you have the models that perform the operations. By default, dbt sets the operations asview
. You can create the tables or views by one of the following methods:-
Specify inside the models (applicable for that model only)
{{ config(materialized='table/view') }}
If this statement is commented out using (--), dbt still uses the configuration. To disable it, remove it entirely or comment it in Jinja style (
{# … #}
). -
Specify in dbt_project.yml (applicable for all models)
models: <project_name>: <model_folders>: +materialized: table/view
For example:
models: demo: example: +materialized: table
Only select statements are supported within models.
The semicolon (;) character is restricted in models.
-
-
Run the models by using the following command to create the tables or views.
cd <project_name> dbt run
You can also specify the tests that you want:
models: - name: <model_name> description: "some description" columns: - name: <col_name> description: "some description" data_tests: - <test_name_1> - <test_name_2>
For example:
models: - name: my_first_dbt_model description: "A starter dbt model" columns: - name: id description: "The primary key for this table" data_tests: - unique - not_null
Connectors must support Create Table as Select (CTAS) for dbt runs to work.
-
To generate the documents about the actions performed, run:
cd <project_name> dbt docs generate dbt docs serve
By default, it runs on localhost:8080. To change the port, run:
dbt docs serve –-port <port_number>