IBM Cloud Docs
Installing and using dbt-watsonx-spark

Installing and using dbt-watsonx-spark

This section covers the steps to install and use dbt-watsonx-spark.

Before you begin

  • Subscription to watsonx.data on IBM Cloud.
  • Provision native Spark engine in watsonx.data.
  • Install DBT core.

Procedure

Create a Spark query server

For the Spark engine to integrate with dbt tool and work as a query engine, you must create a Spark query server. See Create a Spark query server.

Retrieve the query server connection details

To configure the profile file in dbt tool, you must save the query server connection details. See Retrieve the query server connection details.

Set up profiles.yaml for dbt tool

  1. Go to the profiles.yml file that is located in .dbt of your home directory.
  2. Paste the connection details by modifying the parameter values.
  3. Set up the profiles.yml file. For more information, see Configuration (setting up your profile).

Install the dbt tool and verify the connection

  1. Run the following command on your system to install dbt-watsonx-spark.

    pip install dbt-watsonx-spark
    
  2. Run the following command to verify the dbt version.

    dbt --version
    
  3. If you want to create a dbt project, provide a <project_name> and run the following command .

    dbt init <project_name>
    
    1. The system prompts to select the database to be used. Select watsonx_spark.
    2. Provide watsonx.data host, URI, and schema.
  4. To test the connection, run:

    cd <project_name>
    dbt debug
    
  5. Run the seeds by using the following command to create a table and insert the data.

    cd <project_name>
    dbt run
    
  6. In <project_name>/models, you have the models that perform the operations. By default, dbt sets the operations as view. You can create the tables or views by one of the following methods:

    • Specify inside the models (applicable for that model only)

      {{ config(materialized='table/view') }}
      

      If this statement is commented out using (--), dbt still uses the configuration. To disable it, remove it entirely or comment it in Jinja style ({# … #}).

    • Specify in dbt_project.yml (applicable for all models)

      models:
        <project_name>:
          <model_folders>:
            +materialized: table/view
      

      For example:

      models:
        demo:
          example:
            +materialized: table
      

      Only select statements are supported within models.

    The semicolon (;) character is restricted in models.

  7. Run the models by using the following command to create the tables or views.

    cd <project_name>
    dbt run
    

    You can also specify the tests that you want:

    models:
      - name: <model_name>
        description: "some description"
        columns:
          - name: <col_name>
            description: "some description"
            data_tests:
              - <test_name_1>
              - <test_name_2>
    

    For example:

    models:
      - name: my_first_dbt_model
        description: "A starter dbt model"
        columns:
          - name: id
            description: "The primary key for this table"
            data_tests:
              - unique
              - not_null
    

    Connectors must support Create Table as Select (CTAS) for dbt runs to work.

  8. To generate the documents about the actions performed, run:

    cd <project_name>
    dbt docs generate
    dbt docs serve
    

    By default, it runs on localhost:8080. To change the port, run:

    dbt docs serve –-port <port_number>