IBM Cloud Docs
Build an external webhook enrichment solution in Watson Discovery

Build an external webhook enrichment solution in Watson Discovery

In this tutorial, you can use sample applications to build an external webhook enrichment solution by using Watson Discovery.

IBM Cloud

Follow this tutorial only if you are using a managed deployment.

The following image shows the external enrichment configuration flow.

Shows the external enrichment configuration flow
Figure 1. External enrichment configuration flow

The following image shows the external enrichment process flow.

Shows the external enrichment process flow
Figure 2. External enrichment process flow

For more information about the external enrichment APIs, see External enrichment API.

Learning objectives

By the time you finish the tutorial, you will learn how to use the following sample applications:

  • Regex: For entity extraction, document classification, and sentence classification by using regular expressions
  • Granite: For entity extraction from email by using watsonx.ai Granite model
  • Slate: For entity extraction with watsonx.ai Slate model that is fine-tuned with labeled data exported from entity extractor workspace of Watson Discovery.

Duration

This tutorial takes approximately 2 hours to complete.

Prerequisite

  1. Before you begin, you must set up a paid account with IBM Cloud to get an instance of Watson Discovery Plus or Enterprise plan.

    You can complete this tutorial at no cost by using a Plus plan, which offers a 30-day trial at no cost. However, to create a Plus plan instance of the service, you must have a paid account (where you provide credit card details). For more information about creating a paid account, see Upgrading your account. To create a Plus plan Discovery service instance, go to the Discovery resource page in the IBM Cloud catalog and create a Plus plan service instance.

    If you decide to stop using the Plus plan and don't want to pay for it, delete the Plus plan service instance before the 30-day trial period ends.

  2. You should have access to the Discovery doc-tutorial-downloads repository to download the sample applications and data.

Additional prerequisites for the Granite application

  1. Set up an instance of Watson Machine Learning. For more information about the pricing plans, see Watson Machine Learning.

  2. Create an API key for IBM Cloud. For more information, see Understanding API keys.

Additional prerequisites for the Slate application

  1. Set up an instance of Watson Machine Learning. For more information about the pricing plans, see Watson Machine Learning.

  2. Set up an instance of Cloud Pak for Data 4.7.x or later, and install Watson Studio and Watson Machine Learning.

  3. Create an API key for IBM Cloud. For more information, see Understanding API keys.

  4. Create an API key for IBM Cloud Pak for Data. For more information, see Getting Started with IBM Cloud Paks.

Regex - Entity Extraction, document classification, and sentence classification by using regular expressions

In this sample, we are using IBM Cloud Code Engine as the infrastructure environment for the application of webhook enrichment. However, you can deploy the application in any other environment.

  1. Deploy the webhook enrichment application to IBM Cloud Code Engine.

    1. Create a project in IBM Cloud Code Engine. For more information, see Create a project.

    2. Create a secret in the project. For more information, see Creating secrets.

      This secret must contain the following key-value pairs:

      • WD_API_URL: The API endpoint URL of your Discovery instance.
      • WD_API_KEY: The API key of your Discovery instance.
      • WEBHOOK_SECRET: A key to pass with the request that can be used to authenticate with the application. For example,purple_unicorn.
    3. Deploy the application from the sample repository source code. For more information, see Deploying your app from repository source code.

      In Create application, click Specify build details and enter these details.

      • For source, specify:

        • Code repo URL: URL of the sample code repository Discovery doc-tutorial-downloads page
        • Code repo access: None
        • Branch name: master
        • Context directory: discovery-data/webhook-enrichment-sample/regex
      • Strategy: Dockerfile

      • Output: Enter your container image registry information

      • Open Environment variables (optional), and add the following environment variables:

        • Define as: Reference to full secret
        • Secret: The name of the secret that you created in the project in the previous step

      You can set the Min number of instances to 1.

    4. Ensure that the application status changes to Ready.

  2. Configure the Discovery webhook enrichment. For more information, see Configuring the webhook enrichment.

  3. Ingest documents to Discovery and see the results.

    1. Upload nhtsa.csv from Discovery doc-tutorial-downloads to the collection.
    2. Find the webhook enrichment results by previewing your query results after the document processing is complete.

Granite - Entity Extraction by using a foundation model of watsonx.ai

In this sample, we extract entities from an email by using watsonx.ai Granite model. We are using IBM Cloud Code Engine as the infrastructure environment for the application of webhook enrichment. However, you can deploy the application in any other environment.

  1. Deploy the webhook enrichment application to IBM Cloud Code Engine.

    1. Create a project in IBM Cloud Code Engine. For more information, see Create a project.

    2. Create a secret in the project. For more information, see Creating secrets.

      This secret must contain the following key-value pairs:

      • WD_API_URL: The API endpoint URL of your Discovery instance.
      • WD_API_KEY: The API key of your Discovery instance.
      • WEBHOOK_SECRET: A key to pass with the request that can be used to authenticate with the application. For example,purple_unicorn.
      • IBM_CLOUD_API_KEY: The API key of IBM Cloud. It is used to access Watson Machine Learning API.
      • WML_ENDPOINT_URL: The API endpoint URL of your Watson Machine Learning. For more information, see the Machine Learning documentation.
      • WML_INSTANCE_CRN: The CRN of your Watson Machine Learning instance. You can find your instance and CRN using ibmcloud command: ibmcloud resources.
    3. Deploy the application from the sample repository source code. For more information, see Deploying your app from repository source code.

      In Create application, click Specify build details and enter these details.

      • For source, specify:

        • Code repo URL: URL of the sample code repository Discovery doc-tutorial-downloads page
        • Code repo access: None
        • Branch name: master
        • Context directory: discovery-data/webhook-enrichment-sample/granite
      • Strategy: Dockerfile

      • Output: Enter your container image registry information

      • Open Environment variables (optional), and add the following environment variables:

        • Define as: Reference to full secret
        • Secret: The name of the secret that you created in the project in the previous step

      You can set the Min number of instances to 1.

    4. Ensure that the application status changes to Ready.

  2. Configure the Discovery webhook enrichment. For more information, see Configuring the webhook enrichment.

  3. Ingest documents to Discovery and see the results.

    1. Upload email.txt from Discovery doc-tutorial-downloads to the collection.
    2. Find the webhook enrichment results by previewing your query results after the document processing is complete.

Slate - Entity extraction with Watsonx.ai Slate model that is fine-tuned with labeled data exported from entity extractor workspace of Watson Discovery.

Slate models have the best cost performance trade-off for non-generative use cases. For fine tuning, it requires task-specific labeled data. You can prepare labeled data in Watson Discovery, fine-tune the Slate model in Watson Studio, and deploy the model in Watson Machine Learning. Once you deploy a fine-tuned model, you can create a webhook enrichment that enriches documents using that model in Watson Discovery.

  1. Prepare labeled data in Watson Discovery.

    1. Create an entity extractor workspace and label data. For more information, see Define custom entities.

    2. Download labeled data from the entity extractor workspace. For more information, see Exporting labeled data for an entity extractor.

      In this tutorial, you can use the sample labled data from Discovery doc-tutorial-downloads in subsequent steps.

  2. Fine tune the slate model in Watson Studio and deploy the model to Watson Machine Learning.

    1. Create a project in Watson Studio. For more information, see Creating a project.

    2. Create a deployment space in Watson Machine Learning. For more information, see Creating deployment spaces

    3. Create an environment template in the project. For more information, see Creating environment templates. You can create with the following options:

      • Type: Default
      • Hardware configuration
        • Reserve vCPU: 2
        • Reserve RAM (GB): 8
      • Software version: Runtime 23.1 on Python 3.10
    4. Create notebook in the project using the environment template as runtime from the notebook file. For more information about cxreating a notebook, see Creating notebooks. The notebook file is at Discovery doc-tutorial-downloads.

    5. Upload labeled data in the notebook. For more information, see Load data from local files.

    6. Fine tune and deploy the Slate model by running the notebook step-by-step and replacing certain variables.

  3. Deploy the webhook enrichment application to IBM Cloud Code Engine.

    1. Create a project in IBM Cloud Code Engine. For more information, see Create a project.

    2. Create a secret in the project. For more information, see Creating secrets.

      This secret must contain the following key-value pairs:

      • WD_API_URL: The API endpoint URL of your Discovery instance.
      • WD_API_KEY: The API key of your Discovery instance.
      • WEBHOOK_SECRET: A key to pass with the request that can be used to authenticate with the application. For example,purple_unicorn.
      • SCORING_API_HOSTNAME: The API hostname of your Watson Machine Learning scoring deployment that serves your fine-tuned slate model.
      • SCORING_DEPLOYMENT_ID: The ID of your Watson Machine Learning scoring deployment that serves your fine-tuned slate model.
      • SCORING_API_TOKEN: The API token used in bearer authorization to use your Watson Machine Learning scoring deployment that serves your fine-tuned Slate model. You can get a token by using the following command:
      curl -X POST {auth} \
      SCORING_API_TOKEN=$(
      curl -k -X POST 'https://{hostname of your cp4d instance}/icp4d-api/v1/authorize' \
                      --header "Content-Type: application/json" 
                      -d "{\"username\":\"admin\",\"api_key\":\"{api key of your cp4d instance}\"}" \
      | jq .token
      )
      
  4. Deploy the application from the sample repository source code. For more information, see Deploying your app from repository source code.

    1. In Create application, click Specify build details and enter these details.

      • For source, specify:

        • Code repo URL: URL of the sample code repository Discovery doc-tutorial-downloads page
        • Code repo access: None
        • Branch name: master
        • Context directory: discovery-data/webhook-enrichment-sample/slate
      • Strategy: Dockerfile

      • Output: Enter your container image registry information

      • Open Environment variables (optional), and add the following environment variables:

        • Define as: Reference to full secret
        • Secret: The name of the secret that you created in the project in the previous step

      You can set the Min number of instances to 1.

    2. Ensure that the application status changes to Ready.

  5. Configure the Discovery webhook enrichment. For more information, see Configuring the webhook enrichment.

  6. Ingest documents to Discovery and see the results.

    1. Upload a page of Annual report from Discovery doc-tutorial-downloads to the collection.
    2. Find the webhook enrichment results by previewing your query results after the document processing is complete.

Configuring the webhook enrichment

  1. Create a project.

  2. Create a webhook enrichment by using the Discovery API.

    curl -X POST {auth} \
    --header 'Content-Type: multipart/form-data' \
    --form 'enrichment={"name":"my-first-webhook-enrichment", \
    "type":"webhook", \
    "options":{"url":"{your_code_engine_app_domain}/webhook", \
        "secret":"{your_webhook_secret}", \
        "location_encoding":"utf-32"}}' \
    '{url}/v2/projects/{project_id}/enrichments?version=2023-03-31'
    
  3. Create a collection in the project and apply the webhook enrichment to the collection.

    curl -X POST {auth} \
    --header 'Content-Type: application/json' \
    --data '{"name":"my-collection", \
    "enrichments":[{"enrichment_id":"{enrichment_id}", \
        "fields":["text"]}]}' \
    '{url}/v2/projects/{project_id}/collections?version=2023-03-31'