Use NeuralSeek to return polished answers from existing help content

In this tutorial, you will use the Watson Discovery, watsonx Assistant, and NeuralSeek services that are available from the IBM Cloud catalog to create a virtual assistant that can answer questions about Watson Discovery. The assistant will generate answers by using the existing Watson Discovery product documentation as its knowledge base.

This tutorial shows the steps for creating a managed deployment of Discovery. However, you can create a Discovery service instance that is either hosted by IBM Cloud or installed in IBM Cloud Pak for Data and connect it to a NeuralSeek service instance.

Learning objectives

By the time you finish the tutorial, you will understand how to:

Create a Document Retrieval project in Discovery.
Upload PDF documents to your project, and apply a user-trained Smart Document Understanding model to your PDFs.
Connect your Discovery project to a NeuralSeek service instance. NeuralSeek is an AI-powered answer generation engine.
Create an assistant in watsonx Assistant and apply a NeuralSeek integration to it.
Add an action to your watsonx Assistant that connects to NeuralSeek for answers.
Use your assistant to answer questions about Discovery.

NeuralSeek is a third-party product that is provided by a vendor outside of IBM and is subject to a separate agreement between you and the third party, if you accept their terms. IBM is not responsible for the product and makes no privacy, security, performance, support, or other commitments regarding the product.

Duration

This tutorial will take approximately 4 to 5 hours to complete.

Prerequisite

Before you begin, you must set up a paid account with IBM Cloud.

You can complete this tutorial at no cost by using a Plus plan, which offers a 30-day trial at no cost. However, to create a Plus plan instance of the service, you must have a paid account (where you provide credit card details). For more information about creating a paid account, see Upgrading your account.
Create a Plus plan Discovery service instance.

Go to the Discovery resource page in the IBM Cloud catalog and create a Plus plan service instance.

Specify Dallas as the location.

As part of this tutorial, you will provision other services also. The services must be hosted in the same data location so that they can connect to one another. Because the NeuralSeek service is available only from Dallas, you will create all of the service instances in Dallas.

If you decide to stop using the Plus plan and don't want to pay for it, delete the Plus plan service instance before the 30-day trial period ends.

Get the product documentation

To use the Discovery product documentation as our knowledge base, we will download the product documentation as a PDF file.

From a web browser, go to the product documentation site.
```
https://cloud.ibm.com/docs/discovery-data
```
From the table of contents panel, click the overflow menu icon in the Product guide section, and then choose View as PDF.
Save the PDF file to your system by clicking the Save icon from the page header.
Use a PDF file editor to split the PDF document into two separate PDF files of similar size.

Splitting the PDF creates two smaller files that can be enriched faster in Discovery.

Create a Document Retrieval project

Now that you have the latest copy of the product documentation, add it to a Discovery project as your data source.

In Discovery, you will create a Document Retrieval project type. Documents that you add to a project of this type are automatically enriched in the following ways:

Entities, such as proper nouns, are identified and tagged.
Parts of speech are identified and tagged.

This tagged information is used later when a natural language phrase is submitted as a search query to return an accurate response.

Open a new web browser page.
From the Discovery Plus plan service page in IBM Cloud, click Launch Discovery.
From the My Projects page, click New Project.
Name your project Discovery documentation, and then click the Document Retrieval tile.

Project type options
Click Next.

You'll configure the data source for the project in the next step.

Upload data to the project

Add the documentation PDFs to your Discovery project.

From the Select data source page, click the Upload data tile, and then click Next.

Creating a collection from uploaded data
Name the collection Discovery docs part 1, and then click Next.
Click Drag and drop files here or upload, and then browse to add the first PDF file that you created earlier.
Click Finish.

Your file is processed as it is added to the collection.
From the navigation panel, click Manage collections, and then click New collection.

Adding a second collection
Repeat the previous steps to add the second PDF file as a collection named Discovery docs part 2.

After the data is uploaded, it is processed and indexed by Discovery. While the data is being processed, let's create our virtual assistant.

Create an assistant

For this tutorial, you will create an assistant with a single action. First, you must create a watsonx Assistant service instance.

Both Lite and Trial plan watsonx Assistant service instances are available at no cost. You will create a Trial plan.

From a new web browser tab, return to the IBM Cloud catalog.

Keep the Discovery page open in a separate tab, so you can switch between the two applications.
From the watsonx Assistant resource page in the IBM Cloud catalog, create a Trial plan watsonx Assistant service instance in the Dallas location.
From the watsonx Assistant plan service page in IBM Cloud, click Launch watsonx Assistant.

The watsonx Assistant product user interface is displayed where you can create your first assistant.
Add Discovery expert as the assistant name, and then click Next.
If you are asked to share information about you and your assistant, complete the required fields, and then click Next.

When you create an assistant, a web chat application is created for you automatically.
Click Create to create the assistant and the corresponding web chat app.

After a congratulatory message, the home page for your new assistant is displayed.

Shows the assistant page — Assistant home page

Before we add anything to our new assistant, let's check on the status of our data.

Prepare your data for retrieval

To improve the retrievability of the information in your PDF files, you will split the PDF files into many smaller documents. To do so, you will first teach Discovery about the structure of your PDF files, so it understands how subsections are formatted and can split the document by subsection.

Return to the web browser tab where your Discovery project is displayed.

The Improve and customize page for the last PDF file that you uploaded is displayed.
From the Improvement tools panel, expand Define structure, and then click New fields.

Opening the tool for defining fields
Choose the Discovery docs part 1 collection.

The Identify fields tab is displayed, where you can choose the type of Smart Document Understanding model that you want to use.
Click User-trained models, and then click Submit.

Creating a user-trained model
Click Apply changes and reprocess.

After some processing occurs, a representation of the document is displayed in the Smart Document Understanding tool. The tool shows you a view of the original document along with a representation of the document, where the text is replaced by blocks. The blocks represent field types.

Initially, the blocks are labeled as text because all of the document content is considered to be standard text by default, and is indexed in the text field.

We want to label all first- and second-level headings as subtitles instead of text.
From the thumbnails view, click the thumbnail for the first full-text page from the document to open the first page with real content.

The Smart Document Understanding tool
To annotate the document, click the subtitle label from the Field labels list. Then, click each block in the representation of the PDF page that represents a heading to change its label from text to subtitle.

Applying the subtitle label
After every subtitle on the current page is labeled, click Submit page.

The next page of the PDF file is displayed.

Next page is displayed for labeling
Repeat this process until the tool is able to label the headings correctly for you in a consistent way when new pages are loaded into the tool. At that point, click Apply changes and reprocess.

Congratulations! You have successfully trained a Smart Document Understanding (SDU) model that can recognize subtitles in your documents. Let's apply the same model to the other PDF file that you added to the project.
From the SDU editor toolbar, click the overflow menu icon from the page header, and choose Export model.
Save the .sdumodel file to your system in a location where you can access it again shortly.
From the navigation panel, click Manage collections, and then open the Discovery docs part 2 collection.
Open the Identify fields tab.
Click User-trained models, and then click Submit.
Click Apply changes and reprocess.
From the SDU editor toolbar, click the overflow menu icon , and choose Import model, and then click Select model.
Browse to find the .sdumodel file that you downloaded earlier, and then click Open.
Click Apply changes and reprocess to apply the same SDU model to the first collection.

Discovery reprocesses the data in its index to identify subtitles in the documents. While the data is being reprocessed, let's create our answer generator.

Create a NeuralSeek service instance

You can use a search extension in watsonx Assistant to connect your assistant directly to Discovery and return passages straight from the data source. However, we will add the NeuralSeek service between watsonx Assistant and Discovery in this tutorial. NeuralSeek retrieves the passages from Discovery and then converts them into answers that sound more conversational.

From a new web browser tab, return to the IBM Cloud catalog.

Keep the pages to the other services open in separate tabs, so you can switch between the different service instances.
From the NeuralSeek resource page in the IBM Cloud catalog, create a Lite plan service instance.
On the Configure page, add details about your Discovery service instance and customize the connection.
- You can get the service URL and API key from the Discovery service instance details page in IBM Cloud.
- The project ID is available from the IBM Cloud user interface. To get it, click Integrate and deploy from the navigation panel. Open the API Information page, and then copy the project ID.
- Set the document score range to 50%.
- Change the snippet character size to 400.
- Specify your company as the company display name.
- Change the minimum confidence percentage to 50.
Click Save.

Split your PDF documents

Now that subtitles are indexed properly in Discovery, use them as the basis for splitting the PDF files into many smaller documents.

Return to the web browser tab where your Discovery project is displayed.
Open the Manage fields tab for the current collection.
In the Split document on each occurrence of field, choose subtitle, and then click Apply changes and reprocess.

Split a document
From the navigation panel, click Manage collections, and then open the other collection.
Go to the Manage fields page, and then choose subtitle in the Split document on each occurrence of field.
Click Apply changes and reprocess.

The collections start to be reprocessed. After reindexing is finished, instead of containing one document each, the collections will contain several hundred documents each.

Shows the collections with many documents — The collections with more documents

While the index is being rebuilt, let's get our assistant ready.

Add an extension to your assistant

Connect your assistant to your NeuralSeek service instance.

Reopen the NeuralSeek service from IBM Cloud. You can find the instance in the AI and Machine Learning section of your resource list.
Click the Integrate tab and follow the instructions to set up the NeuralSeek custom extension for your assistant. Return to this procedure when you're ready to create the action.

Set up the NeuralSeek instance integration
From the watsonx Assistant navigation panel, click Actions, and then click New action.
Choose Quick start from templates, and then scroll to find and click the NeuralSeek Starter kit.

Choose Quick start from template
Click Select this starter kit, and then click Add templates.
Click to open the NeuralSeek search action that you just added to the assistant.

Add the following user example queries to the first step in the action:

What Watson Discovery project types are available and what do they do?

What external data sources are supported by Watson Discovery?

Can I add a custom dictionary to Watson Discovery?

How do I use the Content Mining application?

When should I add query expansions to my project?

Which file types support Smart Document Understanding models?

Can I enable optical character recognition for all file types?

Does my data have to be written in English?

watsonx Assistant uses the sample questions to recognize the types of user questions it should route to this action.

Click to open Step 3 for editing.

In the And then section, click Edit extension.

Choose NeuralSeek, and then click Apply.

Set up the NeuralSeek extension
Click to open Step 6 for editing.

This step shows a link that users can click to get more information. We want this link to go directly to the product documentation on the IBM Cloud Docs site.

Change the hypertext reference in the anchor HTML element to contain the following URL:
```
<a href="https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-about" target="_blank">
```
{: caption="Change the URL for the More information link" caption-side="bottom"}
Save your changes, and then click the X to close the step.

Congratulations! You successfully created an action that recognizes questions about Discovery, and gets its answers from the connected NeuralSeek extension.

Configure the web chat for your assistant

To preview your assistant, you will use the built-in web chat as the chat user interface for interacting with the assistant.

From the navigation panel in watsonx Assistant, click Environments.

The draft environment is displayed. It shows that a web chat is connected to your assistant. You also can see that the web chat is connected to the NeuralSeek extension.

Environment diagram
Click the Web chat tile to edit the web chat.

We don't want to add multiple starter questions, so we're going to turn off the home screen for the web chat. Click the Home screen tab. Set the switcher to Off, and then click Save and exit.

Web chat home screen disabled

You're ready to preview your assistant!

Preview the assistant

To preview an assistant that connects to data that is stored in Discovery, you must preview the assistant from the Environments page. When you preview the web chat independently, the assistant is not able to retrieve data from Discovery; it needs the environment resources to be able to connect to Discovery.

From the Environments page, click Preview this environment.

A sample web page is displayed that includes a chat icon .
Click the chat icon to open the web chat window.

Web chat welcome message
Enter the following text question:
```
What project types are available?
```
The correct answer is returned and it includes a link to the product documentation.

Web chat returns search response
Submit a question that wasn't used as a query example when you created the action.
```
How do you define synonyms in Watson Discovery?
```
A detailed answer is returned.

Web chat returns a detailed answer
Optionally ask the assistant other questions.

If the assistant doesn't know the answer, reword the question to include “in Watson Discovery” to make it clearer that you are asking about how something works in Discovery specifically.

Congratulations! You successfully created an assistant that can answer questions about Discovery by retrieving information from the product documentation by way of the NeuralSeek service.

Summary

In this tutorial, you created a Watson Discovery Document Retrieval project with uploaded PDF files that contain the Discovery product documentation. Separately, you created a watsonx Assistant virtual assistant with a single action that can recognize user questions about Discovery. You added a custom extension to your assistant that connects to a third-party service called NeuralSeek that gets the correct answer from Discovery and rewords the response. Finally, you tested your virtual assistant by asking a question and getting an accurate and well-written response.

Next steps

The assistant that you created is available from the draft environment. Next, you can publish your assistant to a production environment and deploy it. You can deploy the assistant in various ways. For more information, see Overview: Previewing and publishing.