Implementing dialog responses in a client application

A dialog node can respond to users with a response that includes text, images, or interactive elements such as clickable options. If you are building your own client application, you must implement the correct display of all response types that are returned by your dialog. For more information about dialog responses, see Responses.

Response output format

By default, responses from a dialog node are specified in the output.generic object in the response JSON returned from the /message API. The generic object contains an array of up to 5 response elements that are intended for any channel. The following JSON example shows a response that includes text and an image:

{
  "output": {
    "generic": [
      {
        "response_type": "text",
        "text": "OK, here's a picture of a dog."
      },
      {
        "response_type": "image",
        "source": "http://example.com/dog.jpg"
      }
    ],
    "text" : ["OK, here's a picture of a dog."]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

As this example shows, the text response (OK, here's a picture of a dog.) is also returned in the output.text array. The array is included for compatibility with earlier applications that do not support the output.generic format.

It is the responsibility of your client application to handle all response types. In this case, your application would need to display the specified text and image to the user.

Response types

Each element of a response is of one of the supported response types. Each response type is specified by using a different set of JSON properties, so the properties that are included for each response vary depending upon the response type. For complete information about the response model of the /message API, see the API Reference.

This section describes the available response types and how they are represented in the /message API response JSON. If you are using the Watson SDK, you can use the interfaces that are provided for your language to access the same objects.

The examples in this section show the format of the JSON data returned from the /message API at run time, and is different from the JSON format that is used to define responses within a dialog node. You can use the format in the examples to update output.generic with webhooks. To update output.generic with the JSON editor, see Defining responses by using the JSON editor).

Text

The text response type is used for ordinary text responses from the dialog:

{
  "output": {
    "generic":[
      {
        "response_type": "text",
        "text": "OK, you want to fly to Boston next Monday."
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

For compatibility, the same text is also included in the output.text array in the dialog response.

Image

The image response type instructs the client application to display an image, optionally accompanied by a title and description:

{
  "output": {
    "generic":[
      {
        "response_type": "image",
        "source": "http://example.com/image.jpg",
        "title": "Image example",
        "description": "This is an example image"
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

Your application is responsible for retrieving the image that is specified by the source property and displaying it to the user. If the optional title and description are provided, your application can display them in whatever way is appropriate (for example, rendering the title after the image and the description as hover text).

Video

The video response type instructs the client application to display a video, optionally accompanied by a title, description, and alt_text for accessibility:

{
  "output": {
    "generic":[
      {
        "response_type": "video",
        "source": "http://example.com/video.mp4",
        "title": "Video example",
        "description": "This is an example video",
        "alt_text": "A video showing a great example",
        "channel_options": {
          "chat": {
            "dimensions": {
              "base_height": 180
            }
          }
        }
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

Your application is responsible for retrieving the video that is specified by the source property and displaying it to the user. If the optional title and description are provided, your application can display them in whatever way is appropriate.

The optional channel_options.chat.dimensions.base_height property takes a number that signifies that the video is rendered at a width of 360 pixels. Your app uses this value to maintain the proper aspect ratio of the video if it is rendered in a nonstandard size.

Audio

The audio response type instructs the client application to play an audio file:

{
  "output": {
    "generic":[
      {
        "response_type": "audio",
        "source": "http://example.com/audio.mp3",
        "channel_options": {
          "voice_telephony": {
            "loop": true
          }
        }
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

Your application is responsible for playing the audio file.

The optional channel_options.voice_telephony.loop property takes a Boolean that signifies if the audio file is played as a continuous loop. This option is typically used for hold music that might need to continue for an undefined time.

iframe

The iframe response type instructs the client application to display content in an embedded iframe element, optionally accompanied by a title:

{
  "output": {
    "generic":[
      {
        "response_type": "iframe",
        "source": "http://example.com/iframe.html",
        "title": "My IFrame"
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

Your application is responsible for displaying the iframe content. Content in an embedded iframe is useful for displaying third-party content, or for content from your own site that you do not want to reauthor using the user_defined response type.

Pause

The pause response type instructs the application to wait for a specified interval before the next response:

{
  "output": {
    "generic":[
      {
        "response_type": "pause",
        "time": 500,
        "typing": false
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

This pause might be requested by the dialog to allow time for a request to complete, or to mimic the appearance of a human agent who might pause between responses. The pause can be of any duration up to 10 seconds.

A pause response is typically sent in combination with other responses. Your application pauses for the interval that is specified by the time property (in milliseconds) before the next response in the array. The optional typing property requests that the client application shows a user is typing indicator, if supported, to simulate a live agent.

Option

The option response type instructs the client application to display a user interface control that enables the user to select from a list of options, and then send input back to the assistant based on the selected option:

{
  "output": {
    "generic":[
      {
        "response_type": "option",
        "title": "Available options",
        "description": "Please select one of the following options:",
        "preference": "button",
        "options": [
          {
            "label": "Option 1",
            "value": {
              "input": {
                "text": "option 1"
              }
            }
          },
          {
            "label": "Option 2",
            "value": {
              "input": {
                "text": "option 2"
              }
            }
          }
        ]
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

Your app can display the specified options using any suitable user-interface control (for example, a set of buttons or a drop-down list). The optional preference property indicates the preferred type of control your app uses (button or dropdown), if supported. For the best user experience, a good practice is to present three or fewer options as buttons, and more than three options as a drop-down list.

For each option, the label property specifies the label text that appears for the option in the UI control. The value property specifies the input that is sent back to the assistant (using the /message API) when the user selects the corresponding option.

For an example of implementing option responses in a simple client application, see Example: Implementing option responses.

Suggestion

Plus

The suggestion response type is used by the disambiguation feature to suggest possible matches to clarify what the user wants to do. A suggestion response includes an array of suggestions, each one corresponding to a possible matching dialog node:

{
  "output": {
    "generic": [
      {
        "response_type": "suggestion",
        "title": "Did you mean:",
        "suggestions": [
          {
            "label": "I'd like to order a drink.",
            "value": {
              "intents": [
                {
                  "intent": "order_drink",
                  "confidence": 0.7330395221710206
                }
              ],
              "entities": [],
              "input": {
                "suggestion_id": "576aba3c-85b9-411a-8032-28af2ba95b13",
                "text": "I want to place an order"
              }
            },
            "output": {
              "text": [
                "I'll get you a drink."
              ],
              "generic": [
                {
                  "response_type": "text",
                  "text": "I'll get you a drink."
                }
              ],
              "nodes_visited_details": [
                {
                  "dialog_node": "node_1_1547675028546",
                  "title": "order drink",
                  "user_label": "I'd like to order a drink.",
                  "conditions": "#order_drink"
                }
              ]
            },
            "source_dialog_node": "root"
          },
          {
            "label": "I need a drink refill.",
            "value": {
              "intents": [
                {
                  "intent": "refill_drink",
                  "confidence": 0.2529746770858765
                }
              ],
              "entities": [],
              "input": {
                "suggestion_id": "6583b547-53ff-4e7b-97c6-4d062270abcd",
                "text": "I need a drink refill"
              }
            },
            "output": {
              "text": [
                "I'll get you a refill."
              ],
              "generic": [
                {
                  "response_type": "text",
                  "text": "I'll get you a refill."
                }
              ],
              "nodes_visited_details": [
                {
                  "dialog_node": "node_2_1547675097178",
                  "title": "refill drink",
                  "user_label": "I need a drink refill.",
                  "conditions": "#refill_drink"
                }
              ]
            },
            "source_dialog_node": "root"
          }
        ]
      }
    ],
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

The structure of a suggestion response is similar to the structure of an option response. As with options, each suggestion includes a label that can be displayed to the user and a value specifying the input that is sent back to the assistant if the user chooses the corresponding suggestion. To implement suggestion responses in your application, you can use the same approach that you would use for option responses.

For more information about the disambiguation feature, see Disambiguation.

Search

This feature is available only to users with a paid plan.

The search response type is used by a search skill to return the results from an IBM Watson® Discovery search. A search response includes an array of results, each of which provides information about a match that is returned from the Discovery search query:

{
  "output": {
    "generic": [
      {
        "response_type": "search",
        "header": "I found the following information that might be helpful.",
        "results": [
          {
            "title": "About",
            "body": "IBM watsonx Assistant is a cognitive bot that you can customize for your business needs, and deploy across multiple channels to bring help to your customers where and when they need it.",
            "url": "https://cloud.ibm.com/docs/watson-assistant?topic=watson-assistant-about",
            "id": "6682eca3c5b3778ccb730b799a8063f3",
            "result_metadata": {
              "confidence": 0.08401551980328191,
              "score": 0.73975396
            },
            "highlight": {
              "Shortdesc": [
                "IBM <em>watsonx</em> <em>Assistant</em> is a cognitive bot that you can customize for your business needs, and deploy across multiple channels to bring help to your customers where and when they need it."
              ],
              "url": [
                "https://cloud.ibm.com/docs/<em>assistant</em>?topic=<em>watson-assistant</em>-about"
              ],
              "body": [
                "IBM <em>watsonx</em> <em>Assistant</em> is a cognitive bot that you can customize for your business needs, and deploy across multiple channels to bring help to your customers where and when they need it."
              ]
            }
          }
        ]
      }
    ]
  },
  "user_id": "58e1b04e-f4bb-469a-9e4c-dffe1d4ebf23"
}

For each search result, the title, body, and url properties include content that is returned from the Discovery query. The search integration configuration determines which fields in the Discovery collection are mapped to these fields in the response. Your application can use these fields to display the results to the user (for example, you might use the body text to show an abstract or description of the matching document, and the url value to create a link that the user can click to open the document).

In addition, the header property provides a message to display to the user about the results of the search. When a search is successful, header provides introductory text to be displayed before the search results (for example, I found the following information that might be helpful.). Different message text indicates that the search did not return any results, or that the connection to the Discovery service failed. You can customize these messages in the search skill configuration.

User-defined

A user-defined response type can contain up to 5000 KB of data to support a type of a response you implemented in your client. For example, you might define a user-defined response type to display a special color-coded card, or to format data in a table or graphic.

The user_defined property of the response is an object that can contain any valid JSON data:

{
  "output": {
    "generic":[
      {
        "response_type": "user_defined",
        "user_defined": {
          "field_1": "String value",
          "array_1": [
            1,
            2
          ],
          "object_1": {
            "property_1": "Another string value"
          }
        }
      }
    ]
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

Your application can parse and display the data in any way you choose.

Example: Implementing option responses

To show how a client application might handle option responses, which prompt the user to select from a list of choices, we can extend the client example that is used in Building a client application. The example is a simplified client app that uses standard input and output to handle three intents (sending a greeting, showing the current time, and exiting from the app):

Welcome to the example!
>> hello
Good day to you.
>> what time is it?
The current time is 12:40:42 PM.
>> goodbye
OK! See you later.

If you want to try the example code, set up the required workspace and obtain the API details that you need. For more information, see Building a client application.

Receiving an option response

The option response can be used when you want to present the user with a finite list of choices, rather than interpreting natural language input. The response can be used in any situation where you want to enable the user to quickly select from a set of unambiguous options.

In our simplified client app, we use this capability to select from a list of the actions the assistant supports (greetings, displaying the time, and exiting). In addition to the three intents previously shown (#hello, #time, and #goodbye), the example workspace supports a fourth intent: #menu, which is matched when the use asks to see a list of available actions.

When the workspace recognizes the #menu intent, the dialog responds with an option response:

{
  "output": {
    "generic": [
      {
        "title": "What do you want to do?",
        "options": [
          {
            "label": "Send greeting",
            "value": {
              "input": {
                "text": "hello"
              }
            }
          },
          {
            "label": "Display the local time",
            "value": {
              "input": {
                "text": "time"
              }
            }
          },
          {
            "label": "Exit",
            "value": {
              "input": {
                "text": "goodbye"
              }
            }
          }
        ],
        "response_type": "option"
      }
    ],
    "intents": [
      {
        "intent": "menu",
        "confidence": 0.6178638458251953
      }
    ],
    "entities": []
  },
  "user_id": "faf4a112-f09f-4a95-a0be-43c496e6ac9a"
}

The option response contains multiple options to be presented to the user. Each option includes two objects, label and value. The label is a user-facing string that identifies the option; the value specifies the corresponding message input that is sent back to the assistant if the user chooses the option.

Our client app needs to use the data in this response to build the output we show to the user, and to send the appropriate message to the assistant.

Listing available options

The first step in handling an option response is to display the options to the user, using the text specified by the label property of each option. You can display the options using any technique that your application supports, typically a drop-down list or a set of clickable buttons. The optional preference property of an option response, if specified, indicates which type of display the application uses if possible.

Our simplified example uses standard input and output, so we don't have access to a real UI. Instead, we present the options as a numbered list.

// Option example 1: lists options.

const prompt = require('prompt-sync')();
const AssistantV2 = require('ibm-watson/assistant/v2');
const { IamAuthenticator } = require('ibm-watson/auth');

// Set up Assistant service wrapper.
const service = new AssistantV2({
  version: '2019-02-28',
  authenticator: new IamAuthenticator({
    apikey: '{apikey}', // replace with API key
  })
});

const assistantId = '{assistant_id}'; // replace with assistant ID
let sessionId;

// Create session.
service
  .createSession({
    assistantId,
  })
  .then(res => {
    sessionId = res.result.session_id;
    sendMessage({
      messageType: 'text',
      text: '',  // start conversation with empty message
    });
  })
  .catch(err => {
    console.log(err); // something went wrong
  });

// Send message to assistant.
function sendMessage(messageInput) {
  service
    .message({
      assistantId,
      sessionId,
      input: messageInput,
    })
    .then(res => {
      processResponse(res.result);
    })
    .catch(err => {
      console.log(err); // something went wrong
    });
}

// Process the response.
function processResponse(response) {

  let endConversation = false;

  // Check for client actions requested by the assistant.
  if (response.output.actions) {
    if (response.output.actions[0].type === 'client'){
      if (response.output.actions[0].name === 'display_time') {
        // User asked what time it is, so we output the local system time.
        console.log('The current time is ' + new Date().toLocaleTimeString() + '.');
      } else if (response.output.actions[0].name === 'end_conversation') {
        // User said goodbye, so we're done.
        console.log(response.output.generic[0].text);
        endConversation = true;
      }
    }
  } else {
    // Display the output from assistant, if any. Supports only a single
    // response.
    if (response.output.generic) {
      if (response.output.generic.length > 0) {
        switch (response.output.generic[0].response_type) {
          case 'text':
            // It's a text response, so we just display it.
            console.log(response.output.generic[0].text);
            break;
          case 'option':
            // It's an option response, so we'll need to show the user
            // a list of choices.
            console.log(response.output.generic[0].title);
            const options = response.output.generic[0].options;
            // List the options by label.
            for (let i = 0; i < options.length; i++) {
              console.log((i+1).toString() + '. ' + options[i].label);
            }
            break;
        }
      }
    }
  }

  // If we're not done, prompt for the next round of input.
  if (!endConversation) {
    const newMessageFromUser = prompt('>> ');
    newMessageInput = {
      messageType: 'text',
      text: newMessageFromUser,
    }
    sendMessage(newMessageInput);
  } else {
    // We're done, so we delete the session.
    service
      .deleteSession({
        assistantId,
        sessionId,
      })
      .then(res => {
        return;
      })
      .catch(err => {
        console.log(err); // something went wrong
      });
  }
}

Let's take a closer look at the code that outputs the response from the assistant. Now, instead of assuming a text response, the application supports both the text and option response types:

    // Display the output from assistant, if any. Supports only a single
    // response.
    if (response.output.generic) {
      if (response.output.generic.length > 0) {
        switch (response.output.generic[0].response_type) {
          case 'text':
            // It's a text response, so we just display it.
            console.log(response.output.generic[0].text);
            break;
          case 'option':
            // It's an option response, so we'll need to show the user
            // a list of choices.
            console.log(response.output.generic[0].title);
            const options = response.output.generic[0].options;
            // List the options by label.
            for (let i = 0; i < options.length; i++) {
              console.log((i+1).toString() + '. ' + options[i].label);
            }
            break;
        }
      }
    }

If response_type=text, we display the output, as before. But if response_type=option, we must do a bit more work. First, we display the value of the title property, which serves as lead-in text to introduce the list of options; then, we list the options, using the value of the label property to identify each one. (A real-world application would show these labels in a drop-down list or as the labels on clickable buttons.)

You can see the result by triggering the #menu intent:

Welcome to the example!
>> what are the available actions?
What do you want to do?
1. Send greeting
2. Display the local time
3. Exit
>> 2
Sorry, I have no idea what you're talking about.
>>

As you can see, the application is now correctly handling the option response by listing the available choices. However, we aren't yet translating the user's choice into meaningful input.

Selecting an option

In addition to the label, each option in the response also includes a value object, which contains the input data that is sent back to the assistant if the user chooses the corresponding option. The value.input object is equivalent to the input property of the /message API, which means that we can send this object back to the assistant as-is.

We set a new promptOption flag when the client receives an option response. When this flag is true, we know that we want to take the next round of input from value.input rather than accepting natural language text input from the user. Again, we don't have a real user interface, so we prompt the user to select a valid option from the list by number.

// Option example 2: sends back selected option value.

const prompt = require('prompt-sync')();
const AssistantV2 = require('ibm-watson/assistant/v2');
const { IamAuthenticator } = require('ibm-watson/auth');

// Set up Assistant service wrapper.
const service = new AssistantV2({
  version: '2019-02-28',
  authenticator: new IamAuthenticator({
    apikey: '{apikey}', // replace with API key
  })
});

const assistantId = '{assistant_id}'; // replace with assistant ID
let sessionId;

// Create session.
service
  .createSession({
    assistantId,
  })
  .then(res => {
    sessionId = res.result.session_id;
    sendMessage({
      messageType: 'text',
      text: '',  // start conversation with empty message
    });
  })
  .catch(err => {
    console.log(err); // something went wrong
  });

// Send message to assistant.
function sendMessage(messageInput) {
  service
    .message({
      assistantId,
      sessionId,
      input: messageInput,
    })
    .then(res => {
      processResponse(res.result);
    })
    .catch(err => {
      console.log(err); // something went wrong
    });
}

// Process the response.
function processResponse(response) {

  let endConversation = false;
  let promptOption = false;

  // Check for client actions requested by the assistant.
  if (response.output.actions) {
    if (response.output.actions[0].type === 'client'){
      if (response.output.actions[0].name === 'display_time') {
        // User asked what time it is, so we output the local system time.
        console.log('The current time is ' + new Date().toLocaleTimeString() + '.');
      } else if (response.output.actions[0].name === 'end_conversation') {
        // User said goodbye, so we're done.
        console.log(response.output.generic[0].text);
        endConversation = true;
      }
    }
  } else {
    // Display the output from assistant, if any. Supports only a single
    // response.
    if (response.output.generic) {
      if (response.output.generic.length > 0) {
        switch (response.output.generic[0].response_type) {
          case 'text':
            // It's a text response, so we just display it.
            console.log(response.output.generic[0].text);
            break;
          case 'option':
              // It's an option response, so we'll need to show the user
              // a list of choices.
              console.log(response.output.generic[0].title);
              const options = response.output.generic[0].options;
              // List the options by label.
              for (let i = 0; i < options.length; i++) {
                console.log((i+1).toString() + '. ' + options[i].label);
              }
            promptOption = true;
            break;
        }
      }
    }
  }

  // If we're not done, prompt for the next round of input.
  if (!endConversation) {
    let messageInput;
    if (promptOption == true) {
      // Prompt for a valid selection from the list of options.
      let choice;
      do {
        choice = prompt('? ');
          if (isNaN(choice)) {
            choice = 0;
          }
      } while (choice < 1 || choice > response.output.generic[0].options.length);
      const value = response.output.generic[0].options[choice-1].value;
      // Use message input from the selected option.
      messageInput = value.input;
    } else {
      // We're not showing options, so we just prompt for the next
      // round of input.
      const newText = prompt('>> ');
      messageInput = {
        text: newText,
      }
    }
    sendMessage(messageInput); 
  } else {
    // We're done, so we delete the session.
    service
      .deleteSession({
        assistantId,
        sessionId,
      })
      .then(res => {
        return;
      })
      .catch(err => {
        console.log(err); // something went wrong
      });
  }
}

All that we must do is use the value.input object from the selected response as the next round of message input, rather than building a new input object using text input. The assistant then responds exactly as if the user typed the input text directly.

Welcome to the example!
>> hi
Good day to you.
>> what are the choices?
What do you want to do?
1. Send greeting
2. Display the local time
3. Exit
? 2
The current time is 1:29:14 PM.
>> bye
OK! See you later.

We can now access all of the functions of the assistant either by making natural-language requests or by selecting from a menu of options.

The same approach is used for suggestion responses as well. If your plan supports the disambiguation feature, you can use similar logic to prompt users to select from a list when it isn't clear which of several possible options is correct. For more information about the disambiguation feature, see Disambiguation.