IBM Cloud Docs
Use built-in Watson NLP to find common terms

Use built-in Watson NLP to find common terms

Take advantage of award-winning Watson Natural Language Processing (NLP) capabilities by adding prebuilt enrichments to your documents.

With Watson NLP, you can identify and tag meaningful information in your collections so you can understand what it all means and make more informed decisions.

The following Watson NLP enrichments are available:

  • Entities: Recognizes proper nouns such as people, cities, and organizations that are mentioned in the content.
  • Keywords: Recognizes significant terms in your content.
  • Part of Speech: Identifies the parts of speech (nouns and verbs, for example) in the content.
  • Sentiment: Understands the overall sentiment of the content.

The following other pretrained enrichments are available with Discovery:

Watson NLP enrichments

For example, the following screen capture shows a transcript of the US Declaration of Independence that was added to a Discovery collection where the Entities and Keywords enrichments are enabled. The mentions that are recognized by the enrichments are highlighted in the document text.

Shows an excerpt of the US Declaration of Independence with several terms highlighted.
Figure 1. Excerpt of the US Declaration of Independence with highlighted terms

Some of the NLP enrichments are applied to projects automatically. You don't need to apply them yourself if you are using one of these project types.

Default enrichments per project type

Some prebuilt enrichments are applied automatically to collections in a project based on the project type. The following table shows the default enrichments that are applied to each project type.

Default enrichments per project type
This table has row and column headers. The row headers identify project types. The column headers identify different enrichments. To understand which enrichments are applied to a project type by default, go to the row that describes the enrichments, and find the columns for the project type that you are interested in.
Enrichment Document Retrieval Document Retrieval for Contracts Conversational Search Content Mining
Contracts checkmark icon
Entities checkmark icon checkmark icon
Keywords
Part of Speech checkmark icon
Sentiment of Document
Table Understanding checkmark icon

For more information about the following prebuilt enrichments, see the following topics:

For more information about how to create custom enrichments, see Adding domain-specific resources.

For more information about how to get the most from enrichments, read the Enriching your documents can make search more effective blog post.

For more information about how to apply enrichments by using the API, see Applying enrichments by using the API.

Add enrichments

To add an NLP enrichment, complete the following steps:

  1. Open your project and go to the Manage collections page.

  2. Click to open the collection that you want to enrich.

  3. Open the Enrichments tab.

  4. Scroll to find the NLP enrichment that you want to apply to your documents.

    Both built-in enrichments and custom enrichments are listed. Built-in enrichments have a type value of System.

  5. Choose one or more fields to apply the enrichment to.

    You can apply enrichments to the text and html fields, and to custom fields that were added from uploaded JSON or CSV files or from the Smart Document Understanding (SDU) tool.

  6. Click Apply changes and reprocess.

Enrichments that you enable are applied to the documents in random order. For information about how to remove an enrichment, see Managing enrichments.

Entities

Identifies entities. Entities are terms that typically represent proper nouns such as people, cities, and organizations that are mentioned in the data collection. Discovery can recognize entities that are part of an entity type system that is defined by the Watson Natural Language Processing (NLP) service.

If you want to be able to identify uncommon terms that are significant to your business, you can train your own model to recognize custom entities. For more information, see Entity extractor.

The Watson NLP entity extractor service that is used by Discovery is called the NLU type system. The name originates from the fact that the type system is used by the Watson Natural Language Understanding (NLU) service in addition to the Watson Discovery service. However, it is the Watson NLP implementation of the type system that is used directly by Discovery, not the Watson NLU implementation. As a result, the two implementations can produce different results. To get a general idea of the types of entities that are recognized by the service, see Entities.

The following screen capture shows that the Entities enrichment recognizes the terms Systems of Government and King of Great Britain (among others) and tags them as entity mentions.

Shows the declaration with the terms Governments and King of Great Britain highlighted.
Figure 2. The recognized entities, Governments and King of Great Britain, are highlighted

From the JSON view of the document, you can see the underlying JSON structure of the entity mentions.

Shows the JSON view of the Systems of Government and King of Great Britain entities that are identified in the document
Figure 3. JSON representation of recognized entity mentions

If you want to search for the Organization entity type, for example, you can copy all of the JSON content into a text editor and search for Organization. Click the Copy icon from the root of the JSON tree view.

Example

Input

"IBM is an American multinational technology company headquartered in Armonk."

Response

In the JSON output:

  • text = string. The entity text
  • type = string. The entity type, such as Organization, Location, Person, Number.
  • mentions = array. The entity mentions and locations
  • model_name = string. For custom models, this field contains the user-provided model name. Otherwise, this field contains the default name of the model, such as watson_knowledge_studio, dictionary, character_pattern, or natural_language_understanding
{
  "entities": [
    {
      "model_name": "natural_language_understanding",
      "mentions": [
        {
          "confidence": 0.8317045,
          "location": {
            "end": 3,
            "begin": 0
          },
          "text": "IBM"
        }
      ],
      "text": "IBM",
      "type": "Organization"
    },
    {
      "model_name": "natural_language_understanding",
      "mentions": [
        {
          "confidence": 0.6114863,
          "location": {
            "end": 75,
            "begin": 69
          },
        "text": "Armonk"
        }
      ],
      "text": "Armonk",
      "type": "Location"
    }
  ]
}

Entity limits

The Entities enrichment can identify up to 50 entities, each with one or many mentions, per document.

Keywords

Returns important keywords in the content.

For example, the following screen capture shows highlighted terms from the US Declaration of Independence that are recognized by the Keywords enrichment.

Shows the keywords that are recognized in the document text
Figure 4. Terms recognized by the Keywords enrichment

From the JSON view of the document, you can see the underlying JSON structure of the Declaration keyword mention.

Shows the JSON view of keywords that are identified in the document
Figure 5. JSON representation of Keywords enrichment mentions

Example

Input

"Watson Discovery is an award-winning AI search technology."

Response

In the JSON output:

  • text = The keyword text
  • mentions = The entity mentions and locations
{
  "keywords": [
    {
      "mentions": [
        {
          "location": {
            "end": 157,
            "begin": 141
          },
          "text": "Watson Discovery"
        }
      ],
      "text": "Watson Discovery",
      "relevance": 0.503613
    },
    {
      "mentions": [
        {
          "location": {
           "end": 177,
            "begin": 164
          },
          "text": "award-winning"
        }
      ],
      "text": "award-winning",
      "relevance": 0.728722
    },
    {
      "mentions": [
        {
          "location": {
            "end": 198,
            "begin": 181
          },
          "text": "search technology"
        }
      ],
      "text": "search technology",
      "relevance": 0.779356
    }
  ]
}

Keywords limits

The Keywords enrichment can identify up to 50 keywords, each with one or many mentions, per document.

Part of speech

Recognizes and tags parts of speech, including nouns, verbs, adjectives, adverbs, conjunctions, interjections, and numerals.