Use built-in Watson NLP to find common terms
Take advantage of award-winning Watson Natural Language Processing (NLP) capabilities by adding prebuilt enrichments to your documents.
With Watson NLP, you can identify and tag meaningful information in your collections so you can understand what it all means and make more informed decisions.
The following Watson NLP enrichments are available:
- Entities: Recognizes proper nouns such as people, cities, and organizations that are mentioned in the content.
- Keywords: Recognizes significant terms in your content.
- Part of Speech: Identifies the parts of speech (nouns and verbs, for example) in the content.
- Sentiment: Understands the overall sentiment of the content.
The following other pretrained enrichments are available with Discovery:
Watson NLP enrichments
For example, the following screen capture shows a transcript of the US Declaration of Independence that was added to a Discovery collection where the Entities and Keywords enrichments are enabled. The mentions that are recognized by the enrichments are highlighted in the document text.
Some of the NLP enrichments are applied to projects automatically. You don't need to apply them yourself if you are using one of these project types.
Default enrichments per project type
Some prebuilt enrichments are applied automatically to collections in a project based on the project type. The following table shows the default enrichments that are applied to each project type.
Enrichment | Document Retrieval | Document Retrieval for Contracts | Conversational Search | Content Mining |
---|---|---|---|---|
Contracts | ||||
Entities | ||||
Keywords | ||||
Part of Speech | ||||
Sentiment of Document | ||||
Table Understanding |
For more information about the following prebuilt enrichments, see the following topics:
For more information about how to create custom enrichments, see Adding domain-specific resources.
For more information about how to get the most from enrichments, read the Enriching your documents can make search more effective blog post.
For more information about how to apply enrichments by using the API, see Applying enrichments by using the API.
Add enrichments
To add an NLP enrichment, complete the following steps:
-
Open your project and go to the Manage collections page.
-
Click to open the collection that you want to enrich.
-
Open the Enrichments tab.
-
Scroll to find the NLP enrichment that you want to apply to your documents.
Both built-in enrichments and custom enrichments are listed. Built-in enrichments have a type value of
System
. -
Choose one or more fields to apply the enrichment to.
You can apply enrichments to the
text
andhtml
fields, and to custom fields that were added from uploaded JSON or CSV files or from the Smart Document Understanding (SDU) tool. -
Click Apply changes and reprocess.
Enrichments that you enable are applied to the documents in random order. For information about how to remove an enrichment, see Managing enrichments.
Entities
Identifies entities. Entities are terms that typically represent proper nouns such as people, cities, and organizations that are mentioned in the data collection. Discovery can recognize entities that are part of an entity type system that is defined by the Watson Natural Language Processing (NLP) service.
If you want to be able to identify uncommon terms that are significant to your business, you can train your own model to recognize custom entities. For more information, see Entity extractor.
The Watson NLP entity extractor service that is used by Discovery is called the NLU type system. The name originates from the fact that the type system is used by the Watson Natural Language Understanding (NLU) service in addition to the Watson Discovery service. However, it is the Watson NLP implementation of the type system that is used directly by Discovery, not the Watson NLU implementation. As a result, the two implementations can produce different results. To get a general idea of the types of entities that are recognized by the service, see Entities.
The following screen capture shows that the Entities enrichment recognizes the terms Systems of Government and King of Great Britain (among others) and tags them as entity mentions.
From the JSON view of the document, you can see the underlying JSON structure of the entity mentions.
If you want to search for the Organization entity type, for example, you can copy all of the JSON content into a text editor and search for Organization
. Click the Copy icon from the root of the JSON tree view.
Example
Input
"IBM is an American multinational technology company headquartered in Armonk."
Response
In the JSON output:
text
= string. The entity texttype
= string. The entity type, such asOrganization
,Location
,Person
,Number
.mentions
= array. The entity mentions and locationsmodel_name
= string. For custom models, this field contains the user-provided model name. Otherwise, this field contains the default name of the model, such aswatson_knowledge_studio
,dictionary
,character_pattern
, ornatural_language_understanding
{
"entities": [
{
"model_name": "natural_language_understanding",
"mentions": [
{
"confidence": 0.8317045,
"location": {
"end": 3,
"begin": 0
},
"text": "IBM"
}
],
"text": "IBM",
"type": "Organization"
},
{
"model_name": "natural_language_understanding",
"mentions": [
{
"confidence": 0.6114863,
"location": {
"end": 75,
"begin": 69
},
"text": "Armonk"
}
],
"text": "Armonk",
"type": "Location"
}
]
}
Keywords
Returns important keywords in the content.
For example, the following screen capture shows highlighted terms from the US Declaration of Independence that are recognized by the Keywords enrichment.
From the JSON view of the document, you can see the underlying JSON structure of the Declaration
keyword mention.
Example
Input
"Watson Discovery is an award-winning AI search technology."
Response
In the JSON output:
text
= The keyword textmentions
= The entity mentions and locations
{
"keywords": [
{
"mentions": [
{
"location": {
"end": 157,
"begin": 141
},
"text": "Watson Discovery"
}
],
"text": "Watson Discovery",
"relevance": 0.503613
},
{
"mentions": [
{
"location": {
"end": 177,
"begin": 164
},
"text": "award-winning"
}
],
"text": "award-winning",
"relevance": 0.728722
},
{
"mentions": [
{
"location": {
"end": 198,
"begin": 181
},
"text": "search technology"
}
],
"text": "search technology",
"relevance": 0.779356
}
]
}
Keywords limits
The Keywords enrichment can identify up to 50 keywords, each with one or many mentions, per document.
Part of speech
Recognizes and tags parts of speech, including nouns, verbs, adjectives, adverbs, conjunctions, interjections, and numerals.