Building queries with the Discovery Query Language
In this tutorial, we will learn how to write a few different types of queries in the Discovery Query Language.
For more information about writing queries, see:
- Query concepts
- Query reference (includes the list of parameters, operators, and aggregations available in the Discovery Query Language)
These example queries are built using the Discovery tooling. If you'd like to use the API instead, add the query parameters to your API call. For more information and examples, see the Queries section of the API reference.
You can also write natural language queries (such as "IBM Watson partnerships") using the Discovery tooling. This tutorial primarily focuses on how to write queries with Discovery Query Language because your requirements might necessitate a structured query, and filters and aggregations must be written in the Discovery Query Language.
Before you begin
Go to the Manage data screen and create a new collection named IBM Press Releases, and add these four documents to it: test-doc1.html , test-doc2.html , test-doc3.html , test-doc4.html
In some browsers, the link open in a new window instead of saving locally. If this occurs, select Save As
in your browser's File
menu to save a copy of the file.
Step 1: Quick tour of the Discovery data schema
Let's start out by getting to know the Discovery JSON. To understand how to build a query using the Discovery Query Language, it helps to be familiar with the JSON produced by Discovery after it enriches the documents in your collection.
-
Launch the Discovery tooling. On the Manage data screen, choose the IBM Press Releases collection.
-
Review the insights Watson discovered in your enriched documents.
- Sentiment Analysis displays the percentage breakdown of documents tagged as positive, neutral, and negative discovered by the Sentiment Analysis enrichment.
- Entity Extraction displays persons, places, and organizations discovered in your documents by the Entity Extraction enrichment.
- Category Classification displays the hierarchical taxonomies discovered in your documents by the Category Classification enrichment.
- Concept Tagging displays the concepts discovered in your documents by the Concept Tagging enrichment.
-
To get familiar with the data schema of your documents, let's look at the View data schema screen. It displays the fields and values in your transformed documents two ways: by document (Document view), or by field (Collection view). Collection view displays all fields in your collection.
Click the View data schema icon. In the Collection view, under
enriched_text
, you can examine the enrichments you applied to your collection. Click oncategories
,concepts
,entities
, andsentiment
to see how your collection was enriched with Watson insights.
If your query does not return the expected results, try swapping the field or value that your query is using for one that you can verify in the data schema.
Step 2: Build a basic query
Let's start out by writing a query that searches for the concept Cloud computing
in your collection:
- Click on the Build queries icon to open the query page. Select the collection that contains the IBM Press Releases and click Get started.
-
On the Build queries screen, click Search for Documents, then Use the Discovery Query Language then:
-
Click the Field drop-down and choose
enriched_text.concepts.text
, for the Operator choosecontains
, then enter the Value ofCloud computing
. The queryenriched_text.concepts.text:Cloud computing
is displayed under the Visual Query Builder. -
Alternately, you could click Edit in query language, then Use the Discovery Query Language. Enter
enriched_text.concepts.text:"Cloud computing"
into the Enter query here field.
-
-
Click Run query. There is one match (
"matching_results": 1
). Copy the Query URL at the top of the Summary or JSON tab to use in your application.
Under More options, you have the option to turn on passage retrieval with the Include relevant passages radio button. Passages are short, relevant excerpts extracted from the full documents that your query returns. For more information, see Passages. Passage retrieval is not available for the Discovery News collection.
If you'd like to check out a few pre-built queries, click the Use a sample query button.
Step 3: Experiment with different queries
Try out these queries:
To return all documents that have a positive
sentiment: Click Search for Documents, Use the Discovery Query Language then:
-
Click the Field drop-down and choose
enriched_text.sentiment.document.label
, for the Operator choosecontains
, then enter the Value ofpositive
.The query
enriched_text.sentiment.document.label:positive
displays under the Visual Query Builder.
To return all documents in the health and fitness
category: Click Search for Documents, Use the Discovery Query Language then:
-
Click the Field drop-down and choose
enriched_text.categories.label
, for the Operator chooseis
, then enter the Value of"health and fitness"
.The query
enriched_text.categories.label::"health and fitness"
displays under the Visual Query Builder. The operator::
specifies an exact match.
To return all documents that contain the entity IBM
, but not the entity Watson
: Click Search for Documents, Use the Discovery Query Language then:
-
Click the Field drop-down and choose
enriched_text.entities.text
, for the Operator choosecontains
, then enter the Value ofIBM
. Click Add rule, then for the Field chooseenriched_text.entities.text
, for the Operator choosedoes not contain
, then enter the Value ofWatson
.The query
enriched_text.entities.text:IBM,enriched_text.entities.text:!Watson
displays under the Visual Query Builder. The operator:!
specifies "does not contain".
Step 4: Build a combined query
You can combine query parameters together to build more targeted queries. Let's try using both the filter
and query
parameters to return documents about IBM acquisitions. The filter parameter narrows the results
to only documents that mention IBM
, and then the query parameter returns all results about acquisitions
, in order of relevance.
-
Click on the build queries icon to open the query page. Select the collection that contains the IBM Press Releases and click Get started.
-
Under Filter which documents you query:
-
Click the Field drop-down and choose
enriched_text.entities.text
, for the Operator choosecontains
, then enter the Value ofIBM
.The query
enriched_text.entities.text:IBM
narrows the documents to only those that mention the entityIBM
.
-
-
Under Search for Documents, click Use the Discovery Query Language, then:
-
Click the Field drop-down and choose
enriched_text.concepts.text
, for the Operator choosecontains
, then enter the Value ofworld wide web
.The query
enriched_text.concepts.text:"world wide web"
returns all documents that include the concept ofworld wide web
, and those documents are ranked in order of relevance.
-
-
Click More options, then Fields to return and choose Specify. Select
text
, which limits the response to the text of the relevant articles and exclude everything else. -
Click Run query. There is one matching document:
"matching_results": 1
Step 5: Building an aggregation
Aggregations return a set of data values; for example, top keywords, overall sentiment of entities, and more.
Try building this aggregation. It returns the top 10 concepts in the IBM press releases collection.
-
Click on the Build queries icon to open the query page. Select the collection that contains the IBM Press Releases and click Get started.
-
Under Include analysis of your results:
-
Click the Output drop-down and choose
Top values
, for the Field chooseenriched_text.concepts.text
, then enter the Count of10
.Term
returns the most common values for theconcepts
text
field. Count specifies the number of results that you want returned. The queryterm(enriched_text.concepts.text,count:10)
displays under the Visual Query Builder.
-
-
Click More options, then enter
0
in the Number of documents to return field. -
Click Run query. The top 10 concepts are displayed in both the Summary and JSON tabs.
Step 6: Build a query in Watson Discovery News
Discovery News, is a public data set that is pre-enriched with cognitive insights. It is included with Discovery. See Watson Discovery News for more information about this collection.
You cannot adjust the Discovery News configuration, train, or add documents to Discovery News collection.
The following example query returns the top 10 articles in IBM Watson™ Discovery News about the Pittsburgh Steelers that have a positive sentiment.
-
Click on the Build queries icon to open the query page. Select the Discovery News collection and click Get started. (To query the Spanish, German, Korean, French, or Japanese Discovery News collections, you must first click the icon, then choose the appropriate language from the drop-down.)
-
Under Search for documents, click Use the Discovery Query Language, then:
-
Click the Field drop-down and choose
text
, for the Operator choosecontains
, then enter the Value ofPittsburgh Steelers
. Click Add rule, then click the Field drop-down and chooseenriched_text.sentiment.document.label
, for the Operator choosecontains
, then enter the Value ofpositive.
The query
text:"Pittsburgh Steelers",enriched_text.sentiment.document.label:"positive"
displays under the Visual Query Builder.
-
-
Click More options, then enter
10
(this is the default) in the Number of documents to return field. -
Click Run query. The top 10 articles about the Pittsburgh Steelers with a positive sentiment are displayed.
The maximum number of results returned for a Watson Discovery News query is 50
.
News articles might be syndicated to several news outlets, and IBM Watson™ Discovery News picks up each of them, resulting in duplicate articles. This means that a query to IBM Watson™ Discovery News can potentially return several identical or nearly identical articles in query results. To turn on deduplication, under More options, choose Exclude duplicate results. To learn more about this beta capability, see Excluding duplicate documents from query results.