IBM Cloud Docs
Building queries with the Discovery Query Language

Building queries with the Discovery Query Language

In this tutorial, we will learn how to write a few different types of queries in the Discovery Query Language.

For more information about writing queries, see:

These example queries are built using the Discovery tooling. If you'd like to use the API instead, add the query parameters to your API call. For more information and examples, see the Queries section of the API reference.

You can also write natural language queries (such as "IBM Watson partnerships") using the Discovery tooling. This tutorial primarily focuses on how to write queries with Discovery Query Language because your requirements might necessitate a structured query, and filters and aggregations must be written in the Discovery Query Language.

Before you begin

Go to the Manage data screen and create a new collection named IBM Press Releases, and add these four documents to it: test-doc1.html External link icon, test-doc2.html External link icon, test-doc3.html External link icon, test-doc4.html External link icon

In some browsers, the link open in a new window instead of saving locally. If this occurs, select Save As in your browser's File menu to save a copy of the file.

Step 1: Quick tour of the Discovery data schema

Let's start out by getting to know the Discovery JSON. To understand how to build a query using the Discovery Query Language, it helps to be familiar with the JSON produced by Discovery after it enriches the documents in your collection.

  1. Launch the Discovery tooling. On the Manage data screen, choose the IBM Press Releases collection.

  2. Review the insights Watson discovered in your enriched documents.

    • Sentiment Analysis displays the percentage breakdown of documents tagged as positive, neutral, and negative discovered by the Sentiment Analysis enrichment.
    • Entity Extraction displays persons, places, and organizations discovered in your documents by the Entity Extraction enrichment.
    • Category Classification displays the hierarchical taxonomies discovered in your documents by the Category Classification enrichment.
    • Concept Tagging displays the concepts discovered in your documents by the Concept Tagging enrichment.
  3. To get familiar with the data schema of your documents, let's look at the View data schema screen. It displays the fields and values in your transformed documents two ways: by document (Document view), or by field (Collection view). Collection view displays all fields in your collection.

    Click the View data schema icon. In the Collection view, under enriched_text, you can examine the enrichments you applied to your collection. Click on categories, concepts, entities, and sentiment to see how your collection was enriched with Watson insights.

If your query does not return the expected results, try swapping the field or value that your query is using for one that you can verify in the data schema.

Step 2: Build a basic query

Let's start out by writing a query that searches for the concept Cloud computing in your collection:

  1. Click on the Build queries icon Query icon to open the query page. Select the collection that contains the IBM Press Releases and click Get started.
  2. On the Build queries screen, click Search for Documents, then Use the Discovery Query Language then:

    • Click the Field drop-down and choose enriched_text.concepts.text, for the Operator choose contains, then enter the Value of Cloud computing. The query enriched_text.concepts.text:Cloud computing is displayed under the Visual Query Builder.

    • Alternately, you could click Edit in query language, then Use the Discovery Query Language. Enter enriched_text.concepts.text:"Cloud computing" into the Enter query here field.

  3. Click Run query. There is one match ("matching_results": 1). Copy the Query URL at the top of the Summary or JSON tab to use in your application.

Under More options, you have the option to turn on passage retrieval with the Include relevant passages radio button. Passages are short, relevant excerpts extracted from the full documents that your query returns. For more information, see Passages. Passage retrieval is not available for the Discovery News collection.

If you'd like to check out a few pre-built queries, click the Use a sample query button.

Step 3: Experiment with different queries

Try out these queries:

To return all documents that have a positive sentiment: Click Search for Documents, Use the Discovery Query Language then:

  • Click the Field drop-down and choose enriched_text.sentiment.document.label, for the Operator choose contains, then enter the Value of positive.

    The query enriched_text.sentiment.document.label:positive displays under the Visual Query Builder.

To return all documents in the health and fitness category: Click Search for Documents, Use the Discovery Query Language then:

  • Click the Field drop-down and choose enriched_text.categories.label, for the Operator choose is, then enter the Value of "health and fitness".

    The query enriched_text.categories.label::"health and fitness" displays under the Visual Query Builder. The operator :: specifies an exact match.

To return all documents that contain the entity IBM, but not the entity Watson: Click Search for Documents, Use the Discovery Query Language then:

  • Click the Field drop-down and choose enriched_text.entities.text, for the Operator choose contains, then enter the Value of IBM. Click Add rule, then for the Field choose enriched_text.entities.text, for the Operator choose does not contain, then enter the Value of Watson.

    The query enriched_text.entities.text:IBM,enriched_text.entities.text:!Watson displays under the Visual Query Builder. The operator :! specifies "does not contain".

Step 4: Build a combined query

You can combine query parameters together to build more targeted queries. Let's try using both the filter and query parameters to return documents about IBM acquisitions. The filter parameter narrows the results to only documents that mention IBM, and then the query parameter returns all results about acquisitions, in order of relevance.

  1. Click on the build queries icon Query icon to open the query page. Select the collection that contains the IBM Press Releases and click Get started.

  2. Under Filter which documents you query:

    • Click the Field drop-down and choose enriched_text.entities.text, for the Operator choose contains, then enter the Value of IBM.

      The query enriched_text.entities.text:IBM narrows the documents to only those that mention the entity IBM.

  3. Under Search for Documents, click Use the Discovery Query Language, then:

    • Click the Field drop-down and choose enriched_text.concepts.text, for the Operator choose contains, then enter the Value of world wide web.

      The query enriched_text.concepts.text:"world wide web" returns all documents that include the concept of world wide web, and those documents are ranked in order of relevance.

  4. Click More options, then Fields to return and choose Specify. Select text, which limits the response to the text of the relevant articles and exclude everything else.

  5. Click Run query. There is one matching document: "matching_results": 1

Step 5: Building an aggregation

Aggregations return a set of data values; for example, top keywords, overall sentiment of entities, and more.

Try building this aggregation. It returns the top 10 concepts in the IBM press releases collection.

  1. Click on the Build queries icon Query icon to open the query page. Select the collection that contains the IBM Press Releases and click Get started.

  2. Under Include analysis of your results:

    • Click the Output drop-down and choose Top values, for the Field choose enriched_text.concepts.text, then enter the Count of 10.

      Term returns the most common values for the concepts text field. Count specifies the number of results that you want returned. The query term(enriched_text.concepts.text,count:10) displays under the Visual Query Builder.

  3. Click More options, then enter 0 in the Number of documents to return field.

  4. Click Run query. The top 10 concepts are displayed in both the Summary and JSON tabs.

Step 6: Build a query in Watson Discovery News

Discovery News, is a public data set that is pre-enriched with cognitive insights. It is included with Discovery. See Watson Discovery News for more information about this collection.

You cannot adjust the Discovery News configuration, train, or add documents to Discovery News collection.

The following example query returns the top 10 articles in IBM Watson™ Discovery News about the Pittsburgh Steelers that have a positive sentiment.

  1. Click on the Build queries icon Query icon to open the query page. Select the Discovery News collection and click Get started. (To query the Spanish, German, Korean, French, or Japanese Discovery News collections, you must first click the Manage Data icon, then choose the appropriate language from the drop-down.)

  2. Under Search for documents, click Use the Discovery Query Language, then:

    • Click the Field drop-down and choose text, for the Operator choose contains, then enter the Value of Pittsburgh Steelers. Click Add rule, then click the Field drop-down and choose enriched_text.sentiment.document.label, for the Operator choose contains, then enter the Value of positive.

      The query text:"Pittsburgh Steelers",enriched_text.sentiment.document.label:"positive" displays under the Visual Query Builder.

  3. Click More options, then enter 10 (this is the default) in the Number of documents to return field.

  4. Click Run query. The top 10 articles about the Pittsburgh Steelers with a positive sentiment are displayed.

The maximum number of results returned for a Watson Discovery News query is 50.

News articles might be syndicated to several news outlets, and IBM Watson™ Discovery News picks up each of them, resulting in duplicate articles. This means that a query to IBM Watson™ Discovery News can potentially return several identical or nearly identical articles in query results. To turn on deduplication, under More options, choose Exclude duplicate results. To learn more about this beta capability, see Excluding duplicate documents from query results.