Digging deeper into IBM Cloudant Dashboard

The IBM® Cloudant® for IBM Cloud® Dashboard gives new and experienced IBM Cloudant users the opportunity to add, edit, and delete documents. The IBM Cloudant users can refine the indexing and querying options that best suit their application's use-cases.

Objectives

Set up some basic indexes using the Dashboard to see how each of IBM Cloudant's querying mechanisms works.

Before you begin

You need to create a service instance in IBM Cloudant before you start this tutorial. You can follow the instructions in the Getting started tutorial to create one.

Step 1. The data set

Create a database called books.

Create some sample data that represents a book in a library as shown in the following example:

{
  "_id": "BXP9G5ZQY9Q4EA13",
  "author": "Dickens",
  "title": "David Copperfield",
  "year": 1840,
  "pages": 723,
  "publisher": "Penguin",
  "url": "https://www.somurl.com/dc"
}

Continue to add some documents that match the pattern in the previous step by using the IBM Cloudant Dashboard.

The documents store simple key/value pairs that hold metadata about each book: its author and its publisher. In this example, we address the following three use-cases:
1. A query facility that allows a user to find a book by a known publisher and year.
2. A general-purpose search engine that allows a user to find books by a combination of one or more of the following descriptors: author, title, year, and publisher.
3. A report that details the number of books that are published by year.

Step 2. Querying books by publisher and year - IBM Cloudant Query

IBM Cloudant Query is a query language that allows small slices of a total database to be located. The following query finds 10 books that are published by Penguin in the year 2000:

{
  "selector": {
    "$and": [
      { "publisher": "Penguin" },
      { "year": { "$gt": 2000 } }
    ]
  },
  "limit": 10
}

The query contains a selector object, which uses operators and text fields to define the slice of data you need:

$and means both of the query clauses must be satisfied for a document to make it to the result set.
{ "publisher": "Penguin" }- the publisher must be "Penguin".
{ "year": { "$gt": 2000 } } - the year must be greater than 2000. $gt means "greater than".

We can try the query by choosing "Query" when you view our books database in the IBM Cloudant Dashboard. You can paste in the query JSON and click Run Query.

Running a query

To try the query, do the following steps:

Go to IBM Cloudant Dashboard.
Open the service instance that you created in the prerequisite section.
Open the database that you created.
Go to the Query tab.
Paste the query JSON from the previous section into the Cloudant Query window.
Click Run Query. See the results in the following screen capture:

Window for running queries

IBM Cloudant matches the documents that meet your criteria and it seems to do it quickly, but there's a catch. IBM Cloudant isn't using an index to service this query, meaning that the database has to scan every document in the database to get your answer. This scan is fine for small data sets. But if you're running a production application where the data set is expanding all the time, you definitely don't want to rely on unindexed queries.

Creating an index

To create an index, we can tell IBM Cloudant to create an index on the publisher and year fields that we are using in our query.

From the IBM Cloudant Dashboard, select the books database.
Select the Design Documents tab.
Select New Indexes from the Design Documents menu.

Copy and paste the following index definition:

{
   "index": {
      "fields": [
         "publisher", "year"
      ]
   },
   "name": "publisher-year-index",indexingdashboard5
   "type": "json"
}

See an example in the following screen capture:

Click Create index to create an index. — Window for creating indexes

The fields array contains a list of fields that we want IBM Cloudant to index.

If we repeat our query, it is faster and remains quick even as the database size reaches millions of documents.

Indexing instructs IBM Cloudant to create a secondary data structure that allows it to find the slice of data you need much faster than looking over every document in turn. IBM Cloudant Query is best for fixed queries based on the same fields in the same order.

For more information, see the following details in IBM Cloudant documentation:

This index is useful for queries that involve both the publisher and the year, but if we introduce another field or make the query more complex (for example, by using the $or operator), then the index doesn't get used. We are back to a full database scan.

For a general-purpose search facility, we need IBM Cloudant Search, which is described in the next section.

Step 3. Creating a search engine - IBM Cloudant Search

IBM Cloudant Search is based on Apache Lucene and has its own query language that allows rich queries to be constructed. See the following example of a search:

publisher:Penguin AND (year:1972 OR year:1973) AND title:Crash

Unlike IBM Cloudant Query, you must specify the fields to index before you perform a query. IBM Cloudant Search indexes are defined by supplying IBM Cloudant with a JavaScript function that is called once for every document in the database - if the function calls index then data is added to the index.

From the IBM Cloudant Dashboard, select the books database.
Select the Design Documents tab.
Select New Search Index from the menu.
Enter a design document name.
Enter an index name.

Paste the following code into the search index function:

function (doc) {
  index("author", doc.author);
  index("publisher", doc.publisher);
  index("title", doc.title);
  index("year", doc.year);
}

Choose the "Standard Analyzer".

New Search Index window

You can then build complex queries that involve one, some, or all of the indexed fields combined with AND and OR operators.

IBM Cloudant Search is best if you have many search use-cases involving different combinations of fields.

For more information, see the following resources:

IBM Cloudant Search uses "analyzers" to pre-process text before indexing. Learn about Search Analyzers to ensure you get the results that you expect.
IBM Cloudant Search documentation

Step 4. Aggregating data - MapReduce

IBM Cloudant Query and IBM Cloudant Search cannot aggregate search results. In other words, you can't ask, How many books were published in 1973? IBM Cloudant's MapReduce feature allows secondary indexes to be created that can be used for selection or aggregation. MapReduce indexes are, like IBM Cloudant Search, which is created by supplying a JavaScript function - any call to an emit function adds a row to the index.

From the IBM Cloudant Dashboard, select the books database.
Select the Design Documents tab.
Select New View from the menu.
Keep New document in the drop-down field.
Enter a name in the Index name field. This name is the new view name.
Select _count from the Reduce (optional) drop-down menu. This way our results are counted.
Paste the following code into the Map function field:
```
function (doc) {
  emit(doc.year, null);
}
```
See an example of the window in the following screen capture:

New View window

The subsequent MapReduce view allows documents to be found by year (as that is the key of the index). But if we select the checkbox for the Reduce function from the Options pull-down menu, the index aggregates the results, grouping by key (year):

Windows for running queries

See an example result from after the index aggregated the results.

Result set

MapReduce views are perfect for generating ordered views of your data, containing key/value pairs that you define. They can be used for selecting individual keys, range queries, or aggregation grouping by the key.

For more information, see the following resources from IBM Cloudant documentation: