Using IBM Cloudant Query FAQ
IBM® Cloudant® for IBM Cloud® Query is an API for querying slices of data based on the values of a database's document attributes. It is a flexible API that must be used carefully to ensure that database performance can be maintained as the data size grows over time.
How do I use IBM Cloudant Query?
IBM Cloudant Query is accessed through the POST /{db}/_find
API endpoint where the JSON specification of the query is passed in the HTTP POST body. For example, this
query finds up to 10 documents where the firstname
is "Charles" and the surname
is "Dickens":
{
"selector": {
"firstname": "Charles",
"surname": "Dickens"
},
"limit": 10
}
For more information, see Selector Syntax.
How do I create an index to support an IBM Cloudant Query?
Without a suitable secondary index, IBM Cloudant Query scans each document in the database in turn until it has enough matches to satisfy the query. The larger the data set and the more documents it has to scan to find matching documents, the
slower the response time. For faster performance, an IBM Cloudant Query _find
must be backed by a suitable secondary index. A secondary index is a pre-calculated data structure that allows IBM Cloudant to quickly jump to the slice
of data it needs without scanning irrelevant documents. For the surname
fields, we call the POST /{db}/_index
endpoint to pass the JSON index definition
as the HTTP POST body:
{
"index": {
"fields": ["firstname", "surname"]
},
"ddoc": "jsonindexes",
"name": "byName",
"type": "json"
}
For more information, see Creating an Index.
Can I sort my search results with IBM Cloudant Query?
Yes! The _find
JSON syntax allows for a sort
parameter to be provided listing the attribute or attributes to sort by. In this case, we are sorting by date
:
{
"selector": {
"firstname": "Charles",
"surname": "Dickens"
},
"sort": ["date"],
"limit": 10
}
A suitable index must be present that contains the selector
fields and the sort
fields. Otherwise, IBM Cloudant refuses to execute the query. A suitable index definition for the previous query is shown next:
{
"index": {
"fields": [
"firstname",
"surname",
"date"
]
},
"ddoc": "jsonindexes",
"name": "byNameAndDate",
"type": "json"
}
Can I sort in reverse order?
Yes! IBM Cloudant Query supports sorting the result set in ascending or descending order, but not a combination of the two. For example, a query that sorts some fields in ascending order, and a query where descending is not allowed.
This query returns documents that match firstname
and surname
and sorts by surname
/firstname
/date
descending:
{
"selector": {
"firstname": "Charles",
"surname": "Dickens"
},
"sort": [
{ "firstname": "desc" },
{ "surname": "desc" },
{ "date": "desc" }
],
"limit": 10
}
A suitable index must be present that contains the selector
fields and the sort
fields. Otherwise, IBM Cloudant refuses to execute the query. A suitable index definition for the previous query is shown next:
{
"index": {
"fields": [
"firstname",
"surname",
"date"
]
},
"ddoc": "jsonindexes",
"name": "byNameAndDate",
"type": "json"
}
The previous index is suitable for both ascending and descending sort order.
How can I tell if an index is backing a query?
The POST /{db}/_explain
API endpoint when passed a JSON object that is usually sent to the POST /{db}/_find
endpoint, explains how such a query is handled and which indexes, if any, might be used.
If the index
object in the response indicates that "all_docs" is being used, a full database scan is required to service the query. We recommend that you use the _explain
mechanism to check each IBM Cloudant
query to ensure it is using an index before you deploy to production.
For example, a type=json
index on firstname
, surname
and date
is suitable for finding documents for:
- A known
firstname
,lastname
, anddate
. - A known
firstname
,lastname
, and a range ofdate
values (that use$lt
,$lte
,$gt
,$gte
operators). - A known
firstname
andlastname
sorted bydate
.
It can also be used to assist queries on firstname
, surname
, date
, and other attributes. In other words, it might be able to answer only part of the query but it can help reduce the number of documents
that are scanned to find the answer.
How can I ensure that my query is efficent?
Ideally, an IBM Cloudant Query execution would need to scan only one document for each document returned. If a query has to scan a million documents for each one returned, it is clearly not optimal, and is in need of a secondary index to help.
When you execute a query, passing execution_stats: true
as an extra parameter forces IBM Cloudant to enumerate the number of documents it scanned in performing the query, for example:
{
"selector": {
"firstname": "Charles",
"surname": "Dickens"
},
"sort": ["date"],
"limit": 10,
"execution_stats": true
}
The returned data now includes an extra JSON object:
{
...
"execution_stats": {
"total_keys_examined": 0,
"total_docs_examined": 1000000,
"total_quorum_docs_examined": 0,
"results_returned": 2,
"execution_time_ms": 4400.699
}
}
The ratio between total_docs_examined
and results_returned
is key here: a high value indicates that too many documents are being scanned per document that is returned.
For more information, see Blog post on Optimizing IBM Cloudant Queries.
Which IBM Cloudant Query operators defeat the use of an index?
Any of the combination operators other than $and
can make a query do a full database scan without the help of a secondary index. For example, if an
$or
operator is used, then no secondary index can be used to assist the query. If in doubt, use the POST /{db}/_explain
endpoint to check
that an index is used, and the execution_stats: true
parameter to measure the efficiency of each query.
For a type=json
index to be used to support a query, it must match the fields that are used in the selector and sort parameters. Comparison operators might be used on the last element to perform range queries.
For more information, see Explain plans.
Can I use IBM Cloudant Query with a Lucene index?
Yes! IBM Cloudant Query supports two types of indexes:
"type": "json"
- a fixed-order index powered by IBM Cloudant's MapReduce API. Good for fixed, boilerplate queries that match every term in the index."type": "text"
- a Lucene index powered by IBM Cloudant's Search API. Good for general-purpose queries across one, some, or all indexed fields.
A type=text
index is created with an index definition like this:
{
"index": {
"fields": [
{ "name": "firstname", "type": "string"},
{ "name": "surname", "type": "string"},
{ "name": "date", "type": "string"}
]
},
"ddoc": "textindexes",
"name": "byNameAndDate",
"type": "text"
}
Notice that the fields
array requires each attribute to be named and typed (unlike type=json
indexes).
The resultant index can be used by queries that contain one or more of the indexed fields:
{
"selector": {
"surname": "Dickens"
},
"sort": ["date"],
"limit": 10,
"execution_stats": true
}
The Lucene-backed indexes allow some extra flexibility over type=json
indexes. You can use one, some, or all of your indexed fields in any order, and the index supports the query. With type=json
indexes, the query must
match all of the indexed fields to be useful.
For more information, see Creating a type=text
index.
Can I create an IBM Cloudant Query index for a subset of a database?
Yes! If your use case requires that queries are performed within the confines of a subset of the database (for example, completed orders, or confirmed users, or in-stock products), then a partial index is best. A partial index allows an IBM Cloudant Query selector to be used at index-build time to winnow the data set that makes it into the index, keeping the index small and performant. Then, at query-time, a standard IBM Cloudant Query selector is used to further filter the partial index to get your result set.
A partial index is created by passing a partial_filter_selector
to the POST /{db}/_index
method. In this example, only hardback books that have a status of "published" or "reprint" make
it to the index. See the following example:
{
"index": {
"partial_filter_selector": {
"$or": [
{ "status": "published" },
{ "status": "reprint" }
],
"hardback": true
},
"fields": ["firstname","surname","date"]
},
"ddoc" : "partialindexes",
"name": "byNameAndDate"
"type" : "json"
}
At query time, the use_index
field must be supplied to tell IBM Cloudant that you want it to use the specified index:
{
"selector": {
"$or": [
{ "status": "published" },
{ "status": "reprint" }
],
"hardback": true,
"firstname": "Charles",
"surname": "Dickens"
},
"sort": ["date"],
"limit": 10,
"execution_stats": true,
"use_index": "partialindexes/byNameAndDate"
}
The partial_filter_selector
fields are repeated in the query-time selector.
For more information, see Creating a partial index.
Can I use IBM Cloudant Query on partitioned databases?
Yes! By default, an IBM Cloudant Query index is global (for example, not partitioned) but on a partitioned database, a partitioned index can be created by adding partitioned: true
when the index is created:
e To create a partitioned index on firstname
and surname
, the POST /<dbname>/_index
is used:
{
"index": {
"fields": ["firstname", "surname"]
},
"ddoc": "jsonindexes",
"name": "byName",
"type": "json",
"partitioned": true
}
Partitioned indexes can be used when you run only a partitioned query, for example, by using the GET /<dbname>/_partition/<partition key>/_find
API endpoint.
For more information, see Creating a partitioned index.
Can I use regular expressions in my queries?
Yes! IBM Cloudant Query has a $regex operator that allows regular expression terms within a query.
Proceed with caution: if a query contains only a $regex
operator, then a secondary index cannot help you - the query results in a document-by-document scan of the database.
A $regex
operator can be tagged onto the end of an already performant query. For example, if we already have a type=json
index on firstname
and surname
, we can use the following example:
{
"selector": {
"firstname": "Charles",
"surname": "Dickens",
"title": {
"$regex": "^Oliv"
}
},
"limit": 10
}
IBM Cloudant uses the index to find documents by firstname
and surname
and winnows the result set that uses the regular expression on the title
field.