IBM Cloudant Query performance guide
This guide walks you through the essential techniques for optimizing IBM Cloudant Query performance, starting with the most foundational optimizations and progressing to advanced techniques. By following this guide, you'll transform slow, expensive queries into fast, efficient ones that scale with your data.
The guide will cover:
- Concepts:
- Selectivity of indexes
- Foundation topics:
- Avoid large scans
- Field ordering in compound indexes
- Use operators that allow index use
- Avoiding missing documents in JSON indexes
- Monitoring performance
- Advanced topics:
- Partial indexes
- Covering indexes
- Index consolidation strategy
The guide finishes with a production readiness checklist.
Key concept: selectivity
Performant and scalable queries depend on matching your query needs with indexes that efficiently serve those queries. A key concept in building such indexes is selectivity — how well an index helps narrow down the result set.
A high selectivity index (or highly selective index) dramatically reduces the number of documents matched by a query. In contrast, a low selectivity index matches many documents and offers less performance benefit.
- For an orders table, examples of high selectivity indexes include:
order_id: each order has a unique ID, so queries match at most one document.username: each user has far fewer orders than the total number of orders.
- Examples of low selectivity indexes include:
status: many orders may share the same status.country: a broad field like this will likely match many orders.
Understanding selectivity helps you design indexes that make queries faster and more efficient.
Foundation: avoid large database scans
Impact: 1000x+ performance improvement
The single most devastating performance problem in IBM Cloudant Query is triggering a large database scan. There are two causes of this problem:
- The query uses a low selectivity index, which causes many documents to be scanned to generate a much smaller result set.
- The query cannot use a user-defined index, so falls back on the built-in
_all_docsindex, which results in scanning all the documents in the database.
How to identify large scans
Use the _explain endpoint to check which index your query will use:
POST /orders/_explain
{
"selector": {
"username": "bob@email.com",
"status": "cancelled",
"date": {"$gte": "2018-01-01", "$lt": "2019-01-01"}
}
}
If the response shows a specific index, evaluate whether the index is high or low selectivity. In this example, an index on the status field is likely a low selectivity index, so this query needs work:
{
"index": {
"ddoc": "_design/status-index",
"def": {
"fields": [
{
"status": "asc"
}
]
},
"name": "d479bdddf50865e520a0193704c8b93a3bd48f77",
"type": "json"
}
}
If the response shows "name": "_all_docs", your query will scan the entire database:
{
"index": {
"ddoc": null,
"name": "_all_docs",
"type": "special"
}
}
Solution: create appropriate indexes
For the query above, the optimal index is a compound (multi-field) JSON index covering the fields you're querying:
POST /orders/_index
{
"index": {
"fields": ["username", "status", "date"]
},
"name": "username-status-date-index",
"type": "json"
}
Performance impact: In a 100,000 document database, this single change can reduce query time from 20+ seconds to under 10 milliseconds.
Production safety: prevent accidental large scans
If you know a certain index will be highly selective for a query, combine use_index with allow_fallback: false to force use of that index:
{
"selector": {"username": "bob@email.com", "status": "cancelled"},
"use_index": ["my_design_doc", "username-index"],
"allow_fallback": false
}
This will return an error if my_design_doc/username-index is unusable for the query, rather than silently performing an expensive scan. The default for allow_fallback is true, meaning that IBM Cloudant
will use a different index to respond to the query, potentially resulting in a large scan.
If you don't supply use_index, then allow_fallback: false will still prevent a query using the _all_docs index:
{
"selector": {"username": "bob@email.com"},
"allow_fallback": false
}
This will return an error if no suitable user-defined index exists, avoiding an expensive scan.
See the Query API documentation for more details on these query parameters.
Consistently combining use_index with allow_fallback is recommended practice for critical queries.
Foundation: field ordering in compound indexes
Impact: 100x performance improvement
Once you have indexes, the most common mistake is incorrect field ordering in compound indexes. This can turn an efficient query into one that scans 40%+ of your database.
Critical rule 1: compound index selectivity
Order your equality fields (non-range fields) by selectivity - most selective first:
userfield: 50 docs per user (high selectivity) — put this first.statusfield: 10,000 docs per status (medium selectivity).
Optimal index ordering: ["username", "status", "date"] where date is the range field.
Critical rule 2: range fields must come last
When creating compound indexes, any field used with range operators ($gt, $gte, $lt, $lte) must appear last in the field list.
Wrong - causes partial database scans:
{
"index": {
"fields": ["date", "username", "status"] // Bad - range field first
}
}
Correct - enables efficient seeking:
{
"index": {
"fields": ["username", "status", "date"] // Good - range field last
}
}
Critical rule 3: sort fields must come last
Indexes are also used for sorting query results. When your query filters on fields like username and status, and sorts by another field like date, the index must include all three fields — with the sort
field last.
For example, to support the following query:
{
"selector": { "username": "bob@email.com", "status": "cancelled" },
"sort": [ { "date": "asc" } ]
}
You need an index on ["username", "status", "date"]. In this structure, the sort field (date) must come last. This ordering ensures IBM Cloudant can use the index both to filter and to
sort efficiently.
Why field ordering matters
IBM Cloudant indexes work like sorted tables. An index on ["username", "status", "date"] creates entries sorted first by user, then by status within each user, then by date within each user-status
combination:
["alice", "paid", "2018-01-15"]
["alice", "paid", "2018-02-20"]
["bob", "cancelled", "2018-01-10"]
["bob", "cancelled", "2018-01-15"]
["bob", "paid", "2018-01-12"]
When using an efficient index for a query like user: "bob", status: "cancelled", date: {$gte: "2018-01-01"}, IBM Cloudant can:
- Jump directly to Bob's records.
- Find the cancelled orders within Bob's records.
- Scan efficiently through the date range.
If date were first, IBM Cloudant would need to examine every document in the date range across all users and statuses.
Foundation: use operators that allow index use
Impact: 10-100x performance difference
It's essential to combine high-selectivity indexes with query selectors that allow use of those indexes. The operators used in the conditions of a selector can prevent index use.
When executing a query, only some operators allow IBM Cloudant to use an index to efficiently answer it. If you have a high-selectivity index on a field, but the operator used in that field's condition doesn't support index usage, then the query won't be able to use the index. As a result, it will be inefficient.
JSON and text indexes support use of different operators in conditions. If a JSON index doesn't support the operator you need, a text index might do.
The following table summarises the key properties of indexes, and the operators they support:
| Characteristic | JSON index | Text index |
|---|---|---|
| Underlying technology | CouchDB MapReduce View (B-tree) | Apache Lucene |
| Primary use-case | Latency-sensitive, predictable queries | Flexible, ad-hoc queries |
| Performance | Faster for exact matches and ranges | Slower but more versatile |
| Document requirements | Must contain ALL indexed fields | Flexible - missing fields OK |
| Strongly typed | No | Yes |
| Combination operators | ||
$and |
✔️ Supported | ✔️ Supported |
$or |
❌ Not supported | ✔️ Supported |
$nor |
❌ Not supported | ✔️ Supported |
$not |
❌ Not supported | ✔️ Supported |
| Condition operators | ||
$eq (equality) |
✔️ Optimal | ✔️ Supported |
$gt, $gte, $lt, $lte |
✔️ Optimal for ranges | ✔️ Supported |
$exists |
✔️ Supported | ✔️ Supported |
$beginsWith |
✔️ Supported | ✔️ Supported |
$ne (not equal) |
❌ Not supported | ✔️ Supported |
$in |
❌ Not supported | ✔️ Supported |
$size |
❌ Not supported | ✔️ Supported |
$type |
❌ Not supported | ✔️ Supported |
$text |
❌ Not supported | ✔️ Full-text search |
$regex |
❌ Not supported | ❌ Not supported |
| Field Handling | ||
| Sort order | Fixed by index field ordering | Flexible |
| Array fields | ❌ Not supported | ✔️ Full support with [] notation |
| Nested objects (dot notation) | ✔️ Supported | ✔️ Supported |
| Missing fields | Excludes documents | Includes documents |
| Query page size limits | ||
| Global query | ??? | 200 |
| Partition-scoped query | ??? | 200 |
Index type selection guidelines
Choose JSON indexes when:
- Query patterns are predictable and consistent.
- You need maximum performance for exact matches.
- Documents have consistent structure.
- You're primarily using
$and,$eq, and range operators.
Choose text indexes when:
- You need
$oroperations across multiple fields. - Documents have varying structures.
- You need full-text search capabilities.
- Query patterns are ad-hoc or user-driven.
- You're using
$ne,$exists, or complex operators.
Example decision matrix:
// Predictable e-commerce queries → JSON index
{
"selector": {
"username": "bob@email.com",
"status": "paid",
"date": {"$gte": "2018-01-01"}
}
}
// Flexible search interface → Text index
{
"selector": {
"$or": [
"status": {"$in": ["paid", "in-progress"]}
"category": "electronics"
]
}
}
Foundation: query performance monitoring
Impact: Ongoing optimization and problem detection
Efficient querying requires continuous monitoring. IBM Cloudant provides tools to measure and track query performance over time.
IBM Cloudant Query provides the execution_stats field in the response to help evaluate your queries. See Monitor for inefficient IBM IBM Cloudant queries for a detailed guide to the metrics IBM Cloudant provides for analysing query execution.
Warning: document structure requirements for JSON indexes
Impact: Queries may return incomplete results or fail unexpectedly
JSON indexes in IBM Cloudant have a crucial constraint that differs from SQL databases: a document must contain ALL indexed fields to be included in the index.
Problem
If you create an index on ["username", "status", "date"], any document missing even one of these fields is excluded from the index entirely. This means:
// This document WON'T be indexed
{
"_id": "order123",
"username": "bob@email.com",
"status": "paid"
// Missing "date" field
}
Solutions
Option 1: Ensure consistent document structure
// Always include all indexed fields
{
"_id": "order123",
"username": "bob@email.com",
"status": "paid",
"date": null // Use null for missing values
}
Option 2: Create multiple JSON indexes for different document structures
// Index for documents with all fields
{"fields": ["username", "status", "date"]}
// Index for documents without date
{"fields": ["username", "status"]}
Option 3: Use text indexes for flexible field requirements
{
"index": {
"fields": [
{"name": "username", "type": "string"},
{"name": "status", "type": "string"},
{"name": "date", "type": "string"}
]
},
"type": "text"
}
A text index will index documents missing the date (or other) fields.
Advanced: partial indexes
Impact: 50-90% reduction in index size and query latency
Partial indexes are a powerful optimization technique that pre-filters documents before indexing, creating smaller, faster indexes while reducing storage costs.
When to use partial indexes
Partial indexes are ideal when:
- You consistently query a subset of documents (e.g., only active records).
- Your database contains large amounts of irrelevant data for most queries.
- You want to reduce IBM Cloudant storage costs (pay-per-GB pricing).
Creating partial indexes
Use partial_filter_selector to specify which documents to include:
{
"index": {
"fields": ["date", "category"]
},
"partial_filter_selector": {
"status": "published"
},
"name": "published-articles-index"
}
This index only includes documents where status: "published", making it much smaller and faster for queries on published content.
The partial_filter_selector can include any valid selector, not just simple single field selectors as shown in the example.
Real-world example: blog system
Scenario: A blog database with 100,000 articles, where 80,000 are drafts and 20,000 are published.
Traditional index (covers all documents):
{"fields": ["category", "publishDate"]}
Partial index (only published articles):
{
"index": {"fields": ["category", "publishDate"]},
"partial_filter_selector": {"status": "published"}
}
Benefits:
- Index size: 80% smaller
- Query speed: 2-3x faster
- Storage cost: Significantly reduced
- Cache efficiency: More relevant data stays in memory
Ensuring query compatibility with partial indexes
To use the index, your query selector must be compatible with the partial filter.
This works:
// Compatible - query includes the partial filter condition
{
"selector": {
"status": "published", // Matches partial filter
"category": "technology"
}
}
This doesn't work:
// Incompatible - query doesn't match partial filter
{
"selector": {
"status": "draft", // Contradicts partial filter
"category": "technology"
}
}
Advanced: covering indexes
Impact: 2-5x performance improvement for read-heavy workloads
Covering indexes are a powerful optimization technique - they can answer queries entirely from the index without retrieving the original documents.
Understanding covering indexes
A covering index occurs when your index contains all the fields needed to satisfy both the query selector and the returned fields:
// Index covers these fields
{
"index": {
"fields": ["username", "status", "date", "amount"]
}
}
// Query only needs indexed fields
{
"selector": {"username": "bob@email.com", "status": "paid"},
"fields": ["username", "date", "amount"] // All fields in index
}
IBM Cloudant can return results directly from the index without reading the original documents, providing significant performance benefits.
Designing for coverage
Plan your indexes around common query patterns:
// Analyze your query patterns
const commonQueries = [
{selector: {username: "...", status: "..."}, fields: ["date", "amount"]},
{selector: {status: "...", date: {...}}, fields: ["username", "amount"]},
{selector: {username: "..."}, fields: ["status", "date", "total"]}
];
// Design covering index
{
"index": {
"fields": ["username", "status", "date", "amount", "total"]
}
}
The _explain endpoint will contain "covering": true when an index is found that can cover the query.
Advanced: index consolidation strategy
Instead of creating many narrow indexes, consider fewer broader compound indexes that can cover multiple query patterns. Using fewer reduces the time required to build indexes.
Instead of:
{"fields": ["username"]}
{"fields": ["status"]}
{"fields": ["date"]}
{"fields": ["username", "date"]}
Use:
{"fields": ["username", "status", "date", "amount"]}
This single index can efficiently serve queries filtering on:
usernameonlyusernameandstatususername,status, anddaterange- All the above combinations, with covering benefits when
amountis requested
Production deployment checklist
Before deploying queries to production, verify:
Index strategy review
- All production queries have appropriate indexes
- Compound indexes follow the range-field-last rule
- Field ordering reflects actual selectivity in your data
- No queries accidentally fall back to
_all_docs
Performance validation
- Test queries with production-scale data volumes
- Monitor efficiency ratios (aim for >50%)
- Validate execution times meet SLA requirements
- Consider partial indexes for cost optimization
Monitoring setup
- Enable
execution_statslogging - Set up alerts for poor efficiency ratios
- Monitor index usage patterns
- Plan for query performance regression testing
Error handling
- Use
allow_fallback: falseto prevent expensive accidents - Handle index selection errors gracefully
- Plan for index building time during deployments
Further reading
The IBM Cloudant blog has further in-depth articles on exploring and improving query performance:
- Explaining Explain describes how to interpret and use the output of the
_explainrequest. - Optimising IBM Cloudant Queries.
- Avoiding Query Fallback shows a technique to prevent critical queries falling back to full table scans.
Conclusion
Optimizing IBM Cloudant Query performance is a systematic process that starts with avoiding the most expensive mistakes (large database scans) and progresses through increasingly sophisticated techniques. By following this guide:
- Foundation: Create appropriate indexes and avoid large scans
- Structure: Order compound index fields correctly with ranges last
- Operators: Choose efficient query operators
- Compatibility: Understand document structure requirements
- Monitoring: Continuously measure and optimize performance
- Advanced: Leverage partial indexes and covering patterns
Small changes in index design can produce dramatic performance improvements. A properly optimized query examining 2 documents to return 2 results will always outperform an unoptimized query examining 50,000 documents for the same 2 results.
Remember: the difference between a well-optimized and poorly-optimized query isn't just performance - it's the difference between a scalable application and one that breaks under load.