IBM Cloudant in practice

The IBM Cloudant in practice document is the third best practice document in the series. It shows you the following best practices:

How to avoid conflicts.
How deleting documents works.
What to watch out for with updates.
How to work in an eventually consistent environment.
How to set up replication.
How to use the bulk API.
Why you must not change Q, R, and N.
How rate limits work.
What logging tracks.
How to compress your HTTP traffic.

For more information, see Data modeling or Indexing and querying.

The content in this document was originally written by Stefan Kruger as a Best and worst practice blog post on 21 November 2019.

Avoid conflicts

IBM Cloudant is designed to treat conflicts as a natural state of data in a distributed system. This feature is a powerful feature that helps an IBM Cloudant cluster always maintain high availability. However, the assumption is that conflicts are still reasonably rare. Tracking conflicts in IBM Cloudant’s core has significant cost that is associated with it.

It is perfectly possible (but a bad idea!) to ignore conflicts. The database merrily carries on operating by choosing a random, but deterministic revision of conflicted documents. However, as the number of unresolved conflicts grows, the performance of the database goes down a black hole, especially when you replicate.

As a developer, it’s your responsibility to check for, and to resolve, conflicts, or even better, employ data models that make conflicts impossible.

If you routinely create conflicts, you must really consider model changes: even if you resolve your conflicts diligently, the conflict branches in the revision tree remain with no easy way to tidy that up. For more information, see the following websites:

IBM Cloudant guide to conflicts
IBM Cloudant guide to versions and MVCC
Three-part blog series on conflicts

Deleting documents doesn't delete them

Deleting a document from an IBM Cloudant database doesn’t purge it. Deletion is implemented by writing a new revision of the document under deletion, with an added field _deleted: true. This special revision is called a tombstone. Tombstones still take up space and are also passed around by the replicator.

Models that rely on frequent deletions of documents are not suitable for IBM Cloudant. For more information, see IBM Cloudant tombstone docs.

Be careful with updates

It is more expensive in the end to mutate existing documents than to create new ones. IBM Cloudant always needs to keep the document tree structure around. This rule applies even if internal nodes in the tree are stripped of their payloads. If you find that you create long revision trees, your replication performance suffers. Moreover, if your update frequency is higher than, say, once or twice every few seconds, you’re more likely to produce update conflicts.

Prefer models that are immutable.

When you read the following sections, Deleting documents doesn't delete them and Be careful with updates, they provoke an obvious question. That is, does the data set grow unbounded if my model is immutable? If you accept that deletes don’t completely purge the deleted data and that updates are not updating in place in terms of data volume growth, not much difference exists. Managing data volume over time requires different techniques.

The only way to truly reclaim space is to delete databases, rather than documents. You can replicate only winning revisions to a new database and delete the old to get rid of lingering deletes and conflicts. Or perhaps you can build it into your model to regularly start new databases (say ‘annual data’) and archive off (or remove) outdated data, if your use case allows.

Eventual consistency is a harsh taskmaster (also known as don’t read your writes)

Eventual consistency is a great idea on paper, and a key contributor to IBM Cloudant’s ability to scale out in practice. However, it’s fair to say that the mindset required to develop against an eventually consistent data store does not feel natural to most people.

You often get stung when you write tests similar to the following ones:

Create a database.
Populate the database with some test data.
Query the database for some subset of this test data.
Verify that the data that you got back is the data that you expected to get back.

Nothing wrong with that test? That works on every other database that you ever used, correct?

Not on IBM Cloudant.

Or rather, it works 99 times out of 100.

The reason for this difference is a (mostly) small inconsistency window between writing data to the database and this data becoming available on all nodes of the cluster. As all nodes in a cluster are equal in stature, no guarantee exists that a write, and a subsequent read, are serviced by the same node. So, in some circumstances, the read might be hitting a node before the written data makes it to the node.

So why don’t you just put a short delay in your test between the write and the read? That delay makes the test less likely to fail, but the problem is still there.

IBM Cloudant has no transactional guarantees. While document writes are atomic (you’re guaranteed that a document can either be read in its entirety, or not at all), no way exists to close the inconsistency window. It’s there by design.

A serious concern that every developer must consider is that you can’t safely assume that data you write is available to anyone else at a specific point in time. This state takes some getting used to if you come from a different kind of database tradition.

Testing tip: what you can do to avoid the inconsistency window in testing is to test against a single-node instance of IBM Cloudant or CouchDB running say in Docker (docker information). A single node removes the eventual consistency issue, but beware that you are testing against an environment that behaves differently from what you target in production. Caveat Emptor.

Replication isn't magic

“So let’s set up three clusters across the world, Dallas, London, Sydney, with bi-directional synchronization between them to provide real-time collaboration between our 100,000 clients.”

No. Just… No. IBM Cloudant is good at replication. It might seem like magic, but note that it makes no latency guarantees. In fact, the whole system is designed with eventual consistency in mind. Treating IBM Cloudant’s replication as a real-time messaging system does not end up in a happy place. For this use case, put a system in between that was designed for this purpose, such as Apache Kafka.

It’s difficult to put a number on replication throughput. The answer is always, “It depends.” Things that impact replication performance include, but are not limited to:

Change frequency
Document size
Number of simultaneous replication jobs on the cluster as a whole
Wide (conflicted) document trees
Your reserved throughput capacity settings

For more information, see the following websites:

Blog post on replication topology
IBM Cloudant guide to replication

Use the bulk API

IBM Cloudant has nice API endpoints for bulk loading (and reading) many documents in a single request. Reading many documents in a single request can be much more efficient than reading and writing many documents one at a time. The write endpoint is shown in the following example:

${database}/_bulk_docs

Its main purpose is to be a central part in the replicator algorithm, but it’s available for your use, too, and it’s awesome.

With _bulk_docs, in addition to creating PouchDB, implement create, update, and delete even for single documents this way for fewer code paths.

The following example creates one new document, updates a second existing, and deletes a third document:

curl -XPOST 'https://ACCT.cloudant.com/DB/_bulk_docs' \
     -H "Content-Type: application/json" \
     -d '{"docs":[{"baz":"boo"}, \
         {"_id":"463bd...","foo":"bar"}, \
         {"_id":"ae52d...","_rev":"1-8147...","_deleted": true}]}'

You can also fetch many documents in a single request by issuing a POST to _all_docs (a relatively new endpoint that is called _bulk_get also exists, but this endpoint is probably not what you want. It’s there for a specific internal purpose).

To fetch a fixed set of docs by using _all_docs, POST with a keys body, run the following command:

curl -XPOST 'https://ACCT.cloudant.com/DB/_all_docs' \
     -H "Content-Type: application/json" \
     -d '{"keys":["ab234....","87addef...","76ccad..."]}'

IBM Cloudant (at the time of writing) imposes a max request size of 11 MB. _bulk_docs requests that exceed this size are rejected with a 413: Payload Too Large error.

For more information, see the following websites:

IBM Cloudant bulk operations docs
IBM Cloudant request and doc size limits

Don’t mess with Q, R, and N unless you really know what you are doing

Do not change Q, R, and N unless you really know what you're doing. IBM Cloudant’s quorum and sharding parameters, after you discover them, seem like tempting options to change the behavior of the database.

Stronger consistency, surely I can set the write quorum to the replica count?

No! Recall that no way exists to close the inconsistency window in a cluster.

Don’t go there. The behavior can be much harder to understand especially during network partitions. If you’re using Cloudant-the-service, the default values are fine for most users.

Sometimes, tweaking the shard count for a database is essential to get the best possible performance. If you can’t say why, you’re likely to make your situation worse.

IBM Cloudant is rate limited-let this rate limit inform your code

Cloudant-the-service (unlike basic CouchDB) is sold on a “reserved throughput capacity” model. That means that you pay for the right to use up to a certain throughput, rather than the throughput you end up using. The right to use method takes a while to sink in. One flaky comparison might be that of a cell phone contract where you pay for a set number of minutes regardless of whether you use them or not.

Although the cell phone contract comparison doesn’t capture the whole situation, no constraint exists on the sum of requests that you can make to IBM Cloudant in a month. The constraint is on how fast you make requests.

It’s really a promise that you make to IBM Cloudant, not one that IBM Cloudant makes to you. You promise not to make more requests per second than you agreed to up front. A maximum speed limit, if you like. If you transgress, IBM Cloudant fails your requests with a status of 429: Too Many Requests. It’s your responsibility to look out for this case, and deal with it, which can be difficult when multiple app servers exist. How can they coordinate to ensure that they collectively stay under the requests-per-second limit?

IBM Cloudant’s official client libraries have some built-in provision for this use case that can be enabled, following a “back-off and retry” strategy.

This built-in provision is turned off by default to force you to think about it.

However, if you rely on this facility alone, you might eventually be disappointed. The back-off and retry strategy helps only in cases of temporary transgression, not a persistent butting up against your provisioned throughput capacity limits.

Your business logic must be able to handle this condition. Another way to look at it is that you get the allocation you pay for. If that allocation isn’t sufficient, the only solution is to pay for a higher allocation.

Provisioned throughput capacity is split into three different buckets: Lookups, Writes, and Queries. A Lookup is a “primary key” read, fetching a document based on its _id. A Write is storing a document or attachment on disk, and a Query is looking up documents by using a secondary index (any API endpoint that has a _design or _find in it).

You get different allocations of each and the ratios between them are fixed. This fact can be used to optimize for cost. You get 20 Lookups for every one Query (per second). You might find that you’re mainly hitting the Query limit, but you have plenty of headroom in Lookups. It might be possible to reduce the reliance on Queries through some remodeling of the data or perhaps doing more work client-side.

The corollary here though is that you can’t assume that any third-party library or framework optimizes for cost ahead of convenience. Client-side frameworks that support multiple persistence layers by using plug-ins are unlikely to be aware of this situation, or might be incapable of making such tradeoffs.

Checking for third-party library or framework compatibility before you commit to a particular tool is a good idea.

It is also worth understanding that the rates aren’t directly equivalent to HTTP API endpoint calls. You must expect that, for example, a bulk update counts according to its constituent document writes.

IBM Cloudant documentation on plans and pricing on IBM public cloud

Logging helps you see what’s going on

IBM Cloudant’s logs indicating each API call made, what was requested and how long it took to respond can be automatically spooled to IBM Cloud Logs for analysis and reporting for IBM Cloud-based services. This data is useful for keeping an eye on request volumes, performance, and whether your application is exceeding the provisioned capacity for your IBM Cloudant service.

IBM Cloud Logs is a metered service that offers a variety of retention periods and log consumption tiers. Tiers allow data to be retained to archive to COS, cold or hot searched, and alerted upon at varying costs. Slices and aggregations of your data can be built up into visual dashboards to give you an at-a-glance view of your IBM Cloudant traffic. For more information, see the following documentation:

Compress your HTTP traffic

IBM Cloudant compresses its JSON responses to you if you supply an HTTP header in the request that indicates that your code can handle data in this format:

Request:

> GET /cars/_all_docs?limit=5&include_docs=true HTTP/2
> Host: myhost.cloudant.com
> Accept: */*
> Accept-Encoding: deflate, gzip

Response:                                                                   

< HTTP/2 200 
< content-type: application/json
< content-encoding: gzip

Compressed content occupies a fraction of the size of the decompressed equivalent, meaning that it takes a shorter time to transport the data from IBM Cloudant’s servers to your application.

You might also choose to compress HTTP request bodies by using the Content-encoding header. This practice helps reduce data transfer times when you write documents to IBM Cloudant.