Using the IBM Cloudant changes feed FAQ
An IBM Cloudant database's changes feed's primary use-case is to power the replication of data from a source to a target database. The IBM Cloudant replicator is built to handle the changes feed and runs the necessary checks to ensure data is copied accurately to its destination.
IBM Cloudant has a raw changes feed API that can be used to consume a single database's changes but it must be used with care.
The _changes
API endpoint can be used in several ways and can output data in various formats. But here we focus on best practice and how to avoid some pitfalls when you develop against the _changes
API.
How do I consume the changes feed?
Given a single database orders
, I can ask the database for a list of changes, in this case, limiting the result set to five changes with ?limit=5
:
GET /orders/_changes?limit=5
{
"results": [
{
"seq": "1-g1AAAAB5eJzLYWBg",
"id": "00002Sc12XI8HD0YIBJ92n9ozC0Z7TaO",
"changes": [
{
"rev": "1-3ef45fdbb0a5245634dc31be69db35f7"
}
]
},
....
],
"last_seq": "5-g1AAAAB5eJzLYWBg"
}
The API call returns the following changes:
results
- An array of changes.
last_seq
- A token that can be supplied to the changes endpoint in a subsequent API call to get the next batch of changes.
See how to fetch the next batch of changes in the following example:
GET /orders/_changes?limit=5&since=5-g1AAAAB5eJzLYWBg
{
"results": [ ...],
"last_seq": "10-g1AAAACbeJzLY"
}
The since
parameter is used to define where in the changes feed you want to start from:
since=0
- The beginning of the changes feed.
since=now
- The end of the changes feed.
since=<a last seq token>
- From a known place in the changes feed.
At face value, following the changes feed seems as simple as chaining _changes
API calls together. Then, IBM Cloudant passes the last_seq
from one changes feed
response into the next request's since
parameter. But some subtleties to the changes feed need further discussion.
Why does the changes feed deliver each change at least one time?
The IBM Cloudant Standard changes feed promises to return each document at least one time, which isn't the same as promising to return each document only one time. Put another way, it is possible for a consumer of the changes feed
to see the same change again, or indeed a set of changes repeated.
A consumer of the changes feed must treat the changes idempotently. In practice, you must remember whether a change was already dealt with before you trigger an action from a change. A naive changes feed consumer might send a message to a smartphone on every change received. But a user might receive duplicate text messages if a change is not treated idempotently when replayed changes occur.
Usually these "rewinds" of the changes feed are short, replaying only a handful of changes. But in some cases, a request might see a response with thousands of changes replayed - potentially all of the changes from the beginning of
time. The potential for rewinds
makes the changes feed
unsuitable for an application that expects queue-like behavior.
To reiterate, IBM Cloudant's changes feed promises to deliver a document at least one time in a changes feed, and gives no guarantees about repeated values across multiple requests.
Does the changes feed operate in "real time"?
The changes feed doesn't guarantee how quickly an incoming change appears to a client that consumes the changes feed. Applications must not be developed with the assumption that data inserts, updates, and deletes are immediately propagated to a changes reader.
Why don't all individual document changes appear in the changes feed?
If a document is updated several times in between changes feed calls, then the changes feed might reflect only the most recent of these changes. The client does not receive every change to every document.
The IBM Cloudant changes feed isn't a transaction log that contains every event that happened in time order.
Can I use a filtered changes feed for operational queries?
Filtering the changes feed, and by extension, running filtered replication has its uses:
- Copying data from source to target but ignoring deleted documents.
- Copying data but without index definitions (design documents).
This blog post describes how supplying a selector
during replication makes work of these use cases run smoothly.
The changes feed with an accompanying selector
parameter is not the way to extract slices of data from the database on a routine basis. It must not be used as a means of running operational queries against a database. Filtered
changes are slow (the filter is applied to every changed document in turn, without the help of an index). This process is much slower than creating a secondary index (such as a MapReduce view) and querying that view.
Does a feed=continuous
changes feed continue to run indefinitely?
No, IBM Cloudant does not guarantee connection duration for a continuous changes feed. It might be regularly disconnected by the server for any number of reasons, which include maintenance, security, or network errors. Code that uses the changes
feed must be designed to use a recently saved sequence ID as a since
value to make a new request to resume the changes feed after an error or disconnection.
Why doesn't the changes feed guarantee time-ordering?
If the use case is based on the following statement, then this result cannot be achieved with the IBM Cloudant changes feed.
"Fetch me every document that has changed since a known date, in the order they were written."
The IBM Cloudant database does not record the time as each document change was written. The changes feed makes no guarantees on the ordering of the changes in the feed - they are not guaranteed to be in the order they were sent to the database.
However, you can achieve this use case by storing the date of change in the document body:
{
"_id": "2657",
"type": "order",
"customer": "bob@aol.com",
"order_date": "2022-01-05T10:40:00",
"status": "dispatched",
"last_edit_date": "2022-01-14T19:17:20"
}
And you can create a MapReduce view with last_edit_date
as the key:
function(doc) {
emit(doc.last_edit_date, null)
}
This view can be queried to return any documents that are modified on or after a supplied date and time:
/orders/_design/query/_view/by_last_edit?startkey="2022-01-13T00:00:00"
This technique produces a time-ordered set of results with no repeated values in a performant and repeatable fashion. The consumer of this data does not need to manage the data idempotently, making for a simpler development process.
What is the IBM Cloudant changes feed good for now?
The IBM Cloudant changes feed is good for the following tasks:
- Powering IBM Cloudant replication, optionally with a selector to filter some changes.
- Clients consuming the changes feed in batches but dealing with each change idempotently while not being concerned with sort order and expecting to see some changes more than one time.
The IBM Cloudant changes feed is not good for the following components:
- A message queue. For more information, see IBM Messages for RabbitMQ for managing queues.
- A message broker. For more information, see IBM Event Streams for handling scalable, time-ordered streams of events.
- A real-time publish and subscribe system. For more information, see IBM Databases for Redis for handling publish and subscribe topics.
- A transaction log. Some databases store each change in a transaction log, but the distributed and eventually consistent nature of IBM Cloudant means that no definitive time-ordered transaction log exists.
- A querying mechanism. For more information, see MapReduce Views for creating views of your data that is ordered by a key of your choice.