IBM Cloudant backup and recovery
Although data is stored redundantly within an IBM Cloudant cluster, it's important to consider extra backup measures. For example, redundant data storage doesn't protect against mistakes when data is changed.
Review the IBM® Cloudant® for IBM Cloud® Disaster Recovery guide to understand where backup fits in with the other features that IBM Cloudant offers to support Disaster Recovery (DR) and High Availability (HA) requirements.
Introducing CouchBackup
IBM Cloudant provides a supported tool for snapshot backup and restore. The tool is called CouchBackup, and is open source. It's a node.js
library, and you can install it on NPM.
The CouchBackup package includes the library and two command-line tools:
couchbackup
, which dumps the JSON data from a database to a backup text file.couchrestore
, which restores data from a backup text file to a database.
The CouchBackup tools have limitations.
Backing up your IBM Cloudant data
You can do a simple backup by using the couchbackup
tool. To back up the animaldb
database to a text file called backup.txt
, you might use a command similar to the following example:
couchbackup --url "$SERVICE_URL" --db animaldb > backup.txt
The NPM readme file details other options, including the ones in this list:
-
Environment variables to set the names of the database and URL.
-
Using a log file to record the progress of a backup.
-
The ability to resume an interrupted backup.
This option is only available with the log file for the interrupted backup.
-
Sending the backup text file to a named output file, rather than redirecting the
stdout
output.The CouchBackup tools have limitations.
Restoring your IBM Cloudant data
To restore your data, use the couchrestore
tool. Use couchrestore
to import the backup file into a new IBM Cloudant database. Then, ensure that you build all indexes before any application tries to use the restored
data.
For example, to restore the data that was backed up in the earlier example:
couchrestore --url "https://myaccount.cloudant.com" --db newanimaldb < backup.txt
The NPM readme file provides details of other restore options.
Limitations
The CouchBackup tools have the following limitations:
_security
settings aren't backed up by the tools.- Attachments aren't backed up by the tools.
- Backups aren't precise "point-in-time" snapshots. The reason is that the documents in the database are retrieved in batches, but other applications might be updating documents at the same time. Therefore, data in the database can change between the times when the first and last batches are read.
- Index definitions that are held in design documents are backed up, but the content of indexes isn't backed up. This limitation means that when data is restored, the indexes must be rebuilt. The rebuilding might take a considerable amount of time, depending on how much data is restored.
Using the tools
The NPM page details the basics of using the command-line tools for backup and restore of data. The following examples show how to put those details into practice by describing the use of the tools for specific tasks.
The CouchBackup package provides two ways of using its core functions.
- The command-line tools can be embedded into standard UNIX™ command pipelines. For many scenarios, a combination of
cron
and simple shell scripting of thecouchbackup
application is sufficient. - A library usable from Node.js. The library allows more complicated backup processes to be created and deployed, such as determining dynamically which databases must be backed up.
Use either the command-line backup tool, or the library with application code, to enable backup from IBM Cloudant databases as part of more complicated situations. A useful scenario is scheduling backups by using cron
, and automatically
uploading data to
Cloud Object Storage for long-term retention.
Command line scripting examples
You frequently need to meet the following two requirements:
- Saving disk space by 'zipping' the backup file as you create it.
- Creating a backup of a database automatically at regular intervals.
Compressing a backup file
The couchbackup
tool can write a backup file to disk directly, or stream the backup to stdout
. Streaming to stdout
enables data to be transformed before it is written to disk. This feature is used to
compress data within the stream.
couchbackup --url "$SERVICE_URL" \
--db "animaldb" | gzip > backup.gz
In this example, the gzip
tool accepts the backup data directly through its stdin
, compresses the data, and emits it through stdout
. The resulting compressed data stream is then redirected and written
to a file called backup.gz
.
If the database requires you to supply access credentials, use $SERVICE_URL
with the form https://$USERNAME:$PASSWORD@$ACCOUNT
, for example, https://myusername:mypassword@myhost.cloudant.com
.
It's straightforward to extend the pipeline if you want to transform the data in other ways. For example, you might want to encrypt the data before it's written to disk. You might also want to write the data directly to an object store service by using their command-line tools.
Hourly or daily backups that use cron
The cron
scheduling tool can be set up to take snapshots of data at regular intervals.
A useful starting point is to get couchbackup
to write a single backup to a file, where the file name includes the current date and time, as shown in the following example:
couchbackup --url "https://$USERNAME:$PASSWORD@$ACCOUNT.cloudant.com" \
--db "animaldb" > animaldb-backup-`date -u "+%Y-%m-%dT%H:%M:%SZ"`.bak
After you check the command to ensure it works correctly, it can be entered into a 'cron job':
- Install the CouchBackup tools on the server that you want to do the backups.
- Create a folder to store the backups.
- Create a 'cron entry' that describes the frequency of the backup.
You can create a cron entry by using the crontab -e
command. See your system documentation for specific details on the 'cron' options.
A cron entry that runs a daily backup looks similar to the following example:
0 5 * * * couchbackup --url "https://$USERNAME:$PASSWORD@$ACCOUNT.cloudant.com" --db "animaldb" > /path/to/folder/animaldb-backup-`date -u "+%Y-%m-%dT%H:%M:%SZ"`.bak
This cron entry creates a daily backup at 05:00. You can modify the cron pattern to run hourly, daily, weekly, or monthly backups as needed.
Using CouchBackup as a library
The couchbackup
and couchrestore
command-line tools are wrappers around a library that can be used in your own Node.js applications.
The library is useful for more complicated scenarios, for example:
- Backing up several databases in one task. You might do this backup by identifying all the databases by using the
_all_dbs
call, then doing a backup of each database individually. - Longer pipelines increase the risk of errors. By using the CouchBackup library, your application can detect and address any error at the earliest opportunity.
For more information, see the NPM page.
The following script sample shows how to combine the couchbackup
library with use of IBM® Cloud Object Storage. This code illustrates how you might use Cross Region S3 API to back up a database to an object store.
A prerequisite for the code is that you initialize the S3 client object for IBM Cloud Object Storage by following the instructions in IBM Cloud Object Storage - S3 API Intro.
/*
Backup directly from Cloudant to an S3 bucket via a stream.
@param {string} couchHost - URL of database root
@param {string} couchDatabase - backup source database
@param {object} s3Client - S3 client object
@param {string} s3Bucket - Destination S3 bucket (must exist)
@param {string} s3Key - Destination object's key (shouldn't exist)
@param {boolean} shallow - Whether to use couchbackup's shallow mode
@returns {Promise}
*/
function backupToS3(sourceUrl, s3Client, s3Bucket, s3Key, shallow) {
return new Promise((resolve, reject) => {
debug('Setting up S3 upload to ${s3Bucket}/${s3Key}');
// A pass through stream that has couchbackup's output
// written to it and it then read by the S3 upload client.
// It has a 10 MB internal buffer.
const streamToUpload = new stream.PassThrough({highWaterMark: 10485760});
// Set up S3 upload.
const params = {
Bucket: s3Bucket,
Key: s3Key,
Body: streamToUpload
};
s3Client.upload(params, function(err, data) {
debug('S3 upload done');
if (err) {
debug(err);
reject(new Error('S3 upload failed'));
return;
}
debug('S3 upload succeeded');
debug(data);
resolve();
}).httpUploadProgress = (progress) => {
debug('S3 upload progress: ${progress}');
};
debug('Starting streaming data from ${sourceUrl}');
couchbackup.backup(
sourceUrl,
streamToUpload,
(err, obj) => {
if (err) {
debug(err);
reject(new Error('CouchBackup failed with an error'));
return;
}
debug('Download from ${sourceUrl} complete.');
streamToUpload.end(); // must call end() to complete S3 upload.
// resolve() is called by the S3 upload
}
);
});
}
Other disaster recovery options
Return to the IBM Cloudant Disaster Recovery guide to find out about the other features IBM Cloudant offers for a full disaster recovery setup.