Understanding high availability for Cloud Databases

This document covers all the IBM Cloud® Databases, which include Databases for PostgreSQL, Databases for MongoDB, Databases for Redis, Databases for Elasticsearch, IBM Cloud® Databases for MySQL, Messages for RabbitMQ, Databases for EnterpriseDB and Databases for etcd.

Regions

IBM Cloud® Databases instances are deployed in either a multi-zone region (MZR) (for example, Dallas, Frankfurt, London, Sydney, Tokyo, and Washington), or a single-campus multizone region (for example, Chennai). Each instance is deployed in a highly available configuration; that is, data is replicated by each database onto one or more servers, making the data highly available during normal operations.

In MZRs, database members are distributed across different data centers, or zones.
In single-campus multizone regions, database members are distributed across different hosts.

If a single-campus multizone region failure in an MZR or a hardware failure in any region occurs, your data is still accessible as it is replicated onto other fully functioning database servers. Such issues are addressed by IBM Cloud® Specialists in place.

For more information on how your specific database replicates data among each of its members, see your Cloud Databases documentation.

Backups

In addition to the high-availability configuration, for deployments in IBM Cloud® multi-zone regions, your data is snapshotted and backed up daily by the IBM Cloud® Databases platform and stored in cross-region Cloud Object Storage buckets.
For most IBM Cloud® single-campus multizone regions, your data is backed up locally in Single-campus multizone region Cloud Object Storage buckets.

If a complete region failure occurs, the database servers in the region might not be accessible, but the backup data remains available. You can initiate a restore from these backups into an available region from the service management console. For more information, see the Cloud Databases backups documentation.

It is your responsibility to create a new service instance in which to restore when the IBM Cloud® Databases platform is restored. You are also responsible for testing the validity and restore time of your backups. For more information, see the Disaster recovery section in the Shared responsibilities for Cloud Databases page.

Application-level high availability

Applications that communicate over networks and cloud services are subject to transient connection failures. You want to design your applications to retry connections when errors are caused by a temporary loss in connectivity to your deployment or to IBM Cloud.

Because Cloud Databases is a managed service, regular updates and database maintenance occur as part of normal operations. Such maintenance can occasionally cause short intervals where your database is disabled.

Your applications must be designed to handle temporary interruptions to the database, implement error handling for failed database commands, and implement retry logic to recover from a temporary interruption.

Several minutes of database unavailability or connection interruptions are not expected. Open a support ticket with details if you have time periods longer than a minute with no connectivity so we can investigate.

If you have deployments in more than one region, you must provision IBM Cloud® Monitoring and enable platform metrics in each region. For more information, see IBM Cloud Monitoring integration.

SLAs

See How IBM Cloud ensures high availability and disaster recovery to learn more about the high availability and disaster recovery standards in IBM Cloud.
All IBM Cloud® Databases general availability (GA) offerings conform to the IBM Cloud® Service Level Agreement (SLA) terms.
For more information, see the Responsibilities for Cloud Databases page.