Understanding high availability and disaster recovery for Cloud Databases
This document covers all the IBM Cloud® Databases, which include Databases for Elasticsearch, Databases for EnterpriseDB, Databases for etcd, Databases for MongoDB, Databases for PostgreSQL, Databases for Redis, Messages for RabbitMQ, and Databases for MySQL.
IBM Cloud® Databases instances are deployed in either a multi-zone region (MZR) (for example, Dallas, Frankfurt, London, Sydney, Tokyo, and Washington), or a single-campus multizone region (for example, Chennai). Each instance is deployed in a highly available configuration; that is, data is replicated by each database onto one or more servers, making the data highly available during normal operations.
- In MZRs, database members are distributed across different data centers, or zones.
- In single-campus multizone regions, database members are distributed across different hosts.
If a single-campus multizone region failure in an MZR or a hardware failure in any region occurs, your data is still accessible as it is replicated onto other fully functioning database servers. Such issues are addressed by IBM Cloud® Specialists in place.
You can consult your Cloud Databases documentation for more details on how your specific database replicates data among each of its members.
In addition to the high-availability configuration, for deployments in IBM Cloud Multi-Zone Regions, your data is snapshotted and backed up daily by the IBM Cloud® Databases platform and stored in cross-region Cloud Object Storage buckets. For most IBM Cloud single-campus multizone regions, your data is backed up locally in Single-campus multizone region Cloud Object Storage buckets.
If a complete region failure occurs, the database servers in the region might not be accessible, but the backup data remains available. You can initiate a restore from these backups into an available region from the service management console. Consult your Cloud Databases backups page for more details.
It is your responsibility to create a new service instance in which to restore when the IBM Cloud® Databases platform is restored. You are also responsible for testing the validity and restore time of your backups. For more information, see Disaster recovery in the Responsibilities for Cloud Databases page.
Application-level high availability
Applications that communicate over networks and cloud services are subject to transient connection failures. You want to design your applications to retry connections when errors are caused by a temporary loss in connectivity to your deployment or to IBM Cloud.
Because Cloud Databases is a managed service, regular updates and database maintenance occur as part of normal operations. Such maintenance can occasionally cause short intervals where your database is disabled.
Your applications must be designed to handle temporary interruptions to the database, implement error handling for failed database commands, and implement retry logic to recover from a temporary interruption.
Several minutes of database unavailability or connection interruptions are not expected. Open a support ticket with details if you have time periods longer than a minute with no connectivity so we can investigate.
If you have deployments in more than one region, you must provision IBM Cloud® Monitoring and enable platform metrics in each region. For more information, see IBM Cloud Monitoringyour Cloud Databases deployment's IBM Cloud Monitoring page
SLAs
See How do I ensure zero downtime? to learn more about the high availability and disaster recovery standards in IBM Cloud.
All IBM Cloud® Databases general availability (GA) offerings conform to the IBM Cloud® Service Level Agreement (SLA) terms.
For more information, see the Responsibilities for Cloud Databases page.