IBM Cloud Docs
High availability and disaster recovery

High availability and disaster recovery

IBM Watson® Knowledge Studio is highly available within multiple IBM Cloud locations (for example, Dallas and Washington, DC). However, recovering from potential disasters that affect an entire location requires planning and preparation.

You are responsible for understanding your configuration, customization, and usage of the service. You are also responsible for being ready to re-create an instance of the service in a new location and to restore your data in any location. For more information, see How do I ensure zero downtime?

High availability

Knowledge Studio supports high availability with no single point of failure. The service achieves high availability automatically and transparently by using the multi-zone region (MZR) feature provided by IBM Cloud.

IBM Cloud enables multiple zones that do not share a single point of failure within a single location. It also provides automatic load balancing across the zones within a region.

Disaster recovery

Disaster recovery can become an issue if an IBM Cloud location experiences a significant failure that includes the potential loss of data. Because MZR is not available across locations, you must wait for IBM to bring a location back online if it becomes unavailable. If underlying data services are compromised by the failure, you must also wait for IBM to restore those data services.

If a catastrophic failure occurs, IBM might not be able to recover data from database backups. In this case, you need to restore your data to return your service instance to its most recent state. You can restore the data to the same or to a different location.

Your disaster recovery plan includes knowing, preserving, and being prepared to restore all data that is maintained on IBM Cloud. This stored data includes the training data for your models.

Re-creating models from saved data takes time. You can maintain parallel service configurations in multiple locations to help eliminate the turnaround time associated with disaster recovery.

Disaster recovery for models

For models, understand which data can be backed up, restore and re-create necessary artifacts, and then retrain and redeploy the models.

Backing up data

Some data can be backed up, and some must be re-created:

  1. Understand which data can be backed up
  2. Prepare for backup
  3. Download artifacts from the current instance

Restoring data, models, and tasks

To recover from a disaster:

  1. Recreate workspaces on the new instance
  2. Restore the workspace data
  3. Restore the models
  4. Restore any incomplete annotation tasks