Understanding high availability and disaster recovery for VCF as a Service

End of Marketing: As of 31 October 2025, new deployments of VMware Solutions offerings are no longer available for new customers. Existing customers can still use and expand their active VMware® workloads on IBM Cloud®. For more information, see End of Marketing for VMware on IBM Cloud.

High availabilityThe ability of a service or workload to withstand failures and continue providing processing capability according to some predefined service level. (HA) is the ability for a service to remain operational and accessible in the presence of unexpected failures. Disaster recoveryThe ability of a service or workload to recover from rare, major incidents and wide-scale failures, such as service disruption. This includes a physical disaster that affects an entire region, corruption of a database, or the loss of a service contributing to a workload. The impact exceeds the ability of the high availability design to handle it. is the process of recovering the service instance to a working state.

IBM Cloud for VMware Cloud Foundation as a Service is a highly available regional service that is designed for availability during a zonal outage. VMware Cloud Foundation as a Service is designed to meet the Service Level Objectives (SLO) with the Standard plan.

For more information about the available region and data center locations, see Service and infrastructure availability by location.

High availability

High availability features

VMware Cloud Foundation as a Service supports the following high availability features:

HA features for VMware Cloud Foundation as a Service
Feature	Description	Consideration
Compute regional HA	Ensures workload uptime through maintaining resources to run workloads across two zones. Workloads migrate to the healthy zone when there is a zonal failure.	Available in the Washington DC region for both networking and compute resources.
Network regional HA	Ensures that workloads maintain networking durability across zonal failures.	Deploy the HA edge on a stretched resource pool or consolidate your network across two resource pools in a multizone region.
Swap locations	Swap primary and secondary network locations for a highly available network edge.	The primary location is the preferred location for your workloads. When there is an outage at the primary location, the secondary location temporarily becomes the active location.

Disaster recovery

Disaster recovery features

VMware Cloud Foundation as a Service supports the following disaster recovery features:

DR features for VMware Cloud Foundation as a Service
Feature	Description	Consideration
VMware Cloud Director Availability	Replicate workloads from a source VMware Cloud Foundation as a Service environment over to a second VMware Cloud Foundation as a Service environment. You can replicate source workloads to any VMware environment, including IBM Cloud, other Cloud vendors, and on-premises.	Included by default in all multitenant virtual data centers (VDCs) and optionally included in your single-tenant Cloud Director site order.
Veeam® Backup	Achieve cyber-secure recovery points for your applications and data.	Service charges are incurred only if you choose to include the service in your order.

Planning for disaster recovery

The DR steps must be practiced regularly. As you build your plan, consider the following failure scenarios and resolutions.

DR scenarios for VMware Cloud Foundation as a Service
Failure	Resolution
Hardware failure (single point)	VMware Cloud Foundation as a Service as a managed service has an operations team that manages the hardware to ensure enough hardware capacity to maintain VMware workload stability through hardware failures.
Zone failure	VMware Cloud Foundation as a Service management components that support each customer VMware environment are run cross-zone as the default configuration and are resilient if a single zone failure occurs, including the VMware Director Console. Customer workloads must use one of the HA solutions or VMware Cloud Director Availability DR for cross-zone resilience.
Data corruption	All management components that support customer VMware environments are backed up on intervals and can be restored by the IBM operations team if corrupted. It is recommended that customers use Veeam with VMware Cloud Foundation as a Service to create regular backups. With regular backups, customers can self-service a recovery of data.
Regional failure	VMware Cloud Foundation as a Service management components nor workloads are resilient across regions. If the full region fails an approach to run workload in an alternative region is required. A disaster recovery solution such as VMware Cloud Director Availability is required to maintain availability

Your responsibilities for HA and DR

It is your responsibility to continuously test your plan for HA and DR.

Interruptions in network connectivity and short periods of unavailability of a service might occur. It is your responsibility to make sure that application source code includes client availability retry logic to maintain high availability of the application.

For more information about responsibility ownership between you and IBM Cloud for VMware Cloud Foundation as a Service, see Shared responsibilities for using IBM Cloud products.

For more information about your responsibilities, see Understanding your responsibilities when using VMware Cloud Foundation as a Service.

Recovery time objective (RTO) and recovery point objective (RPO)

IBM Cloud has business continuityThe capability of a business to withstand outages and to operate mission-critical services normally and without interruption in accordance with predefined service-level agreements. plans in place to provide for the recovery of services within hours if a disaster occurs. Customers are responsible for deployed workload data backup and associated recovery of your content.

VMware Cloud Foundation as a Service provides mechanisms to protect your data and restore service functions. Business continuity plans are in place to achieve targeted recovery point objectiveIn disaster recovery planning, the time at which data is restored measured in time (seconds, minutes, hours) starting at the recovered instance and ending at the point of disaster. (RPO) and recovery time objectiveIn disaster recovery planning, the duration of time for a business process to be restored after a disaster. (RTO) for the service. The following table outlines the targets for VMware Cloud Foundation as a Service.

RPO and RTO for VMware Cloud Foundation as a Service
Disaster recovery objective	Target value	Method
RPO	24 h	Use a backup provider such as Veeam Backup and Recovery to store periodic backups of your workload.
RPO	Minutes	Use a replication provider such as Veeam to replicate your workload to another location.
RTO	Minutes to hours	The recovery time objective depends on the storage medium that is used for your backups and on how long it takes for your workload to be ready from a cold start.

Change management

Change management includes tasks such as upgrades, configuration changes, and deletion.

Consider creating a manual backup before you upgrade to a new version of VMware Cloud Foundation as a Service.

Grant users and processes the IAM roles and actions with the least privilege that is required for their work. For more information, see How can I prevent accidental deletion of services?.

How IBM supports disaster recovery planning

IBM takes specific recovery actions for VMware Cloud Foundation as a Service if a disaster occurs. These actions and disaster response are practiced and validated yearly.

How IBM recovers from zone failures

If a zone failure occurs, IBM resolves the zone outage. When the zone is restored, the global load balancer automatically resumes routing traffic to the restored instance without customer action.

How IBM recovers from regional failures

If regional data for the management components remains intact, the service instance is restored to its previous state with the same configuration.

If regional state for the management components is corrupted, the service is restored from the last internal backups that are stored in a cross-region IBM Cloud Object Storage bucket. This type of corruption can result in 24 hours of data loss.

After the management components are restored, workloads that are deployed by the customer must be restored by the customer by using either the Veeam Backup and Recovery solution or with VMware Cloud Director Availability.

If IBM can’t restore the service instance, you must restore the service as described in the Disaster recovery architecture.

How IBM maintains services

All upgrades follow IBM service best practices, including recovery plans and rollback processes. Regular maintenance might cause short interruptions, mitigated by client availability retry logic. Changes are rolled out sequentially, region by region, and zone by zone within a region. IBM reverts updates at the first sign of a defect.

Complex changes are enabled and disabled with feature flags to control exposure.

Changes that impact customer workloads are detailed in IBM Cloud notifications. For more information about planned maintenance, announcements, and release notes that impact this service, see Viewing cloud status.