Using mirroring in a disaster recovery example scenario
This end-to-end disaster recovery scenario demonstrates how to use mirroring to provide increased availability and keep applications working should a major incident affects a full IBM Cloud region.
Two clusters were provisioned in different regions and configured for mirroring (following the information in Enabling mirroring) by using A and B as cluster aliases. A producer
publishes records to a topic called accounting.invoices
and a consumer reads the messages from that topic in cluster A.
Failing over producers
Perform the following steps to fail over:
- Stop the producers that were pointing to cluster A.
- If cluster A and the link from A to cluster B is still operational, ensure that as much data as possible is mirrored by checking that the lag on those topics on cluster B is zero.
- Restart the producers to point to cluster B's endpoints.
- Disable any mirroring that is still enabled on topics from cluster A to cluster B. This can be done by using the User controls.
The producer is now switched to cluster B and sends messages to a new local topic with the same name as the original.
Failing over consumers
Perform the following steps to fail over:
- If cluster A and the link from A to cluster B is still operational, allow the consumers to read all the message data in the topics on cluster A, and commit their offsets at the end of the topic.
- Stop the consumers that were pointing to cluster A.
- Restart the consumers to point to cluster B's endpoints.
The consumer is now able to continue to consume the existing messages from the accounting.invoices.A
topic from cluster B while new messages come from accounting.invoices
.
If the application requires strict ordering, remote topics are to be fully consumed first before starting to consume from local topics. This way, messages are processed in the order that they were produced.
Resetting a mirroring environment
The Event Streams service instance owner is responsible for deciding what happens when cluster A is recovered.
In case cluster A is not recoverable, the Event Streams service instance owner is responsible for enabling mirroring between cluster B and a newly provisioned instance. To enable mirroring to a new instance, complete the following steps:
- Fail over the producers and consumers from A to B as described previously.
- Disable the current mirroring from cluster A to cluster B. For more information, see Disabling mirroring.
- After mirroring has been disabled, enable mirroring between cluster B (now the source) to the newly provisioned cluster (the target). For more information, see Enabling mirroring.
Alternatively, if cluster A has recovered, typically a user returns operations to cluster A. Complete the following steps to return primary operations to cluster A.
Before failing back, mirroring must be enabled in the opposite direction:
- Ensure that cluster A is fully operational.
- Disable the current mirroring from cluster A to cluster B. For more information, see Disabling mirroring.
- After mirroring has been disabled, enable mirroring between cluster B (now the source) and cluster A (now the target). For more information, see Enabling mirroring.
- The source cluster A now becomes the target cluster.
- The target cluster B becomes the new source cluster.
- Enable any topics to be mirrored from cluster B to cluster A. You can do this by using the User controls.
Next, make sure that data is being replicated into cluster A by examining the topics from cluster B appearing on cluster A. These topics have the suffix from the new source cluster, B.
Do not mirror back the original target topic on cluster B as that would cause an undesirable cyclic effect. As shown in the diagram, we mirror accounting.invoices from cluster B to cluster A, not accounting.invoices.A.
Failback
The decision to fail back is again owned by the Event Streams instance owner. The service instance owner must coordinate the failback of applications including their reconfiguration, redeployment, and restart if necessary.
Unlike the failover case, in this case there was no disaster on cluster B. Therefore, failback is a controlled operation and can be achieved with minimal data loss or reprocessing of data.
Finally, switch the mirroring back to the original configuration, which means that cluster A is again the source and cluster B resumes as the target.
- Ensure that cluster A and cluster B are fully operational.
- Disable the current mirroring from cluster B to cluster A. For more information, see Disabling mirroring.
- After mirroring has been disabled, enable mirroring between cluster A (now the source) and cluster B (now the target). For more information, see Enabling mirroring.