Configuring SAP HANA active/active (read enabled) system replication in a Red Hat Enterprise Linux High Availability Add-On cluster

The following information describes the configuration of a Red Hat Enterprise Linux (RHEL) High Availability Add-On cluster for managing SAP HANA Active-Active (Read Enabled) System Replication. The cluster uses virtual server instances in IBM® Power® Virtual Server as cluster nodes.

In an Active/Active (read enabled) configuration, SAP HANA system replication allows read access to the database content on the secondary system.

This information is intended for architects and specialists that are planning a high-availability deployment of SAP HANA on Power Virtual Server.

Before you begin

Review the general requirements, product documentation, support articles, and SAP notes listed in Implementing high availability for SAP applications on IBM Power Virtual Server References.

Prerequisites

A Red Hat High Availability cluster is deployed on two virtual server instances in Power Virtual Server.
- Install and set up the RHEL HA Add-On cluster according to Implementing a Red Hat Enterprise Linux High Availability Add-On cluster.
- Configure and verify fencing as described in the preceding document.
The virtual server instances need to fulfill hardware and resource requirements for the SAP HANA systems in scope. Follow the guidelines in the Planning your deployment document.
The hostnames of the virtual server instances must meet the SAP HANA requirement.
SAP HANA is installed on both virtual server instances and SAP HANA System Replication is configured. The installation of SAP HANA and setup of SAP HANA System Replication is not specific to the Power Virtual Server environment, and you need to follow the standard procedures.

Setting up the Active/Active (read enabled) scenario

The Active/Active (read enabled) system replication scenario is an extension of the setup that is described in Configuring SAP HANA scale-up system replication in a Red Hat Enterprise Linux High Availability Add-On cluster.

Complete the setup for the production system System Replication cluster before you continue with the following steps.

Changing the system replication operation mode to Active/Active (read enabled)

On the node that is running the secondary instance, run the following command to change the operation mode.

Put the cluster in maintenance mode.
```
pcs property set maintenance-mode=true
```
Stop the secondary SAP HANA instance.
```
sudo -i -u ${sid}adm -- \
    HDB stop
```

Change the system replication operation mode.

sudo -i -u ${sid}adm -- \
    hdbnsutil -sr_changeOperationMode --mode=logreplay_readaccess

Start the secondary SAP HANA instance.

sudo -i -u ${sid}adm -- \
    HDB start

Remove the cluster from maintenance mode.

pcs property set maintenance-mode=false

Configuring cluster resources for an Active/Active (read enabled) scenario

Use the following information to configure the additional cluster resources that are required for an Active/Active (read enabled) scenario.

Creating a secondary virtual IP address resource

Review the information in Reserving virtual IP addresses and reserve a virtual IP address for the secondary.

Use the reserved IP address to create a virtual IP address resource. This virtual IP address allows clients to connect to the secondary HANA instance for read-only queries.

On a cluster node, assign the reserved IP address to a VIP_SECONDARY environment variable and create the virtual IP address cluster resource by running the following commands.

export VIP_SECONDARY=<reserved IP address for SAP HANA secondary>

echo $VIP_SECONDARY

pcs resource create vip_s_${SID}_${INSTNO} IPaddr2 ip=$VIP_SECONDARY

Check the configured virtual IP address and the cluster status.

pcs resource config vip_s_${SID}_${INSTNO}

pcs status --full

Creating location constraints for the secondary virtual IP address

Create a cluster constraint to make sure that the secondary virtual IP address is placed on the cluster node that is running the secondary instance.

On a cluster node, run the following commands.

pcs constraint location vip_s_${SID}_${INSTNO} rule \
    score=INFINITY hana_${sid}_sync_state eq SOK \
    and hana_${sid}_roles eq 4:S:master1:master:worker:master

pcs constraint location vip_s_${SID}_${INSTNO} rule \
    score=2000 hana_${sid}_sync_state eq PRIM \
    and hana_${sid}_roles eq 4:P:master1:master:worker:master

These location constraints establish the following behavior for the second virtual IP resource:

If both SAP HANA primary and SAP HANA secondary are available, and SAP HANA system replication state is SOK, then the secondary virtual IP address is assigned to the node where SAP HANA secondary is active.
If the SAP HANA secondary node is not available or SAP HANA system replication state is not SOK, then the secondary virtual IP is assigned to the node where SAP HANA primary is active. When the SAP HANA secondary becomes available and the SAP HANA system replication state is SOK again, the second virtual IP address moves back to the node where the SAP HANA secondary is active.
If SAP HANA primary or the node where it is running becomes unavailable then the SAP HANA secondary takes over the primary role. The second virtual IP remains on the node until the other node turns into SAP HANA secondary role and SAP HANA system replication state is SOK again.

This behavior maximizes the time that the secondary virtual IP resource is assigned to a node where a healthy SAP HANA instance is running.

The cluster configuration for the Active/Active (read enabled) scenario is complete.

Checking the cluster configuration

On a cluster node, run the following command to check the status of the cluster resources.

pcs status --full

Sample output:

# pcs status --full
Cluster name: H4S_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: cl-h4s-1 (1) (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
  * Last updated: Mon Jul 31 11:46:11 2023
  * Last change:  Mon Jul 31 11:44:34 2023 by root via crm_attribute on cl-h4s-1
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ cl-h4s-1 (1) cl-h4s-2 (2) ]

Full List of Resources:
  * res_fence_ibm_powervs	(stonith:fence_ibm_powervs):	 Started cl-h4s-1
  * vip_H4S_00_primary	(ocf::heartbeat:IPaddr2):	 Started cl-h4s-1
  * Clone Set: SAPHanaTopology_H4S_00-clone [SAPHanaTopology_H4S_00]:
    * SAPHanaTopology_H4S_00	(ocf::heartbeat:SAPHanaTopology):	 Started cl-h4s-2
    * SAPHanaTopology_H4S_00	(ocf::heartbeat:SAPHanaTopology):	 Started cl-h4s-1
  * Clone Set: SAPHana_H4S_00-clone [SAPHana_H4S_00] (promotable):
    * SAPHana_H4S_00	(ocf::heartbeat:SAPHana):	 Slave cl-h4s-2
    * SAPHana_H4S_00	(ocf::heartbeat:SAPHana):	 Master cl-h4s-1
  * vip_s_H4S_00	(ocf::heartbeat:IPaddr2):	 Started cl-h4s-2

Node Attributes:
  * Node: cl-h4s-1 (1):
    * hana_h4s_clone_state            	: PROMOTED
    * hana_h4s_op_mode                	: logreplay_readaccess
    * hana_h4s_remoteHost             	: cl-h4s-2
    * hana_h4s_roles                  	: 4:P:master1:master:worker:master
    * hana_h4s_site                   	: SiteA
    * hana_h4s_srmode                 	: syncmem
    * hana_h4s_sync_state             	: PRIM
    * hana_h4s_version                	: 2.00.070.00.1679989823
    * hana_h4s_vhost                  	: cl-h4s-1
    * lpa_h4s_lpt                     	: 1690796675
    * master-SAPHana_H4S_00           	: 150
  * Node: cl-h4s-2 (2):
    * hana_h4s_clone_state            	: DEMOTED
    * hana_h4s_op_mode                	: logreplay_readaccess
    * hana_h4s_remoteHost             	: cl-h4s-1
    * hana_h4s_roles                  	: 4:S:master1:master:worker:master
    * hana_h4s_site                   	: SiteB
    * hana_h4s_srmode                 	: syncmem
    * hana_h4s_sync_state             	: SOK
    * hana_h4s_version                	: 2.00.070.00.1679989823
    * hana_h4s_vhost                  	: cl-h4s-2
    * lpa_h4s_lpt                     	: 30
    * master-SAPHana_H4S_00           	: 100

Migration Summary:

Tickets:

PCSD Status:
  cl-h4s-1: Online
  cl-h4s-2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

On a cluster node, run the following command to check the defined constraints.

pcs constraint --full

Sample output:

# pcs constraint --full
Location Constraints:
  Resource: vip_s_H4S_00
    Constraint: location-vip_s_H4S_00
      Rule: boolean-op=and score=INFINITY (id:location-vip_s_H4S_00-rule)
        Expression: hana_h4s_sync_state eq SOK (id:location-vip_s_H4S_00-rule-expr)
        Expression: hana_h4s_roles eq 4:S:master1:master:worker:master (id:location-vip_s_H4S_00-rule-expr-1)
    Constraint: location-vip_s_H4S_00-1
      Rule: boolean-op=and score=2000 (id:location-vip_s_H4S_00-1-rule)
        Expression: hana_h4s_sync_state eq PRIM (id:location-vip_s_H4S_00-1-rule-expr)
        Expression: hana_h4s_roles eq 4:P:master1:master:worker:master (id:location-vip_s_H4S_00-1-rule-expr-1)
Ordering Constraints:
  promote SAPHana_H4S_00-clone then start vip_H4S_00_primary (kind:Mandatory) (id:order-SAPHana_H4S_00-clone-vip_H4S_00_primary-mandatory)
  start SAPHanaTopology_H4S_00-clone then start SAPHana_H4S_00-clone (kind:Mandatory) (non-symmetrical) (id:order-SAPHanaTopology_H4S_00-clone-SAPHana_H4S_00-clone-mandatory)
Colocation Constraints:
  vip_H4S_00_primary with SAPHana_H4S_00-clone (score:2000) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-vip_H4S_00_primary-SAPHana_H4S_00-clone-2000)
Ticket Constraints:

Checking access to the read enabled secondary SAP HANA instance

You can use SAP HANA system replication Active/Active (read enabled) to connect to the secondary system for improved overall performance. Two connection methods are available to access the read enabled secondary HANA instance:

Explicit read-only connection The application opens an explicit connection to the secondary HANA instance.
Hint-based statement routing An application, for example SAP S/4HANA, opens a connection to the primary HANA instance. On this connection, SQL statements with system replication-specific hints are first prepared, and then executed. During their execution, the SQL statements are automatically routed to the secondary system and processed there. For more information about hints, see the SAP HANA SQL and System Views Reference Guide.

Set the following two environment variables to the virtual IP addresses for the SAP HANA primary and secondary.

export VIP_PRIMARY=<virtual IP address of SAP HANA primary>
export VIP_SECONDARY=<virtual IP address of SAP HANA secondary>

The commands in the following two sections prompt for the password of the SAP HANA SYSTEM user. The command output shows the hostname and the IP addresses of the SAP HANA system that ran the SQL statement.

Checking access by using an explicit read-only connection

Verify the connection to the secondary instance by using an explicit read-only connection.

On a cluster node, run the following command.

sudo -i -u ${sid}adm -- \
    hdbsql -n $VIP_SECONDARY -i $INSTNO -d SYSTEMDB -u SYSTEM \
      "select * from m_host_information \
      where key = 'net_hostnames' or key = 'net_ip_addresses'"

The sample output shows that the statement ran on the SAP HANA secondary.

HOST,KEY,VALUE
"cl-h4s-2","net_hostnames","cl-h4s-2"
"cl-h4s-2","net_ip_addresses","10.40.10.132,10.40.10.211"
2 rows selected (overall time 7518 usec; server time 291 usec)

Checking access by using hint-based statement routing

Verify the connection to the secondary instance by using the hint-based statement routing.

Run a connection test by using an explicit connection to the SAP HANA primary without an SQL hint.

On a cluster node, run the following command.

sudo -i -u ${sid}adm -- \
    hdbsql -n $VIP_PRIMARY -i $INSTNO -d SYSTEMDB -u SYSTEM \
      "select * from m_host_information \
      where key = 'net_hostnames' or key = 'net_ip_addresses'"

The sample output shows that the statement ran on the SAP HANA primary.

  HOST,KEY,VALUE
"cl-h4s-1","net_hostnames","cl-h4s-1"
"cl-h4s-1","net_ip_addresses","10.40.10.162,10.40.10.201"
2 rows selected (overall time 5239 usec; server time 361 usec)

Run a connection test by using an explicit connection to the SAP HANA primary and the result_lag SQL hint.

sudo -i -u ${sid}adm -- \
    hdbsql -n $VIP_PRIMARY -i $INSTNO -d SYSTEMDB -u SYSTEM \
      "select * from m_host_information \
      where key = 'net_hostnames' or key = 'net_ip_addresses' \
      with hint(result_lag('hana_sr'))"

The sample output shows that the statement ran on the SAP HANA secondary.

HOST,KEY,VALUE
"cl-h4s-2","net_hostnames","cl-h4s-2"
"cl-h4s-2","net_ip_addresses","10.40.10.132,10.40.10.211"
2 rows selected (overall time 40.722 msec; server time 16.428 msec)

Enabling the automated registration of the secondary instance

You need to set the parameter AUTOMATED_REGISTER according to your operational requirements. If you want to keep the ability to revert to the state of the previous primary SAP HANA instance, then AUTOMATED_REGISTER=false avoids an automatic registration of the previous primary as a new secondary.

If you experience an issue with the data after a takeover that was triggered by the cluster, you can manually revert if AUTOMATED_REGISTER is set to false.

If AUTOMATED_REGISTER is set to true, the previous primary SAP HANA instance automatically registers as secondary, and cannot be activated on its previous history. The advantage of the setting AUTOMATED_REGISTER=true is that high-availability automatically reestablishes after the failed node reappears in the cluster.

For now, it is recommended to keep AUTOMATED_REGISTER on default value false until the cluster is fully tested and that you verify that the failover scenarios work as expected.

The pcs resource update command is used to modify resource attributes and pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true sets the attribute to true.

Testing SAP HANA System Replication cluster

It is important to thoroughly test the cluster configuration to make sure that the cluster is working correctly. The following information provides a few sample failover test scenarios, but is not a complete list of test scenarios.

For example, the description of each test case includes the following information.

Component that is being tested
Description of the test
Prerequisites and the cluster state before you start the failover test
Test procedure
Expected behavior and results
Recovery procedure

Test1 - Testing failure of the primary database instance

Use the following information to test the failure of the primary database instance.

Test1 - Description

Simulate a crash of the primary SAP HANA database instance that is running on NODE1.

Test1 - Prerequisites

A functional two-node RHEL HA Add-On cluster for SAP HANA system replication.
Both cluster nodes are active.
Cluster that is started on NODE1 and NODE2.
Cluster Resource SAPHana_${SID}_${INSTNO} that is configured with AUTOMATED_REGISTER=false.
Check SAP HANA System Replication status:
- Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- SAP HANA System Replication is activated and in sync

Test1 - Test procedure

Crash SAP HANA primary by sending a SIGKILL signal as the user ${sid}adm.

On NODE1, run the following command.

sudo -i -u ${sid}adm -- HDB kill-9

Test1 - Expected behavior

SAP HANA primary instance on NODE1 crashes.
The cluster detects the stopped primary SAP HANA database and marks the resource as failed.
The cluster promotes the secondary SAP HANA database on NODE2 to take over as the new primary.
The cluster releases the virtual IP address on NODE1, and acquires it on the new primary on NODE2.
After the takeover, the secondary SAP HANA instance is unavailable and the secondary virtual IP address stays on NODE2.
If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.

Test1 - Recovery procedure

As the cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=false, the cluster doesn't restart the failed SAP HANA database, and doesn't register it against the new primary. Which means that the status on the new primary (NODE2) also shows the secondary in status 'CONNECTION TIMEOUT'.

To reregister the previous primary as a new secondary use the following commands.

On NODE1, run the following command.

sudo -i -u ${sid}adm -- \
    hdbnsutil -sr_register \
      --name=${DC1} \
      --remoteHost=${NODE2} \
      --remoteInstance=00 \
      --replicationMode=sync \
      --operationMode=logreplay_readaccess \
      --online

Verify the system replication status:

sudo -i -u ${sid}adm -- \
    hdbnsutil -sr_state

On a cluster node, run the following command to refresh the cluster resource. This command starts the secondary instance.

pcs resource refresh SAPHana_${SID}_${INSTNO}

When the secondary reaches the synced state (SOK), the secondary virtual IP address moves to NODE1.

On a cluster node, run the following command to check the cluster status.

pcs status --full

Test2 - Testing failure of the node that is running the primary database

Use the following information to test the failure of the node that is running the primary database.

Test2 - Description

Simulate a crash of the node that is running the primary SAP HANA database.

Test2 - Preparation

Make sure that the Cluster Resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.

On NODE1, run the following command.

pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true

Verify the AUTOMATED_REGISTER setting in the resource configuration.

pcs resource config SAPHana_${SID}_${INSTNO} | grep Attributes

Test2 - Prerequisites

A functional two-node RHEL HA Add-On cluster for SAP HANA system replication.
Both nodes are active.
Cluster is started on NODE1 and NODE2.
Check SAP HANA System Replication status.
- Primary SAP HANA database is running on NODE2
- Secondary SAP HANA database is running on NODE1
- SAP HANA System Replication is activated and in sync
- Secondary virtual IP address is active on NODE1

Test2 - Test procedure

Crash primary on NODE2 by sending a crash system request.

On NODE2, run the following command.

sync; echo c > /proc/sysrq-trigger

Test2 - Expected behavior

NODE2 shuts down.
The cluster detects the failed node and sets its state to OFFLINE.
The cluster promotes the secondary SAP HANA database on NODE1 to take over as the new primary.
The cluster acquires the virtual IP address on NODE1 on the new primary.
After the takeover, the secondary SAP HANA instance is unavailable and the secondary virtual IP address stays on NODE1.
If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.

Test2 - Recovery procedure

Log in to the IBM Cloud® Console and start the NODE2 instance. Wait until NODE2 is available again, then restart the cluster framework.

On NODE2, run the following command.

pcs cluster start

pcs status --full

As a cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true, SAP HANA restarts when NODE2 rejoins the cluster and the former primary reregisters as a secondary. When the secondary reaches the synced state (SOK), the secondary virtual IP address moves to NODE2.

Test3 - Testing the failure of the secondary database instance

Use the following information to test the failure of the secondary database instance.

Test3 - Description

Simulate a crash of the secondary SAP HANA database.

Test3 - Prerequisites

A functional two-node RHEL HA Add-On cluster for SAP HANA system replication.
Both nodes are active.
Cluster is started on NODE1 and NODE2.
Cluster Resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
Check SAP HANA System Replication status:
- Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- SAP HANA System Replication is activated and in sync
- Secondary virtual IP address is active on NODE2

Test3 - Test procedure

Crash SAP HANA secondary by sending a SIGKILL signal as the user ${sid}adm.

On NODE2, run the following command.

sudo -i -u ${sid}adm -- HDB kill-9

Test3 - Expected behavior

SAP HANA secondary on NODE2 crashes.
The cluster detects the stopped secondary SAP HANA database and marks the resource as failed.
The cluster moves the secondary virtual IP address to NODE1.
The cluster restarts the secondary SAP HANA database.
The cluster detects that the system replication is in sync again.
The cluster moves the secondary virtual IP address back to NODE2.

Test3 - Recovery procedure

Wait until the secondary SAP HANA instance starts and syncs again (SOK), then cleanup the failed resource actions as shown in pcs status.

On a cluster node, run the following commands.

pcs resource refresh SAPHana_${SID}_${INSTNO}

pcs status --full

Test4 - Testing the manual move of a SAPHana resource to another node

Use the following information to test the manual move of a SAPHana resource to another node.

Test4 - Description

Use cluster commands to move the primary instance to the other node for maintenance purposes.

Test4 - Prerequisites

A functional two-node RHEL HA Add-On cluster for SAP HANA system replication.
Both nodes are active.
Cluster is started on NODE1 and NODE2.
Cluster Resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
Check SAP HANA System Replication status:
- Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- SAP HANA System Replication is activated and in sync
- Secondary virtual IP address is active on NODE2

Test4 - Test procedure

Move SAP HANA primary to other node by using the pcs resource move command.

On a cluster node, run the following command.

pcs resource move SAPHana_${SID}_${INSTNO}-clone

Sample output:

# pcs resource move SAPHana_H4S_00-clone
Warning: Creating location constraint 'cli-ban-SAPHana_H4S_00-clone-on-cl-hdb-1' with a score of -INFINITY for resource SAPHana_H4S_00-clone on cl-hdb-1.
        This will prevent SAPHana_H4S_00-clone from running on cl-hdb-1 until the constraint is removed
        This will be the case even if cl-hdb-1 is the last node in the cluster

Test4 - Expected behavior

The cluster creates location constraints to move the resource.
The cluster triggers a takeover to the secondary SAP HANA database.
The secondary virtual IP address stays on NODE2.
If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.

Test4 - Recovery procedure

The automatically created location constraints must be removed to allow automatic failover in the future.

Wait until the primary SAP HANA instance is active and remove the constraints.

On a cluster node, run the following command.

pcs constraint

pcs resource clear SAPHana_${SID}_${INSTNO}-clone

pcs constraint

The cluster registers and starts the SAP HANA database as a new secondary instance. After system replication status is in sync again (SOK), the cluster moves the secondary virtual IP address to NODE1.

pcs status --full