Configuring SAP HANA scale-up system replication in a Red Hat Enterprise Linux High Availability Add-On cluster
The following information describes the configuration of a Red Hat Enterprise Linux (RHEL) High Availability Add-On cluster for managing SAP HANA Scale-Up System Replication. The cluster uses virtual server instances in IBM® Power® Virtual Server as cluster nodes.
The instructions describe how to automate SAP HANA Scale-Up System Replication for a single database deployment in a performance-optimized scenario on a RHEL HA Add-on cluster.
This information is intended for architects and specialists that are planning a high-availability deployment of SAP HANA on Power Virtual Server.
Before you begin
Review the general requirements, product documentation, support articles, and SAP notes listed in Implementing high availability for SAP applications on IBM Power Virtual Server References.
Prerequisites
- A Red Hat High Availability cluster is deployed on two virtual server instances in Power Virtual Server.
              - Install and set up the RHEL HA Add-On cluster according to Implementing a Red Hat Enterprise Linux High Availability Add-On cluster.
- Configure and verify fencing as described in the preceding document.
 
- The virtual server instances need to fulfill hardware and resource requirements for the SAP HANA systems in scope. Follow the guidelines in the Planning your deployment document.
- The hostnames of the virtual server instances must meet the SAP HANA requirement.
- SAP HANA is installed on both virtual server instances and SAP HANA System Replication is configured. The installation of SAP HANA and setup of HANA System Replication is not specific to the Power Virtual Server environment, and you need to follow the standard procedures.
- A valid RHEL for SAP Applications or RHEL for SAP Solutions subscription is required to enable the repositories that you need to install SAP HANA and the resource agents for HA configurations.
Configuring SAP HANA System Replication in a RHEL HA Add-On cluster on IBM Power Virtual Server
The instructions are based on the Red Hat product documentation and articles that are listed in Implementing high availability for SAP applications on IBM Power Virtual Server References.
Preparing environment variables
To simplify the setup, prepare the following environment variables for root on both nodes. These environment variables are used with later operating system commands in this information.
On both nodes, set the following environment variables.
# General settings
export SID=<SID>            # SAP HANA System ID (uppercase)
export sid=<sid>            # SAP HANA System ID (lowercase)
export INSTNO=<INSTNO>      # SAP HANA instance number
# Cluster node 1
export NODE1=<HOSTNAME_1>   # Virtual server instance hostname
export DC1="Site1"          # HANA System Replication site name
# Cluster node 2
export NODE2=<HOSTNAME_2>   # Virtual server instance hostname
export DC2="Site2"          # HANA System Replication site name
# Single zone
export VIP=<IP address>     # SAP HANA System Replication cluster virtual IP address
# Multizone region
export CLOUD_REGION=<CLOUD_REGION>       # Multizone region name
export APIKEY="APIKEY or path to file"   # API Key of the IBM Cloud IAM ServiceID for the resource agent
export API_TYPE="private or public"      # Use private or public API endpoints
export IBMCLOUD_CRN_1=<IBMCLOUD_CRN_1>   # Workspace 1 CRN
export IBMCLOUD_CRN_2=<IBMCLOUD_CRN_2>   # Workspace 2 CRN
export POWERVSI_1=<POWERVSI_1>           # Virtual server instance 1 id
export POWERVSI_2=<POWERVSI_2>           # Virtual server instance 2 id
export SUBNET_NAME="vip-${sid}-net"      # Name which is used to define the subnet in IBM Cloud
export CIDR="CIDR of subnet"             # CIDR of the subnet containing the service IP address
export VIP="Service IP address"          # IP address in the subnet
export JUMBO="true or false"             # Enable Jumbo frames
Setting extra environment variables for a single zone implementation
Review the information in Reserving virtual IP addresses and reserve a virtual IP address for the SAP HANA System Replication cluster. Set the VIPenvironment
                variable to the reserved IP address.
Setting extra environment variables for a multizone region implementation
Set the CLOUD_REGION, APIKEY, IBMCLOUD_CRN_?, POWERVSI_? variables as described in Collecting parameters for configuring a high availability cluster                section. Set the API_TYPE variable to private to communicate with the IBM Cloud IAM and IBM Power Cloud API via private endpoints.
The SUBNET_NAME variable contains the name of the subnet. The CIDR variable represents the Classless Inter-Domain Routing (CIDR) notation for the subnet in the format <IPv4_address>/number.
                The VIP variable is the IP address of the virtual IP address resource and must belong to the CIDR of the subnet. Set the JUMBO variable to true if you want to enable the subnet for a
                large MTU size.
The following is an example of how to set the extra environment variables that are required for a multizone region implementation.
export CLOUD_REGION="eu-de"
export IBMCLOUD_CRN_1="crn:v1:bluemix:public:power-iaas:eu-de-2:a/a1b2c3d4e5f60123456789a1b2c3d4e5:a1b2c3d4-0123-4567-89ab-a1b2c3d4e5f6::"
export IBMCLOUD_CRN_2="crn:v1:bluemix:public:power-iaas:eu-de-1:a/a1b2c3d4e5f60123456789a1b2c3d4e5:e5f6a1b2-cdef-0123-4567-a1b2c3d4e5f6::"
export POWERVSI_1="a1b2c3d4-0123-890a-f012-0123456789ab"
export POWERVSI_2="e5f6a1b2-4567-bcde-3456-cdef01234567"
export APIKEY="@/root/.apikey.json"
export API_TYPE="private"
export SUBNET_NAME="vip-mha-net"
export CIDR="10.40.11.100/30"
export VIP="10.40.11.102"
export JUMBO="true"
Installing SAP HANA resource agents
Run the following command to install the RHEL HA Add-On resource agents for SAP HANA System Replication.
dnf install -y resource-agents-sap-hana
Starting the SAP HANA system
Start SAP HANA and verify that HANA System Replication is active. For more information, see 2.4. Checking SAP HANA System Replication state.
On both nodes, run the following commands.
sudo -i -u ${sid}adm -- HDB start
sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_state
    HDBSettings.sh systemReplicationStatus.py
EOT
Enabling the SAP HANA srConnectionChanged() hook
Recent versions of SAP HANA provide hooks so SAP HANA can send out notifications for certain events. For more information, see Implementing a HA/DR Provider.
The srConnectionChanged() hook improves the ability of the cluster to detect a status change of HANA System Replication that requires an action from the cluster. The goal is to prevent data loss and corruption by preventing accidental takeovers.
Activating the srConnectionChanged() hook on all SAP HANA instances
- 
                  Stop the cluster. On NODE1, run the following command. pcs cluster stop --all
- 
                  Install the hook script that is provided by the resource-agents-sap-hana package in the /hana/shared/myHooksdirectory for each HANA instance, and set the required ownership.On both nodes, run the following commands. mkdir -p /hana/shared/myHookscp /usr/share/SAPHanaSR/srHook/SAPHanaSR.py /hana/shared/myHookschown -R ${sid}adm:sapsys /hana/shared/myHooks
- 
                  Update the global.inifile on each HANA node to enable the hook script.On both nodes, run the following command. sudo -i -u ${sid}adm -- <<EOT python \$DIR_INSTANCE/exe/python_support/setParameter.py \ -set SYSTEM/global.ini/ha_dr_provider_SAPHanaSR/provider=SAPHanaSR \ -set SYSTEM/global.ini/ha_dr_provider_SAPHanaSR/path=/hana/shared/myHooks \ -set SYSTEM/global.ini/ha_dr_provider_SAPHanaSR/execution_order=1 \ -set SYSTEM/global.ini/trace/ha_dr_saphanasr=info EOT
- 
                  Verify the changed file. On both nodes, run the following command. cat /hana/shared/${SID}/global/hdb/custom/config/global.ini
- 
                  Create sudo settings for SAP HANA OS user. You need the following sudo settings to allow the ${sid}admuser script can update the node attributes when the srConnectionChanged() hook runs.On both nodes, run the following commands. Create a file with the required sudo aliases and user specifications. cat >> /etc/sudoers.d/20-saphana << EOT Cmnd_Alias DC1_SOK = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC1} -v SOK -t crm_config -s SAPHanaSR Cmnd_Alias DC1_SFAIL = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC1} -v SFAIL -t crm_config -s SAPHanaSR Cmnd_Alias DC2_SOK = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC2} -v SOK -t crm_config -s SAPHanaSR Cmnd_Alias DC2_SFAIL = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC2} -v SFAIL -t crm_config -s SAPHanaSR ${sid}adm ALL=(ALL) NOPASSWD: DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL Defaults!DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL !requiretty EOTAdjust the permissions and check for syntax errors. chown root:root /etc/sudoers.d/20-saphanachmod 0440 /etc/sudoers.d/20-saphanacat /etc/sudoers.d/20-saphanavisudo -c
Any problems that are reported by the visudo -c command must be corrected.
- 
                  Verify that the hook functions. - Restart both HANA instances and verify that the hook script works as expected.
- Perform an action to trigger the hook, such as stopping a HANA instance.
- Check whether the hook logged anything in the trace files.
 On both nodes, run the following commands. Stop the HANA instance. sudo -i -u ${sid}adm -- HDB stopStart the HANA instance. sudo -i -u ${sid}adm -- HDB startCheck that the hook logged some messages to the trace files. sudo -i -u ${sid}adm -- sh -c 'grep "ha_dr_SAPHanaSR.*crm_attribute" $DIR_INSTANCE/$VTHOSTNAME/trace/nameserver_* | cut -d" " -f2,3,5,17'After you verify that the hooks function, you can restart the HA cluster. 
- 
                  Start the cluster. On NODE1, run the following commands. Start the cluster. pcs cluster start --allCheck the status of the cluster. pcs status --full
Configuring general cluster properties
To avoid resource failover during initial testing and post-production, set the following default values for the resource-stickiness and migration-threshold parameters.
Keep in mind that defaults don't apply to resources that override them with their own defined values.
On NODE1, run the following commands.
pcs resource defaults update resource-stickiness=1000
pcs resource defaults update migration-threshold=5000
Creating a cloned SAPHanaTopology resource
The SAPHanaTopology resource gathers the status and configuration of SAP HANA System Replication on each node. It also starts and monitors the local SAP HostAgent, which is required for starting, stopping, and monitoring SAP HANA instances.
On NODE1, run the following commands.
Create the SAPHanaTopology resource.
pcs resource create SAPHanaTopology_${SID}_${INSTNO} SAPHanaTopology \
    SID=${SID} InstanceNumber=${INSTNO} \
    op start timeout=600 \
    op stop timeout=300 \
    op monitor interval=10 timeout=600 \
    clone clone-max=2 clone-node-max=1 interleave=true
Check the configuration and the cluster status by running the following commands.
pcs resource config SAPHanaTopology_${SID}_${INSTNO}
pcs resource config SAPHanaTopology_${SID}_${INSTNO}-clone
pcs status --full
Creating a promotable SAPHana resource
The SAPHana resource manages two SAP HANA instances that are configured as HANA System Replication nodes.
On NODE1, create the SAPHana resource by running the following command.
pcs resource create SAPHana_${SID}_${INSTNO} SAPHana \
    SID=${SID} InstanceNumber=${INSTNO} \
    PREFER_SITE_TAKEOVER=true \
    DUPLICATE_PRIMARY_TIMEOUT=7200 \
    AUTOMATED_REGISTER=false \
    op start timeout=3600 \
    op stop timeout=3600 \
    op monitor interval=61 role="Unpromoted" timeout=700 \
    op monitor interval=59 role="Promoted" timeout=700 \
    op promote timeout=3600 \
    op demote timeout=3600 \
    promotable notify=true clone-max=2 clone-node-max=1 interleave=true
Check the configuration and the cluster status.
pcs resource config SAPHana_${SID}_${INSTNO}
pcs status --full
Creating a virtual IP address cluster resource
Depending on the scenario, proceed to one of the following sections:
- Creating a virtual IP address cluster resource in a single zone environment if the cluster nodes are running in a single Power Virtual Server workspace
- Creating a virtual IP address cluster resource in a multizone region environment if the cluster nodes are running in separate Power Virtual Server workspaces
Creating a virtual IP address cluster resource in a single zone environment
Use the reserved IP address to create a virtual IP address cluster resource. This virtual IP address is used to reach the SAP HANA System Replication primary instance.
Create the virtual IP address cluster resource with the following command.
pcs resource create vip_${SID}_${INSTNO} IPaddr2 ip=$VIP
Check the configured virtual IP address cluster resource and the cluster status.
pcs resource config vip_${SID}_${INSTNO}
pcs status --full
Proceed to the Creating cluster resource constraints section.
Creating a virtual IP address cluster resource in a multizone region environment
Verify that you have completed all the steps in the Preparing a multi-zone RHEL HA Add-On cluster for a virtual IP address resource section.
Run the pcs resource describe powervs-subnet command to get information about the resource agent parameters.
On NODE1, create a powervs-subnet cluster resource by running the following command.
pcs resource create vip_${SID}_${INSTNO} powervs-subnet \
    api_key=${APIKEY} \
    api_type=${API_TYPE} \
    cidr=${CIDR} \
    ip=${VIP} \
    crn_host_map="${NODE1}:${IBMCLOUD_CRN_1};${NODE2}:${IBMCLOUD_CRN_2}" \
    vsi_host_map="${NODE1}:${POWERVSI_1};${NODE2}:${POWERVSI_2}" \
    jumbo=${JUMBO} \
    region=${CLOUD_REGION} \
    subnet_name=${SUBNET_NAME} \
    op start timeout=720 \
    op stop timeout=300 \
    op monitor interval=60 timeout=30
If you set API_TYPE to public, you must also specify a proxy parameter.
Ensure that both virtual server instances in the cluster have the status Active and the health status OK before running the pcs resource config command.
Check the configured virtual IP address resource and the cluster status.
pcs resource config vip_${SID}_${INSTNO}
Sample output:
# pcs resource config vip_MHA_00
Resource: vip_MHA_00 (class=ocf provider=heartbeat type=powervs-subnet)
  Attributes: vip_MHA_00-instance_attributes
    api_key=@/root/.apikey.json
    api_type=private
    cidr=10.40.11.100/30
    crn_host_map=cl-mha-1:crn:v1:bluemix:public:power-iaas:eu-de-2:**********************************:************************************::;cl-mha-2:crn:v1:bluemix:public:power-iaas:eu-
        de-1:**********************************:************************************::
    ip=10.40.11.102
    jumbo=true
    proxy=http://10.30.40.4:3128
    region=eu-de
    subnet_name=vip-mha-net
    vsi_host_map=cl-mha-1:************************************;cl-mha-2:************************************
  Operations:
    monitor: res_vip_MHA_00-monitor-interval-60
      interval=60
      timeout=60
    start: res_vip_MHA_00-start-interval-0s
      interval=0s
      timeout=720
    stop: res_vip_MHA_00-stop-interval-0s
      interval=0s
      timeout=300
pcs status --full
The following example is a sample output of an SAP HANA System Replication cluster in a multizone region setup.
# pcs status --full
Cluster name: SAP_MHA
Status of pacemakerd: 'Pacemaker is running' (last updated 2024-07-31 11:37:49 +02:00)
Cluster Summary:
  * Stack: corosync
  * Current DC: cl-mha-2 (2) (version 2.1.5-9.el9_2.4-a3f44794f94) - partition with quorum
  * Last updated: Wed Jul 31 11:37:50 2024
  * Last change:  Wed Jul 31 11:37:31 2024 by root via crm_attribute on cl-mha-1
  * 2 nodes configured
  * 7 resource instances configured
Node List:
  * Node cl-mha-1 (1): online, feature set 3.16.2
  * Node cl-mha-2 (2): online, feature set 3.16.2
Full List of Resources:
  * fence_node1	(stonith:fence_ibm_powervs):	 Started cl-mha-1
  * fence_node2	(stonith:fence_ibm_powervs):	 Started cl-mha-2
  * Clone Set: SAPHanaTopology_MHA_00-clone [SAPHanaTopology_MHA_00]:
    * SAPHanaTopology_MHA_00	(ocf:heartbeat:SAPHanaTopology):	 Started cl-mha-2
    * SAPHanaTopology_MHA_00	(ocf:heartbeat:SAPHanaTopology):	 Started cl-mha-1
  * Clone Set: SAPHana_MHA_00-clone [SAPHana_MHA_00] (promotable):
    * SAPHana_MHA_00	(ocf:heartbeat:SAPHana):	 Unpromoted cl-mha-2
    * SAPHana_MHA_00	(ocf:heartbeat:SAPHana):	 Promoted cl-mha-1
  * vip_MHA_00	(ocf:heartbeat:powervs-subnet):	 Started cl-mha-1
Node Attributes:
  * Node: cl-mha-1 (1):
    * hana_mha_clone_state            	: PROMOTED
    * hana_mha_op_mode                	: logreplay
    * hana_mha_remoteHost             	: cl-mha-2
    * hana_mha_roles                  	: 4:P:master1:master:worker:master
    * hana_mha_site                   	: SiteA
    * hana_mha_sra                    	: -
    * hana_mha_srah                   	: -
    * hana_mha_srmode                 	: syncmem
    * hana_mha_sync_state             	: PRIM
    * hana_mha_version                	: 2.00.075.00
    * hana_mha_vhost                  	: cl-mha-1
    * lpa_mha_lpt                     	: 1722418651
    * master-SAPHana_MHA_00           	: 150
  * Node: cl-mha-2 (2):
    * hana_mha_clone_state            	: DEMOTED
    * hana_mha_op_mode                	: logreplay
    * hana_mha_remoteHost             	: cl-mha-1
    * hana_mha_roles                  	: 4:S:master1:master:worker:master
    * hana_mha_site                   	: SiteB
    * hana_mha_sra                    	: -
    * hana_mha_srah                   	: -
    * hana_mha_srmode                 	: syncmem
    * hana_mha_sync_state             	: SOK
    * hana_mha_version                	: 2.00.075.00
    * hana_mha_vhost                  	: cl-mha-2
    * lpa_mha_lpt                     	: 30
    * master-SAPHana_MHA_00           	: 100
Migration Summary:
Tickets:
PCSD Status:
  cl-mha-1: Online
  cl-mha-2: Online
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
Proceed to the Creating cluster resource constraints section.
Creating cluster resource constraints
Make sure that SAPHanaTopology resources are started before you start the SAPHana resources.
The virtual IP address must be present on the node where the primary resource of "SAPHana" is running.
- 
                Create a resource constraint to start "SAPHanaTopology" before "SAPHana". This constraint mandates the start order of these resources. On NODE1, use the following command to create the SAPHanaTopology order constraint: pcs constraint order SAPHanaTopology_${SID}_${INSTNO}-clone \ then SAPHana_${SID}_${INSTNO}-clone symmetrical=falseCheck the configuration. pcs constraint
- 
                Create a resource constraint to colocate the virtual IP address with primary. This constraint colocates the virtual IP address resource with the SAPHana resource that was promoted as primary. On NODE1, run the following command to create the virtual IP address colocation constraint. pcs constraint colocation add vip_${SID}_${INSTNO} \ with Promoted SAPHana_${SID}_${INSTNO}-clone 2000Check the configuration and the cluster status. pcs constraintSample output: # pcs constraint Location Constraints: Ordering Constraints: start SAPHanaTopology_HDB_00-clone then start SAPHana_HDB_00-clone (kind:Mandatory) (non-symmetrical) Colocation Constraints: vip_HDB_00 with SAPHana_HDB_00-clone (score:2000) (rsc-role:Started) (with-rsc-role:Promoted) Ticket Constraints:Verify the cluster status. pcs status --fullOn the promoted cluster node, verify that the cluster service IP address is active. ip addr show
Enabling the SAP HANA srServiceStateChanged() hook (optional)
SAP HANA has built-in functions to monitor its indexserver. In case of a problem, SAP HANA tries to recover automatically by stopping and restarting the process. To stop the process or clean up after a crash, the Linux kernel
              must release all memory that is allocated by the process. For large databases, this cleanup can take a long time. During this time, SAP HANA continues to operate and accept new client requests. As a result, SAP HANA system replication can
              become out of sync. If another error occurs in the SAP HANA instance before the restart and recovery of the indexserver is complete, data consistency is at risk.
The ChkSrv.py script for the srServiceStateChanged() hook reacts to such a situation and can stop the entire SAP HANA instance for faster recovery. If automated failover is enabled in the cluster, and
              the secondary node is in a healthy state, a takeover operation is started. Otherwise, recovery must continue locally, but is accelerated by the forced restart of the SAP HANA instance.
The SAP HANA srServiceStateChanged() hook is available with resource-agents-sap-hana version 0.162.3 and later.
The hook script analyzes the events in the instance, applies filters to the event details, and triggers actions based on the results. It distinguishes between an SAP HANA indexserver process that is stopped and restarted by SAP
              HANA and the process that is stopped during an instance shutdown.
Depending on the configuration of the action_on_lost parameter, the hook takes different actions:
- Ignore
- This action simply logs the events and decision information to a log file.
- Stop
- This action triggers a graceful stop of the SAP HANA instance by using the sapcontrol command.
- Kill
- This action triggers the HDB kill-<signal>command with a default signal 9. The signal can be configured.
Both the stop and the kill actions result in a stopped SAP HANA instance, the kill action is slightly faster.
Activating the srServiceStateChanged() hook on all SAP HANA instances
The srServiceStateChanged() hook can be added while SAP HANA is running on both nodes.
- 
                  For each HANA instance, install the hook script that is provided by the resource-agents-sap-hana package in the /hana/shared/myHooksdirectory and set the required ownership.On both nodes, run the following commands. cp /usr/share/SAPHanaSR/srHook/ChkSrv.py /hana/shared/myHookschown ${sid}adm:sapsys /hana/shared/myHooks/ChkSrv.py
- 
                  Update the global.inifile on each SAP HANA node to enable the hook script.On both nodes, run the following command. sudo -i -u ${sid}adm -- <<EOT python \$DIR_INSTANCE/exe/python_support/setParameter.py \ -set SYSTEM/global.ini/ha_dr_provider_ChkSrv/provider=ChkSrv \ -set SYSTEM/global.ini/ha_dr_provider_ChkSrv/path=/hana/shared/myHooks \ -set SYSTEM/global.ini/ha_dr_provider_ChkSrv/execution_order=2 \ -set SYSTEM/global.ini/ha_dr_provider_ChkSrv/action_on_lost=stop \ -set SYSTEM/global.ini/trace/ha_dr_chksrv=info EOTThe action_on_lostparameter in this example is set tostop, the default setting isignore. You can optionally set the parametersstop_timeout(default:20seconds) andkill_signal(default:9).
- 
                  Activate the ChkSrv.pyhookOn both nodes, run the following command to reload the HA-DR providers. sudo -i -u ${sid}adm -- hdbnsutil -reloadHADRProviders
- 
                  Check that the hook logged some messages to the trace files. On both nodes, run the following command. sudo -i -u ${sid}adm -- sh -c 'grep "ha_dr_ChkSrv" $DIR_INSTANCE/$VTHOSTNAME/trace/nameserver_* | cut -d" " -f2,3,6-'
Enabling automated registration of secondary instance
You need to set the parameter AUTOMATED_REGISTER according to your operational requirements. If you want to keep the ability to revert to the state of the previous primary SAP HANA instance, then AUTOMATED_REGISTER=false              avoids an automatic registration of the previous primary as a new secondary.
If you experience an issue with the data after a takeover that was triggered by the cluster, you can manually revert if AUTOMATED_REGISTER is set to false.
If AUTOMATED_REGISTER is set to true, the previous primary SAP HANA instance automatically registers as secondary, and cannot be activated on its previous history. The advantage of AUTOMATED_REGISTER=true              is that high-availability capability is automatically reestablished after the failed node reappears in the cluster.
For now, it is recommended to keep AUTOMATED_REGISTER on default value false until the cluster is fully tested and that you verify that the failover scenarios work as expected.
The pcs resource update command is used to modify resource attributes and pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true sets the attribute to true.
Testing SAP HANA System Replication cluster
It is vital to thoroughly test the cluster configuration to make sure that the cluster is working correctly. The following information provides a few sample failover test scenarios, but is not a complete list of test scenarios.
For example, the description of each test case includes the following information.
- Component that is being tested
- Description of the test
- Prerequisites and the cluster state before you start the failover test
- Test procedure
- Expected behavior and results
- Recovery procedure
Test 1 - Testing a failure of the primary database instance
Use the following information to test the failure of the primary database instance.
Test 1 - Description
Simulate a crash of the primary HANA database instance that is running on NODE1.
Test 1 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both cluster nodes are active.
- Cluster that is started on NODE1 and NODE2.
- Cluster Resource SAPHana_${SID}_${INSTNO}that is configured withAUTOMATED_REGISTER=false.
- Check SAP HANA System Replication status:
                  - Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- HANA System Replication is activated and in sync
 
Test 1 - Test procedure
Crash SAP HANA primary by sending a SIGKILL signal as the user ${sid}adm.
On NODE1, run the following command.
sudo -i -u ${sid}adm -- HDB kill-9
Test 1 - Expected behavior
- SAP HANA primary instance on NODE1 crashes.
- The cluster detects the stopped primary HANA database and marks the resource as failed.
- The cluster promotes the secondary HANA database on NODE2 to take over as the new primary.
- The cluster releases the virtual IP address on NODE1, and acquires it on the new primary on NODE2.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
Test 1 - Recovery procedure
As the cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=false, the cluster doesn't restart the failed HANA database, and doesn't register it against the new primary. Which means that
                the status on the new primary (NODE2) also shows the secondary in status 'CONNECTION TIMEOUT'.
To reregister the previous primary as a new secondary use the following commands.
On NODE1, run the following command.
sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_register \
      --name=${DC1} \
      --remoteHost=${NODE2} \
      --remoteInstance=00 \
      --replicationMode=sync \
      --operationMode=logreplay \
      --online
EOT
Verify the system replication status:
sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_state
    HDBSettings.sh systemReplicationStatus.py
EOT
After the manual register and resource refreshes, the new secondary instance restarts and shows up in status synced (SOK).
On NODE1, run the following command.
pcs resource refresh SAPHana_${SID}_${INSTNO}
pcs status --full
Test 2 - Testing a failure of the node that is running the primary database
Use the following information to test the failure of the node that is running the primary database.
Test 2 - Description
Simulate a crash of the node that is running the primary HANA database.
Test 2 - Preparation
Make sure that the cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
On NODE1, run the following command.
pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true
pcs resource config SAPHana_${SID}_${INSTNO}
Test 2 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both nodes are active.
- Cluster is started on NODE1 and NODE2.
- Check SAP HANA System Replication status.
                  - Primary SAP HANA database is running on NODE2
- Secondary SAP HANA database is running on NODE1
- HANA System Replication is activated and in sync
 
Test 2 - Test procedure
Crash primary on NODE2 by sending a crash system request.
On NODE2, run the following command.
sync; echo c > /proc/sysrq-trigger
Test 2 - Expected behavior
- NODE2 shuts down.
- The cluster detects the failed node and sets its state to OFFLINE.
- The cluster promotes the secondary HANA database on NODE1 to take over as the new primary.
- The cluster acquires the virtual IP address on NODE1 on the new primary.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
Test 2 - Recovery procedure
Log in to the IBM Cloud® Console and start the NODE2 instance. Wait until NODE2 is available again, then restart the cluster framework.
On NODE2, run the following command.
pcs cluster start
pcs status --full
As the cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true, SAP HANA restarts when NODE2 rejoins the cluster and the former primary reregisters as a secondary.
Test 3 - Testing a failure of the secondary database instance
Use the following information to test the failure of the secondary database instance.
Test 3 - Description
Simulate a crash of the secondary HANA database.
Test 3 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both nodes are active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource SAPHana_${SID}_${INSTNO}is configured withAUTOMATED_REGISTER=true.
- Check SAP HANA System Replication status:
                  - Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- HANA System Replication is activated and in sync
 
Test 3 - Test Procedure
Crash SAP HANA secondary by sending a SIGKILL signal as the user ${sid}adm.
On NODE2, run the following command.
sudo -i -u ${sid}adm -- HDB kill-9
Test 3 - Expected behavior
- SAP HANA secondary on NODE2 crashes.
- The cluster detects the stopped secondary HANA database and marks the resource as failed.
- The cluster restarts the secondary HANA database.
- The cluster detects that the system replication is in sync again.
Test 3 - Recovery procedure
Wait until the secondary HANA instance starts and syncs again (SOK), then cleanup the failed resource actions as shown in pcs status.
On NODE2, run the following command.
pcs resource refresh SAPHana_${SID}_${INSTNO}
pcs status --full
Test 4 - Testing a manual move of a SAPHana resource to another node
Use the following information to test the manual move of a SAPHana resource to another node.
Test 4 - Description
Use cluster commands to move the primary instance to the other node for maintenance purposes.
Test 4 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both nodes are active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource SAPHana_${SID}_${INSTNO}is configured withAUTOMATED_REGISTER=true.
- Check SAP HANA System Replication status:
                  - Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- HANA System Replication is activated and in sync
 
Test 4 - Test procedure
Move SAP HANA primary to other node by using the pcs resource move command.
On NODE1, run the following command.
pcs resource move SAPHana_${SID}_${INSTNO}-clone
Test 4 - Expected behavior
- The cluster creates location constraints to move the resource.
- The cluster triggers a takeover to the secondary HANA database.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
Test 4 - Recovery procedure
The automatically created location constraints must be removed to allow automatic failover in the future.
Wait until the primary HANA instance is active and remove the constraints.
The cluster registers and starts the HANA database as a new secondary instance.
On NODE1, run the following command.
pcs constraint
pcs resource clear SAPHana_${SID}_${INSTNO}-clone
pcs constraint
pcs status --full