IBM Cloud Docs
Configuring SAP HANA scale-up system replication in a Red Hat Enterprise Linux High Availability Add-On cluster with the sap-hana-ha resource agent

Configuring SAP HANA scale-up system replication in a Red Hat Enterprise Linux High Availability Add-On cluster with the sap-hana-ha resource agent

The following information describes the configuration of a Red Hat Enterprise Linux (RHEL) High Availability Add-On cluster for managing SAP HANA Scale-Up System Replication. The cluster uses virtual server instances in IBM® Power® Virtual Server as cluster nodes.

The instructions describe how to automate SAP HANA scale-up system replication for a single database deployment in a performance-optimized scenario on a RHEL HA Add-on cluster.

This information is intended for architects and specialists who are planning a high-availability deployment of SAP HANA on Power Virtual Server.

Before you begin

Review the general requirements, product documentation, support articles, and SAP notes listed in Implementing high availability for SAP applications on IBM Power Virtual Server References.

Prerequisites

  • A Red Hat High Availability cluster is deployed on two virtual server instances in Power Virtual Server.
  • The virtual server instances must meet the hardware and resource requirements for the SAP HANA systems in scope. Follow the guidelines in Planning your deployment.
  • The hostnames of the virtual server instances must comply with SAP HANA naming requirements.
  • SAP HANA is installed on both virtual server instances, and SAP HANA system replication is configured. The SAP HANA installation and system replication setup are not specific to the Power Virtual Server environment. Follow the standard procedures.
  • A valid RHEL for SAP Applications or RHEL for SAP Solutions subscription is required to enable the repositories that you need for installing SAP HANA and the high availability resource agents.

Configuring SAP HANA system replication in a RHEL HA Add-On cluster on IBM Power Virtual Server

The instructions are based on the Red Hat product documentation and articles that are listed in Implementing high availability for SAP applications on IBM Power Virtual Server References.

Preparing environment variables

To simplify the setup, define the following environment variables for the root user on both nodes. These variables are used in later operating system commands throughout this procedure.

On both nodes, set the following environment variables.

# General settings
export SID=<SID>            # SAP HANA System ID (uppercase)
export sid=<sid>            # SAP HANA System ID (lowercase)
export INSTNO=<INSTNO>      # SAP HANA instance number

# Cluster node 1
export NODE1=<HOSTNAME_1>   # Virtual server instance hostname
export DC1="Site1"          # HANA system replication site name

# Cluster node 2
export NODE2=<HOSTNAME_2>   # Virtual server instance hostname
export DC2="Site2"          # HANA system replication site name

# Single zone
export VIP=<IP address>     # SAP HANA system replication cluster virtual IP address

# Multizone region
export CLOUD_REGION=<CLOUD_REGION>       # Multizone region name
export APIKEY="APIKEY or path to file"   # API Key of the IBM Cloud IAM ServiceID for the resource agent
export API_TYPE="private or public"      # Use private or public API endpoints
export IBMCLOUD_CRN_1=<IBMCLOUD_CRN_1>   # Workspace 1 CRN
export IBMCLOUD_CRN_2=<IBMCLOUD_CRN_2>   # Workspace 2 CRN
export POWERVSI_1=<POWERVSI_1>           # Virtual server instance 1 id
export POWERVSI_2=<POWERVSI_2>           # Virtual server instance 2 id
export SUBNET_NAME="vip-${sid}-net"      # Name which is used to define the subnet in IBM Cloud
export CIDR="CIDR of subnet"             # CIDR of the subnet containing the service IP address
export VIP="Service IP address"          # IP address in the subnet
export JUMBO="true or false"             # Enable Jumbo frames

Setting extra environment variables for a single zone implementation

Review Reserving virtual IP addresses and reserve a virtual IP address for the SAP HANA system replication cluster. Set the VIP environment variable to the reserved IP address.

Setting extra environment variables for a multizone region implementation

Set the CLOUD_REGION, APIKEY, IBMCLOUD_CRN_?, POWERVSI_? variables as described in the Collecting parameters for configuring a high availability cluster section. Set API_TYPE to private to enable communication with the IBM Cloud IAM and IBM Power Cloud API through private endpoints.

Prepare the variables for the subnet:

  • SUBNET_NAME specifies the name of the subnet.
  • CIDR defines the subnet in Classless Inter-Domain Routing (CIDR) format: <IPv4_address>/number.
  • VIP is the virtual IP address and must fall within the defined CIDR.
  • Set JUMBO to true to enable support for large MTU sizes on the subnet.

The subnet SUBNET_NAME must not exist, and its CIDR range must not overlap with the IP ranges of any existing subnets in either workspace.

The following export commands show how to set the required environment variables for a multizone region implementation.

export CLOUD_REGION="eu-de"
export IBMCLOUD_CRN_1="crn:v1:bluemix:public:power-iaas:eu-de-2:a/a1b2c3d4e5f60123456789a1b2c3d4e5:a1b2c3d4-0123-4567-89ab-a1b2c3d4e5f6::"
export IBMCLOUD_CRN_2="crn:v1:bluemix:public:power-iaas:eu-de-1:a/a1b2c3d4e5f60123456789a1b2c3d4e5:e5f6a1b2-cdef-0123-4567-a1b2c3d4e5f6::"
export POWERVSI_1="a1b2c3d4-0123-890a-f012-0123456789ab"
export POWERVSI_2="e5f6a1b2-4567-bcde-3456-cdef01234567"
export APIKEY="@/root/.apikey.json"
export API_TYPE="private"
export SUBNET_NAME="vip-mh1-net"
export CIDR="10.40.41.100/30"
export VIP="10.40.41.102"
export JUMBO="true"

Installing SAP HANA resource agents

The sap-hana-ha package includes three resource agents.

  • SAPHanaTopology: Collects the status and configuration of SAP HANA system replication on each node. It also starts and monitors the local SAP HostAgent, which is required to start, stop, and monitor the SAP HANA instances.
  • SAPHanaFilesystem (optional): Checks access to an SAP HANA file system. If the SAP HANA primary node experiences a storage failure, this agent fences the node to enable faster failover.
  • SAPHanaController: Manages the SAP HANA instances for both single-host (scale-up) and multiple-host (scale-out) deployments. If the SAP HANA primary system fails, this agent triggers a takeover to the secondary system.

On both nodes, run the following command to install the sap-hana-ha resource agent package.

dnf install -y sap-hana-ha

The setup that is described here requires version 1.2.8-4 or later of the sap-hana-ha package.

Starting the SAP HANA system

Start SAP HANA and verify that system replication is active. For details, see 2.4. Checking SAP HANA System Replication state.

On both nodes, run the following commands to start SAP HANA.

sudo -i -u ${sid}adm -- HDB start
sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_state
    HDBSettings.sh systemReplicationStatus.py
EOT

Configuring the systemd based SAP HANA startup framework

The SAP HANA startup framework is integrated by default with systemd on RHEL 9 for SAP HANA 2.0 SPS07 revision 70 and newer. Apply extra modifications to manage the pacemaker systemd service and SAP HANA instance systemd service in the correct order.

On both nodes, run the following command to check whether SAP HANA instance systemd integration is enabled.

systemctl list-units --all SAP*
# systemctl list-units --all SAP*
UNIT              LOAD      ACTIVE   SUB     DESCRIPTION
SAPMH1_00.service loaded    active   running SAP Instance SAPMH1_00
SAP.slice         loaded    active   active  SAP Slice

If the unit SAP${SID}_${INSTNO}.service appears in the command output, complete the following steps on both nodes.

  1. Create the directory for the pacemaker service drop-in file.

    mkdir /etc/systemd/system/pacemaker.service.d
    
  2. Create the pacemaker service drop-in file.

    cat >> /etc/systemd/system/pacemaker.service.d/HA.conf << EOT
    [Unit]
    Description=Pacemaker needs the SAP HANA instance service
    Wants=SAP${SID}_${INSTNO}.service
    After=SAP${SID}_${INSTNO}.service
    EOT
    
  3. Reload the systemctl daemon.

    systemctl daemon-reload
    

Enabling the SAP HANA hook scripts

SAP HANA provides hooks that send notifications when specific events occur. For more information, see Implementing a HA/DR Provider.

The srConnectionChanged() hook helps to detect SAP HANA system replication status changes that require cluster intervention. Its purpose is to prevent data loss or corruption by avoiding accidental takeovers.

SAP HANA includes built-in functions to monitor its indexserver. If a problem occurs, SAP HANA attempts to recover automatically by stopping and restarting the process. To complete this recovery, the Linux kernel must release all memory that is allocated to the process. For large databases, this cleanup can take a significant amount of time. During this period, SAP HANA continues to operate and accept client requests, which can cause system replication to become out of sync. If another error occurs before the indexserver is fully restarted and recovered, data consistency can be compromised. The srServiceStateChanged() hook addresses this scenario by stopping the entire SAP HANA instance to enable faster and safer recovery.

Configuring the hook HanaSR for all SAP HANA instances

The sap-hana-ha package includes the HanaSR.py hook script, which is located in the /usr/share/sap-hana-ha directory. Configure this hook on all SAP HANA cluster nodes.

Enable and test the SAP HANA srConnectionChanged() hook before you continue with the cluster setup.

  1. Update the global.ini file on each SAP HANA node to enable the HanaSR hook script.

    On both nodes, run the following command.

    sudo -i -u ${sid}adm -- <<EOT
        python \$DIR_INSTANCE/exe/python_support/setParameter.py \
          -set SYSTEM/global.ini/ha_dr_provider_hanasr/provider=HanaSR \
          -set SYSTEM/global.ini/ha_dr_provider_hanasr/path=/usr/share/sap-hana-ha/ \
          -set SYSTEM/global.ini/ha_dr_provider_hanasr/execution_order=1 \
          -set SYSTEM/global.ini/trace/ha_dr_hanasr=info
    EOT
    
  2. Verify the updated global.ini file.

    On both nodes, run the following command.

    cat /hana/shared/${SID}/global/hdb/custom/config/global.ini
    
  3. Configure sudo settings for the SAP HANA operating system administrator sidadm.

    These settings allow the hook script srConnectionChanged() to update node attributes.

    Run the following commands on both nodes to create the required sudo configuration file.

    cat >> /etc/sudoers.d/20-saphana << EOT
    Cmnd_Alias DC1_SOK = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC1} -v SOK -t crm_config -s SAPHanaSR
    Cmnd_Alias DC1_SFAIL = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC1} -v SFAIL -t crm_config -s SAPHanaSR
    Cmnd_Alias DC2_SOK = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC2} -v SOK -t crm_config -s SAPHanaSR
    Cmnd_Alias DC2_SFAIL = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC2} -v SFAIL -t crm_config -s SAPHanaSR
    Cmnd_Alias FENCE_ME = /usr/bin/SAPHanaSR-hookHelper --sid=${SID} --case=fenceMe
    ${sid}adm ALL=(ALL) NOPASSWD: DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL, FENCE_ME
    Defaults!DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL, FENCE_ME !requiretty
    EOT
    

    The Cmnd_Alias FENCE_ME is only necessary when you configure the ChkSrv hook script with the parameter action_on_lost = fence.

    Set the correct ownership and permissions and use visudo to validate its syntax.

    chown root:root /etc/sudoers.d/20-saphana
    
    chmod 0440 /etc/sudoers.d/20-saphana
    
    cat /etc/sudoers.d/20-saphana
    
    visudo -c
    

Correct any issues reported by the visudo -c command.

Configuring the ChkSrv hook for all SAP HANA instances (optional)

The sap-hana-ha package includes the ChkSrv.py hook script, which is located in the /usr/share/sap-hana-ha directory. Configure this hook on all SAP HANA cluster nodes.

You can configure the hook srServiceStateChanged() to accelerate recovery when an indexserver process fails. This configuration is optional.

The ChkSrv.py script stops the entire SAP HANA instance for faster recovery. If automated failover is enabled in the cluster and the secondary node is in a healthy state, the system initiates a takeover operation. If failover is not possible, the script forces a restart of the SAP HANA instance on the local node.

The hook script analyzes instance events, filters the event details, and triggers actions based on the filtered results. It distinguishes between an SAP HANA indexserver process that is stopped and restarted by the system, and one that is stopped as part of an instance shutdown.

The actions that are triggered by the hook depend on the configuration of the action_on_lost parameter.

ignore
This action logs the event details and the corresponding decision logic to a file.
stop
This action gracefully stops the SAP HANA instance by using the sapcontrol command.
kill
This action triggers the HDB kill-<signal> command with a default signal 9. The signal can be configured.

Both the stop and the kill actions stop the SAP HANA instance, but the kill action is slightly faster.

fence
This action fences the cluster node when a failed indexserver process is detected. Grant sudo privileges to allow the sidadm user to run the /usr/bin/SAPHanaSR-hookHelper script.
  1. Update the global.ini file on each SAP HANA node to enable the hook script.

    On both nodes, run the following command.

    sudo -i -u ${sid}adm -- <<EOT
        python \$DIR_INSTANCE/exe/python_support/setParameter.py \
          -set SYSTEM/global.ini/ha_dr_provider_chksrv/provider=ChkSrv \
          -set SYSTEM/global.ini/ha_dr_provider_chksrv/path=/usr/share/sap-hana-ha/ \
          -set SYSTEM/global.ini/ha_dr_provider_chksrv/execution_order=2 \
          -set SYSTEM/global.ini/ha_dr_provider_chksrv/action_on_lost=stop \
          -set SYSTEM/global.ini/trace/ha_dr_chksrv=info
    EOT
    

    In this example, the action_on_lost parameter is set to stop; the default value is ignore. You can optionally configure the stop_timeout parameter (default: 20 seconds) and the kill_signal parameter (default: 9).

Activating the hooks

  1. Reload the hook configuration.

    On both nodes, run the following command to activate the hooks.

    sudo -i -u ${sid}adm -- hdbnsutil -reloadHADRProviders
    
  2. Verify that the hooks are active by checking the trace logs.

    On both nodes, run the following command.

    sudo -i -u ${sid}adm -- sh -c 'grep "ha_dr_provider" $DIR_INSTANCE/$VTHOSTNAME/trace/nameserver_* | cut -d" " -f2,3,6-'
    

    Sample Output:

    2025-02-27 15:47:21.855038   HADRProviderManager.cpp(00087) : loading HA/DR Provider 'ChkSrv' from /usr/share/sap-hana-ha/
    2025-02-27 15:47:22.648229   HADRProviderManager.cpp(00087) : loading HA/DR Provider 'HanaSR' from /usr/share/sap-hana-ha/
    

Configuring general cluster properties

To prevent unintended resource failover during initial testing and post-production phases, configure the following default values for the resource-stickiness and migration-threshold parameters.

These default values do not apply to resources that define custom values for these parameters.

On NODE1, run the following commands.

pcs resource defaults update resource-stickiness=1000
pcs resource defaults update migration-threshold=5000

Creating a cloned SAPHanaTopology resource

On NODE1, run the following command to create the SAPHanaTopology resource.

pcs resource create SAPHanaTopology_${SID}_${INSTNO} SAPHanaTopology \
    SID=${SID} \
    InstanceNumber=${INSTNO} \
    op start timeout=600 \
    op stop timeout=300 \
    op monitor interval=10 timeout=600 \
    clone meta clone-max=2 clone-node-max=1 interleave=true \
    --future \
    --disabled

Verify the resource configuration and cluster status by running the following commands.

pcs resource config SAPHanaTopology_${SID}_${INSTNO}
pcs resource config SAPHanaTopology_${SID}_${INSTNO}-clone

Creating a cloned SAPHanaFilesystem resource

The SAPHanaFilesystem resource monitors the file systems that are used by the SAP HANA system.

On NODE1, run the following command to create the SAPHanaFilesystem resource.

pcs resource create SAPHanaFilesystem_${SID}_${INSTNO} SAPHanaFilesystem \
    SID=${SID} \
    InstanceNumber=${INSTNO} \
    op start interval=0 timeout=10 \
    op stop interval=0 timeout=20 \
    op monitor interval=120 timeout=120 \
    clone meta clone-max=2 clone-node-max=1 interleave=true \
    --future \
    --disabled

Verify the resource configuration and cluster status by running the following commands.

pcs resource config SAPHanaFilesystem_${SID}_${INSTNO}
pcs resource config SAPHanaFilesystem_${SID}_${INSTNO}-clone

Creating a promotable SAPHanaController resource

The SAPHanaController resource manages the two SAP HANA instances that are configured for system replication.

On NODE1, run the following command to create the SAPHanaController resource.

pcs resource create SAPHanaController_${SID}_${INSTNO} SAPHanaController \
    SID=${SID} \
    InstanceNumber=${INSTNO} \
    HANA_CALL_TIMEOUT=120 \
    PREFER_SITE_TAKEOVER=true \
    DUPLICATE_PRIMARY_TIMEOUT=7200 \
    AUTOMATED_REGISTER=false \
    op start timeout=3600 \
    op stop timeout=3600 \
    op monitor interval=59 role="Unpromoted" timeout=700 \
    op monitor interval=61 role="Promoted" timeout=700 \
    op promote timeout=900 \
    op demote timeout=320 \
    meta priority=100 \
    promotable meta clone-max=2 clone-node-max=1 interleave=true \
    --future \
    --disabled

Verify the resource configuration and cluster status.

pcs resource config SAPHanaController_${SID}_${INSTNO}

Creating a virtual IP address cluster resource

Choose the appropriate configuration path based on your deployment scenario:

Creating a virtual IP cluster resource in a single zone environment

Use the reserved IP address to create a virtual IP address cluster resource. This IP address enables access to the SAP HANA system replication primary instance.

Create the virtual IP address cluster resource with the following command.

pcs resource create vip_${SID}_${INSTNO} IPaddr2 \
    ip=$VIP \
    --disabled

Verify the virtual IP address resource configuration and cluster status.

pcs resource config vip_${SID}_${INSTNO}
pcs status --full

Proceed to the Creating cluster resource constraints section.

Creating a virtual IP cluster resource in a multizone region environment

Ensure that you completed all steps in the Preparing a multi-zone RHEL HA Add-On cluster for a virtual IP address resource section.

Use the pcs resource describe powervs-subnet command to get information about the resource agent parameters.

On NODE1, create a powervs-subnet cluster resource by running the following command.

pcs resource create vip_${SID}_${INSTNO} powervs-subnet \
    api_key=${APIKEY} \
    api_type=${API_TYPE} \
    cidr=${CIDR} \
    ip=${VIP} \
    crn_host_map="${NODE1}:${IBMCLOUD_CRN_1};${NODE2}:${IBMCLOUD_CRN_2}" \
    vsi_host_map="${NODE1}:${POWERVSI_1};${NODE2}:${POWERVSI_2}" \
    jumbo=${JUMBO} \
    region=${CLOUD_REGION} \
    subnet_name=${SUBNET_NAME} \
    op start timeout=720 \
    op stop timeout=300 \
    op monitor interval=60 timeout=30 \
    --disabled

When API_TYPE is set to public, you must also specify the proxy parameter.

Before you run the pcs resource config command, verify that both virtual server instances in the cluster have the status Active, and their health status is OK.

Verify the configured virtual IP resource and cluster status.

pcs resource config vip_${SID}_${INSTNO}

Sample output:

# pcs resource config vip_MH1_00
Resource: vip_MH1_00 (class=ocf provider=heartbeat type=powervs-subnet)
  Attributes: vip_MH1_00-instance_attributes
    api_key=@/root/.apikey.json
    api_type=private
    cidr=10.40.41.100/30
    crn_host_map=cl-mh1-1:crn:v1:bluemix:public:power-iaas:eu-de-2:a/**********************************:**********************************::;cl-mh1-2:crn:v1:bluemix:public:power-iaas:eu-
        de-1:a/**********************************:**********************************::
    ip=10.40.41.102
    jumbo=true
    region=eu-de
    subnet_name=vip-mh1-net
    vsi_host_map=cl-mh1-1:**********************************;cl-mh1-2:**********************************
  Operations:
    monitor: vip_MH1_00-monitor-interval-60
      interval=60 timeout=30
    start: vip_MH1_00-start-interval-0s
      interval=0s timeout=720
    stop: vip_MH1_00-stop-interval-0s
      interval=0s timeout=300

Creating cluster resource constraints

Ensure that the SAPHanaTopology resources start before the SAPHanaController resources.

The virtual IP address must be assigned to the node that hosts the primary SAPHanaController resource.

  1. Create a resource constraint to enforce the start order. This constraint ensures that SAPHanaTopology starts before SAPHanaController.

    On NODE1, use the following command to create the SAPHanaTopology order constraint.

    pcs constraint order SAPHanaTopology_${SID}_${INSTNO}-clone \
        then SAPHanaController_${SID}_${INSTNO}-clone symmetrical=false
    

    Verify that the constraint is applied correctly.

    pcs constraint
    
  2. Create a resource constraint to colocate the virtual IP address with the SAP HANA system replication primary instance. The constraint ensures that the virtual IP address is assigned to the node where the promoted SAPHanaController resource, the SAP HANA primary, is running.

    On NODE1, run the following command to create the virtual IP colocation constraint.

    pcs constraint colocation add vip_${SID}_${INSTNO} \
        with Promoted SAPHanaController_${SID}_${INSTNO}-clone 2000
    

    Verify that the colocation constraint is applied.

    pcs constraint
    

    Sample output:

    # pcs constraint
    Colocation Constraints:
      Started resource 'vip_MH1_00' with Promoted resource 'SAPHanaController_MH1_00-clone'
        score=2000
    Order Constraints:
      start resource 'SAPHanaTopology_MH1_00-clone' then start resource 'SAPHanaController_MH1_00-clone'
        symmetrical=0
    

Enabling the cluster resources

The cluster resources were initially created with the --disabled flag.

On NODE1, use the following commands to enable each cluster resource.

pcs resource enable SAPHanaTopology_${SID}_${INSTNO}-clone
pcs resource enable SAPHanaFilesystem_${SID}_${INSTNO}-clone
pcs resource enable SAPHanaController_${SID}_${INSTNO}-clone
pcs resource enable vip_${SID}_${INSTNO}

Verify that all resources are running.

pcs status --full

The following output is a sample for an SAP HANA system replication cluster that is deployed in a multizone region.

# pcs status --full
Cluster name: SAP_MH1
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: cl-mh1-1 (1) (version 2.1.7-5.2.el9_4-0f7f88312) - partition with quorum
  * Last updated: Thu Feb 27 17:43:57 2025 on cl-mh1-1
  * Last change:  Thu Feb 27 17:43:22 2025 by root via root on cl-mh1-1
  * 2 nodes configured
  * 8 resource instances configured

Node List:
  * Node cl-mh1-1 (1): online, feature set 3.19.0
  * Node cl-mh1-2 (2): online, feature set 3.19.0

Full List of Resources:
  * res_fence_sbd       (stonith:fence_sbd):     Started cl-mh1-1
  * Clone Set: SAPHanaTopology_MH1_00-clone [SAPHanaTopology_MH1_00]:
    * SAPHanaTopology_MH1_00    (ocf:heartbeat:SAPHanaTopology):         Started cl-mh1-2
    * SAPHanaTopology_MH1_00    (ocf:heartbeat:SAPHanaTopology):         Started cl-mh1-1
  * Clone Set: SAPHanaFilesystem_MH1_00-clone [SAPHanaFilesystem_MH1_00]:
    * SAPHanaFilesystem_MH1_00  (ocf:heartbeat:SAPHanaFilesystem):       Started cl-mh1-2
    * SAPHanaFilesystem_MH1_00  (ocf:heartbeat:SAPHanaFilesystem):       Started cl-mh1-1
  * Clone Set: SAPHanaController_MH1_00-clone [SAPHanaController_MH1_00] (promotable):
    * SAPHanaController_MH1_00  (ocf:heartbeat:SAPHanaController):       Unpromoted cl-mh1-2
    * SAPHanaController_MH1_00  (ocf:heartbeat:SAPHanaController):       Promoted cl-mh1-1
  * vip_MH1_00  (ocf:heartbeat:powervs-subnet):  Started cl-mh1-1

Node Attributes:
  * Node: cl-mh1-1 (1):
    * hana_mh1_clone_state              : PROMOTED
    * hana_mh1_roles                    : master1:master:worker:master
    * hana_mh1_site                     : SiteA
    * hana_mh1_srah                     : -
    * hana_mh1_version                  : 2.00.079.00
    * hana_mh1_vhost                    : cl-mh1-1
    * master-SAPHanaController_MH1_00   : 150
  * Node: cl-mh1-2 (2):
    * hana_mh1_clone_state              : DEMOTED
    * hana_mh1_roles                    : master1:master:worker:master
    * hana_mh1_site                     : SiteB
    * hana_mh1_srah                     : -
    * hana_mh1_version                  : 2.00.079.00
    * hana_mh1_vhost                    : cl-mh1-2
    * master-SAPHanaController_MH1_00   : 100

Migration Summary:

Tickets:

PCSD Status:
  cl-mh1-1: Online
  cl-mh1-2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

On the promoted cluster node, verify that the cluster service IP address is active.

ip addr show

Enabling automated registration of the secondary instance

Set the AUTOMATED_REGISTER parameter based on your operational requirements. To retain the ability to revert to the previous primary SAP HANA instance, set AUTOMATED_REGISTER=false. This setting prevents an automatic registration of the previous primary as a new secondary.

If a cluster-triggered takeover results in data issues, you can manually revert the configuration if AUTOMATED_REGISTER is set to false.

When AUTOMATED_REGISTER=true, the previous primary SAP HANA instance automatically registers as a secondary. The instance looses its history and cannot be reactivated on its old state. The benefit of this setting is that high availability is automatically restored when the failed node rejoins the cluster.

Use the default value false for AUTOMATED_REGISTER until the cluster is fully tested and the failover scenarios are verified.

To enable automated registration, use pcs resource update SAPHanaController_${SID}_${INSTNO} AUTOMATED_REGISTER=true to update the resource attribute.

Testing SAP HANA system replication cluster

Thoroughly test the cluster configuration to help ensure that all components function as expected. The following examples describe failover test scenarios, but the list is not comprehensive. Each test case includes the following details.

  • Component that is being tested
  • Test description
  • Prerequisites and cluster state before the failover is initiated
  • Test procedure
  • Expected behavior and results
  • Recovery procedure

Test 1 - Testing a failure of the primary database instance

Use the following procedure to test a failure of the primary SAP HANA database instance.

Test 1 - Description

Simulate a crash of the primary SAP HANA database instance that is running on NODE1.

Test 1 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster configured for HANA system replication
  • The cluster is running on both nodes
  • The SAPHanaController_${SID}_${INSTNO} resource is configured with AUTOMATED_REGISTER=false
  • Verify the SAP HANA system replication status:
    • Primary SAP HANA database is running on NODE1
    • Secondary SAP HANA database is running on NODE2
    • System replication is activated and in sync

Test 1 - Test procedure

Simulate a crash of the primary SAP HANA instance by sending a SIGKILL signal as the sidadm user.

On NODE1, run the following command.

sudo -i -u ${sid}adm -- HDB kill-9

Test 1 - Expected behavior

  • SAP HANA primary instance on NODE1 crashes.
  • The cluster detects the stopped primary HANA database and marks the resource as failed.
  • The cluster promotes the secondary HANA database on NODE2 to become the new primary.
  • The virtual IP address is released from NODE1 and reassigned to NODE2.
  • Applications such as SAP NetWeaver, if connected to a tenant database, automatically reconnect to the new primary.

Test 1 - Recovery procedure

Because the SAPHanaController_${SID}_${INSTNO} resource is configured with AUTOMATED_REGISTER=false, the cluster does not restart the failed SAP HANA instance or register it against the new primary. As a result, the new primary on NODE2 shows the former secondary in CONNECTION TIMEOUT status.

To manually reregister the previous primary as a new secondary, run the following command on NODE1.

sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_register \
      --name=${DC1} \
      --remoteHost=${NODE2} \
      --remoteInstance=00 \
      --replicationMode=sync \
      --operationMode=logreplay \
      --online
EOT

Verify the SAP HANA system replication status.

sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_state
    HDBSettings.sh systemReplicationStatus.py
EOT

After manual registration and resource refresh, the new secondary instance restarts and appears in SOK (synced) status.

On NODE1, run the following command.

pcs resource refresh SAPHanaController_${SID}_${INSTNO}
pcs status --full

Test 2 - Testing a failure of the node that is running the primary database

Use the following procedure to simulate and test a failure of the node that hosts the primary SAP HANA database instance.

Test 2 - Description

Simulate a crash of the node that hosts the primary SAP HANA database instance.

Test 2 - Preparation

Make sure that the SAPHanaController_${SID}_${INSTNO} resource is configured with AUTOMATED_REGISTER=true.

On NODE1, run the following commands.

pcs resource update SAPHanaController_${SID}_${INSTNO} AUTOMATED_REGISTER=true
pcs resource config SAPHanaController_${SID}_${INSTNO}

Test 2 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster configured for HANA system replication
  • The cluster is running on both nodes
  • Verify the SAP HANA system replication status:
    • Primary SAP HANA database is running on NODE2
    • Secondary SAP HANA database is running on NODE1
    • System replication is activated and in sync

Test 2 - Test procedure

Simulate a node failure on NODE2 by triggering a system crash.

On NODE2, run the following command.

sync; echo c > /proc/sysrq-trigger

Test 2 - Expected behavior

  • NODE2 shuts down.
  • The cluster detects the failed node and marks its state as OFFLINE.
  • The cluster promotes the secondary SAP HANA instance on NODE1 to become the new primary.
  • The virtual IP address is reassigned to NODE1.
  • Applications such as SAP NetWeaver, if connected to a tenant database, automatically reconnect to the new primary.

Test 2 - Recovery procedure

Log in to the IBM Cloud® Console and start the NODE2 instance. After NODE2 becomes available, restart the cluster framework.

On NODE2, run the following commands.

pcs cluster start
pcs status --full

Because AUTOMATED_REGISTER=true is configured for the SAPHanaController_${SID}_${INSTNO} resource, SAP HANA automatically restarts when NODE_B1 rejoins the cluster. The former primary instance is reregistered as a secondary.

Test 3 - Testing a failure of the secondary database instance

Follow these steps to simulate a failure of the secondary database instance.

Test 3 - Description

This test simulates a crash of the secondary SAP HANA instance to validate cluster failover behavior.

Test 3 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster configured for HANA system replication
  • The cluster is running in both nodes
  • The SAPHanaController_${SID}_${INSTNO} resource is configured with AUTOMATED_REGISTER=true
  • Verify the SAP HANA system replication status:
    • Primary SAP HANA database is running on NODE1
    • Secondary SAP HANA database is running on NODE2
    • HANA system replication is activated and in sync

Test 3 - Test Procedure

Simulate a crash of the secondary SAP HANA instance by sending a SIGKILL signal as the sidadm user.

On NODE2, run the following command.

sudo -i -u ${sid}adm -- HDB kill-9

Test 3 - Expected behavior

  • The secondary SAP HANA instance on NODE2 crashes.
  • The cluster detects the stopped secondary HANA database and marks the resource as failed.
  • The cluster restarts the secondary HANA instance.
  • The cluster verifies that system replication is reestablished and in sync.

Test 3 - Recovery procedure

Wait until the secondary SAP HANA instance starts and system replication status returns to SOK. Then, clean up any failed resource actions as indicated by pcs status.

On NODE2, run the following commands.

pcs resource refresh SAPHanaController_${SID}_${INSTNO}
pcs status --full

Test 4 - Testing a manual move of a SAPHana resource to another node

Follow these steps to manually move a SAPHanaController resource to another node.

Test 4 - Description

Use cluster management commands to relocate the primary SAP HANA instance to another node for maintenance.

Test 4 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster configured for SAP HANA system replication
  • The cluster is running on both nodes
  • The SAPHanaController_${SID}_${INSTNO} resource is configured with AUTOMATED_REGISTER=true
  • Verify the SAP HANA system replication status:
    • Primary SAP HANA instance is running on NODE1
    • Secondary SAP HANA instance is running on NODE2
    • HANA system replication is activated and in sync

Test 4 - Test procedure

Manually move the primary SAP HANA instance to another node by using the pcs resource move command.

On NODE1, run the following command.

pcs resource move SAPHanaController_${SID}_${INSTNO}-clone

Test 4 - Expected behavior

  • The cluster creates temporary location constraints to relocate the resource to the target node.
  • The cluster initiates a takeover, promoting the secondary SAP HANA instance to primary.
  • Applications such as SAP NetWeaver, if connected to a tenant database, automatically reconnect to the new primary instance.

Test 4 - Recovery procedure

As part of the resource move process, the cluster automatically registers and starts the original primary node as the new secondary instance. After the move completes successfully, the cluster removes any temporary location constraints.

Run the following commands on NODE1 to confirm that temporary constraints are removed and to verify the cluster status.

pcs constraint
pcs status --full