IBM Cloud Docs
Configuring SAP HANA Scale-Up System Replication in a RHEL HA Add-On cluster

Configuring SAP HANA Scale-Up System Replication in a RHEL HA Add-On cluster

With SAP HANA on IBM Power Systems, you have a consistent platform for HANA-based and traditional applications, best-in-class performance, resilience for critical workloads, and most flexible infrastructure.

Overview

The following information describes the configuration of a Red Hat Enterprise Linux 8 (RHEL) HA Add-On cluster for managing SAP HANA Scale-Up System Replication. The cluster uses virtual server instances in IBM Power Virtual Server{: external} as cluster nodes.

The instructions describe how to automate SAP HANA Scale-Up System Replication for a single database deployment in a performance-optimized scenario on a RHEL HA Add-on cluster.

For more information, see the following links:

This information is intended for architects and specialists that are planning a high-availability deployment of SAP HANA on Power Virtual Server.

Prerequisites

Configuring SAP HANA System Replication in a RHEL HA Add-On cluster on IBM Power Virtual Server

For the following steps, also check the documentation in the Red Hat article Automating SAP HANA Scale-Up System Replication by using the RHEL HA Add-On.

Preparing environment variables

To simplify the setup, prepare the following environment variables for root on both nodes. These environment variables are used in subsequent commands in the remainder of the examples.

On both nodes, run the following commands.

export SID=<SID>            # SAP HANA System ID (uppercase)
export sid=<sid>            # SAP HANA System ID (lowercase)
export INSTNO=<INSTNO>      # SAP HANA Instance Number

export DC1=<Site1>          # HANA System Replication Site Name 1
export DC2=<Site2>          # HANA System Replication Site Name 2

export NODE1=<Hostname 1>   # Hostname of virtual server instance 1
export NODE2=<Hostname 2>   # Hostname of virtual server instance 2

Installing SAP HANA resource agents

Run the following command to install the RHEL HA Add-On resource agents for SAP HANA System Replication.

dnf install -y resource-agents-sap-hana

Starting the SAP HANA system

Start SAP HANA and verify that HANA System Replication is active. For more information, see 2.4. Checking SAP HANA System Replication state.

On both nodes, run the following commands.

sudo -i -u ${sid}adm -- HDB start
sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_state
    HDBSettings.sh systemReplicationStatus.py
EOT

Enabling the SAP HANA srConnectionChanged() hook

Recent versions of SAP HANA provide hooks so SAP HANA can send out notifications for certain events. For more information, see Implementing a HA/DR Provider.

The srConnectionChanged() hook improves the ability of the cluster to detect a status change of HANA System Replication that requires an action from the cluster. The goal is to prevent data loss and corruption by preventing accidental takeovers.

Activating the srConnectionChanged() hook on all SAP HANA instances

  1. Stop the cluster.

    On NODE1, run the following command.

    pcs cluster stop --all
    
  2. Install the hook script that is provided by the resource-agents-sap-hana package in the /hana/shared/myHooks directory for each HANA instance, and set the required ownership.

    On both nodes, run the following commands.

    mkdir -p /hana/shared/myHooks
    
    cp /usr/share/SAPHanaSR/srHook/SAPHanaSR.py /hana/shared/myHooks
    
    chown -R ${sid}adm:sapsys /hana/shared/myHooks
    
  3. Update the global.ini file on each HANA node to enable the hook script.

    On both nodes, run the following command.

    sudo -i -u ${sid}adm -- <<EOT
        python \$DIR_INSTANCE/exe/python_support/setParameter.py \
          -set SYSTEM/global.ini/ha_dr_provider_SAPHanaSR/provider=SAPHanaSR \
          -set SYSTEM/global.ini/ha_dr_provider_SAPHanaSR/path=/hana/shared/myHooks \
          -set SYSTEM/global.ini/ha_dr_provider_SAPHanaSR/execution_order=1 \
          -set SYSTEM/global.ini/trace/ha_dr_saphanasr=info
    EOT
    
  4. Verify the changed file.

    On both nodes, run the following command.

    cat /hana/shared/${SID}/global/hdb/custom/config/global.ini
    
  5. Create sudo settings for SAP HANA OS user.

    You need the following sudo settings to allow the ${sid}adm user script can update the node attributes when the srConnectionChanged() hook runs.

    On both nodes, run the following commands.

    Create a file with the required sudo aliases and user specifications.

    cat >> /etc/sudoers.d/20-saphana << EOT
    Cmnd_Alias DC1_SOK = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC1} -v SOK -t crm_config -s SAPHanaSR
    Cmnd_Alias DC1_SFAIL = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC1} -v SFAIL -t crm_config -s SAPHanaSR
    Cmnd_Alias DC2_SOK = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC2} -v SOK -t crm_config -s SAPHanaSR
    Cmnd_Alias DC2_SFAIL = /usr/sbin/crm_attribute -n hana_${sid}_site_srHook_${DC2} -v SFAIL -t crm_config -s SAPHanaSR
    ${sid}adm ALL=(ALL) NOPASSWD: DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL
    Defaults!DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL !requiretty
    EOT
    

    Adjust the permissions and check for syntax errors.

    chown root:root /etc/sudoers.d/20-saphana
    
    chmod 0440 /etc/sudoers.d/20-saphana
    
    cat /etc/sudoers.d/20-saphana
    
    visudo -c
    

Any problems that are reported by the visudo -c command must be corrected.

  1. Verify that the hook functions.

    • Restart both HANA instances and verify that the hook script works as expected.
    • Perform an action to trigger the hook, such as stopping a HANA instance.
    • Check whether the hook logged anything in the trace files.

    On both nodes, run the following commands.

    Restart and then stop the HANA instance.

    sudo -i -u ${sid}adm -- HDB restart
    
    sudo -i -u ${sid}adm -- HDB stop
    

    Check messages in trace files.

    sudo -i -u ${sid}adm -- sh -c 'grep "ha_dr_SAPHanaSR.*crm_attribute" $DIR_INSTANCE/$VTHOSTNAME/trace/nameserver_* | cut -d" " -f2,3,5,17'
    

    Start the HANA instance.

    sudo -i -u ${sid}adm -- HDB start
    

    After you verify that the hooks function, you can restart the HA cluster.

  2. Start cluster.

    On NODE1, run the following commands.

    Start the cluster.

    pcs cluster start --all
    

    Check the status of the cluster.

    pcs status --full
    

Configuring general cluster properties

To avoid failovers of the resources during initial testing and post production, set the following default values for the resource-stickiness and migration-threshold parameters.

Keep in mind that defaults don't apply to resources that override them with their own defined values.

On NODE1, run the following commands.

pcs resource defaults update resource-stickiness=1000
pcs resource defaults update migration-threshold=5000

Creating a cloned SAPHanaTopology resource

The SAPHanaTopology resource gathers status and configuration of SAP HANA System Replication on each node. It also starts and monitors the local SAP HostAgent, which is required for starting, stopping, and monitoring SAP HANA instances.

On NODE1, run the following commands.

Create the SAPHanaTopology resource.

pcs resource create SAPHanaTopology_${SID}_${INSTNO} SAPHanaTopology \
    SID=${SID} InstanceNumber=${INSTNO} \
    op start timeout=600 \
    op stop timeout=300 \
    op monitor interval=10 timeout=600 \
    clone clone-max=2 clone-node-max=1 interleave=true

Check the configuration and the cluster status by running the following commands.

pcs resource config SAPHanaTopology_${SID}_${INSTNO}
pcs resource config SAPHanaTopology_${SID}_${INSTNO}-clone
pcs status --full

Creating master and slave SAPHana resources

The SAPHana resource manages two SAP HANA instances that are configured as HANA System Replication nodes.

On NODE1, create the SAPHana resource, by running the following command.

pcs resource create SAPHana_${SID}_${INSTNO} SAPHana \
    SID=${SID} InstanceNumber=${INSTNO} \
    PREFER_SITE_TAKEOVER=true \
    DUPLICATE_PRIMARY_TIMEOUT=7200 \
    AUTOMATED_REGISTER=false \
    op start timeout=3600 \
    op stop timeout=3600 \
    op monitor interval=61 role="Slave" timeout=700 \
    op monitor interval=59 role="Master" timeout=700 \
    op promote timeout=3600 \
    op demote timeout=3600 \
    promotable notify=true clone-max=2 clone-node-max=1 interleave=true

Check the configuration and the cluster status.

pcs resource config SAPHana_${SID}_${INSTNO}
pcs status --full

Creating a virtual IP address resource

Review the information in Reserving virtual IP addresses and reserve a virtual IP address for the SAP HANA System Replication cluster.

Use the reserved IP address to create a virtual IP address resource. This virtual IP address is used to reach the System Replication primary instance.

On NODE1, assign the reserved IP address to a VIP environment variable and create the virtual IP address cluster resource by running the following commands.

export VIP=<reserved IP address>
echo $VIP
pcs resource create vip_${SID}_${INSTNO} IPaddr2 ip=$VIP

Check the configured virtual IP address and the cluster status.

pcs resource config vip_${SID}_${INSTNO}
ip addr show
pcs status --full

Creating constraints

Make sure that SAPHanaTopology resources are started before you start the SAPHana resources.

The virtual IP address must be present on the node where the primary resource of "SAPHana" is running.

  1. Create constraint to start "SAPHanaTopology" before "SAPHana". This constraint mandates the start order of these resources.

    On NODE1, use the following command to create the SAPHanaTopology order constraint:

    pcs constraint order SAPHanaTopology_${SID}_${INSTNO}-clone \
        then SAPHana_${SID}_${INSTNO}-clone symmetrical=false
    

    Check the configuration.

    pcs constraint
    
  2. Create constraint to colocate the virtual IP address with primary. This constraint colocates the virtual IP address resource with the SAPHana resource that was promoted as primary.

    On NODE1, run the following command to create the virtual IP address colocation constraint.

    pcs constraint colocation add vip_${SID}_${INSTNO} \
        with master SAPHana_${SID}_${INSTNO}-clone 2000
    

    Check the configuration and the cluster status.

    pcs constraint
    
    pcs status --full
    

Enabling automated registration of secondary instance

You need to set the parameter AUTOMATED_REGISTER according to your operational requirements. If you want to keep the ability to revert to the state of the previous primary SAP HANA instance, then AUTOMATED_REGISTER=false avoids an automatic registration of the previous primary as a new secondary.

If you experience an issue with the data after a takeover that was triggered by the cluster, you can manually revert if AUTOMATED_REGISTER is set to false.

If AUTOMATED_REGISTER is set to true, the previous primary SAP HANA instance automatically registers as secondary, and cannot be activated on its previous history. The advantage of AUTOMATED_REGISTER=true is that high-availability capability is automatically reestablished after the failed node reappears in the cluster.

For now, it is recommended to keep AUTOMATED_REGISTER on default value false until the cluster is fully tested and that you verify that the failover scenarios work as expected.

The pcs resource update command is used to modify resource attributes and pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true sets the attribute to true.

Testing SAP HANA System Replication cluster

It is vital to thoroughly test the cluster configuration to make sure that the cluster is working correctly. The following information provides a few sample failover test scenarios, but is not a complete list of test scenarios.

For example, the description of each test case includes the following information.

  • Component that is being tested
  • Description of the test
  • Prerequisites and the cluster state before you start the failover test
  • Test procedure
  • Expected behavior and results
  • Recovery procedure

Test1 - Testing failure of the primary database instance

Use the following information to test the failure of the primary database instance.

Test1 - Description

Simulate a crash of the primary HANA database instance that is running on NODE1.

Test1 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both cluster nodes are active.
  • Cluster that is started on NODE1 and NODE2.
  • Cluster Resource SAPHana_${SID}_${INSTNO} that is configured with AUTOMATED_REGISTER=false.
  • Check SAP HANA System Replication status:
    • Primary SAP HANA database is running on NODE1
    • Secondary SAP HANA database is running on NODE2
    • HANA System Replication is activated and in sync

Test1 - Test procedure

Crash SAP HANA primary by sending a SIGKILL signal as user ${sid}adm.

On NODE1, run the following command.

sudo -i -u ${sid}adm -- HDB kill-9

Test1 - Expected behavior

  • SAP HANA primary instance on NODE1 crashes.
  • The cluster detects the stopped primary HANA database and marks the resource as failed.
  • The cluster promotes the secondary HANA database on NODE2 to take over as new primary.
  • The cluster releases the virtual IP address on NODE1, and acquires it on the new primary on NODE2.
  • If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.

Test1 - Recovery procedure

As cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=false, the cluster doesn't restart the failed HANA database, and doesn't register it against the new primary. Which means that the status on the new primary (NODE2) also shows the secondary in status 'CONNECTION TIMEOUT'.

To reregister the previous primary as new secondary use the following commands.

On NODE1, run the following command.

sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_register \
      --name=${DC1} \
      --remoteHost=${NODE2} \
      --remoteInstance=00 \
      --replicationMode=sync \
      --operationMode=logreplay
EOT

Verify the system replication status:

sudo -i -u ${sid}adm -- <<EOT
    hdbnsutil -sr_state
    HDBSettings.sh systemReplicationStatus.py
EOT

After the manual register and resource refreshes, the new secondary instance restarts and shows up in status synced (SOK).

On NODE1, run the following command.

pcs resource refresh SAPHana_${SID}_${INSTNO}
pcs status --full

Test2 - Testing failure of the node that is running the primary database

Use the following information to test the failure of the node that is running the primary database.

Test2 - Description

Simulate a crash of the node that is running the primary HANA database.

Test2 - Preparation

Make sure that Cluster Resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.

On NODE1, run the following command.

pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true
pcs resource config SAPHana_${SID}_${INSTNO}

Test2 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both nodes active.
  • Cluster is started on NODE1 and NODE2.
  • Check SAP HANA System Replication status.
    • Primary SAP HANA database is running on NODE2
    • Secondary SAP HANA database is running on NODE1
    • HANA System Replication is activated and in sync

Test2 - Test procedure

Crash primary on NODE2 by sending a shutoff system request.

On NODE2, run the following command.

sync; echo o > /proc/sysrq-trigger

Test2 - Expected behavior

  • NODE2 shuts down.
  • The cluster detects the failed node and sets its state to OFFLINE.
  • The cluster promotes the secondary HANA database on NODE1 to take over as new primary.
  • The cluster acquires the virtual IP address on NODE1 on the new primary.
  • If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.

Test2 - Recovery procedure

Log in to the IBM Cloud® Console and start the NODE2 instance. Wait until NODE2 is available again, then restart the cluster framework.

On NODE2, run the following command.

pcs cluster start
pcs status --full

As cluster resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true, SAP HANA restarts when NODE2 rejoins the cluster and the former primary reregisters as a secondary.

Test3 - Testing the failure of the secondary database instance

Use the following information to test the failure of the secondary database instance.

Test3 - Description

Simulate a crash of the secondary HANA database.

Test3 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both nodes are active.
  • Cluster is started on NODE1 and NODE2.
  • Cluster Resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
  • Check SAP HANA System Replication status:
    • Primary SAP HANA database is running on NODE1
    • Secondary SAP HANA database is running on NODE2
    • HANA System Replication is activated and in sync

Test3 - Test Procedure

Crash SAP HANA secondary by sending a SIGKILL signal as user ${sid}adm.

On NODE2, run the following command.

sudo -i -u ${sid}adm -- HDB kill-9

Test3 - Expected behavior

  • SAP HANA secondary on NODE2 crashes.
  • The cluster detects the stopped secondary HANA database and marks the resource as failed.
  • The cluster restarts the secondary HANA database.
  • The cluster detects that the system replication is in sync again.

Test3 - Recovery procedure

Wait until the secondary HANA instance starts and syncs again (SOK), then cleanup the failed resource actions as shown in pcs status.

On NODE2, run the following command.

pcs resource refresh SAPHana_${SID}_${INSTNO}
pcs status --full

Test4 - Testing the manual move of a SAPHana resource to another node

Use the following information to test the manual move of a SAPHana resource to another node.

Test4 - Description

Use cluster commands to move the primary instance to the other node for maintenance purposes.

Test4 - Prerequisites

  • A functional two-node RHEL HA Add-On cluster for HANA system replication.
  • Both nodes are active.
  • Cluster is started on NODE1 and NODE2.
  • Cluster Resource SAPHana_${SID}_${INSTNO} is configured with AUTOMATED_REGISTER=true.
  • Check SAP HANA System Replication status:
    • Primary SAP HANA database is running on NODE1
    • Secondary SAP HANA database is running on NODE2
    • HANA System Replication is activated and in sync

Test4 - Test procedure

Move SAP HANA primary to other node by using the pcs resource move command.

On NODE1, run the following command.

pcs resource move SAPHana_${SID}_${INSTNO}-clone

Test4 - Expected behavior

  • The cluster creates location constraints to move the resource.
  • The cluster triggers a takeover to the secondary HANA database.
  • If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.

Test4 - Recovery procedure

The automatically created location constraints must be removed to allow automatic failover in the future.

Wait until the primary HANA instance is active and remove the constraints.

The cluster registers and starts the HANA database as new secondary instance.

On NODE1, run the following command.

pcs constraint
pcs resource clear SAPHana_${SID}_${INSTNO}-clone
pcs constraint
pcs status --full