Configuring SAP HANA scale-up system replication in a SUSE Linux Enterprise High Availability Extension cluster
The following information describes the configuration of a SUSE Enterprise Linux Server (SLES) High Availability Extension (HAE)cluster for managing SAP HANA Scale-Up System Replication. The cluster uses virtual server instances in IBM® Power® Virtual Server as cluster nodes.
The instructions describe how to automate SAP HANA Scale-Up System Replication for a single database deployment in a performance-optimized scenario on a SLES HA Extension cluster.
This information is intended for architects and specialists that are planning a high-availability deployment of SAP HANA on Power Virtual Server.
Before you begin
Review the general requirements, product documentation, support articles, and SAP notes listed in Implementing high availability for SAP applications on IBM Power Virtual Server References.
Prerequisites
- A SUSE High Availability cluster is deployed on two virtual server instances in Power Virtual Server.
- Install and set up the SLES High Availability Extension cluster according to Implementing a SUSE Linux Enterprise Server high availability cluster.
- Configure and verify fencing as described in the preceding document.
- The virtual server instances need to fulfill hardware and resource requirements for the SAP HANA systems in scope. Follow the guidelines in Planning your deployment.
- The hostnames of the virtual server instances must meet the SAP HANA requirement.
- SAP HANA is installed on both virtual server instances and SAP HANA System Replication is configured Installing of SAP HANA and setting up HANA System Replication is not specific to the Power Virtual Server environment. You need to follow the standard installation and set up procedures.
- A valid SUSE Linux Enterprise Server for SAP Applications license is required to enable the repositories that you need to install SAP HANA and the resource agents for HA configurations.
- See the Prerequisites chapter in the SUSE Linux Enterprise Server for SAP applications guide.
Configuring SAP HANA System Replication in a SLES HA Extension cluster on IBM Power Virtual Server
The instructions are based on the SUSE product documentation and articles that are listed in Implementing high availability for SAP applications on IBM Power Virtual Server References.
Preparing environment variables
To simplify the setup, prepare the following environment variables for root on both nodes. These environment variables are used with later operating system commands in this information.
On both nodes, set the following environment variables.
# General settings
export SID=<SID> # SAP HANA System ID (uppercase)
export sid=<sid> # SAP HANA System ID (lowercase)
export INSTNO=<INSTNO> # SAP HANA instance number
# Cluster node 1
export NODE1=<HOSTNAME_1> # Virtual server instance hostname
export DC1="Site1" # HANA System Replication site name
# Cluster node 2
export NODE2=<HOSTNAME_2> # Virtual server instance hostname
export DC2="Site2" # HANA System Replication site name
# Single zone
export VIP=<IP address> # SAP HANA System Replication cluster virtual IP address
Setting extra environment variables for implementing a single zone
Review the information in Reserving virtual IP addresses and reserve a virtual IP address for the SAP HANA System Replication cluster. Set the VIP
environment variable to the reserved IP address.
Installing SAP HANA resource agents
The SAP Hana resource agent and the SAP HanaTopology Resource agent are part of the SLES for SAP applications distribution.
To install the resource and topology agent, make sure that the package yast2-sap-ha is installed, as described in Setting up an SAP HANA cluster and follow the steps to configure the HANA cluster by using yast2.
For a scale-out scenarios, follow the Installing additional Software section of the SAP HANA System Replication Scale-Up - Performance-Optimized Scenario guide.
Starting the SAP HANA system
Start SAP HANA and verify that HANA System Replication is active. For more information, see Checking System Replication Status Details.
On both nodes, run the following commands.
sudo -i -u ${sid}adm -- HDB start
sudo -i -u ${sid}adm -- <<EOT
hdbnsutil -sr_state
HDBSettings.sh systemReplicationStatus.py
EOT
Enabling the SAP HANA srConnectionChanged() hook
Recent versions of SAP HANA provide hooks so that SAP HANA can send out notifications for certain events. For more information, see Implementing a HA/DR Provider.
The srConnectionChanged() hook improves the ability of the cluster to detect a HANA System Replication status change that requires an action from the cluster. The goal is to prevent data loss and corruption by preventing accidental takeovers.
Activating the srConnectionChanged() hook on all SAP HANA instances
-
Stop the cluster.
On NODE1, run the following command.
crm cluster stop --all
Then, follow the steps that are described in Setting up SAP HANA HA/DR providers.
-
Verify that the hook functions.
- Restart both HANA instances and verify that the hook script works as expected.
- Perform an action to trigger the hook, such as stopping a HANA instance.
- Check whether the hook logged anything in the trace files.
On both nodes, run the following commands.
Stop the HANA instance.
sudo -i -u ${sid}adm -- HDB stop
Start the HANA instance.
sudo -i -u ${sid}adm -- HDB start
Check that the hook logged messages to the trace files.
sudo -i -u ${sid}adm -- sh -c 'grep "ha_dr_SAPHanaSR.*crm_attribute" $DIR_INSTANCE/$VTHOSTNAME/trace/nameserver_* | cut -d" " -f2,3,5,17'
After you verify that the hooks function, you can restart the HA cluster.
-
Start the cluster.
On NODE1, run the following commands.
Start the cluster.
crm cluster start --all
Check the status of the cluster.
crm status --full
Configuring general cluster properties
To avoid resource failover during initial testing and post-production, set the following default values for the resource-stickiness and migration-threshold parameters.
These steps are described in Configuring the cluster.
The IBM Power10 Systems provide an integrated hardware watchdog timer that is enabled by default. The Configuring the cluster descriptions suggests as a fallback to use softdog
as a software watchdog timer. Use the more reliable IBM Power10 hardware watchdog timer instead.
Testing SAP HANA System Replication cluster
It is vital to thoroughly test the cluster configuration to make sure that the cluster is working correctly. The following information provides a few sample failover test scenarios. It's not a complete list of test scenarios.
For example, the description of each test case includes the following information.
- Component that is being tested
- Description of the test
- Prerequisites and the cluster state before you start the failover test
- Test procedure
- Expected behavior and results
- Recovery procedure
Test 1 - Testing a failure of the primary database instance
Use the following information to test the failure of the primary database instance.
Test 1 - Description
Simulate a crash of the primary HANA database instance that is running on NODE1.
Test 1 - Prerequisites
- A functional two-node SLES HA Extension cluster for HANA system replication.
- Both cluster nodes are active.
- Cluster that is started on NODE1 and NODE2.
- Cluster Resource
SAPHana_${SID}_${INSTNO}
that is configured withAUTOMATED_REGISTER=false
. - Check SAP HANA System Replication status:
- Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- HANA System Replication is activated and in sync
A variation of Test 1 is described in Test cases for semi-automation.
Test 1 procedure
Use the following command to run Test 1.
Crash SAP HANA primary by sending a SIGKILL signal as the user ${sid}adm
.
On NODE1, run the following command.
sudo -i -u ${sid}adm -- HDB kill-9
Test 1 - Expected behavior
You can expect the following behavior from the test.
- SAP HANA primary instance on NODE1 crashes.
- The cluster detects the stopped primary HANA database and marks the resource as
failed
. - The cluster promotes the secondary HANA database on NODE2 to take over as the new primary.
- The cluster releases the virtual IP address on NODE1, and acquires it on the new primary on NODE2.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
Test 1 - Recovery procedure
As the cluster resource SAPHana_${SID}_${INSTNO}
is configured with AUTOMATED_REGISTER=false
, the cluster doesn't restart the failed HANA database and doesn't register it against the new primary. Which means that
the status on the new primary (NODE2) also shows the secondary in status 'CONNECTION TIMEOUT'.
To reregister the previous primary as a new secondary use the following commands.
On NODE1, run the following command.
sudo -i -u ${sid}adm -- <<EOT
hdbnsutil -sr_register \
--name=${DC1} \
--remoteHost=${NODE2} \
--remoteInstance=00 \
--replicationMode=sync \
--operationMode=logreplay \
--online
EOT
Verify the system replication status by using the following command.
sudo -i -u ${sid}adm -- <<EOT
hdbnsutil -sr_state
HDBSettings.sh systemReplicationStatus.py
EOT
After the manual register and resource refreshes, the new secondary instance restarts and shows a synced (SOK
) status.
On NODE1, run the following command.
crm resource refresh SAPHana_${SID}_${INSTNO}
Test 2 - Testing a failure of the node that is running the primary database
Use the following information to test the failure of the node that is running the primary database.
Test 2 - Description
Simulate a crash of the node that is running the primary HANA database.
Test 2 - Prerequisites
See the following prerequisites before you perform Test 2.
- You need a functional two-node SLES HA Extension cluster for HANA system replication.
- Make sure that both nodes are active.
- Confirm that the cluster is started on NODE1 and NODE2.
- Check SAP HANA System Replication status.
- Primary SAP HANA database is running on NODE2.
- Secondary SAP HANA database is running on NODE1.
- HANA System Replication is activated and in sync.
Test 2 - Preparation
Make sure that the cluster resource SAPHana_${SID}_${INSTNO}
is configured with AUTOMATED_REGISTER=true
.
On NODE1, run the following commands.
crm resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true
crm resource config SAPHana_${SID}_${INSTNO}
Test 2 - Test procedure
Crash primary on NODE2 by sending a crash system request.
On NODE2, run the following command.
sync; echo c > /proc/sysrq-trigger
Test 2 - Expected behavior
You can expect the following behavior from the test.
- NODE2 shuts down.
- The cluster detects the failed node and sets its state to
OFFLINE
. - The cluster promotes the secondary HANA database on NODE1 to take over as the new primary.
- The cluster acquires the virtual IP address on NODE1 on the new primary.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
Test 2 - Recovery procedure
Use the following information to recover from Test 2.
-
Log in to the IBM Cloud® Console and start the NODE2 instance.
-
Wait until NODE2 is available again, then restart the cluster framework.
- On NODE2, run the following commands.
crm cluster start
crm status --full
As the cluster resource SAPHana_${SID}_${INSTNO}
is configured with AUTOMATED_REGISTER=true
, SAP HANA restarts when NODE2 rejoins the cluster and the former primary reregisters as a secondary.
Test 3 - Testing a failure of the secondary database instance
Use the following information to test the failure of the secondary database instance.
Test 3 - Description
Simulate a crash of the secondary HANA database.
Test 3 - Prerequisites
See the following prerequisites before you perform Test 3.
- A functional two-node SLES HA Extension cluster for HANA system replication.
- Both nodes are active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource
SAPHana_${SID}_${INSTNO}
is configured withAUTOMATED_REGISTER=true
. - Check SAP HANA System Replication status.
- Primary SAP HANA database is running on NODE1.
- Secondary SAP HANA database is running on NODE2.
- HANA System Replication is activated and in sync.
Test 3 - Test Procedure
Crash SAP HANA secondary by sending a SIGKILL signal as the user ${sid}adm
.
On NODE2, run the following command.
sudo -i -u ${sid}adm -- HDB kill-9
Test 3 - Expected behavior
You can expect the following behavior from the test.
- SAP HANA secondary on NODE2 crashes.
- The cluster detects the stopped secondary HANA database and marks the resource as
failed
. - The cluster restarts the secondary HANA database.
- The cluster detects that the system replication is in sync again.
Test 3 - Recovery procedure
Use the following information to recover from Test 2.
-
Wait until the secondary HANA instance starts and syncs again (
SOK
), then cleanup the failed resource actions as shown incrm status
. -
On NODE2, run the following command.
crm resource refresh SAPHana_${SID}_${INSTNO}
crm status --full
Test 4 - Testing a manual move of an SAP Hana resource to another node
Use the following information to test the manual move of an SAP Hana resource to another node.
Test 4 - Description
Use cluster commands to move the primary instance to the other node for maintenance purposes.
Test 4 - Prerequisites
See the following prerequisites before you perform Test 4.
- A functional two-node SLES HA Extension cluster for HANA system replication.
- Both nodes are active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource
SAPHana_${SID}_${INSTNO}
is configured withAUTOMATED_REGISTER=true
. - Check SAP HANA System Replication status:
- Primary SAP HANA database is running on NODE1
- Secondary SAP HANA database is running on NODE2
- HANA System Replication is activated and in sync
Test 4 - Test procedure
Move SAP HANA primary to other node by using the crm resource move
command.
On NODE1, run the following command.
crm resource move SAPHana_${SID}_${INSTNO}-clone
Test 4 - Expected behavior
You can expect the following behavior from the test.
- The cluster creates location constraints to move the resource.
- The cluster triggers a takeover to the secondary HANA database.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
Test 4 - Recovery procedure
Use the following information to recover from Test 2.
The automatically created location constraints must be removed to allow automatic failover in the future.
Wait until the primary HANA instance is active and remove the constraints.
The cluster registers and starts the HANA database as a new secondary instance.
On NODE1, run the following command.
crm constraint
crm resource clear SAPHana_${SID}_${INSTNO}-clone
crm constraint
crm status --full