Configuring SAP HANA Cost-Optimized Scale-Up System Replication in a RHEL HA Add-On Cluster
The following information describes the configuration of a Red Hat Enterprise Linux 8 (RHEL) HA Add-On cluster for managing SAP HANA® Cost-Optimized Scale-Up System Replication. The cluster uses virtual server instances in IBM® Power® Virtual Server as cluster nodes.
In a cost-optimized configuration, a nonproduction SAP HANA system runs on the secondary node during normal operation. The hardware resources on the secondary node are shared between the nonproduction system and the SAP HANA System Replication secondary. The memory usage of the production System Replication secondary is reduced by disabling the preload of column table data.
If a failover occurs, the nonproduction instance is stopped automatically before the node takes over the production workload. The takeover time is longer compared to a performance optimized configuration.
This information is intended for architects and specialists that are planning a high-availability deployment of SAP HANA on Power Virtual Server.
Before you begin
Review the general requirements, product documentation, support articles, and SAP notes listed in Implementing High Availability for SAP Applications on IBM Power Virtual Server References.
Prerequisites
- A Red Hat High Availability cluster is deployed on two virtual server instances in Power Virtual Server.
- Install and set up the RHEL HA Add-On cluster according to Implementing a RHEL HA Add-On Cluster on IBM Power Virtual Server.
- Configure and verify fencing as described in the preceding document.
- The virtual server instances need to fulfill hardware and resource requirements for the SAP HANA systems in scope. Follow the guidelines in the Planning your deployment document.
- The hostnames of the virtual server instances must meet the SAP HANA requirement.
- SAP HANA is installed on both virtual server instances and SAP HANA System Replication is configured. The installation of SAP HANA and setup of HANA System Replication is not specific to the Power Virtual Server environment, and you need to follow the standard procedures.
- A nonproduction SAP HANA System is installed on NODE2 with a different
SID
andInstance Number
than the production system. The nonproduction system needs its own dedicated storage volumes and file systems. Restrict the Global Memory Allocation Limit for the nonproduction system to ensure sufficient memory for the HANA system replication workload on the secondary. The limit is set with theglobal_allocation_limit
parameter in the[memorymanager]
section of theglobal.ini
configuration file. - Optional, a virtual IP address is reserved for the nonproduction system as described in Reserving virtual IP addresses.
Setting up the cost optimized scenario
The cost optimized scenario is an extension of the setup that is described in Configuring SAP HANA Scale-Up System Replication in a RHEL HA Add-On Cluster. Complete the setup for the production system System Replication cluster before you continue with the following steps.
Preparing environment variables
To simplify the setup, prepare the following environment variables for user ID root
on NODE2. These environment variables are used with later operating system commands in this information.
On NODE2, set the following environment variables.
# General settings
export SID_NP=<SID> # SAP HANA System ID of non-production system (uppercase)
export sid_np=<sid> # SAP HANA System ID of non-production system (lowercase)
export INSTNO_NP=<INSTNO> # SAP HANA Instance Number of non-production system
# Cluster nodes
export NODE1=<Hostname 1> # Hostname of virtual server instance 1 (production primary)
export NODE2=<Hostname 2> # Hostname of virtual server instance 2 (non-production, production secondary)
# Optional virtual IP address
export VIP_NP=<IP address> # Virtual IP address for the non-production system
Configuring the SAP HANA HA/DR provider hook
The SAP HANA nameserver provides a Python-based API that is called at important points during the HANA System Replication takeover process. These API calls are used to run customer-specific operations (Implementing a HA/DR Provider).
In the cost-optimized scenario, the SAP HANA HA/DR provider hook is used to automatically reconfigure the SAP HANA instance during the takeover event.
The following section shows a sample set up of a hook script for a cost-optimized SAP HANA System Replication environment. When you implement automation of the cost-optimized SAP HANA System Replication HA environment in the cluster, the takeover hook script must be thoroughly tested. Run the tests manually. Shut down the nonproduction SAP HANA instance on the secondary node, perform a takeover, and verify that the hook script correctly reconfigures the primary HANA DB.
Creating a database user in the SAP HANA production database
Use the following steps to create a database user in the SAP HANA production database.
-
Create a database user in the SystemDB of the SAP HANA production system, or provide credentials of an existing user. The hook script uses this database user to connect to the production database and alter the configuration parameters.
Log in to the SystemDB of the primary instance by using the SAP HANA database interactive terminal hdbsql or SAP HANA Cockpit, and create a new user.
For example, connect to the database by using hdbsql in a terminal session.
sudo -i -u ${sid}adm -- hdbsql -i ${INSTNO} -d SYSTEMDB -u SYSTEM
Create a user.
CREATE USER HA_HOOK PASSWORD <Password> no force_first_password_change;
Grant the required privileges to the user.
Grant privilege INIFILE ADMIN to allow for changes of profile parameters.
GRANT INIFILE ADMIN TO HA_HOOK;
Verify the HA_HOOK user.
sudo -i -u ${sid}adm -- hdbsql -d SYSTEMDB -u SYSTEM select \* from users where user_name = \'HA_HOOK\';
-
Add the user credentials to the secure user store hdbuserstore.
On both nodes, run the following command. Use the password that you set in the previous step.
sudo -i -u ${sid}adm -- hdbuserstore SET HA_HOOK_KEY localhost:3${INSTNO}13 HA_HOOK <Password>
Check the update to the hdbuserstore.
sudo -i -u ${sid}adm -- hdbuserstore list
On the primary instance, test the connection with the stored user key.
sudo -i -u ${sid}adm -- hdbsql -U HA_HOOK_KEY select \* from m_inifiles;
Creating the hook script
Python sample files for creating hook scripts are delivered as part of the SAP HANA installation. The samples are located in directory $DIR_INSTANCE/exe/python_support/hdb_ha_dr
.
The target directory /hana/shared/myHooks
was already created for hook SAPHanaSR.py
. Create a HA/DR provider hook in /hana/shared/myHooks
. The following hook script is based on the HADRdummy.py
sample.
On NODE2, edit the file /hana/shared/myHooks/SAPHanaCostOptSR.py
and add the following content.
"""
Sample for a HA/DR hook provider.
When using your own code in here, please copy this file to location on /hana/shared outside the HANA installation.
This file will be overwritten with each hdbupd call! To configure your own changed version of this file, please add
to your global.ini lines similar to this:
[ha_dr_provider_<className>]
provider = <className>
path = /hana/shared/haHook
execution_order = 1
For all hooks, 0 must be returned in case of success.
"""
from __future__ import absolute_import
from hdb_ha_dr.client import HADRBase, Helper
from hdbcli import dbapi
import os, time
class SAPHanaCostOptSR(HADRBase):
def __init__(self, *args, **kwargs):
# delegate construction to base class
super(SAPHanaCostOptSR, self).__init__(*args, **kwargs)
def about(self):
return {"provider_company" : "SAP",
"provider_name" : "SAPHanaCostOptSR", # provider name = class name
"provider_description" : "Handle reconfiguration event for cost-optimized system replication",
"provider_version" : "1.0"}
def postTakeover(self, rc, **kwargs):
"""Post takeover hook."""
# prepared SQL statements to remove memory allocation limit and pre-load of column tables
stmnt1 = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('memorymanager','global_allocation_limit') WITH RECONFIGURE"
stmnt2 = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('system_replication','preload_column_tables') WITH RECONFIGURE"
myPort = int('3' + os.environ.get('DIR_INSTANCE')[-2:] + '15')
myKey = self.config.get("userkey") if self.config.hasKey("userkey") else "HA_HOOK_KEY"
self.tracer.info("%s.postTakeover method called with rc=%s" % (self.__class__.__name__, rc))
self.tracer.info("%s.postTakeover method: userkey: %s, port: %s" % (self.__class__.__name__, myKey, myPort))
if rc in (0, 1):
# rc == 0: normal takeover succeeded
# rc == 1: waiting for force takeover
conn = dbapi.connect(userkey=myKey, address='localhost', port=myPort)
self.tracer.info("%s: Connect using userkey %s - %s" % (self.__class__.__name__, myKey, conn.isconnected()))
cursor = conn.cursor()
rc1 = cursor.execute(stmnt1)
self.tracer.info("%s: (%s) - %s" % (self.__class__.__name__, stmnt1, rc1))
rc2 = cursor.execute(stmnt2)
self.tracer.info("%s: (%s) - %s" % (self.__class__.__name__, stmnt2, rc2))
return 0
elif rc == 2:
# rc == 2: error, something went wrong
return 0
Activating the cost optimized hook
Use the following steps to activate the cost optimized hook.
-
Stop the cluster.
On any cluster node, run the following command.
pcs cluster stop --all
-
Set the file ownership of the hook script.
On NODE2, run the following command.
chown -R ${sid}adm:sapsys /hana/shared/myHooks
-
Update the
global.ini
configuration file on NODE2 to enable the hook script.On NODE2, run the following command to add the required parameters to the
global.ini
file.sudo -i -u ${sid}adm -- <<EOT python \$DIR_INSTANCE/exe/python_support/setParameter.py \ -set SYSTEM/global.ini/ha_dr_provider_SAPHanaCostOptSR/provider=SAPHanaCostOptSR \ -set SYSTEM/global.ini/ha_dr_provider_SAPHanaCostOptSR/path=/hana/shared/myHooks \ -set SYSTEM/global.ini/ha_dr_provider_SAPHanaCostOptSR/userkey=HA_HOOK_KEY \ -set SYSTEM/global.ini/ha_dr_provider_SAPHanaCostOptSR/execution_order=2 \ -set SYSTEM/global.ini/trace/ha_dr_saphanacostoptsr=info EOT
-
Check the content of the
global.ini
file.cat /hana/shared/${SID}/global/hdb/custom/config/global.ini
-
Verify that the hook functions.
- Restart the HANA instance on NODE2 and verify that the hook script works as expected.
- Trigger the hook with an SAP HANA takeover operation.
- Check whether the hook logged anything in the trace files.
sudo -i -u ${sid}adm -- \ sh -c 'grep SAPHanaCostOptSR $DIR_INSTANCE/$VTHOSTNAME/trace/nameserver_*.trc'
After you verify that the hook functions, you can restart the HA cluster.
-
Start the HA cluster.
On any cluster node, run the following command.
pcs cluster start --all
Check the status of the cluster.
pcs status --full
Defining limits for SAP HANA resource usage on the secondary node
All SAP HANA systems that are running on NODE2 share the avaible memory of the node. Memory configuration of the secondary system SAP HANA ${SID} must be limited to the amount required for system replication so that the nonproduction systems can use the remaining memory.
SAP documentation Secondary System Usage describes the different scenarios and provideds parameter recommendations.
The preload of column tables on the secondary system is disabled to restrict its memory consumption by setting the database configuration parameter preload_column_tables = false
. This parameter is found in the [system_replication]
section of the instance configuration file for SAP HANA production system on NODE2.
The global_allocation_limit
is set in the [memorymanager]
section to limit memory allocation for the SAP HANA production system and the nonproduction system that is running on NODE2.
On NODE2, define an environment variable with the wanted memory limit for the secondary HANA production instance.
export GLOBAL_ALLOCATION_LIMIT=<memory_size_in_mb_for_hana_secondary>
Then, run the following command to update the global.ini
configuration file.
sudo -i -u ${sid}adm -- <<EOT
python \$DIR_INSTANCE/exe/python_support/setParameter.py \
-set SYSTEM/global.ini/system_replication/preload_column_tables=false \
-set SYSTEM/global.ini/memorymanager/global_allocation_limit=$GLOBAL_ALLOCATION_LIMIT
EOT
Verify the configuration file.
cat /hana/shared/${SID}/global/hdb/custom/config/global.ini
You cannot use hdbsql
and ALTER SYSTEM ALTER CONFIGURATION ...
statements on the secondary, no SQL connect is possible in this state. Activate the change by using the hdbnsutil –reconfig
command.
sudo -i -u ${sid}adm -- hdbnsutil -reconfig
Update the global.ini
configuration file of the nonproduction instance to allow for the memory resource usage of the secondary.
On NODE2, define an environment variable with the wanted memory limit for the nonproduction HANA instance.
export NON_PROD_GLOBAL_ALLOCATION_LIMIT=<memory_size_in_mb_for_non_prod_hana>
Then, run the following command to update the global.ini
configuration file.
sudo -i -u ${sid_np}adm -- <<EOT
python \$DIR_INSTANCE/exe/python_support/setParameter.py \
-set SYSTEM/global.ini/memorymanager/global_allocation_limit=$NON_PROD_GLOBAL_ALLOCATION_LIMIT \
-reconfigure
EOT
Verify the configuration file.
cat /hana/shared/${SID_NP}/global/hdb/custom/config/global.ini
Run the following command to check the current database memory limit.
sudo -i -u ${sid_np}adm -- hdbcons "mm globallimit" | grep limit
Configuring cluster resources for the nonproduction instance
Use the following information to configure cluster resources for the nonproduction instance.
Installing the SAPInstance resource agent
The resource-agents-sap
package includes the SAPInstance cluster resource agent, which is used to manage the additional nonproduction SAP HANA instance.
On NODE2, run the following command to install the resource agent.
dnf install -y resource-agents-sap
If needed, use subscription-manager
to enable the SAP NetWeaver repository.
subscription-manager repos --enable="rhel-8-for-ppc64le-sap-netweaver-e4s-rpms"
Creating the cluster resource for managing the nonproduction instance
On NODE2, run the following command.
pcs resource create SAPHana_np_${SID_NP}_HDB${INSTNO_NP} SAPInstance \
InstanceName="${SID_NP}_HDB${INSTNO_NP}_${NODE2}" \
MONITOR_SERVICES="hdbindexserver|hdbnameserver" \
START_PROFILE="/usr/sap/${SID_NP}/SYS/profile/${SID_NP}_HDB${INSTNO_NP}_${NODE2}" \
op start timeout=600 op stop timeout=600 op monitor interval=60 timeout=600 \
--group group_${sid_np}_non_prod
If you want to assign a virtual IP address to the nonproduction instance, add a IPaddr2 cluster resource.
pcs resource create vip_np IPaddr2 \
ip="${VIP_NP}" \
--group group_${sid_np}_non_prod
Create a cluster constraint to prevent that the nonproduction system starts on NODE1.
pcs constraint location add loc-${sid_np}-avoid-${NODE1} \
group_${sid_np}_non_prod ${NODE1} -INFINITY resource-discovery=never
When the production system assumes the PRIMARY role on NODE2 if a takeover occurs, the nonproduction system stops and its memory resources are released. The following cluster constraints make sure that the primary production instance and the nonproduction instance never run together on one node, and that the nonproduction instance stops before the production instance is promoted.
pcs constraint colocation add group_${sid_np}_non_prod with master SAPHana_${SID}_${INSTNO}-clone score=-INFINITY
pcs constraint order stop group_${sid_np}_non_prod then promote SAPHana_${SID}_${INSTNO}-clone
The cluster configuration is complete.
Run the following command to check the status of the defined cluster resources.
pcs status --full
Sample output:
# pcs status --full
Cluster name: SAP_PRD
Cluster Summary:
* Stack: corosync
* Current DC: cl-prd-2 (2) (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
* Last updated: Fri Apr 28 16:38:00 2023
* Last change: Fri Apr 28 16:37:49 2023 by hacluster via crmd on cl-prd-1
* 2 nodes configured
* 8 resource instances configured
Node List:
* Online: [ cl-prd-1 (1) cl-prd-2 (2) ]
Full List of Resources:
* res_fence_ibm_powervs (stonith:fence_ibm_powervs): Started cl-prd-2
* Clone Set: SAPHanaTopology_PRD_00-clone [SAPHanaTopology_PRD_00]:
* SAPHanaTopology_PRD_00 (ocf::heartbeat:SAPHanaTopology): Started cl-prd-2
* SAPHanaTopology_PRD_00 (ocf::heartbeat:SAPHanaTopology): Started cl-prd-1
* Clone Set: SAPHana_PRD_00-clone [SAPHana_PRD_00] (promotable):
* SAPHana_PRD_00 (ocf::heartbeat:SAPHana): Slave cl-prd-2
* SAPHana_PRD_00 (ocf::heartbeat:SAPHana): Master cl-prd-1
* vip_PRD_00 (ocf::heartbeat:IPaddr2): Started cl-prd-1
* Resource Group: group_dev_non_prod:
* vip_np (ocf::heartbeat:IPaddr2): Started cl-prd-2
* SAPHana_np_DEV_HDB10 (ocf::heartbeat:SAPInstance): Started cl-prd-2
Node Attributes:
* Node: cl-prd-1 (1):
* hana_prd_clone_state : PROMOTED
* hana_prd_op_mode : logreplay
* hana_prd_remoteHost : cl-prd-2
* hana_prd_roles : 4:P:master1:master:worker:master
* hana_prd_site : SiteA
* hana_prd_srmode : syncmem
* hana_prd_sync_state : PRIM
* hana_prd_version : 2.00.070.00.1679989823
* hana_prd_vhost : cl-prd-1
* lpa_prd_lpt : 1682692638
* master-SAPHana_PRD_00 : 150
* Node: cl-prd-2 (2):
* hana_prd_clone_state : DEMOTED
* hana_prd_op_mode : logreplay
* hana_prd_remoteHost : cl-prd-1
* hana_prd_roles : 4:S:master1:master:worker:master
* hana_prd_site : SiteB
* hana_prd_srmode : syncmem
* hana_prd_sync_state : SOK
* hana_prd_version : 2.00.070.00.1679989823
* hana_prd_vhost : cl-prd-2
* lpa_prd_lpt : 30
* master-SAPHana_PRD_00 : 100
Migration Summary:
Tickets:
PCSD Status:
cl-prd-1: Online
cl-prd-2: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Run the following command to check the defined constraints.
Sample output:
# pcs constraint --full
Location Constraints:
Resource: group_dev_non_prod
Disabled on:
Node: cl-prd-1 (score:-INFINITY) (resource-discovery=never) (id:loc-dev-avoid-cl-prd-1)
Ordering Constraints:
start SAPHanaTopology_PRD_00-clone then start SAPHana_PRD_00-clone (kind:Mandatory) (non-symmetrical) (id:order-SAPHanaTopology_PRD_00-clone-SAPHana_PRD_00-clone-mandatory)
stop group_dev_non_prod then promote SAPHana_PRD_00-clone (kind:Mandatory) (id:order-group_dev_non_prod-SAPHana_PRD_00-clone-mandatory)
Colocation Constraints:
vip_PRD_00 with SAPHana_PRD_00-clone (score:2000) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-vip_PRD_00-SAPHana_PRD_00-clone-2000)
group_dev_non_prod with SAPHana_PRD_00-clone (score:-INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-group_dev_non_prod-SAPHana_PRD_00-clone-INFINITY)
Ticket Constraints:
Enabling the automated registration of the secondary instance
You need to set the parameter AUTOMATED_REGISTER
according to your operational requirements. If you want to keep the ability to revert to the state of the previous primary SAP HANA instance, then AUTOMATED_REGISTER=false
avoids an automatic registration of the previous primary as a new secondary.
If you experience an issue with the data after a takeover that was triggered by the cluster, you can manually revert if AUTOMATED_REGISTER
is set to false
.
If AUTOMATED_REGISTER
is set to true
, the previous primary SAP HANA instance automatically registers as secondary, and cannot be activated on its previous history. The advantage of AUTOMATED_REGISTER=true
is that high-availability is automatically reestablished after the failed node reappears in the cluster.
For now, it is recommended to keep AUTOMATED_REGISTER
on default value false
until the cluster is fully tested and that you verify that the failover scenarios work as expected.
The pcs resource update
command is used to modify resource attributes and pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true
sets the attribute to true
.
Testing the SAP HANA System Replication cluster
It is vital to thoroughly test the cluster configuration to make sure that the cluster is working correctly. The following information provides a few sample failover test scenarios, but is not a complete list of test scenarios.
For example, the description of each test case includes the following information.
- Which component is being tested
- Description of the test
- Prerequisites and the initial state before you start the failover test
- Test procedure
- Expected behavior and results
- Recovery procedure
Test1 - Testing the failure of the primary database instance
Use the following information to test the failure of the primary database instance.
Test1 - Description
Simulate a crash of the primary HANA database instance that is running on NODE1.
Test1 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both cluster nodes are active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource
SAPHana_${SID}_${INSTNO}
is configured withAUTOMATED_REGISTER=false
. - Check SAP HANA System Replication status:
- The primary SAP HANA database is running on NODE1.
- The secondary SAP HANA database is running on NODE2.
- HANA System Replication is activated and in sync.
- The secondary SAP HANA database on NODE2 is running with reduced memory configuration.
- The
global_allocation_limit
is reduced. - Preload of column tables is disabled (
preload_column_tables = false
).
- The
- A nonproduction SAP HANA system
${SID_NP}
is running on NODE2.
Test1 - Test procedure
Crash SAP HANA primary by sending a SIGKILL signal as user ${sid}adm
.
On NODE1, run the following command.
sudo -i -u ${sid}adm -- HDB kill-9
Test1 - Expected behavior
- SAP HANA primary instance on NODE1 crashes.
- The cluster detects the stopped primary HANA database and marks the resource as
failed
. - The cluster promotes the secondary HANA database on NODE2 to take over as new primary.
- The cluster stops the nonproduction database
${SID_NP}
on NODE2. - During activation, the
global_allocation_limit
andpreload_column_tables
parameters are reset to default.
- The cluster stops the nonproduction database
- The cluster releases the virtual IP address on NODE1, and acquires it on the new primary on NODE2.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
On NODE2, run the following commands to check that the global_allocation_limit
and preload_column_tables
are unset.
sudo -i -u ${sid}adm -- hdbcons "mm globallimit" | grep limit
grep -E "global_allocation_limit|preload_column_tables" \
/hana/shared/${SID}/global/hdb/custom/config/global.ini
Test1 - Recovery procedure
As the cluster resource SAPHana_${SID}_${INSTNO}
is configured with AUTOMATED_REGISTER=false
, the cluster doesn't restart the failed HANA database, and doesn't register it against the new primary. The status on
the new primary (NODE2) shows the secondary in status 'CONNECTION TIMEOUT'.
On NODE1, run the following commands to register the previous primary as new secondary.
sudo -i -u ${sid}adm -- <<EOT
hdbnsutil -sr_register \
--name=${DC1} \
--remoteHost=${NODE2} \
--remoteInstance=${INSTNO} \
--replicationMode=sync \
--operationMode=logreplay \
--online
EOT
Verify the system replication status.
sudo -i -u ${sid}adm -- <<EOT
hdbnsutil -sr_state
HDBSettings.sh systemReplicationStatus.py
EOT
On NODE1, run the following command to start the cluster node.
pcs cluster start
The new secondary instance restarts and shows up in status synced (SOK
).
pcs status --full
Configure cluster resource SAPHana_${SID}_${INSTNO}
with AUTOMATED_REGISTER=true
.
On NODE1, run the following command.
pcs resource update SAPHana_${SID}_${INSTNO} AUTOMATED_REGISTER=true
pcs resource config SAPHana_${SID}_${INSTNO}
Test2 - Testing the manual move of SAPHana resource to another node
Use the following information to test the manual move of SAPHana resource to another node.
Test2 - Description
Use cluster commands to move the primary instance back to the other node.
Test2 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both cluster nodes are active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource
SAPHana_${SID}_${INSTNO}
is configured withAUTOMATED_REGISTER=true
. - Check SAP HANA System Replication status:
- The primary SAP HANA database is running on NODE2.
- The secondary SAP HANA database is running on NODE1.
- HANA System Replication is activated and in sync.
- The nonproduction SAP HANA system
${SID_NP}
is stopped on NODE2.
Test2 - Test Preparation
Unmanage the cluster resource for the nonproduction SAP HANA system to prevent that it starts when the memory resources of the secondary are not restricted.
On NODE1, run the following command.
pcs resource unmanage group_${sid_np}_non_prod
Test2 - Test Procedure
On NODE1, run the following command to move the SAP HANA primary back to NODE1.
pcs resource move SAPHana_${SID}_${INSTNO}-clone
Test2 - Expected behavior
- The cluster creates a location constraint to move the resource.
- The cluster triggers a takeover to the secondary HANA database on NODE1.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
- The resource of the nonproduction SAP HANA system
${SID_NP}
is in the unmanaged state and isn't started automatically.
Test2 - Recovery procedure
Several steps need to be followed to reestablish the complete HA scenario.
-
Wait until the primary HANA instance is active. Then, reduce the memory footprint of the secondary.
On NODE2, run the following commands to reduce the memory.
export GLOBAL_ALLOCATION_LIMIT=<size_in_mb_for_hana_secondary>
sudo -i -u ${sid}adm -- <<EOT python \$DIR_INSTANCE/exe/python_support/setParameter.py \ -set SYSTEM/global.ini/system_replication/preload_column_tables=false \ -set SYSTEM/global.ini/memorymanager/global_allocation_limit=$GLOBAL_ALLOCATION_LIMIT EOT
-
Remove the location constraint, which triggers the start of the secondary instance.
pcs resource clear SAPHana_${SID}_${INSTNO}-clone
Verify that the constraint is cleared.
pcs constraint
Check the cluster status.
pcs status --full
-
On NODE2, run the following commands to check that the
global_allocation_limit
andpreload_column_tables
are set.sudo -i -u ${sid}adm -- hdbcons "mm globallimit" | grep limit
grep -E "global_allocation_limit|preload_column_tables" \ /hana/shared/${SID}/global/hdb/custom/config/global.ini
-
Reactivate the resource for the nonproduction SAP HANA system.
On NODE2, run the following command.
pcs resource manage group_${sid_np}_non_prod
The resource of the nonproduction SAP HANA system
${SID_NP}
is managed and the nonproduction system starts on NODE2.pcs status --full
Test3 - Testing failure of node that is running the primary database
Use the following information to test the failure of node that is running the primary database.
Test3 - Description
Simulate a crash of the node that is running the primary HANA database.
Test3 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both cluster nodes are active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource
SAPHana_${SID}_${INSTNO}
is configured withAUTOMATED_REGISTER=true
. - Check SAP HANA System Replication status:
- The primary SAP HANA database is running on NODE1.
- The secondary SAP HANA database is running on NODE2.
- HANA System Replication is activated and in sync.
- The secondary SAP HANA database on NODE2 is running with reduced memory configuration.
- The
global_allocation_limit
is reduced. - Preload of column tables is disabled (
preload_column_tables = false
).
- The
- A nonproduction SAP HANA system
${SID_NP}
is running on NODE2.
Test3 - Test procedure
Crash primary on NODE1 by sending a shutoff system request.
On NODE1, run the following command.
sync; echo o > /proc/sysrq-trigger
Test3 - Expected behavior
- NODE1 shuts down.
- The cluster detects the failed node and sets its state to
OFFLINE
. - The cluster promotes the secondary HANA database on NODE2 to take over as new primary.
- The cluster stops the nonproduction database
${SID_NP}
on NODE2. - During activation, the
global_allocation_limit
andpreload_column_tables
parameters of SAP HANA${SID}
are reset.
- The cluster stops the nonproduction database
- The cluster acquires the virtual IP address on NODE2 on the new primary.
- If an application, such as SAP NetWeaver, is connected to a tenant database of SAP HANA, the application automatically reconnects to the new primary.
Test3 - Recovery procedure
Log in to the IBM Cloud® console and start NODE1. Wait until NODE1 is available again, then restart the cluster framework.
On NODE1, run the following command.
pcs cluster start
pcs status --full
As cluster resource SAPHana_${SID}_${INSTNO}
is configured with AUTOMATED_REGISTER=true
, SAP HANA restarts when NODE1 joins the cluster and the former primary is registered as secondary.
Then, rerun the steps in Test2 - Test the manual move of SAPHana resource to another node to revert to the initial situation.
Test4 - Testing failure of the secondary database instance
Use the following information to test the failure of the secondary database instance.
Test4 - Description
Simulate a crash of the secondary HANA database.
Test4 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for HANA system replication.
- Both nodes active.
- Cluster is started on NODE1 and NODE2.
- Cluster Resource
SAPHana_${SID}_${INSTNO}
is configured withAUTOMATED_REGISTER=true
. - Check SAP HANA System Replication status:
- The primary SAP HANA database is running on NODE1.
- The secondary SAP HANA database is running on NODE2.
- HANA System Replication is activated and sync.
Test4 - Test Procedure
Crash SAP HANA secondary by sending a SIGKILL signal as user ${sid}adm
.
On NODE2, run the following command.
sudo -i -u ${sid}adm -- HDB kill-9
Test4 - Expected behavior
- SAP HANA secondary on NODE2 crashes.
- The cluster detects the stopped secondary HANA database and marks the resource as
failed
. - The cluster restarts the secondary HANA database.
- The cluster detects that the system replication is in sync again.
Test4 - Recovery Procedure
Wait until the secondary HANA instance starts and synchronized again (SOK
), then cleanup the failed resource actions as shown in pcs status
.
On NODE2, run the following command.
pcs resource refresh SAPHana_${SID}_${INSTNO}
pcs status --full