Configuring High Availability for SAP S/4HANA (ASCS and ERS) in a RHEL HA Add-On Cluster
The following information describes the configuration of ABAP SAP Central Services (ASCS) and Enqueue Replication Service (ERS) with Red Hat Enterprise Linux (RHEL) in a RHEL HA Add-On cluster. The cluster uses virtual server instances in IBM® Power® Virtual Server as cluster nodes.
The focus of this example configuration is on the second generation of the Standalone Enqueue Server, or ENSA2.
Starting with the release of SAP S/4HANA 1809, ENSA2 is installed by default, and can be configured in a two-node or multi-node cluster. This example uses the ENSA2 setup for a two-node RHEL HA Add-On cluster. If the ASCS service fails in a two-node cluster, it restarts on the node where ERS is running. The lock entries for the SAP application are restored from the copy of the lock table in the ERS. When an administrator activates the failed cluster node, the ERS instance moves to the other node (anti-collocation) to protect the lock table copy.
It is recommended that you install the SAP database instance and other SAP application server instances on virtual server instances outside the two-node cluster for ASCS and ERS.
Before you begin
Review the general requirements, product documentation, support articles, and SAP notes listed in Implementing High Availability for SAP Applications on IBM Power Virtual Server References.
Prerequisites
-
The virtual server instances must meet the hardware and resource requirements of the SAP instances installed on them. Follow the guidelines on instance types, storage, and memory sizing in the Planning your deployment document.
-
This information describes a setup that uses shareable storage volumes accessible on both cluster nodes. Certain file systems are created on shareable storage volumes so that they can be mounted on both cluster nodes. This setup applies to both instance directories.
/usr/sap/<SID>/ASCS<Inst#>
of the ASCS instance./usr/sap/<SID>/ERS<Inst#>
of the ERS instance.
Make sure that the storage volumes that were created for these file systems are attached to both virtual server instances. During SAP instance installation and RHEL HA Add-On cluster configuration, each instance directory must be mounted on its appropriate node. HA-LVM ensures that each of the two instance directories is mounted on only one node at a time.
Different storage setups for the instance directories, such as NFS mounts, are possible. Storage setup steps for file storage or creation of cluster file system resources are not described in this document.
-
The virtual hostname for ASCS instance and ERS instance must meet the requirements as documented in Hostnames of SAP ABAP Platform servers. Make sure that the virtual IP addresses for the SAP instances are assigned to a network adapter and that they can communicate in the network.
-
SAP application server instances require a common shared file system SAPMNT
/sapmnt/<SID>
with read and write access, and other shared file systems such as SAPTRANS/usr/sap/trans
. These file systems are typically provided by an external NFS server. The NFS server must be high-available and must not be installed on virtual servers that are part of the ENSA2 cluster.Configuring an Active-Passive NFS Server in a Red Hat High Availability Cluster describes the implementation of an active-passive NFS server in a RHEL HA Add-On cluster with Red Hat Enterprise Linux 8 by using virtual server instances in Power Virtual Server. The RHEL HA Add-On cluster for the active-passive NFS server must be deployed in a single Power Virtual Server workspace.
-
Ensure that all SAP installation media is available.
Preparing nodes for SAP installation
The following information describes how to prepare the nodes for an SAP installation.
Preparing environment variables
To simplify the setup, prepare the following environment variables for user root
on both cluster nodes. These environment variables are used in subsequent commands in the remainder of the instructions.
On both nodes, create a file with the following environment variables. Next, update these variables according to your configuration.
export SID=<SID> # SAP System ID (uppercase)
export sid=<sid> # SAP System ID (lowercase)
# ASCS instance
export ASCS_INSTNO=<INSTNO> # ASCS instance number
export ASCS_VH=<virtual hostname> # ASCS virtual hostname
export ASCS_IP=<IP address> # ASCS virtual IP address
export ASCS_VG=<vg name> # ASCS volume group name
export ASCS_LV=<lv name> # ASCS logical volume name
# ERS instance
export ERS_INSTNO=<INSTNO> # ERS instance number
export ERS_VH=<virtual hostname> # ERS virtual hostname
export ERS_IP=<IP address> # ERS virtual IP address
export ERS_VG=<vg name> # ERS volume group name
export ERS_LV=<lv name> # ERS logical volume name
It is recommended to use meaningful names for the volume groups and logical volumes that designate their content. For example, include the SID and ascs or ers in the name. Don't use hyphens in the volume group or logical volume names.
- s01ascsvg and s01ascslv
- s01ersvg and s01erslv
You must source this file before you use the sample commands in the remainder of the instructions.
For example, if you created a file that is named sap_envs.sh
, run the following command on both nodes to set the environment variables.
source sap_envs.sh
Every time that you start a new terminal session, you must run the previous source
command. As an alternative, you can move the environment variables file to the /etc/profile.d
directory during the cluster configuration.
In this example, the file is sourced automatically each time you log in to the server.
Assigning virtual IP addresses
Review the information in Reserving virtual IP addresses.
Check whether the virtual IP address for the SAP instance is present. Otherwise, you need to identify the correct network adapter to assign the IP address.
On both nodes, check the list of currently active IP addresses.
ip -o -f inet address show | '/scope global/ {print $2, $4}'
Sample output of the previous command.
# ip -o -f inet address show | awk '/scope global/ {print $2, $4}'
env2 10.51.0.66/24
env3 10.52.0.41/24
env4 10.111.1.28/24
The device name of the network adapter appears in the first column. The second column lists the active IP addresses and the number of bits that are reserved for the netmask - which are separated by a slash.
If the virtual IP address for the SAP instance is not present, make sure that it isn't erroneously set on another virtual server instance.
On NODE1, run the following command.
ping -c 3 ${ASCS_VH}
Sample output:
# ping -c 2 cl-sap-scs
PING cl-sap-scs (10.111.1.248) 56(84) bytes of data.
From cl-sap-1.tst.ibm.com (10.111.1.28) icmp_seq=1 Destination Host Unreachable
From cl-sap-1.tst.ibm.com (10.111.1.28) icmp_seq=2 Destination Host Unreachable
--- cl-sap-ers ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 2112ms
pipe 3
If the ping
output shows Destination Host Unreachable
, the IP address is available, and you can assign the IP alias to the virtual server instance. Use the correct device name env of the network adapter that
matches the subnet of the IP address.
Example command on NODE1:
ip addr add ${ASCS_IP} dev env4
Example command on NODE2:
ip addr add ${ERS_IP} dev env4
According to your specific network configuration, the device name for the network adapter might be different.
The IP address is required for the SAP installation, and is set manually. Later, the virtual IP addresses are controlled by the Red Hat HA Cluster Add-on.
Preparing volume groups, logical volumes, and shared file systems
Shared storage is an important resource in an ENSA2 cluster. ASCS and ERS must be able to run on both nodes, and their runtime environment is stored in the shared storage volumes. All cluster nodes need to access the shared storage volumes, but only one node has exclusive read and write access to a volume.
Preparing High Availability Logical Volume Manager settings
Edit the file /etc/lvm/lvm.conf
to include the system ID in the volume group.
On both nodes, edit the lvm.conf
file.
vi /etc/lvm/lvm.conf
Search for the system_id_source
parameter and change its value to uname
.
Sample setting for the system_id_source
parameter in etc/lvm/lvm.conf
.
system_id_source = "uname"
Identifying World Wide Names of shared storage volumes
Determine the World Wide Name (WWN) for each storage volume that is part of one of the shared volume groups.
-
Log in to IBM Cloud® to the Storage volumes view of Power Virtual Server.
-
Select your workspace.
-
Filter on the volume prefix in the Storage volumes list, and identify all the World Wide Names of the volumes that are in scope for ASCS and ERS instances. The World Wide Name is a 32-digit hexadecimal number.
Make sure that the attribute Shareable is On for those volumes.
In the Virtual server instances view, go to both virtual server instances of the cluster. Verify that all volumes that are in scope for ASCS and ERS are attached to both virtual server instances.
When you attach a new storage volume to a virtual server instance, make sure that you rescan the SCSI bus to detect the new volume. Afterward, update the multipath configuration of the virtual server instance.
On the nodes with new storage volume attachments, run the following command.
rescan-scsi-bus.sh && sleep 10 && multipathd reconfigure
Log in to both cluster nodes, and add the WWN to the environment variables of user root
.
Use the pvs --all
command to determine the appropriate WWN values.
On NODE1, export the ASCS_PVID
environment variable.
export ASCS_PVID=3<WWN> # WWN of shared storage volume for ASCS
On NODE2, export the ERS_PVID
environment variable.
export ERS_PVID=3<WWN> # WWN of shared storage volume for ERS
Make sure that you set the environment variable by using the hexadecimal number with lowercase letters.
Creating physical volumes
On NODE1, run the following command.
pvcreate /dev/mapper/${ASCS_PVID}
Sample output:
# pvcreate /dev/mapper/${ASCS_PVID}
Physical volume "/dev/mapper/360050768108103357000000000002ddc" successfully created.
On NODE2, run the following command.
pvcreate /dev/mapper/${ERS_PVID}
Sample output:
# pvcreate /dev/mapper/${ERS_PVID}
Physical volume "/dev/mapper/360050768108103357000000000002e31" successfully created.
Creating volume groups
Create the volume group for the ASCS.
On NODE1, run the following command.
vgcreate ${ASCS_VG} /dev/mapper/${ASCS_PVID}
Verify that the System ID is set.
vgs -o+systemid
Sample output:
# vgs -o+systemid
VG #PV #LV #SN Attr VSize VFree System ID
s01ascsvg 1 0 0 wz--n- <50.00g <50.00g cl-sap-1
Create the volume group for the ERS.
On NODE2, run the following command.
vgcreate ${ERS_VG} /dev/mapper/${ERS_PVID}
Verify that the System ID is set.
Sample output:
# vgs -o+systemid
VG #PV #LV #SN Attr VSize VFree System ID
s01ersvg 1 0 0 wz--n- <50.00g <50.00g cl-sap-2
Creating logical volumes and file systems
Create the logical volume for the ASCS and format it as an XFS file system.
On NODE1, run the following commands.
lvcreate -l 100%FREE -n ${ASCS_LV} ${ASCS_VG}
mkfs.xfs /dev/${ASCS_VG}/${ASCS_LV}
Create the logical volume for the ERS and format it as an XFS file system.
On NODE2, run the following commands.
lvcreate -l 100%FREE -n ${ERS_LV} ${ERS_VG}
mkfs.xfs /dev/${ERS_VG}/${ERS_LV}
Making sure that a volume group is not activated on multiple cluster nodes
Volume groups that are managed by the cluster must not activate automatically on startup.
For RHEL 8.5 and later, disable autoactivation when creating the volume group by specifying the --setautoactivation n
flag on the vgcreate command.
On both nodes, edit the /etc/lvm/lvm.conf
file and modify the auto_activation_volume_list
entry to limit autoactivation to specific volume groups.
vi /etc/lvm/lvm.conf
Locate the auto_activation_volume_list
parameter and add all volume groups except the one you defined for the NFS cluster to this list.
See an example of how to set the auto_activation_volume_list
entry in /etc/lvm/lvm.conf
:
auto_activation_volume_list = [ "rhel_root" ]
Rebuild the initramfs boot image to make sure that the boot image does not activate a volume group that is controlled by the cluster.
On both nodes, run the following command.
dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
Reboot both nodes.
Mounting the file systems for SAP installation
Activate the volume groups and mount the SAP instance file systems.
On NODE1 (ASCS), run the following commands.
vgchange -a y ${ASCS_VG}
mkdir -p /usr/sap/${SID}/ASCS${ASCS_INSTNO}
mount /dev/${ASCS_VG}/${ASCS_LV} /usr/sap/${SID}/ASCS${ASCS_INSTNO}
On NODE2 (ERS), run the following commands.
vgchange -a y ${ERS_VG}
mkdir -p /usr/sap/${SID}/ERS${ERS_INSTNO}
mount /dev/${ERS_VG}/${ERS_LV} /usr/sap/${SID}/ERS${ERS_INSTNO}
Mounting the required NFS file systems
On both nodes, make sure that the NFS file systems /sapmnt
and /usr/sap/trans
are mounted.
mount | grep nfs
Installing SAP instances
Use the SAP Software Provisioning Manager (SWPM) to install all instances.
-
Install SAP instances on the cluster nodes.
- Install an ASCS instance on NODE1 by using the virtual hostname
${ASCS_VH}
that is associated with the virtual IP address for ASCS:
<swpm>/sapinst SAPINST_USE_HOSTNAME=${ASCS_VH}
- Install an ERS instance on NODE2 by using the virtual hostname
${ERS_VH}
that is associated with the virtual IP address for ERS:
<swpm>/sapinst SAPINST_USE_HOSTNAME=${ERS_VH}
- Install an ASCS instance on NODE1 by using the virtual hostname
-
Install instances outside the cluster.
- DB instance
- PAS instance
- AAS instances
Installing and setting up the RHEL HA Add-On cluster
Install and set up the RHEL HA Add-On cluster according to Implementing a RHEL HA Add-On Cluster on IBM Power Virtual Server.
Configure and test fencing as described in Creating the fencing device.
Preparing ASCS and ERS instances for the cluster integration
Use the following steps to prepare the SAP instances for the cluster integration.
Disabling the automatic start of the SAP instance agents for ASCS and ERS
You must disable the automatic start of the sapstartsrv
instance agents for both ASCS and ERS instances after a reboot.
Verifying the SAP instance agent integration type
Recent versions of the SAP instance agent sapstartsrv
provide native systemd
support on Linux. For more information, refer to the the SAP notes that are listed at SAP Notes.
On both nodes, check the content of the /usr/sap/sapservices
file.
cat /usr/sap/sapservices
In the systemd
format, the lines start with systemctl
entries.
Example:
systemctl --no-ask-password start SAPS01_01 # sapstartsrv pf=/usr/sap/S01/SYS/profile/S01_ASCS01_cl-sap-scs
If the entries for ASCS and ERS are in systemd
format, continue with the steps in Disabling systemd services of the ASCS and the ERS SAP instance.
In the classic
format, the lines start with LD_LIBRARY_PATH
entries.
Example:
LD_LIBRARY_PATH=/usr/sap/S01/ASCS01/exe:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH;/usr/sap/S01/ASCS01/exe/sapstartsrv pf=/usr/sap/S01/SYS/profile/S01_ASCS01_cl-sap-scs -D -u s01adm
If the entries for ASCS and ERS are in classic
format, then modify the /usr/sap/sapservices
file to prevent the automatic start of the sapstartsrv
instance agent for both ASCS and ERS instances after a reboot.
On both nodes, remove or comment out the sapstartsrv
entries for both ASCS and ERS in the SAP services file.
sed -i -e 's/^LD_LIBRARY_PATH=/#LD_LIBRARY_PATH=/ sapservices
Example:
#LD_LIBRARY_PATH=/usr/sap/S01/ASCS01/exe:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH;/usr/sap/S01/ASCS01/exe/sapstartsrv pf=/usr/sap/S01/SYS/profile/S01_ASCS01_cl-sap-scs -D -u s01adm
Now proceed to Creating mount points for the instance file systems on the takeover node.
Disabling systemd services of the ASCS and the ERS SAP instances
On both nodes, disable the instance agent for the ASCS.
systemctl disable --now SAP${SID}_${ASCS_INSTNO}.service
On both nodes, disable the instance agent for the ERS.
systemctl disable --now SAP${SID}_${ERS_INSTNO}.service
Disabling systemd
restart of a crashed ASCS or ERS instance
Systemd
has its own mechanisms for restarting a crashed service. In a high availability setup, only the HA cluster is responsible for managing the SAP ASCS and ERS instances. Create systemd drop-in files
on both
cluster nodes to prevent systemd
from restarting a crashed SAP instance.
On both nodes, create the directories for the drop-in files.
mkdir /etc/systemd/system/SAP${SID}_${ASCS_INSTNO}.service.d
mkdir /etc/systemd/system/SAP${SID}_${ERS_INSTNO}.service.d
On both nodes, create the drop-in files for ASCS and ERS.
cat >> /etc/systemd/system/SAP${SID}_${ASCS_INSTNO}.service.d/HA.conf << EOT
[Service]
Restart=no
EOT
cat >> /etc/systemd/system/SAP${SID}_${ERS_INSTNO}.service.d/HA.conf << EOT
[Service]
Restart=no
EOT
Restart=no
must be in the [Service]
section, and the drop-in files must be available on all cluster nodes.
On both nodes, reload the systemd
unit files.
systemctl daemon-reload
Creating mount points for the instance file systems on the takeover node
Create the mount points for the instance file systems and adjust their ownership.
On NODE1, run the following commands.
mkdir /usr/sap/${SID}/ERS${ERS_INSTNO}
chown ${sid}adm:sapsys /usr/sap/${SID}/ERS${ERS_INSTNO}
On NODE2, run the following commands.
mkdir /usr/sap/${SID}/ASCS${ASCS_INSTNO}
chown ${sid}adm:sapsys /usr/sap/${SID}/ASCS${ASCS_INSTNO}
Installing permanent SAP license keys
When the SAP ASCS instance is installed on a Power Virtual Server instance, the SAP license mechanism relies on the partition UUID. For more information, see SAP note 2879336 - Hardware key based on unique ID.
On both nodes, run the following command as user <sid>adm
to identify the HARDWARE KEY
of the node.
sudo -i -u ${sid}adm -- sh -c 'saplikey -get'
Sample output:
$ sudo -i -u ${sid}adm -- sh -c 'saplikey -get'
saplikey: HARDWARE KEY = H1428224519
Note the HARDWARE KEY
of each node.
You need both hardware keys to request two different SAP license keys. Check the following SAP notes for more information about requesting SAP license keys:
Installing SAP resource agents
Install the required software packages. The resource-agents-sap
includes the SAPInstance cluster resource agent for managing the SAP instances.
Unless sap_cluster_connector
is configured for the SAP instance, the RHEL HA Add-On cluster considers any state change of the instance as an issue. If other SAP tools such as sapcontrol
are used to manage the instance,
then sap_cluster_connector
grants permission to control SAP instances that are running inside the cluster. If the SAP instances are managed by only cluster tools, the implementation of sap_cluster_connector
is not
necessary.
Install the packages for the resource agent and the SAP Cluster Connector library. For more information, see How to enable the SAP HA Interface for SAP ABAP application server instances managed by the RHEL HA Add-On
On both nodes, run the following commands.
If needed, use subscription-manager
to enable the SAP NetWeaver repository. The RHEL for SAP Subscriptions and Repositories documentation describes how to enable the required repositories.
subscription-manager repos --enable="rhel-8-for-ppc64le-sap-netweaver-e4s-rpms"
Install the required packages.
dnf install -y resource-agents-sap sap-cluster-connector
Configuring SAP Cluster Connector
Add user ${sid}adm
to the haclient
group.
On both nodes, run the following command.
usermod -a -G haclient ${sid}adm
Adapting the SAP instance profiles
Modify the start profiles of all SAP instances that are managed by SAP tools outside the cluster. Both ASCS and ERS instances can be controlled by the RHEL HA Add-On cluster and its resource agents. Adjust the SAP instance profiles to prevent an automatic restart of instance processes.
On NODE1, navigate to the SAP profile directory.
cd /sapmnt/${SID}/profile
Change all occurrences of Restart_Program
to Start_Program
in the instance profile of both ASCS and ERS.
sed -i -e 's/Restart_Program_\([0-9][0-9]\)/Start_Program_\1/' ${SID}_ASCS${ASCS_INSTNO}_${ASCS_VH}
sed -i -e 's/Restart_Program_\([0-9][0-9]\)/Start_Program_\1/' ${SID}_ERS${ERS_INSTNO}_${ERS_VH}
Add the following two lines at the end of the SAP instance profile to configure sap_cluster_connector
for the ASCS and ERS instances.
service/halib = $(DIR_EXECUTABLE)/saphascriptco.so
service/halib_cluster_connector = /usr/bin/sap_cluster_connector
Configuring ASCS and ERS cluster resources
Up to this point, the following are assumed:
- A RHEL HA Add-On cluster is running on both virtual server instances and fencing of the nodes was tested.
- The SAP System is running.
- SAP ASCS is installed and active on node 1 of the cluster.
- SAP ERS is installed and active on node 2 of the cluster.
- All steps in Prepare ASCS and ERS instances for the cluster integration are complete.
Configuring ASCS resource group
Create a resource for the virtual IP address of the ASCS.
On NODE1, run the following command.
pcs resource create ${sid}_vip_ascs${ASCS_INSTNO} IPaddr2 \
ip=${ASCS_IP} \
--group ${sid}_ascs${ASCS_INSTNO}_group
In this example of creating resources for an HA-LVM file system on a shared storage volume, you create resources for LVM-activate and for the instance file system of the ASCS.
pcs resource create ${sid}_fs_ascs${ASCS_INSTNO}_lvm LVM-activate \
vgname="${ASCS_VG}" \
vg_access_mode=system_id \
--group ${sid}_ascs${ASCS_INSTNO}_group
pcs resource create ${sid}_fs_ascs${ASCS_INSTNO} Filesystem \
device="/dev/mapper/${ASCS_VG}-${ASCS_LV}" \
directory=/usr/sap/${SID}/ASCS${ASCS_INSTNO} \
fstype=xfs \
--group ${sid}_ascs${ASCS_INSTNO}_group
In the alternative example that the instance file system of the ASCS is provided by an HA NFS server, only the file system resource is required. Make sure that you defined the environment variable ${NFS_VH}
according
to your NFS server, and that you created a directory ${SID}/ASCS
under the NFS root directory during the SAP installation of the ASCS instance.
pcs resource create ${sid}_fs_ascs${ASCS_INSTNO} Filesystem \
device="${NFS_VH}:${SID}/ASCS" \
directory=/usr/sap/${SID}/ASCS${ASCS_INSTNO} \
fstype=nfs \
options="${NFS_OPTIONS}" \
force_unmount=safe \
op start interval=0 timeout=60 \
op stop interval=0 timeout=120 \
--group ${sid}_ascs${ASCS_INSTNO}_group
Create a resource for managing the ASCS instance.
pcs resource create ${sid}_ascs${ASCS_INSTNO} SAPInstance \
InstanceName="${SID}_ASCS${ASCS_INSTNO}_${ASCS_VH}" \
START_PROFILE=/sapmnt/${SID}/profile/${SID}_ASCS${ASCS_INSTNO}_${ASCS_VH} \
AUTOMATIC_RECOVER=false \
meta resource-stickiness=5000 \
migration-threshold=1 failure-timeout=60 \
op monitor interval=20 on-fail=restart timeout=60 \
op start interval=0 timeout=600 \
op stop interval=0 timeout=600 \
--group ${sid}_ascs${ASCS_INSTNO}_group
The meta resource-stickiness=5000
option is used to balance the failover constraint with ERS so that the resource stays on the node where it started and doesn't migrate uncontrollably in the cluster.
Add a resource stickiness to the group to make sure that the ASCS remains on the node.
pcs resource meta ${sid}_ascs${ASCS_INSTNO}_group \
resource-stickiness=3000
Configuring the ERS resource group
Create a resource for the virtual IP address of the ERS.
On NODE1, run the following command.
pcs resource create ${sid}_vip_ers${ERS_INSTNO} IPaddr2 \
ip=${ERS_IP} \
--group ${sid}_ers${ERS_INSTNO}_group
In the example of creating resources for an HA-LVM file system on a shared storage volume, you create resources for LVM-activate and for the instance file system of the ERS.
pcs resource create ${sid}_fs_ers${ERS_INSTNO}_lvm LVM-activate \
vgname="${ERS_VG}" \
vg_access_mode=system_id \
--group ${sid}_ers${ERS_INSTNO}_group
pcs resource create ${sid}_fs_ers${ERS_INSTNO} Filesystem \
device="/dev/mapper/${ERS_VG}-${ERS_LV}" \
directory=/usr/sap/${SID}/ERS${ERS_INSTNO} \
fstype=xfs \
--group ${sid}_ers${ERS_INSTNO}_group
In the alternative example that the instance file system of the ERS is provided by an HA NFS server, only the file system resource is required. Make sure that you defined the environment variable ${NFS_VH}
according to
your NFS server, and that you created a directory ${SID}/ERS
under the NFS root directory during the SAP installation of the ERS instance.
pcs resource create ${sid}_fs_ers${ERS_INSTNO} Filesystem \
device="${NFS_VH}:${SID}/ERS" \
directory=/usr/sap/${SID}/ERS${ERS_INSTNO} \
fstype=nfs \
options="${NFS_OPTIONS}" \
force_unmount=safe \
op start interval=0 timeout=60 \
op stop interval=0 timeout=120 \
--group ${sid}_ers${ERS_INSTNO}_group
Create a resource for managing the ERS instance.
pcs resource create ${sid}_ers${ERS_INSTNO} SAPInstance \
InstanceName="${SID}_ERS${ERS_INSTNO}_${ERS_VH}" \
START_PROFILE=/sapmnt/${SID}/profile/${SID}_ERS${ERS_INSTNO}_${ERS_VH} \
AUTOMATIC_RECOVER=false \
IS_ERS=true \
op monitor interval=20 on-fail=restart timeout=60 \
op start interval=0 timeout=600 \
op stop interval=0 timeout=600 \
--group ${sid}_ers${ERS_INSTNO}_group
Configuring cluster resource constraints
A colocation constraint prevents resource groups ${sid}_ascs${ASCS_INSTNO}_group
and ${sid}_ers${ERS_INSTNO}_group
from being active on the same node whenever possible. The stickiness score of -5000
makes
sure that they run on the same node if only a single node is available.
pcs constraint colocation add \
${sid}_ers${ERS_INSTNO}_group with ${sid}_ascs${ASCS_INSTNO}_group -5000
An order constraint controls that resource group ${sid}_ascs${ASCS_INSTNO}_group
starts before ${sid}_ers${ERS_INSTNO}_group
.
pcs constraint order start \
${sid}_ascs${ASCS_INSTNO}_group then stop ${sid}_ers${ERS_INSTNO}_group \
symmetrical=false \
kind=Optional
The following two order constraints make sure that the file system SAPMNT mounts before resource groups ${sid}_ascs${ASCS_INSTNO}_group
and ${sid}_ers${ERS_INSTNO}_group
start.
pcs constraint order fs_sapmnt-clone then ${sid}_ascs${ASCS_INSTNO}_group
pcs constraint order fs_sapmnt-clone then ${sid}_ers${ERS_INSTNO}_group
The cluster setup is complete.
Testing an SAP ENSA2 cluster
It is vital to thoroughly test the cluster configuration to make sure that the cluster is working correctly. The following information provides a few sample failover test scenarios, but is not a complete list of test scenarios.
For example, the description of each test case includes the following information.
- Component under test
- Description of the test
- Prerequisites and the initial state before failover test
- Test procedure
- Expected behavior and results
- Recovery procedure
Test 1 - Testing a failure of the ASCS instance
Test 1 - Description
Simulate a crash of the SAP ASCS instance that is running on NODE1.
Test 1 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for SAP ENSA2.
- Both cluster nodes are active.
- Cluster is started on NODE1 and NODE2.
- Resource group ${sid}_ascs${ASCS_INSTNO}_group is active on NODE1.
- Resources ${sid}_vip_ascs${ASCS_INSTNO}, ${sid}_fs_ascs${ASCS_INSTNO}_lvm, ${sid}_fs_ascs${ASCS_INSTNO} and ${sid}_ascs${ASCS_INSTNO} are
Started
on NODE1. - Resource group ${sid}_ers${ERS_INSTNO}_group is active on NODE2.
- Resources ${sid}_vip_ers${ERS_INSTNO}, ${sid}_fs_ers${ERS_INSTNO}_lvm, ${sid}_fs_ers${ERS_INSTNO} and ${sid}_ers${ERS_INSTNO} are
Started
on NODE2.
- Check SAP instance processes:
- ASCS instance is running on NODE1.
- ERS instance is running on NODE2.
pcs status
Sample output:
# pcs status
Cluster name: SAP_ASCS
Cluster Summary:
* Stack: corosync
* Current DC: cl-sap-1 (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
* Last updated: Tue Feb 14 07:59:16 2023
* Last change: Tue Feb 14 05:02:22 2023 by hacluster via crmd on cl-sap-1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ cl-sap-1 cl-sap-2 ]
Full List of Resources:
* res_fence_ibm_powervs (stonith:fence_ibm_powervs): Started cl-sap-2
* Resource Group: s01_ascs01_group:
* s01_vip_ascs01 (ocf::heartbeat:IPaddr2): Started cl-sap-1
* s01_fs_ascs01_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-1
* s01_fs_ascs01 (ocf::heartbeat:Filesystem): Started cl-sap-1
* s01_ascs01 (ocf::heartbeat:SAPInstance): Started cl-sap-1
* Resource Group: s01_ers02_group:
* s01_vip_ers02 (ocf::heartbeat:IPaddr2): Started cl-sap-2
* s01_fs_ers02_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-2
* s01_fs_ers02 (ocf::heartbeat:Filesystem): Started cl-sap-2
* s01_ers02 (ocf::heartbeat:SAPInstance): Started cl-sap-2
* Clone Set: fs_sapmnt-clone [fs_sapmnt]:
* Started: [ cl-sap-1 cl-sap-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Test 1 - Test Procedure
To crash the SAP ASCS instance, send a SIGKILL signal to the enque server as user ${sid}adm
.
On NODE1, identify the PID of the enque server.
pgrep -af "(en|enq).sap"
Send a SIGKILL signal to the identified process.
Sample output:
# pgrep -af "(en|enq).sap"
30186 en.sapS01_ASCS01 pf=/usr/sap/S01/SYS/profile/S01_ASCS01_cl-sap-scs
# kill -9 30186
Test 1 - Expected behavior
- SAP ASCS instance on NODE1 crashes.
- The cluster detects the crashed ASCS instance.
- The cluster stops the dependent resources on NODE1 (virtual IP address, file system
/usr/sap/${SID}/ASCS${ASCS_INSTNO}
, and the LVM resources), and acquires them on NODE2. - The cluster starts the ASCS on NODE2.
- The cluster stops the ERS instance on NODE2.
- The cluster stops the dependent resources on NODE1 (virtual IP address, file system
/usr/sap/${SID}/ERS${ERS_INSTNO}
, and the LVM resources), and acquires them on NODE1. - The cluster starts the ERS on NODE1.
After a few seconds, check the status with the following command.
pcs status
Sample output:
# pcs status
Cluster name: SAP_ASCS
Cluster Summary:
* Stack: corosync
* Current DC: cl-sap-1 (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
* Last updated: Tue Feb 14 08:10:18 2023
* Last change: Tue Feb 14 05:02:22 2023 by hacluster via crmd on cl-sap-1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ cl-sap-1 cl-sap-2 ]
Full List of Resources:
* res_fence_ibm_powervs (stonith:fence_ibm_powervs): Started cl-sap-2
* Resource Group: s01_ascs01_group:
* s01_vip_ascs01 (ocf::heartbeat:IPaddr2): Started cl-sap-2
* s01_fs_ascs01_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-2
* s01_fs_ascs01 (ocf::heartbeat:Filesystem): Started cl-sap-2
* s01_ascs01 (ocf::heartbeat:SAPInstance): Started cl-sap-2
* Resource Group: s01_ers02_group:
* s01_vip_ers02 (ocf::heartbeat:IPaddr2): Started cl-sap-1
* s01_fs_ers02_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-1
* s01_fs_ers02 (ocf::heartbeat:Filesystem): Started cl-sap-1
* s01_ers02 (ocf::heartbeat:SAPInstance): Started cl-sap-1
* Clone Set: fs_sapmnt-clone [fs_sapmnt]:
* Started: [ cl-sap-1 cl-sap-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Test 2 - Testing a failure of the node that is running the ASCS instance
Use the folling information to test a failure of the node that is running the ASCS instance.
Test 2 - Description
Simulate a crash of the node where the ASCS instance is running.
Test 2 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for SAP ENSA2.
- Both cluster nodes are active.
- Cluster is started on NODE1 and NODE2.
- Resource group ${sid}_ascs${ASCS_INSTNO}_group is active on NODE2.
- Resources ${sid}_vip_ascs${ASCS_INSTNO}, ${sid}_fs_ascs${ASCS_INSTNO}_lvm, ${sid}_fs_ascs${ASCS_INSTNO} and ${sid}_ascs${ASCS_INSTNO} are
Started
on NODE2. - Resource group ${sid}_ers${ERS_INSTNO}_group is active on NODE1.
- Resources ${sid}_vip_ers${ERS_INSTNO}, ${sid}_fs_ers${ERS_INSTNO}_lvm, ${sid}_fs_ers${ERS_INSTNO} and ${sid}_ers${ERS_INSTNO} are
Started
on NODE1.
- Check SAP instance processes:
- ASCS instance is running on NODE2.
- ERS instance is running on NODE1.
Test 2 - Test procedure
Crash NODE2 by sending a fast-restart
system request.
On NODE2, run the following command.
sync; echo b > /proc/sysrq-trigger
Test 2 - Expected behavior
- NODE2 restarts.
- The cluster detects the failed node and sets its state to offline (UNCLEAN).
- The cluster acquires the ASCS resources (virtual IP address, file system
/usr/sap/${SID}/ASCS${ASCS_INSTNO}
, and the LVM items) on NODE1. - The cluster starts the ASCS on NODE1.
- The cluster stops the ERS instance on NODE1.
- The cluster stops the dependent resources on NODE1 (virtual IP address, file system
/usr/sap/${SID}/ERS${ERS_INSTNO}
, and the LVM resources), and releases them.
After a while, check the status with the following command.
The second node is offline and both resource groups are running on the first node.
pcs status
Sample output:
# pcs status
Cluster name: SAP_ASCS
Cluster Summary:
* Stack: corosync
* Current DC: cl-sap-1 (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
* Last updated: Tue Feb 14 08:34:16 2023
* Last change: Tue Feb 14 08:34:04 2023 by hacluster via crmd on cl-sap-1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ cl-sap-1 ]
* OFFLINE: [ cl-sap-2 ]
Full List of Resources:
* res_fence_ibm_powervs (stonith:fence_ibm_powervs): Started cl-sap-1
* Resource Group: s01_ascs01_group:
* s01_vip_ascs01 (ocf::heartbeat:IPaddr2): Started cl-sap-1
* s01_fs_ascs01_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-1
* s01_fs_ascs01 (ocf::heartbeat:Filesystem): Started cl-sap-1
* s01_ascs01 (ocf::heartbeat:SAPInstance): Started cl-sap-1
* Resource Group: s01_ers02_group:
* s01_vip_ers02 (ocf::heartbeat:IPaddr2): Started cl-sap-1
* s01_fs_ers02_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-1
* s01_fs_ers02 (ocf::heartbeat:Filesystem): Started cl-sap-1
* s01_ers02 (ocf::heartbeat:SAPInstance): Started cl-sap-1
* Clone Set: fs_sapmnt-clone [fs_sapmnt]:
* Started: [ cl-sap-1 ]
* Stopped: [ cl-sap-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Test 2 - Recovery procedure
Wait until NODE2 restarts, then restart the cluster framework.
On NODE1, run the following command.
pcs cluster start
- The cluster starts on NODE2 and acquires the ERS resources (virtual IP address, file system
/usr/sap/${SID}/ERS${ERS_INSTNO}
, and the LVM resources) on NODE2. - The cluster starts the ERS instance on NODE2.
Wait a moment and check the status with the following command. The ERS resource group moved to the second node.
pcs status
Sample output:
# pcs status
Cluster name: SAP_ASCS
Cluster Summary:
* Stack: corosync
* Current DC: cl-sap-1 (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
* Last updated: Tue Feb 14 08:41:23 2023
* Last change: Tue Feb 14 08:34:04 2023 by hacluster via crmd on cl-sap-1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ cl-sap-1 cl-sap-2 ]
Full List of Resources:
* res_fence_ibm_powervs (stonith:fence_ibm_powervs): Started cl-sap-1
* Resource Group: s01_ascs01_group:
* s01_vip_ascs01 (ocf::heartbeat:IPaddr2): Started cl-sap-1
* s01_fs_ascs01_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-1
* s01_fs_ascs01 (ocf::heartbeat:Filesystem): Started cl-sap-1
* s01_ascs01 (ocf::heartbeat:SAPInstance): Started cl-sap-1
* Resource Group: s01_ers02_group:
* s01_vip_ers02 (ocf::heartbeat:IPaddr2): Started cl-sap-2
* s01_fs_ers02_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-2
* s01_fs_ers02 (ocf::heartbeat:Filesystem): Started cl-sap-2
* s01_ers02 (ocf::heartbeat:SAPInstance): Started cl-sap-2
* Clone Set: fs_sapmnt-clone [fs_sapmnt]:
* Started: [ cl-sap-1 cl-sap-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Test 3 - Testing a failure of the ERS instance
Use the following information to test the failure of an ERS instance.
Test 3 - Description
Simulate a crash of the ERS instance.
Test 3 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for SAP ENSA2.
- Both cluster nodes are active.
- Cluster starts on NODE1 and NODE2.
- Resource group ${sid}_ascs${ASCS_INSTNO}_group is active on NODE1.
- Resources ${sid}_vip_ascs${ASCS_INSTNO}, ${sid}_fs_ascs${ASCS_INSTNO}_lvm, ${sid}_fs_ascs${ASCS_INSTNO} and ${sid}_ascs${ASCS_INSTNO} are
Started
on NODE1. - Resource group ${sid}_ers${ERS_INSTNO}_group is active on NODE2.
- Resources ${sid}_vip_ers${ERS_INSTNO}, ${sid}_fs_ers${ERS_INSTNO}_lvm, ${sid}_fs_ers${ERS_INSTNO} and ${sid}_ers${ERS_INSTNO} are
Started
on NODE2.
- Check SAP instance processes:
- ASCS instance is running on NODE1.
- ERS instance is running on NODE2.
Test 3 - Test Procedure
Crash the SAP ERS instance by sending a SIGKILL signal.
On NODE2, identify the PID of the enque replication server.
pgrep -af "(er|enqr).sap"
Send a SIGKILL signal to the identified process.
Sample output:
# pgrep -af "(er|enqr).sap"
2527198 er.sapS01_ERS02 pf=/usr/sap/S01/ERS02/profile/S01_ERS02_cl-sap-ers NR=01
# kill -9 2527198
Test 3 - Expected behavior
- SAP Enqueue Replication Server on NODE2 crashes immediately.
- The cluster detects the stopped ERS and marks the resource as failed.
- The cluster restarts the ERS on NODE2.
Check the status with the following command.
pcs status
The ${sid}_ers${ERS_INSTNO}
ERS resource restarted on the second node. If you run the pcs status
command too soon, you might see the ERS resource briefly in status FAILED
.
Sample output:
# pcs status
Cluster name: SAP_ASCS
Cluster Summary:
* Stack: corosync
* Current DC: cl-sap-1 (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
* Last updated: Tue Feb 14 08:50:53 2023
* Last change: Tue Feb 14 08:50:50 2023 by hacluster via crmd on cl-sap-2
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ cl-sap-1 cl-sap-2 ]
Full List of Resources:
* res_fence_ibm_powervs (stonith:fence_ibm_powervs): Started cl-sap-1
* Resource Group: s01_ascs01_group:
* s01_vip_ascs01 (ocf::heartbeat:IPaddr2): Started cl-sap-1
* s01_fs_ascs01_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-1
* s01_fs_ascs01 (ocf::heartbeat:Filesystem): Started cl-sap-1
* s01_ascs01 (ocf::heartbeat:SAPInstance): Started cl-sap-1
* Resource Group: s01_ers02_group:
* s01_vip_ers02 (ocf::heartbeat:IPaddr2): Started cl-sap-2
* s01_fs_ers02_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-2
* s01_fs_ers02 (ocf::heartbeat:Filesystem): Started cl-sap-2
* s01_ers02 (ocf::heartbeat:SAPInstance): Started cl-sap-2
* Clone Set: fs_sapmnt-clone [fs_sapmnt]:
* Started: [ cl-sap-1 cl-sap-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Test 3 - Recovery Procedure
On NODE2, run the following commands.
pcs resource refresh
pcs status --full
Test 4 - Testing a manual move of the ASCS instance
Use the following information to test a manual move of an ASCS instance.
Test 4 - Description
Use SAP Control commands to move the ASCS instance to the other node for maintenance purposes.
Test 4 - Prerequisites
- A functional two-node RHEL HA Add-On cluster for SAP ENSA2.
- The
sap_cluster_connector
is installed and configured. - Both cluster nodes are active.
- Cluster is started on NODE1 and NODE2.
- Resource group ${sid}_ascs${ASCS_INSTNO}_group is active on NODE1.
- Resources ${sid}_vip_ascs${ASCS_INSTNO}, ${sid}_fs_ascs${ASCS_INSTNO}_lvm, ${sid}_fs_ascs${ASCS_INSTNO} and ${sid}_ascs${ASCS_INSTNO} are
Started
on NODE1. - Resource group ${sid}_ers${ERS_INSTNO}_group is active on NODE2.
- Resources ${sid}_vip_ers${ERS_INSTNO}, ${sid}_fs_ers${ERS_INSTNO}_lvm, ${sid}_fs_ers${ERS_INSTNO} and ${sid}_ers${ERS_INSTNO} are
Started
on NODE2.
- Check SAP instance processes:
- ASCS instance is running on NODE1.
- ERS instance is running on NODE2.
Test 4 - Test Procedure
Log in to NODE1 and run sapcontrol
to move the ASCS instance to the other node.
sudo -i -u ${sid}adm -- sh -c "sapcontrol -nr ${ASCS_INSTNO} -function HAFailoverToNode"
Test 4 - Expected behavior
sapcontrol
interacts with the cluster through thesap-cluster-connector
.- The cluster creates location constraints to move the resource.
Check the status with the following command. Keep in mind that the ASCS resource group moved to the second node. If you run the pcs status
command too soon, you might see some resources stopping
and starting
.
pcs status
Sample output:
# pcs status
Cluster name: SAP_ASCS
Cluster Summary:
* Stack: corosync
* Current DC: cl-sap-1 (version 2.0.5-9.el8_4.5-ba59be7122) - partition with quorum
* Last updated: Tue Feb 14 09:03:19 2023
* Last change: Tue Feb 14 09:01:40 2023 by s01adm via crm_resource on cl-sap-1
* 2 nodes configured
* 11 resource instances configured
Node List:
* Online: [ cl-sap-1 cl-sap-2 ]
Full List of Resources:
* res_fence_ibm_powervs (stonith:fence_ibm_powervs): Started cl-sap-1
* Resource Group: s01_ascs01_group:
* s01_vip_ascs01 (ocf::heartbeat:IPaddr2): Started cl-sap-2
* s01_fs_ascs01_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-2
* s01_fs_ascs01 (ocf::heartbeat:Filesystem): Started cl-sap-2
* s01_ascs01 (ocf::heartbeat:SAPInstance): Started cl-sap-2
* Resource Group: s01_ers02_group:
* s01_vip_ers02 (ocf::heartbeat:IPaddr2): Started cl-sap-1
* s01_fs_ers02_lvm (ocf::heartbeat:LVM-activate): Started cl-sap-1
* s01_fs_ers02 (ocf::heartbeat:Filesystem): Started cl-sap-1
* s01_ers02 (ocf::heartbeat:SAPInstance): Started cl-sap-1
* Clone Set: fs_sapmnt-clone [fs_sapmnt]:
* Started: [ cl-sap-1 cl-sap-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Test 4 - Recovery Procedure
Wait until the ASCS instance is active on NODE2. After five minutes, the cluster removes the created location constraints automatically.
The following instructions show how to remove the constraints manually.
On NODE2, run the following command.
pcs constraint
Sample output:
# pcs constraint
Location Constraints:
Resource: s01_ascs01_group
Constraint: cli-ban-s01_ascs01_group-on-cl-sap-1
Rule: boolean-op=and score=-INFINITY
Expression: #uname eq string cl-sap-1
Expression: date lt 2023-02-08 09:33:50 -05:00
Ordering Constraints:
start s01_ascs01_group then stop s01_ers02_group (kind:Optional) (non-symmetrical)
start fs_sapmnt-clone then start s01_ascs01_group (kind:Mandatory)
start fs_sapmnt-clone then start s01_ers02_group (kind:Mandatory)
Colocation Constraints:
s01_ers02_group with s01_ascs01_group (score:-5000)
Ticket Constraints:
pcs resource clear ${sid}_ascs${ASCS_INSTNO}_group
The Location constraints are removed:
pcs constraint
Sample output:
# pcs constraint
Location Constraints:
Ordering Constraints:
start s01_ascs01_group then stop s01_ers02_group (kind:Optional) (non-symmetrical)
start fs_sapmnt-clone then start s01_ascs01_group (kind:Mandatory)
start fs_sapmnt-clone then start s01_ers02_group (kind:Mandatory)
Colocation Constraints:
s01_ers02_group with s01_ascs01_group (score:-5000)
Ticket Constraints: