Setting up an IBM Spectrum LSF cluster
Deploy the HPC cluster with your choice of configuration properties.
Architecture overview and NFS file system setup
The HPC cluster consists of a login node, a storage node where the block storage volume is attached, 1 - 3 LSF management nodes, and a number of LSF worker nodes.
-
The login node is served as a jump host and it is the only node that has the public IP address. Other nodes have only private IP addresses and the only way to reach these nodes is through the login node. You can log in to the primary LSF management host and do most of the operations from the LSF management host. By default,
lsfadmin
is the only user ID created on the cluster. The SSH passwordless setup is configured between the LSF management host and workers. You can reach any other worker node with thelsfadmin
user ID from the LSF primary. -
The worker node can be a static resource. In this case, its lifecycle is managed by Schematics. You can request a number of static worker nodes, and these workers remain available in the LSF cluster until a Schematics-destroy action is performed. The LSF resource connector function creates extra workers when there is not enough capacity to run jobs and destroys workers when the demands decrease. The lifecycle of these dynamic workers is managed by the LSF resource connector. Wait until these dynamic resources are returned to the cloud before you destroy the entire VPC cluster through Schematics.
-
The IBM Cloud File Storage for VPC is used for file sharing. By default, there are two file share volumes; each is 100 GB. To change this configuration, set the custom_file_shares deployment value.
The HPC cluster solution provides a base custom image, which includes the LSF installation. You can create your own custom image on top of the base image. For more information, see Create custom image. The image service on VPC provides a way for doing this. You can then specify the custom image that you want to use in Schematics for LSF management nodes and worker nodes. The image that is used by the login node and the storage node is not configurable at the moment (CentOS 7 by default).
Create SSH key
Complete the following steps to create your SSH key:
-
Generate an SSH key on your system by running the following command:
ssh-keygen -t rsa
-
Copy and save all the content from
.ssh/id_rsa.pub
.
Add SSH key to the VPC infrastructure
- Log in to the IBM Cloud® console by using your unique credentials.
- From the dashboard, click Menu icon > VPC Infrastructure > SSH keys.
- Click Create.
- Enter the SSH key name (for example,
po-ibm-ssh-key
), select the default resource group, add tags, and select the region. - Copy and paste the public key into the Public key field (the contents that you saved from
.ssh/id_rsa.pub
). - Click Add SSH key.
Create API key
Complete the following steps to create your API key:
- In the IBM Cloud console, go to Manage > Access (IAM) > API keys.
- Click Create an IBM Cloud API key.
- Enter a name and description for your API key.
- Click Create.
- Then click Show to display the API key, Copy to copy and save it for later, or click Download.
Create and configure an HPC/LSF cluster from the IBM Cloud catalog
Complete the following steps to create and configure an HPC cluster from the IBM Cloud catalog:
-
In the IBM Cloud catalog, search for HPC or Spectrum LSF, and then select IBM Spectrum LSF.
-
In the Set the deployment values section, supply the required values:
api_key
,ibm_customer_number
,remote_allowed_ips
,ssh_key_name
, andzone
. -
After you confirm with the license agreement, you can use the default values for other parameters and click Install. The HPC cluster is created and completed within 5 minutes with the default configuration.
Parameters for cluster deployment
See the following table for a list of parameters that you can configure for your HPC cluster:
Parameter | Description |
---|---|
cluster_prefix |
The prefix used to name the VPC resources that are provisioned to build the HPC cluster. There are resources where their names must be unique in the same cloud account given a single region. Make sure that the name is unique (for example, add your initials to the name: po-hpc-cluster). |
hyperthreading_enabled |
You can enable hyper-threading in the worker nodes of the cluster by setting this value to true (default). Otherwise, hyper-threading is disabled. |
compute_image_name |
Name of the custom image that you use to create virtual server instances in your IBM Cloud account to deploy the IBM Spectrum LSF cluster worker nodes. If you created your own custom image, edit the value to the name of your custom image. |
management_image_name |
Edit and add the custom image with LSF. You can use the default LSF custom image that is provided by the solution. |
management_node_count |
You can have up to three management nodes in the cluster. If you want failover support that is provided by LSF, you need to specify the value to be larger than one. In this case, when the primary management node is down, one of the candidate management nodes become the primary and your cluster remains functional without interruption. |
region |
Edit the region where you want your cluster to be created. To get a full list of regions, see Creating a VPC in a different region. |
resource_group |
The resource group name from your IBM Cloud account where the VPC resources are deployed. |
vpc_name |
You can use an existing VPC in which the cluster resources are provisioned. If no value given, then a new VPC is provisioned for the cluster. |
vpn_enabled |
You can deploy a VPN gateway for VPC in the cluster. By default, the value is set to false. |
vpn_peer_address |
The peer public IP address to which the VPN is connected. |
vpn_peer_cidrs |
Comma-separated list of peer CIDRs (for example, 192.168.0.0/24) to which the VPN is connected. |
vpn_preshared_key |
The pre-shared key for the VPN. |
zone |
Edit the zone based on the selected region. To get a full list of zones within a region, see Get zones by using the CLI. |
Parameters for auto scaling
You can set the following parameters for auto scaling:
-
worker_node_min_count
: The minimum number of worker nodes that are provisioned at the time the cluster is created and remain running regardless of job demands in the cluster. -
worker_node_max_count
: The maximum number of worker nodes in your HPC cluster, which limits the number of machines that can be added to HPC cluster. LSF auto scaling scales up the cluster to this number of nodes when needed for your workloads and scales back for keeping onlyworker_node_min_count
workers when no job is in the queues.
Parameters for instance profiles
You can control the instance profile for each instance type through the xxx_node_instance_type
parameters. The management nodes are where the main LSF daemons are running. You need to select ones with more compute power if you
plan to run jobs by using 100+ nodes. The worker nodes are the ones where the workload execution takes place and the choice needs to be made according to the characteristic of workloads. The storage node is the one to manage the NFS file
system for your HPC cluster. The login instance is served as a jump host, so you can pick the smallest profile. For more information, see Instance Profiles.
Parameters for block volumes
You can configure the storage capacity and throughput by using the volume_xxx
parameters. The value for the volume_profile
parameter can be either general purpose or custom. When general-purpose
is used,
IOPS is determined by the cloud infrastructure and the volume_iops
parameter does not have any effect. If you want to customize the IOPS, you need to use custom for volume_profiles
and set up the IOPS through volume_iops
based on the capacity that is specified in volume_capacity
. For more information, see Block storage profiles.
Accessing the HPC cluster
To access your HPC cluster, complete the following steps:
-
Go to Menu icon > Activity > Plan applied > View log.
-
Copy
ssh-command
to access your cluster.-
ssh -J root@ip-jumphost lsfadmin@ip-managementhost
-
The
ip-jumphost
ispublic
, while theip-managementhost
is not. -
-J flag
: Connects to the jump-host and establishes a TCP forwarding to the ultimate destination (management host).
-
Auto scaling
You have a minimum number of worker nodes (worker_node_min_count
). This is the number of worker nodes that are provisioned at the time the cluster is created. However, you can use a maximum number of worker nodes that should be
added to the Spectrum LSF cluster defined by worker_node_max_count
. This is to limit the number of machines that can be added to Spectrum LSF cluster when the auto scaling configuration is used. This property can be used to manage
the cost associated with Spectrum LSF cluster instance.
The following example shows worker_node_min_count=2
and worker_node_max_count=10
.
-
To check the two worker nodes, run the following command:
bhosts -w
Example output:
-
To try the auto scaling function, run a job that requires more than two nodes. For example, this job requires five jobs to sleep for 10 seconds:
bsub -n 5 -R "span[ptile=1]" sleep 10
-
The job is submitted.
-
After a minute, check the nodes by running the following command:
bhosts -w
You can see that now five nodes were added to your cluster:
-
The difference of nodes that were created by the auto scaling function are destroyed automatically after 10 minutes of not being used.
(Optional) Set up hybrid connectivity
If you want to set up a hybrid connectivity environment by using VPN, see the instructions Installing a VPN to an HPC cluster.
If you would like to use Direct Link, see the instructions for Installing Direct Link to an HPC cluster.
Using OpenLDAP with IBM Spectrum LSF
If you want to know more about OpenLDAP with IBM Spectrum LSF, see About OpenLDAP with IBM Spectrum LSF.
During deployment, you enable OpenLDAP with your IBM Spectrum LSF cluster by setting the enable_ldap
,ldap_basedns
, ldap_server
, ldap_admin_password
, ldap_user_name
, and ldap_user_password
deployment input values.
If you want to know more about integrating OpenLDAP with your IBM Spectrum LSF cluster, see Integrating OpenLDAP with your IBM Spectrum LSF cluster.
Create DNS zones and DNS custom resolver
If you leave the dns_instance_id
deployment input value as null, the deployment process creates a new DNS service instance ID in the respective DNS zone. Alternatively, provide an existing IBM Cloud® DNS Service instance ID for the dns_instance_id
deployment input value.
If you leave the dns_custom_resolver_id
deployment input value as null, the deployment process creates a new VPC and enables a new custom resolver for your cluster. Alternatively, to create DNS custom resolvers with an existing
VPC, provide the resolver ID for the dns_custom_resolver_id
deployment input value. For more information, see DNS custom resolvers for your IBM Spectrum LSF cluster.
Using IBM Key Protect instances to manage data encryption
To manage the data encryption to your virtual server instances, use the IBM Key Protect instance through IBM Spectrum LSF cluster. For more information on Key Protect and encryption keys, see IBM® Key Protect and encryption keys.