Setting up an IBM Spectrum Symphony cluster
Architecture overview and IBM fileshare setup
The Spectrum Symphony cluster consists of a login node, 1 - 3 Symphony management nodes, and a number of Symphony worker nodes.
-
The login node is served as a jump host and it is the only node that has the public IP address. Other nodes have only private IP addresses and the only way to reach these nodes is through the login node. You can log in to the primary Symphony management host and do most of the operations from the Symphony management host. By default, lsfadmin is the only user ID created on the cluster. The SSH passwordless setup is configured between the Symphony management host and workers. You can reach any other worker node with the lsfadmin user ID from the Symphony primary.
-
The worker node can be a static resource. In this case, its lifecycle is managed by Schematics. You can request a number of static worker nodes, and these workers remain available in the Symphony cluster until a Schematics-destroy action is performed. The Symphony resource connector function creates extra workers when there is not enough capacity to run jobs and destroys workers when the demands decrease. The lifecycle of these dynamic workers is managed by the Symphony resource connector. Wait until these dynamic resources are returned to the cloud before you destroy the entire VPC cluster through Schematics.
-
The IBM Cloud File Storage for VPC is used for file sharing. By default, there are three file share volumes; each is 100 GB. To change this configuration, set the custom_file_shares deployment value.
-
The Spectrum Symphony cluster solution provides a base custom image, which includes the Symphony installation. You can create your own custom image on top of the base image. For more information, see Create custom image. The image service on VPC provides a way for doing this. You can then specify the custom image that you want to use in Schematics for Symphony management nodes and worker nodes. The image that is used by the login node and the storage node is not configurable at the moment (CentOS 7 by default).
Create SSH key
Complete the following steps to create your SSH key:
-
Generate an SSH key on your system by running the following command:
ssh-keygen -t rsa
-
Copy and save all the content from
.ssh/id_rsa.pub
.
Add SSH key to the VPC infrastructure
- Log in to the IBM Cloud® console by using your unique credentials.
- From the dashboard, click Menu icon
> VPC Infrastructure > SSH keys.
- Click Create.
- Enter the SSH key name (for example,
po-ibm-ssh-key
), select the default resource group, add tags, and select the region. - Copy and paste the public key into the Public key field (the contents that you saved from
.ssh/id_rsa.pub
). - Click Add SSH key.
Create API key
Complete the following steps to create your API key:
- In the IBM Cloud console, go to Manage > Access (IAM) > API keys.
- Click Create an IBM Cloud API key.
- Enter a name and description for your API key.
- Click Create.
- Then click Show to display the API key, Copy to copy and save it for later, or click Download.
Create and configure an HPC/Symphony cluster from the IBM Cloud catalog
Complete the following steps to create and configure an Spectrum Symphony cluster from the IBM Cloud catalog:
-
In the IBM Cloud catalog, search for HPC or Spectrum Symphony, and then select IBM Spectrum Symphony.
HPC cluster solution page -
In the Set the deployment values section, supply the required values:
api_key
,remote_allowed_ips
,ssh_key_name
, andzone
. -
After you confirm with the license agreement, you can use the default values for other parameters and click Install. The HPC cluster is created and completed within 5 minutes with the default configuration.
Parameters for cluster deployment
See the following table for a list of parameters that you can configure for your HPC cluster:
Parameter | Description |
---|---|
cluster_prefix |
Prefix that is used to name the IBM Spectrum Symphony cluster and IBM Cloud resources that are provisioned to build the IBM Spectrum Symphony cluster instance. You cannot create more than one instance of the Symphony cluster with the same name. Make sure that the name is unique. |
custom_file_shares |
Mount points and sizes in GB and IOPS range of file shares that can be used to customize shared file storage layout. Provide the details for up to 5 shares. Each file share size in GB supports different range of IOPS. For more information, see file share IOPS value. |
hyperthreading_enabled |
Setting this to true will enable hyper-threading in the worker nodes of the cluster(default). Otherwise, hyper-threading will be disabled. |
image_name |
Name of the custom image that you want to use to create virtual server instances in your IBM Cloud account to deploy the IBM Spectrum Symphony cluster. By default, the automation uses a base image with additional software packages mentioned here. If you would like to include your application-specific binary files, follow the instructions in Planning for custom images to create your own custom image and use that to build the IBM Spectrum Symphony cluster through this offering. |
management_node_count |
Number of management nodes. This is the total number of primary, secondary and management nodes. There will be one Primary, one Secondary and the rest of the nodes will be management nodes. Enter a value in the range 1 - 10. |
region |
IBM Cloud zone name within the selected region where the IBM Spectrum Symphony cluster should be deployed. Learn more. |
resource_group |
Resource group name from your IBM Cloud account where the VPC resources should be deployed. Note: Do not modify the "Default" value if you would like to use the auto-scaling capability. For additional information on resource groups, see Managing resource groups. |
vpc_name |
Name of an existing VPC in which the cluster resources will be deployed. If no value is given, then a new VPC will be provisioned for the cluster. Learn more. |
vpn_enabled |
Set the value as true to deploy a VPN gateway for VPC in the cluster. |
vpn_peer_address |
The peer public IP address to which the VPN will be connected. |
vpn_peer_cidrs |
Comma separated list of peer CIDRs (e.g., 192.168.0.0/24) to which the VPN will be connected. |
vpn_preshared_key |
The pre-shared key for the VPN. |
Parameters for auto scaling
You can set the following parameters for auto scaling:
-
worker_node_min_count
: The minimum number of worker nodes that are provisioned at the time the cluster is created and remain running regardless of job demands in the cluster. -
worker_node_max_count
: The maximum number of worker nodes in your Spectrum Symphony cluster, which limits the number of machines that can be added to HPC cluster. Symphony auto scaling scales up the cluster to this number of nodes when needed for your workloads and scales back for keeping onlyworker_node_min_count workers when no job is in the queues.
Parameters for instance profiles
You can control the instance profile for each instance type through the xxx_node_instance_type parameters. The management nodes are where the main Symphony daemons are running. You need to select ones with more compute power if you plan to run jobs by using 100+ nodes. The worker nodes are the ones where the workload execution takes place and the choice needs to be made according to the characteristic of workloads. The login instance is served as a jump host, so you can pick the smallest profile. For more information, see Instance Profiles.
Accessing the Spectrum Symphony cluster
To access your Spectrum Symphony cluster, complete the following steps:
-
Go to the Menu icon
> Activity > Plan applied > View log.
-
Copy
ssh-command
to access your cluster.-
ssh -J root@ip-jumphost root@ip-managementhost
-
The
ip-jumphost
is public, while theip-managementhostis
not. -
-J flag
: Connects to the jump-host and establishes a TCP forwarding to the ultimate destination (management host).
-
Auto scaling
You have a minimum number of worker nodes (worker_node_min_count
). This is the number of worker nodes that are provisioned at the time the cluster is created. However, you can use a maximum number of worker nodes that should be
added to the Spectrum Symphony cluster defined by worker_node_max_count
. This is to limit the number of machines that can be added to Spectrum Symphony cluster when the auto scaling configuration is used. This property can be
used to manage the cost associated with IBM Spectrum Symphony cluster instance.
The following example shows worker_node_min_count=2
and worker_node_max_count=2
.
-
To check the two static worker nodes, run the following command:
egosh user logon -u Admin -x Admin egosh resource list -ll
Example output:
Two worker nodes -
To try the auto scaling function, run a job that requires more than two nodes. For example, this job requires 10 jobs:
symping -m 10
-
The job is submitted.
-
After a minute, check the nodes by running the following command:
egosh resource list -ll
You can see that one node is added to your cluster:
One worker node added -
The node that created by the auto scaling function are destroyed automatically after 1 hour of not being used.
Set up hybrid connectivity (Optional)
If you want to set up a hybrid connectivity environment by using VPN, see the instructions Installing a VPN to an HPC cluster.
Create DNS zones and DNS custom resolver
Use the variable vpc_worker_dns_domain
to specify the DNS domain name for the worker cluster. If this variable is left empty, a DNS zone will be created with the default domain name (for example, dnsscale.com). For the Scale Storage
cluster, the IBM Cloud DNS Services domain variable to be used is vpc_scale_storage_dns_domain
.
When spectrum_scale_enabled
is set to true, the values of vpc_worker_dns_domain
and vpc_scale_storage_dns_domain
must not be the same.