IBM Cloud Docs
IBM Spectrum LSF

IBM Spectrum LSF

IBM® Spectrum LSF enables High-Performance Computing (HPC) clusters by using LSF as the HPC scheduling software. This solution employs a deployable architecture to provision and configure IBM Cloud resources. It supports public virtual machines or virtual machines on dedicated hosts for static compute nodes. However, management nodes and dynamic compute nodes are exclusively deployed by using public virtual machines.

Architecture diagram

Architecture diagram
IBM Spectrum LSF architecture diagram

Design concepts

The architecture framework design covers design considerations and architecture decisions for the following aspects and domains:

  • Data: Data storage
  • Compute: Virtual servers
  • Storage: Primary storage
  • Networking: Isolation and domain name service
  • Security: Data security
  • Service management: Logging and automated deployment

Design requirements.
Architecture design scope

Requirements

The following table outlines the requirements that are addressed in this architecture.

Requirements
Aspect Requirements
Data Provide a location to store IBM Spectrum LSF configuration and data.
Compute Provide properly isolated compute resources with adequate compute capacity for the applications.
Storage Provide storage that meets the application and database performance requirements.
Networking
  • Deploy workloads in an isolated environment and enforce information flow policies.
  • Distribute incoming application requests across available compute resources.
  • Support failover of application within the cluster event of planned or unplanned node outage.
  • Provide private DNS resolution to support the use of hostnames instead of IP addresses.
Security
  • Ensure that all operator actions are run securely through bastion host.
  • Provide users with the ability to use keys to ensure that all data meets regulatory compliance requirements for more security and user control.
  • Protect secrets through their entire lifecycle and secure them using access control measures.
Service Management
  • Monitor system and application health metrics and logs to detect issues that might impact the availability of the application.
  • Monitor audit logs to track changes and detect potential security problems.

Components

Components
Aspects Requirement Architecture component How the component is used
Data and Storage Create file shares IBM Cloud File Storage for VPC or optionally IBM® Storage Scale Creates file shares for configuring user file data sharing.
Compute Provide infrastructure and administration access HPC VPC service Provides a VPC service so that you can log in and submit an HPC job.
Create virtual server instances to support bastion. Bastion node Create a VPC virtual server instance for bastion and special-purpose servers that are used to manage access to a private network from an external network, typically the internet.
Create virtual server instances to support management. Login node Creates a VPC virtual server instance for so that you can log in and submit HPC jobs.
Create virtual server instances that run LSF as a distributed batch HPC application for HPC workload (jobs). LSF management node Creates a VPC virtual server instance that runs LSF as a distributed batch HPC application for HPC workloads.
Networking
  • Isolate bastion, login, and LSF management nodes.
  • Limit the number of connections to the bastion node.
  • Restrict management subnet access to bastion and users host or CIDR.
Security group rules for each subnet As an alternative, more CIDR or ports can be manually added after deployment.
Enable floating IP on bastion node for user access. Floating IP on the bastion node Allows user access to the HPC VPC.
Enable a public gateway for the HPC management subnet. Public gateway for management subnet Allows outbound communication for the LSF management node for any internet access (for example, repositories, packages, and so on).
DNS service for the HPC compute nodes DNS service Helps with the IP and name resolution for the HPC compute nodes.
(Optional) Load VPN configuration to simplify VPN setup. VPN VPN configuration is the responsibility of the user.
Security Create virtual server instances to support management. Bastion node Configures security group rules to allow access to IBM Cloud services.
Create virtual server instances to support management. Login node Configures security group rules to allow access to IBM Cloud services.
Create virtual server instances that run LSF as a distributed batch HPC application for HPC workload (jobs). LSF management node Configures security group rules to allow access to IBM Cloud services.
(Optional) Provide users with the ability to use keys to ensure that all data meets regulatory compliance requirements for more security and user control. IBM® Key Protect for IBM Cloud® Provides the ability to use keys to ensure that all data meets regulatory compliance requirements for more security and user control.
(Optional) Protect secrets through their entire lifecycle and secure them using access control measures. IBM Cloud® Secrets Manager Protects secrets through their entire lifecycle and secure them using access control measures.
Service Management Schedule and run distributed batch HPC applications.

IBM Spectrum LSF cluster includes:

  • Management nodes: Run LSF internal management is highly available components.
  • Dynamic compute nodes: Computational hosts where LSF runs the HPC workload and can be placed in a single zone.
IBM Spectrum LSF software is industry-leading and enterprise-class. LSF provides a resource management framework that takes your job requirements, dynamically requests the best resources from the cloud to run the job, monitors its progress, and releases the resource after the workload completion.
(Optional) Monitor system and application health metrics and logs to detect issues that might impact the availability of the application. IBM Cloud Monitoring Monitors system and application health to detect issues that might impact the availability of the application.
(Optional) Monitor audit logs to track changes and detect potential security problems. IBM Cloud® Activity Tracker Event Routing Monitors audit logs to track changes and detect potential security problems.