IBM Cloud Docs
Setting up multi-cluster and job forwarding using Spectrum LSF

Setting up multi-cluster and job forwarding using Spectrum LSF

The following example is a guide on how to set up the multi-cluster and job forwarding using Spectrum LSF. This example explains common situations where a cluster is on-premises and another is in the cloud.

This example assumes that the on-premises cluster labeled with "OnPremiseCluster" uses a subnet 192.168.0.0/24 and its management host uses 192.168.0.4 (on-premise-management). The cloud cluster labeled with "HPCCluster" uses a subnet 10.244.128.0/24 and its management host uses 10.244.128.37 (icgen2host-10-244-128-37). Both of the configuration directories are in /opt/ibm/lsf/conf, but you can change the directory depending on your cluster configuration.

  1. The following is an example of the /etc/hosts file for the cloud cluster. You need to make sure that the host names for the LSF management hosts are DNS-resolveable.

    ...
    10.244.128.61 icgen2host-10-244-128-61
    10.244.128.62 icgen2host-10-244-128-62
    10.244.128.63 icgen2host-10-244-128-63
    
    192.168.0.4 on-premise-management   # added
    

    For the on-premises /etc/hosts file, make sure to add the information about the management host in the cloud cluster:

    10.244.128.37 icgen2host-10-244-128-37 #added
    
  2. Both clusters need to recognize each other, so you need to modify /opt/ibm/lsf/conf/lsf.shared. This configuration file should be identical in both clusters.

    ...
    Begin Cluster
    ClusterName        Servers                       # Keyword             # modified
    HPCCluster         (icgen2host-10-244-128-37)    # modified
    OnPremiseCluster   (on-premise-management)       # modified
    End Cluster
    ...
    
  3. The two clusters are configured to have different lsb.queues files. In the cloud cluster, you need to append the following lines to /opt/ibm/lsf/conf/lsbatch/HPCCluster/configdir/lsb.queues to register a receive queue:

    ...
    Begin Queue
    QUEUE_NAME=recv_q
    RCVJOBS_FROM=OnPremiseCluster
    PRIORITY=30
    NICE=20
    RC_HOSTS=all
    End Queue
    

    The on-premises cluster is configured to have a send queue at /opt/ibm/lsf/conf/lsbatch/OnPremiseCluster/configdir/lsb.queues:

    ...
    Begin Queue
    QUEUE_NAME=send_q
    SNDJOBS_TO=recv_q@HPCCluster
    PRIORITY=30
    NICE=20
    End Queue
    
  4. Restart both clusters by running the following command:

    $ lsfrestart
    
  5. After you restart both clusters, you can now forward jobs from on-premises to the cloud. At your on-premises cluster, you can test the following job:

    $ bsub -q send_q sh -c 'echo $HOSTNAME > /home/lsfadmin/shared/mc-test.txt'
    

    You can see that the job comes to the HPCCluster at 10.244.128.37.

    $ bjobs -aw
    
    JOBID   USER      STAT   QUEUE    FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
    304     lsfadmin  DONE   recv_q   on-premise-management@OnPremiseCluster:911 icgen2host-10-244-128-39 sh -c 'echo $HOSTNAME > /home/lsfadmin/shared/mc-test.txt' Jun 17 02:27
    

Additional resources

For more information, check out the following IBM Spectrum LSF documentation: