Software

ソフトウェア

ホーム > HPC・DL・AI > HPC製品＆サービス > ソフトウェア > IBM Spectrum LSF Suites

IBM Spectrum LSF Suites

IBM Spectrum LSF Suitesは、ミッションクリティカルな計算・解析・シミュレーション等のアプリケーションのバッチワークロード処理を高速・高スループットで運用可能にします。IBM Spectrum LSF Suitesを使用することにより、HPC環境のバッチワークロード処理をインテリジェントなスケジューリングで実行することができます。IBM Spectrum LSF Suitesは、業種や分野を問わず既存ハードウェアリソースの使用率を最大化させます。従来の高性能コンピューティングと高いスループットのワークロードをサポートするだけでなく、ビッグデータ、コグニティブ、GPU機械学習、コンテナ化ワークロードもサポートします。

お問い合わせはこちら

Status

名称： IBM Spectrum LSF Suites
最新バージョン：Version 10.1 Fix Pack 6

リリースノート：: https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_release_notes/lsf_relnotes_whatsnew10.1.0.6.html

The following topics summarize the new and changed behavior in LSF 10.1 Fix Pack 6.
Release date: June 2018

•GPU enhancements: ・GPU autoconfiguration
Enabling GPU detection for LSF is now available with automatic configuration. To enable automatic GPU configuration, configure LSF_GPU_AUTOCONFIG=Y in the lsf.conf file.
When enabled, the lsload -gpu, lsload -gpuload, and lshosts -gpu commands will show host-based or GPU-based resource metrics for monitoring.
・Specify additional GPU resource requirements
LSF now allows you to request additional GPU resource requirements to allow you to further refine the GPU resources that are allocated to your jobs. The existing bsub -gpu command option, LSB_GPU_REQ parameter in the lsf.conf file, and the GPU_REQ parameter in the lsb.queues and lsb.applications files now have additional GPU options to make the following requests:
The gmodel option requests GPUs with a specific brand name, model number, or total GPU memory.
The gtile option specifies the number of GPUs to use per socket.
The gmem option reserves the specified amount of memory on each GPU that the job requires.
The nvlink option requests GPUs with NVLink connections.
You can also use these options in the bsub -R command option or RES_REQ parameter in the lsb.queues and lsb.applications files for complex GPU resource requirements, such as for compound or alternative resource requirements. Use the gtile option in the span[] string and the other options (gmodel, gmem, and nvlink) in the rusage[] string as constraints on the ngpus_physical resource.
To specify these new GPU options, specify LSB_GPU_NEW_SYNTAX=extend in the lsf.conf file.

•Data collection: ・IBM Spectrum Scale disk I/O accounting using Elasticsearch
LSF now uses IBM Spectrum LSF Explorer (LSF Explorer) to collect IBM Spectrum Scale disk I/O accounting data which, when combined with LSF job information, allows LSF to provide job-level IBM Spectrum Scale I/O statistics. To use this feature, LSF Explorer must be deployed in your LSF cluster, and LSF must be using IBM Spectrum Scale as the file system. To enable IBM Spectrum Scale disk I/O accounting, configure LSF_QUERY_ES_FUNCTIONS=”gpfsio” (or LSF_QUERY_ES_FUNCTIONS=”all”) and LSF_QUERY_ES_SERVERS=”ip:port” in the lsf.conf file.
Use the following commands to display IBM Spectrum Scale disk I/O accounting information:
bacct -l displays the total number of read/write bytes of all storage pools on IBM Spectrum Scale.
bjobs -l displays the accumulated job disk usage (I/O) data on IBM Spectrum Scale.
bjobs -o “gpfsio” displays the job-level disk usage (I/O) data on IBM Spectrum Scale.

•Resource Connector enhancements

・LSF resource connector auditing

With this release, LSF will log resource connector VM events along with usage information into a new file rc.audit.x (one log entry per line in JSON format). The purpose of the rc.audit.x log file is to provide evidence to support auditing and usage accounting as supplementary data to third party cloud provider logs. The information is readable by the end user as text and is hash protected for security.
LSF also provides a new command-line tool rclogsvalidate to validate the logs described above. If the audit file is tampered with, the tool will identify the line which was modified and incorrect.
New parameters have been added to LSF in the lsf.conf configuration file:
-LSF_ RC_AUDIT_LOG: If set to Y, enables the resource connector auditor to generate log files.
-RC_MAX_AUDIT_LOG_SIZE: An integer to determine the maximum size of the rc.audit.x log file, in MB.
-RC_MAX_AUDIT_LOG_KEEP_TIME: An integer that specifies the amount of time that the resource connector audit logs are kept, in months.
・Resource connector template prioritizing
In 10.1 Fix Pack 6 Resource Connector prioritize templates.
The ability to set priorities is now provided in the Resource Connector template. LSF will use higher priority templates first (for example, less expensive templates should be assigned higher priorities).
LSF sorts candidate template hosts by template name. However, an administrator might want to sort them by priority, so LSF favors one template to the other. The “Priority” attribute has been added.:

{
      "Name": "T2",
      "MaxNumber": "2",
      "Attributes":
      {
        "type": ["String", "X86_64"],
        "ncpus": ["Numeric", "1"],
        "mem": ["Numeric", "512"],
        "template": ["String", "T2"],
        "ostkhost": ["Boolean", "1"]
      },
      "Image": "LSF10.1.0.3_OSTK_SLAVE_VM",
      "Flavor": "t2.nano",
      "UserData": "template=T2",
      "Priority": "10"
    }copy to clipboard

Note
The example above is for a template in openStack. Other templates may not contain all attributes.
The default value of Priority is “0”, which means the lowest priority. If template hosts have the same priority, LSF sorts them by template name.

・Support for a dedicated instance of AWS
One new parameter is added to the Resource Connector template to support a dedicated instance of AWS.
If you do not have a placement group in your AWS account, you must at least insert a placement group with a blank name inside quotation marks, because this is required to specify the tenancy. If you have a placement group, specify the placement group name inside the quotation marks. For example, “placementGroupName”: “”, or “placementGroupName”: “hostgroupA”,.
The values for tenancy can be “default”, “dedicated”, and “host”. However, LSF currently only supports “default” and “dedicated”.
The above can be applied for both on-demand and spot instances of AWS.
Full example the template file is as follows:

{
    "templates": [
         {
            "templateId": "aws-vm-0",
            "maxNumber": 5,
            "attributes": {
                "type": ["String", "X86_64"],
                "ncores": ["Numeric", "1"],
                "ncpus": ["Numeric", "1"],
                "mem": ["Numeric", "512"],
                "awshost": ["Boolean", "1"],
                "zone": ["String", "us_west_2d"]               
            },
            "imageId": "ami-0db70175",
            "subnetId": "subnet-cc0248ba",
            "vmType": "c4.xlarge",
            "keyName": "martin",
            "securityGroupIds": ["sg-b35182ca"],
            "instanceTags": "Name=aws-vm-0",
            "ebsOptimized" : false,
            "placementGroupName": "",
            "tenancy": "dedicated",
            "userData": "zone=us_west_2d"        }
}

・HTTP proxy server capability for LSF Resource connector
This feature is useful for customers with strict security requirements. It allows for the use of an HTTP proxy server for endpoint access.
Note
For this release, this feature is enabled only for AWS.

This feature introduces the parameter “scriptOption” for the provider. For example:

{
    "providers":[
        {
            "name": "aws1",
            "type": "awsProv",
            "confPath": "resource_connector/aws",
            "scriptPath": "resource_connector/aws",
           "scriptOption": "-Dhttps.proxyHost=10.115.206.146 -Dhttps.proxyPort=8888"
        }
    ]
}

The value of scriptOption can be any string and is not verified by LSF.
LSF sets the environment variable SCRIPT_OPTIONS when launching the scripts. For AWS plugins, the information is passed to java through syntax like the following:
java $SCRIPT_OPTIONS -Daws-home-dir=$homeDir -jar $homeDir/lib/AwsTool.jar –getAvailableMachines $homeDir $inJson

・Create EBS-Optimized instances
Creating instances with EBS-Optimized enabled is introduced in this release to archive better performance in cloud storage.

The EBS-Optimized attribute has been added to the Resource Connector template. The AWS provider plugin passes the information to AWS when creating the instance. Only high-end instance types support this attribute. The Resource Connector provider plugin will not check if the instance type is supported.
The “ebsOptimized” field in the Resource Connector template is a boolean value (either true or false). The default value is false. Specify the appropriate vmType that supports ebs_optimized (consult AWS documentation).

{
    "templates": [
        {
            "templateId": "Template-VM-1",
            "maxNumber": 4,
            "attributes": {
                "type": ["String", "X86_64"],
                "ncores": ["Numeric", "1"],
                "ncpus": ["Numeric", "1"],
                "mem": ["Numeric", "1024"],
                "awshost1": ["Boolean", "1"]
            },
            "imageId": "ami-40a8cb20",
           "vmType": "m4.large",
            "subnetId": "subnet-cc0248ba",
            "keyName": "martin",
            "securityGroupIds": ["sg-b35182ca"],
            "instanceTags" : "group=project1",
            "ebsOptimized" : true,
            "userData": "zone=us_west_2a"
        }
    ]
}

・Resource connector Policy Enhancement
Enhancements have been made for administration of Resource Connector policies:
A clusterwide parameter RC_MAX_REQUESTS has been introduced in the lsb.params file to control the maximum number of new instances that can be required or requested.
After adding allocated usable hosts in previous sessions, LSF generates total demand requirement. An internal policy entry is created as below:

{
      "Name": "__RC_MAX_REQUESTS",
      "Consumer": 
       {
        "rcAccount": ["all"],
        "templateName": ["all"],
        "provider": ["all"] 
       },      
      "StepValue": "$val:0"   
    }</code.

The parameter LSB_RC_UPDATE_INTERVAL controls how frequent LSF starts demand evaluation. Combining with the new parameter, it plays a cluster wide “step” to control the speed of cluster grow.

•Resource management: ・Running LSF jobs with IBM Cluster Systems Manager
LSF now allows you to run jobs with IBM Cluster Systems Manager (CSM).
The CSM integration allows you to run LSF jobs with CSM features.・Direct data staging
LSF now allows you to run direct data staging jobs, which uses a burst buffer (for example, IBM CAST burst buffer) instead of the cache to stage in and stage out data for data jobs.
Use the CSM integration to configure LSF to run burst buffer data staging jobs.

•Job scheduling and execution

・Plan-based scheduling and reservations
When enabled, LSF’s plan-based scheduling makes allocation plans for jobs based on anticipated future cluster states. LSF reserves resources as needed in order to carry out its plan. This helps to avoid starvation of jobs with special resource requirements.Plan-based scheduling and reservations addresses a number of issues with the older reservation features in LSF. For example:
It ensures that reserved resources can really be used by the reserving jobs
It has better job start-time prediction for reserving jobs, and thus better backfill decisions
Plan-based scheduling aims to replace legacy LSF reservation policies. When ALLOCATION_PLANNER is enabled in the lsb.params configuration file, then parameters related to the old reservation features (that is SLOT_RESERVE and RESOURCE_RESERVE in lsb.queues), are ignored with a warning.・Automatically extend job run limits
You can now configure the LSF allocation planner to extend the run limit for jobs when the resources that are occupied by the job are not needed by other jobs in queues with the same or higher priority. The allocation planner looks at job plans to determine if there are any other jobs that require the current job’s resources.

Enable extendable run limits for jobs submitted to a queue by specifying the EXTENDABLE_RUNLIMIT parameter in the lsb.queues file. Since the allocation planner decides whether the extend the run limit of jobs, you must also enable plan-based scheduling by enabling the ALLOCATION_PLANNER parameter in the lsb.params file.

・Default epsub executable files
Similar to esub programs, LSF now allows you to define a default epsub program that runs even if you do not define mandatory epsub programs with the LSB_ESUB_METHOD parameter in the lsf.conf file. To define a default epsub program, create an executable file named epsub (with no application name in the file name) in the LSF_SERVERDIR directory.

After the job is submitted, LSF runs the default epsub executable file if it exists in the LSF_SERVERDIR directory, followed by any mandatory epsub executable files that are defined by LSB_ESUB_METHOD, followed by the epsub executable files that are specified by the -a option.

・Restrict users and user groups from forwarding jobs to remote clusters
You can now specify a list of users or user groups that can forward jobs to remote clusters when using the LSF multicluster capability. This allows you to prevent jobs from certain users or user groups from being forwarded to an execution cluster, and to set limits on the submission cluster.

These limits are defined at the queue level in LSF. For jobs that are intended to be forwarded to a remote cluster, users must submit these jobs to queues that have the SNDJOBS_TO parameter configured in the lsb.queues file. To restrict these queues to specific users or user groups, define the FWD_USERS parameter in the lsb.queues file for these queues.

・Advance reservations now support the “same” section in resource requirement strings
When using the brsvadd -R and brsvmod -R options to specify resource requirements for advance reservations, the same string now takes effect, in addition to the select string. Previous versions of LSF only allowed the select string to take effect.

This addition allows you to select hosts with the same resources for your advance reservation.

・Priority factors for absolute priority scheduling
You can now set additional priority factors for LSF to calculate the job priority for absolute priority scheduling (APS). These additional priority factors allow you to modify the priority for the application profile, submission user, or user group, which are all used as factors in the APS calculation. You can also view the APS and fairshare user priority values for pending jobs.

To set the priority factor for an application profile, define the PRIORITY parameter in the lsb.applications file. To set the priority factor for a user or user group, define the PRIORITY parameter in the User or UserGroup section of the lsb.users file.

The new bjobs -prio option displays the APS and fairshare user priority values for all pending jobs. In addition, the busers and bugroup commands display the APS priority factor for the specified users or user groups.

・Job dispatch limits for users, user groups, and queues
You can now set limits on the maximum number of jobs that are dispatched in a scheduling cycle for users, user groups, and queues. This allows you to control the number of jobs, by user, user group, or queue, that are dispatched for execution. If the number of dispatched jobs reaches this limit, other pending jobs that belong to that user, user group, or queue that might have dispatched will remain pending for this scheduling cycle.
To set or update the job dispatch limit, run the bconf command on the limit object (that is, run bconf action_typelimit=limit_name) to define the JOBS_PER_SCHED_CYCLE parameter for the specific limit. You can only set job dispatch limits if the limit consumer types are USERS, PER_USER, QUEUES, or PER_QUEUE.

For example, bconf update limit=L1 “JOBS_PER_SCHED_CYCLE=10”

You can also define the job dispatch limit by defining the JOBS_PER_SCHED_CYCLE parameter in the Limit section of the lsb.resources file.

•Command output formatting: ・blimits -a option shows all resource limits
The new blimits -a command option shows all resource allocation limits, even if they are not being applied to running jobs. Normally, running the blimits command with no options displays only resource allocation limits that are being applied to running jobs.・Use bread -w to show messages and attached data files in wide format
LSF allows you to read messages and attached data files from a job in wide format with the new bread -w command option. The wide format displays information without truncating fields.

•Other changes to IBM Spectrum LSF

・Increased project name size
In previous versions of LSF, when submitting a job with a project name (by using the bsub -P option, the DEFAULT_PROJECT parameter in the lsb.params file, or by using the LSB_PROJECT_NAME or LSB_DEFAULTPROJECT environment variables), the maximum length of the project name was 59 characters. The maximum length of the project name is now increased to 511 characters.
This increase also applies to each project name that is specified in the PER_PROJECT and PROJECTS parameters in the lsb.resources file.
・Cluster-wide DNS host cache
LSF can generate a cluster-wide DNS host cache file ($LSF_ENVDIR/.hosts.dnscache) that is used by all daemons on each host in the cluster to reduce the number of times that LSF daemons directly call the DNS server when starting the LSF cluster. To enable the cluster-wide DNS host cache file, configure LSF_DNS_CACHE=Y in the lsf.conf file.
・Use #include for shared configuration file content
In previous versions of LSF, you can use the #INCLUDE directive to insert the contents of a specified file into the beginning of the lsf.shared or lsb.applications configuration files to share common configurations between clusters or hosts.
You can now use the #INCLUDE directive in any place in the following configuration files:
lsb.applications
lsb.hosts
lsb.queues
lsb.reasons
lsb.resources
lsb.usersYou can use the #INCLUDE directive only at the beginning of the following file:
lsf.shared
For example, you can use #if … #endif Statements to specify a time-based configuration that uses different configurations for different times. You can change the configuration for the entire system by modifying the common file that is specified in the #INCLUDE directive.・Showing the pending reason for interactive jobs
The bsub -I command now displays the pending reason for interactive jobs, based on the setting of LSB_BJOBS_PENDREASON_LEVEL, if the job is pending.

・Changing job priorities and limits dynamically
Through the introduction of two new parameters, LSF now supports changing job priorities and limits dynamically through an import file. This includes:
Calling the eadmin script at a configured interval, even when a job exception has not occurred through the parameter EADMIN_TRIGGER_INTERVAL in the lsb.params file.
Allowing job submission during a policy update or cluster restart through the parameter PERSIST_LIVE_CONFIG in the lsb.params file.
Enhancement of the bconf command to override existing settings through the set action, to support the -pack option for reading multiple requests from a file.

・Specify a UDP port range for LSF daemons
You can now specify a range of UDP ports to be used by LSF daemons. Previously, LSF binds to a random port number between 1024 and 65535.
To specify a UDP port range, define the LSF_UDP_PORT_RANGE parameter in the lsf.conf file. Include at least 10 ports in this range, and you can specify integers between 1024 and 65535.