IBM Spectrum LSF Suitesは、ミッションクリティカルな計算・解析・シミュレーション等のアプリケーションのバッチワークロード処理を高速・高スループットで運用可能にします。IBM Spectrum LSF Suitesを使用することにより、HPC環境のバッチワークロード処理をインテリジェントなスケジューリングで実行することができます。IBM Spectrum LSF Suitesは、業種や分野を問わず既存ハードウェアリソースの使用率を最大化させます。従来の高性能コンピューティングと高いスループットのワークロードをサポートするだけでなく、ビッグデータ、コグニティブ、GPU機械学習、コンテナ化ワークロードもサポートします。
The following topics summarize the new and changed behavior in LSF 10.1 Fix Pack 6.
Release date: June 2018
With this release, LSF will log resource connector VM events along with usage information into a new file rc.audit.x (one log entry per line in JSON format). The purpose of the rc.audit.x log file is to provide evidence to support auditing and usage accounting as supplementary data to third party cloud provider logs. The information is readable by the end user as text and is hash protected for security.
LSF also provides a new command-line tool rclogsvalidate to validate the logs described above. If the audit file is tampered with, the tool will identify the line which was modified and incorrect.
New parameters have been added to LSF in the lsf.conf configuration file:
-LSF_ RC_AUDIT_LOG: If set to Y, enables the resource connector auditor to generate log files.
-RC_MAX_AUDIT_LOG_SIZE: An integer to determine the maximum size of the rc.audit.x log file, in MB.
-RC_MAX_AUDIT_LOG_KEEP_TIME: An integer that specifies the amount of time that the resource connector audit logs are kept, in months.
・Resource connector template prioritizing
In 10.1 Fix Pack 6 Resource Connector prioritize templates.
The ability to set priorities is now provided in the Resource Connector template. LSF will use higher priority templates first (for example, less expensive templates should be assigned higher priorities).
LSF sorts candidate template hosts by template name. However, an administrator might want to sort them by priority, so LSF favors one template to the other. The “Priority” attribute has been added.:
{
"Name": "T2",
"MaxNumber": "2",
"Attributes":
{
"type": ["String", "X86_64"],
"ncpus": ["Numeric", "1"],
"mem": ["Numeric", "512"],
"template": ["String", "T2"],
"ostkhost": ["Boolean", "1"]
},
"Image": "LSF10.1.0.3_OSTK_SLAVE_VM",
"Flavor": "t2.nano",
"UserData": "template=T2",
"Priority": "10"
}copy to clipboard
Note
The example above is for a template in openStack. Other templates may not contain all attributes.
The default value of Priority is “0”, which means the lowest priority. If template hosts have the same priority, LSF sorts them by template name.
・Support for a dedicated instance of AWS
One new parameter is added to the Resource Connector template to support a dedicated instance of AWS.
If you do not have a placement group in your AWS account, you must at least insert a placement group with a blank name inside quotation marks, because this is required to specify the tenancy. If you have a placement group, specify the placement group name inside the quotation marks. For example, “placementGroupName”: “”, or “placementGroupName”: “hostgroupA”,.
The values for tenancy can be “default”, “dedicated”, and “host”. However, LSF currently only supports “default” and “dedicated”.
The above can be applied for both on-demand and spot instances of AWS.
Full example the template file is as follows:
{
"templates": [
{
"templateId": "aws-vm-0",
"maxNumber": 5,
"attributes": {
"type": ["String", "X86_64"],
"ncores": ["Numeric", "1"],
"ncpus": ["Numeric", "1"],
"mem": ["Numeric", "512"],
"awshost": ["Boolean", "1"],
"zone": ["String", "us_west_2d"]
},
"imageId": "ami-0db70175",
"subnetId": "subnet-cc0248ba",
"vmType": "c4.xlarge",
"keyName": "martin",
"securityGroupIds": ["sg-b35182ca"],
"instanceTags": "Name=aws-vm-0",
"ebsOptimized" : false,
"placementGroupName": "",
"tenancy": "dedicated",
"userData": "zone=us_west_2d" }
}
・HTTP proxy server capability for LSF Resource connector
This feature is useful for customers with strict security requirements. It allows for the use of an HTTP proxy server for endpoint access.
Note
For this release, this feature is enabled only for AWS.
This feature introduces the parameter “scriptOption” for the provider. For example:
{
"providers":[
{
"name": "aws1",
"type": "awsProv",
"confPath": "resource_connector/aws",
"scriptPath": "resource_connector/aws",
"scriptOption": "-Dhttps.proxyHost=10.115.206.146 -Dhttps.proxyPort=8888"
}
]
}
The value of scriptOption can be any string and is not verified by LSF.
LSF sets the environment variable SCRIPT_OPTIONS when launching the scripts. For AWS plugins, the information is passed to java through syntax like the following:
java $SCRIPT_OPTIONS -Daws-home-dir=$homeDir -jar $homeDir/lib/AwsTool.jar –getAvailableMachines $homeDir $inJson
・Create EBS-Optimized instances
Creating instances with EBS-Optimized enabled is introduced in this release to archive better performance in cloud storage.
The EBS-Optimized attribute has been added to the Resource Connector template. The AWS provider plugin passes the information to AWS when creating the instance. Only high-end instance types support this attribute. The Resource Connector provider plugin will not check if the instance type is supported.
The “ebsOptimized” field in the Resource Connector template is a boolean value (either true or false). The default value is false. Specify the appropriate vmType that supports ebs_optimized (consult AWS documentation).
{
"templates": [
{
"templateId": "Template-VM-1",
"maxNumber": 4,
"attributes": {
"type": ["String", "X86_64"],
"ncores": ["Numeric", "1"],
"ncpus": ["Numeric", "1"],
"mem": ["Numeric", "1024"],
"awshost1": ["Boolean", "1"]
},
"imageId": "ami-40a8cb20",
"vmType": "m4.large",
"subnetId": "subnet-cc0248ba",
"keyName": "martin",
"securityGroupIds": ["sg-b35182ca"],
"instanceTags" : "group=project1",
"ebsOptimized" : true,
"userData": "zone=us_west_2a"
}
]
}
・Resource connector Policy Enhancement
Enhancements have been made for administration of Resource Connector policies:
A clusterwide parameter RC_MAX_REQUESTS has been introduced in the lsb.params file to control the maximum number of new instances that can be required or requested.
After adding allocated usable hosts in previous sessions, LSF generates total demand requirement. An internal policy entry is created as below:
{
"Name": "__RC_MAX_REQUESTS",
"Consumer":
{
"rcAccount": ["all"],
"templateName": ["all"],
"provider": ["all"]
},
"StepValue": "$val:0"
}</code.
The parameter LSB_RC_UPDATE_INTERVAL controls how frequent LSF starts demand evaluation. Combining with the new parameter, it plays a cluster wide “step” to control the speed of cluster grow.
・Automatically extend job run limits
You can now configure the LSF allocation planner to extend the run limit for jobs when the resources that are occupied by the job are not needed by other jobs in queues with the same or higher priority. The allocation planner looks at job plans to determine if there are any other jobs that require the current job’s resources.
Enable extendable run limits for jobs submitted to a queue by specifying the EXTENDABLE_RUNLIMIT parameter in the lsb.queues file. Since the allocation planner decides whether the extend the run limit of jobs, you must also enable plan-based scheduling by enabling the ALLOCATION_PLANNER parameter in the lsb.params file.
・Default epsub executable files
Similar to esub programs, LSF now allows you to define a default epsub program that runs even if you do not define mandatory epsub programs with the LSB_ESUB_METHOD parameter in the lsf.conf file. To define a default epsub program, create an executable file named epsub (with no application name in the file name) in the LSF_SERVERDIR directory.
After the job is submitted, LSF runs the default epsub executable file if it exists in the LSF_SERVERDIR directory, followed by any mandatory epsub executable files that are defined by LSB_ESUB_METHOD, followed by the epsub executable files that are specified by the -a option.
・Restrict users and user groups from forwarding jobs to remote clusters
You can now specify a list of users or user groups that can forward jobs to remote clusters when using the LSF multicluster capability. This allows you to prevent jobs from certain users or user groups from being forwarded to an execution cluster, and to set limits on the submission cluster.
These limits are defined at the queue level in LSF. For jobs that are intended to be forwarded to a remote cluster, users must submit these jobs to queues that have the SNDJOBS_TO parameter configured in the lsb.queues file. To restrict these queues to specific users or user groups, define the FWD_USERS parameter in the lsb.queues file for these queues.
・Advance reservations now support the “same” section in resource requirement strings
When using the brsvadd -R and brsvmod -R options to specify resource requirements for advance reservations, the same string now takes effect, in addition to the select string. Previous versions of LSF only allowed the select string to take effect.
This addition allows you to select hosts with the same resources for your advance reservation.
・Priority factors for absolute priority scheduling
You can now set additional priority factors for LSF to calculate the job priority for absolute priority scheduling (APS). These additional priority factors allow you to modify the priority for the application profile, submission user, or user group, which are all used as factors in the APS calculation. You can also view the APS and fairshare user priority values for pending jobs.
To set the priority factor for an application profile, define the PRIORITY parameter in the lsb.applications file. To set the priority factor for a user or user group, define the PRIORITY parameter in the User or UserGroup section of the lsb.users file.
The new bjobs -prio option displays the APS and fairshare user priority values for all pending jobs. In addition, the busers and bugroup commands display the APS priority factor for the specified users or user groups.
・Job dispatch limits for users, user groups, and queues
You can now set limits on the maximum number of jobs that are dispatched in a scheduling cycle for users, user groups, and queues. This allows you to control the number of jobs, by user, user group, or queue, that are dispatched for execution. If the number of dispatched jobs reaches this limit, other pending jobs that belong to that user, user group, or queue that might have dispatched will remain pending for this scheduling cycle.
To set or update the job dispatch limit, run the bconf command on the limit object (that is, run bconf action_typelimit=limit_name) to define the JOBS_PER_SCHED_CYCLE parameter for the specific limit. You can only set job dispatch limits if the limit consumer types are USERS, PER_USER, QUEUES, or PER_QUEUE.
For example, bconf update limit=L1 “JOBS_PER_SCHED_CYCLE=10”
You can also define the job dispatch limit by defining the JOBS_PER_SCHED_CYCLE parameter in the Limit section of the lsb.resources file.
・Showing the pending reason for interactive jobs
The bsub -I command now displays the pending reason for interactive jobs, based on the setting of LSB_BJOBS_PENDREASON_LEVEL, if the job is pending.
・Changing job priorities and limits dynamically
Through the introduction of two new parameters, LSF now supports changing job priorities and limits dynamically through an import file. This includes:
Calling the eadmin script at a configured interval, even when a job exception has not occurred through the parameter EADMIN_TRIGGER_INTERVAL in the lsb.params file.
Allowing job submission during a policy update or cluster restart through the parameter PERSIST_LIVE_CONFIG in the lsb.params file.
Enhancement of the bconf command to override existing settings through the set action, to support the -pack option for reading multiple requests from a file.
・Specify a UDP port range for LSF daemons
You can now specify a range of UDP ports to be used by LSF daemons. Previously, LSF binds to a random port number between 1024 and 65535.
To specify a UDP port range, define the LSF_UDP_PORT_RANGE parameter in the lsf.conf file. Include at least 10 ports in this range, and you can specify integers between 1024 and 65535.
IBM Spectrum LSF には以下の制限を有する無償利用可能な Community Edition が存在します。(2025年2月現在)
▸ 各ノードのCPUは2ソケットまたは1ソケットのみ。
▸ 利用できる最大ノード数は10。
Lavaは、512ノードまで管理可能なオープンソースのジョブスケジューラです。またコマンド体系もIBM Platform LSFに準拠しているため、IBM Platform LSFユーザの方にも違和感無くご使用いただけます。ただし、ジョブの割り込みができないなどの制限事項がございます。
平日9:30~17:30 (土曜日、日曜日、祝祭日、年末年始、夏期休暇は、休日とさせていただきます。)