Resource Allocation & Job Submission¶
Loadleveler¶
Basic Commands:
llsubmit script.ll - submit job to the queue
llq - check job status of all jobs in the queue
llstatus - check available resources
llclass – shows available job classes and their parameters
llcancel JOBID – cancel job with the id “JOBID”
Job description and required resources must be defined in a special script (text file) for LoadLeveler.
You can find some job script examples in this directory: /gpfs/home/info/examples
.
Job Script Syntax:
The script consists of key expressions for LoadLeveler (lines starting with #@
) and commands, which will be interpreted.
At the beginning of the script, you need to specify the job’s resources.
Lines with LoadLeveler keywords should not be split by lines that do not contain them.
Keywords are followed by commands for job execution. You will usually use ‘mpiexec’ or ‘poe’ to run your parallel program. You can also use shell commands inside your script. You can find out your name and account number (account_no) using the command “showaccount”.
Script Example (IBM PE):
#!/bin/bash
#@ job_type = parallel
#@ job_name = My_job
#@ account_no = name-number
#@ class = My_class
#@ error = job.err
#@ output = job.out
#@ network.MPI = sn_all,not_shared,US
#@ node = 2
#@ rset = RSET_MCM_AFFINITY
#@ mcm_affinity_options = mcm_mem_req mcm_distribute mcm_sni_none
#@ task_affinity = core(1)
#@ tasks_per_node = 32
#@ queue
mpiexec /path/to/your/app -flags
Script Example (MPICH):
#!/bin/bash
#@ job_type = parallel
#@ account_no = name-number
#@ class = My_class
#@ output = job.out
#@ error = job.err
#@ network.MPI = sn_all,not_shared,US
#@ node = 1
#@ tasks_per_node = 32
#@ queue
export LD_LIBRARY_PATH=/gpfs/home/utils/mpi/mpich2-1.5/lib:$LD_LIBRARY_PATH
export PATH=/gpfs/home/utils/mpi/mpich2-1.5/bin:$PATH
$(which mpiexec) ./soft.x
The most important keywords are #@ total_tasks
, which specifies the number of MPI processes and #@ node
, which specifies the number of nodes. Our example runs a total of 64 tasks on 2 nodes. It is also important to choose the right job class. The following table shows the available classes.
class |
max_node per job |
maxjobs per user |
max_total_tasks per user |
max. walltime (HH:MM) |
priority |
---|---|---|---|---|---|
short |
32 |
-1 |
-1 |
12:00 |
100 |
short_priority |
32 |
-1 |
-1 |
12:00 |
1100 |
medium |
16 |
-1 |
1024 |
48:00 |
80 |
medium_priority |
16 |
-1 |
1024 |
72:00 |
1000 |
testing |
1 |
1:00 |
undefined* |
* this class runs on a single designated node, it is used to test and tune applications.
You can find complete information about classes using llclass -l
command.