Quick start¶
Login Credentials¶
User registration and administration is handled by the registration portal at register.sivvp.sk. You can create your account here. User registration proceeds in the following steps:
A new user creates an account at register.sivvp.sk.
User confirms their email address.
User details are checked, user account is activated and the user receives testing access to all of the computing resources of SIVVP. Further information is sent to the user’s email.
Computing resources are accessible only through secure connection (SSH) to one of the login nodes. After logging in, the user can compile programs and run jobs. Logging in requires a private SSH key.
Creating ssh key pair in Linux/UNIX¶
Run the following command in shell:
ssh-keygen -b 2048 -t rsa
This creates a key pair in ~/.ssh directory (by default), please select a secure password (at least 8 characters, including numbers and special symbols). You will upload your public key (~/.ssh/id_rsa.pub) during the registration process. Let us remind you that ~/.ssh/id_rsa is your private key, which needs to be stored securely. Never send or show it to anyone. If you are suspicious that your private key was compromised, please report it immediately. You can always generate a new key pair and send it to us for replacement.
Creating ssh key pair in Windows¶
Windows 10 and newer has built-in ssh client. You can use it from command prompt or powershell, then continue instructions from linux section. Since Windows 10’s April 2018 Update the built-in ssh client is enabled by default. If you have an older version of Windows 10 you can install OpenSSH client or install some of the third-party software (e.g. putty, mobaxterm).
The following is the procedure using PuTTY (open-source ssh client). Along with the main client, you will also need PuTTYgen from here: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

Select SSH-2 RSA as the key type and set “number of bits in the generated key” to 2048.

Click “generate” and move your mouse around in the gray field.

Do not forget a strong password for your key. Upload you public key and secure your private key (see information above).
Accessing Resources¶
IP addresses of the login nodes¶
Login node |
IP address / domain |
Port |
---|---|---|
Aurel 1 |
147.213.80.175 |
22 |
Aurel 2 |
147.213.80.176 |
22 |
Žilina |
147.213.242.7 |
22 |
Košice |
login2.kelinux.nscc.sk |
5522 |
Aurel supercomputer has two equivalent login nodes. If one is down, you can use the other.
Login¶
To log in, use ssh client. In Linux/UNIX or Mac type following command, use -p port
when needed (default port is 22):
ssh user@IP
In Windows open PuTTY, input IP address of the server as the host name and correct port number. Import your private key through the menu Connection -> SSH -> Auth.

File Transfer¶
You can use SCP protocol to transfer files to Aurel supercomputer and the other compute clusters.
File transfer from Linux/Mac¶
Example command to transfer a local file to the cluster:
scp /path/to/local/file login@IP:.
Example command to transfer a file from the cluster to a local machine:
scp login@IP:/path/to/remote/file .
Just replace the “IP” with the address of the target machine and “login” with your login name.
If port is not default (22), add -P port
to the command.
File transfer through Windows¶
You can use any scp client that supports authentication with ssh keys. In our examples we will use the freely available WinSCP.
Click on “New” to create a new connection.
Choose scp as the protocol
Fill in the IP address and port of a login node
Fill in your login information (name and password for your ssh key)
Input the path to your private key
Click “Login” (you can also “Save” your settings)


Winscp provides a comfortable user interface to copy, transfer and delete files.
Resource Allocation & Job Submission¶
To run a job, computational resources for this particular job must be allocated via scheduler. It is Loadleveler on Aurel supercomputer and Žilina cluster, Slurm on Košice cluster.
Loadleveler¶
Basic Commands:
llsubmit script.ll - submit job to the queue
llq - check job status of all jobs in the queue
llstatus - check available resources
llclass – shows available job classes and their parameters
llcancel JOBID – cancel job with the id “JOBID”
Job description and required resources must be defined in a special script (text file) for LoadLeveler.
You can find some job script examples in this directory: /gpfs/home/info/examples
.
Job Script Syntax:
The script consists of key expressions for LoadLeveler (lines starting with #@
) and commands, which will be interpreted.
At the beginning of the script, you need to specify the job’s resources.
Lines with LoadLeveler keywords should not be split by lines that do not contain them.
Keywords are followed by commands for job execution. You will usually use ‘mpiexec’ or ‘poe’ to run your parallel program. You can also use shell commands inside your script. You can find out your name and account number (account_no) using the command “showaccount”.
Script Example (IBM PE):
#!/bin/bash
#@ job_type = parallel
#@ job_name = My_job
#@ account_no = name-number
#@ class = My_class
#@ error = job.err
#@ output = job.out
#@ network.MPI = sn_all,not_shared,US
#@ node = 2
#@ rset = RSET_MCM_AFFINITY
#@ mcm_affinity_options = mcm_mem_req mcm_distribute mcm_sni_none
#@ task_affinity = core(1)
#@ tasks_per_node = 32
#@ queue
mpiexec /path/to/your/app -flags
Script Example (MPICH):
#!/bin/bash
#@ job_type = parallel
#@ account_no = name-number
#@ class = My_class
#@ output = job.out
#@ error = job.err
#@ network.MPI = sn_all,not_shared,US
#@ node = 1
#@ tasks_per_node = 32
#@ queue
export LD_LIBRARY_PATH=/gpfs/home/utils/mpi/mpich2-1.5/lib:$LD_LIBRARY_PATH
export PATH=/gpfs/home/utils/mpi/mpich2-1.5/bin:$PATH
$(which mpiexec) ./soft.x
The most important keywords are #@ total_tasks
, which specifies the number of MPI processes and #@ node
, which specifies the number of nodes. Our example runs a total of 64 tasks on 2 nodes. It is also important to choose the right job class. The following table shows the available classes.
class |
max_node per job |
maxjobs per user |
max_total_tasks per user |
max. walltime (HH:MM) |
priority |
---|---|---|---|---|---|
short |
32 |
-1 |
-1 |
12:00 |
100 |
short_priority |
32 |
-1 |
-1 |
12:00 |
1100 |
medium |
16 |
-1 |
1024 |
48:00 |
80 |
medium_priority |
16 |
-1 |
1024 |
72:00 |
1000 |
testing |
1 |
1:00 |
undefined* |
* this class runs on a single designated node, it is used to test and tune applications.
You can find complete information about classes using llclass -l
command.
Slurm¶
Overview¶
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Architecture¶
As depicted in Figure 1, Slurm consists of a slurmd daemon running on each compute node and a central slurmctld daemon running on a management node (with optional fail-over twin). The slurmd daemons provide fault-tolerant hierarchical communications. The user commands include: sacct, sacctmgr, salloc, sattach, sbatch, sbcast, scancel, scontrol, scrontab, sdiag, sh5util, sinfo, sprio, squeue, sreport, srun, sshare, sstat, strigger and sview. All of the commands can run anywhere in the cluster.

Figure 1. Slurm components
The entities managed by these Slurm daemons, shown in Figure 2, include nodes, the compute resource in Slurm, partitions**, which group nodes into logical (possibly overlapping) sets, jobs, or allocations of resources assigned to a user for a specified amount of time, and job steps, which are sets of (possibly parallel) tasks within a job. The partitions can be considered job queues, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc. Priority-ordered jobs are allocated nodes within a partition until the resources (nodes, processors, memory, etc.) within that partition are exhausted. Once a job is assigned a set of nodes, the user is able to initiate parallel work in the form of job steps in any configuration within the allocation. For instance, a single job step may be started that utilizes all nodes allocated to the job, or several job steps may independently use a portion of the allocation.

Figure 2. Slurm entities
Commands¶
Man pages exist for all Slurm daemons, commands, and API functions. The command option --help
also provides a brief summary of options. Note that the command options are all case sensitive.
sacct
is used to report job or job step accounting information about active or completed jobs.
salloc
is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.
sattach
is used to attach standard input, output, and error plus signal capabilities to a currently running job or job step. One can attach to and detach from jobs multiple times.
sbatch
is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
sbcast
is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.
scancel
is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
scontrol
is the administrative tool used to view and/or modify Slurm state. Note that many scontrol commands can only be executed as user root.
sinfo
reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options.
sprio
is used to display a detailed view of the components affecting a job’s priority.
squeue
reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
srun
is used to submit a job for execution or initiate job steps in real time. srun
has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared resources within the job’s node allocation.
sshare
displays detailed information about fairshare usage on the cluster. Note that this is only viable when using the priority/multifactor plugin.
sstat
is used to get information about the resources utilized by a running job or job step.
strigger
is used to set, get or view event triggers. Event triggers include things such as nodes going down or jobs approaching their time limit.
sview
is a graphical user interface to get and update state information for jobs, partitions, and nodes managed by Slurm.
Examples¶
First we determine what partitions exist on the system, what nodes they include, and general system state. This information is provided by the sinfo
command. In the example below we find there are four partitions: urgent, devel, long and short. The * following the name short indicates this is the default partition for submitted jobs. We see that all partitions are in an UP state. The information about each partition may be split over more than one line so that nodes in different states can be identified. In this case, the two nodes comp28 is down. The * following the state down indicate the nodes are not responding. Note the use of a concise expression for node name specification with a common prefix comp and numeric ranges or specific numbers identified. This format allows for very large clusters to be easily managed. The sinfo command has many options to easily let you view the information of interest to you in whatever format you prefer. See the man page for more information.
[user@login1 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
urgent up infinite 4 drain* comp[02,49-50,56]
urgent up infinite 1 down* comp28
urgent up infinite 51 alloc comp[01,03-27,29-48,51-55]
devel up 12:00:00 4 drain* comp[02,49-50,56]
devel up 12:00:00 1 down* comp28
devel up 12:00:00 51 alloc comp[01,03-27,29-48,51-55]
long up 3-00:00:00 4 drain* comp[02,49-50,56]
long up 3-00:00:00 1 down* comp28
long up 3-00:00:00 51 alloc comp[01,03-27,29-48,51-55]
short* up 1-00:00:00 4 drain* comp[02,49-50,56]
short* up 1-00:00:00 1 down* comp28
short* up 1-00:00:00 51 alloc comp[01,03-27,29-48,51-55]
Next we determine what jobs exist on the system using the squeue
command. The ST field is job state. Jobs are in a running state (R is an abbreviation for Running) while other job is in a pending state (PD is an abbreviation for Pending). The TIME field shows how long the jobs have run for using the format days-hours:minutes:seconds. The NODELIST(REASON) field indicates where the job is running or the reason it is still pending. Typical reasons for pending jobs are Resources (waiting for resources to become available) and Priority (queued behind a higher priority job). The squeue
command has many options to easily let you view the information of interest to you in whatever format you prefer. See the man page for more information.
[user@login1 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
849650 long 0.00-120 nats PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
849092 long Se_TBP_M pskorna PD 0:00 4 (Priority)
849094 long POLYMER_ pskorna PD 0:00 4 (Priority)
849069 long A-Hal_OH danimor PD 0:00 4 (Priority)
849070 long A-Hal_Si danimor PD 0:00 4 (Priority)
849071 long D-Hal_OH danimor PD 0:00 4 (Priority)
849072 long D-Hal_Si danimor PD 0:00 4 (Priority)
849074 long Hall danimor PD 0:00 4 (Priority)
849077 long A-Hal danimor PD 0:00 4 (Priority)
849079 long A-Hal_w danimor PD 0:00 4 (Priority)
849082 long D-Hal danimor PD 0:00 4 (Priority)
849083 long D-Hal_w danimor PD 0:00 4 (Priority)
848974 long ed_Hal_7 danimor R 2-07:38:15 2 comp[01,41]
849020 long ReCO3_PP hrobco R 1-09:32:11 1 comp54
849019 long ReCO3_Cl hrobco R 1-09:34:11 1 comp52
849017 long BBTZ-I_S hrobco R 1-09:39:58 1 comp48
849055 long ReCO3_PM hrobco R 5:17:12 1 comp51
849081 long 04-pyrid hrobco R 9:10:12 1 comp37
849053 long BBTZ-I_S hrobco R 20:50:07 1 comp55
849052 long BBTZ-I_o hrobco R 20:50:11 1 comp47
849093 long 05-pyrid hrobco R 4:56:00 1 comp35
848975 long ed_Hal_7 danimor R 1-21:53:45 2 comp[16-17]
848977 long ed-Hal_1 danimor R 1-21:52:16 2 comp[22-23]
848976 long ed_Hal_1 danimor R 1-21:53:16 2 comp[18-19]
849084 long Se_POLYM pskorna R 6:59:34 4 comp[26,39-40,42]
849078 long Se_POLYM pskorna R 7:49:05 4 comp[11,27,45-46]
849073 long POLYMER_ pskorna R 9:33:42 4 comp[07-10]
849075 long Se_Mt pskorna R 9:34:12 4 comp[31-34]
849002 long ReCO3_PP hrobco R 2-07:37:14 1 comp53
849068 long Hal_Silo danimor R 5:09:36 4 comp[03-04,29-30]
849063 long Hal_OH danimor R 9:34:12 4 comp[12-13,43-44]
849042 long vasp marekm R 23:46:08 4 comp[20-21,24-25]
849041 long vasp marekm R 23:47:11 4 comp[05-06,14-15]
849110_[166-500] short rr12_j1u farky PD 0:00 1 (Resources)
849111_[0-500] short rr12_j1u farky PD 0:00 1 (Priority)
849112_[0-500] short rr12_j1u farky PD 0:00 1 (Priority)
849113_[0-500] short rr12_j1u farky PD 0:00 1 (Priority)
849114_[0-500] short rr12_j1u farky PD 0:00 1 (Priority)
849115_[0-500] short rr12_j1u farky PD 0:00 1 (Priority)
849116_[0-500] short rr12_j1u farky PD 0:00 1 (Priority)
849226 short ml_irc_p monikag PD 0:00 1 (Priority)
849227 short ml_irc_p monikag PD 0:00 1 (Priority)
849110_165 short rr12_j1u farky R 0:35 1 comp36
849110_164 short rr12_j1u farky R 0:41 1 comp48
849110_163 short rr12_j1u farky R 0:45 1 comp48
849110_162 short rr12_j1u farky R 0:46 1 comp36
849110_161 short rr12_j1u farky R 0:50 1 comp36
849110_160 short rr12_j1u farky R 0:52 1 comp36
849110_159 short rr12_j1u farky R 0:55 1 comp55
849110_158 short rr12_j1u farky R 0:59 1 comp55
849110_157 short rr12_j1u farky R 1:01 1 comp36
849110_156 short rr12_j1u farky R 1:19 1 comp36
849110_155 short rr12_j1u farky R 1:32 1 comp38
849110_154 short rr12_j1u farky R 1:53 1 comp47
849110_153 short rr12_j1u farky R 1:55 1 comp47
849110_152 short rr12_j1u farky R 2:31 1 comp38
849110_151 short rr12_j1u farky R 2:51 1 comp53
849110_150 short rr12_j1u farky R 2:55 1 comp53
849110_149 short rr12_j1u farky R 2:56 1 comp53
849110_148 short rr12_j1u farky R 2:59 1 comp53
849110_147 short rr12_j1u farky R 6:23 1 comp38
849110_146 short rr12_j1u farky R 6:26 1 comp38
849110_145 short rr12_j1u farky R 6:32 1 comp38
849110_144 short rr12_j1u farky R 6:35 1 comp38
849110_143 short rr12_j1u farky R 7:20 1 comp35
849110_135 short rr12_j1u farky R 9:19 1 comp35
The scontrol
command can be used to report more detailed information about nodes, partitions, jobs, job steps, and configuration. It can also be used by system administrators to make configuration changes. A couple of examples are shown below. See the man page for more information.
[user@login1 ~]$ scontrol show partition
PartitionName=urgent
AllowGroups=ALL AllowAccounts=urgent AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=comp[01-56]
PriorityJobFactor=1 PriorityTier=30 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=712 TotalNodes=56 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=devel
AllowGroups=ALL AllowAccounts=devel AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=12:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=comp[01-56]
PriorityJobFactor=1 PriorityTier=20 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=712 TotalNodes=56 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=long
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=4 MaxTime=3-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=comp[01-56]
PriorityJobFactor=1 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=REQUEUE
State=UP TotalCPUs=712 TotalNodes=56 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=short
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=6 MaxTime=1-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=comp[01-56]
PriorityJobFactor=1 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=REQUEUE
State=UP TotalCPUs=712 TotalNodes=56 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
[user@login1 ~]$ scontrol show node comp01
NodeName=comp01 Arch=x86_64 CoresPerSocket=6
CPUAlloc=12 CPUTot=12 CPULoad=12.03
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=comp01 NodeHostName=comp01 Version=20.02.6
OS=Linux 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020
RealMemory=31997 AllocMem=0 FreeMem=20025 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=10 Owner=N/A MCS_label=N/A
Partitions=urgent,devel,long,short
BootTime=2021-03-30T09:19:33 SlurmdStartTime=2021-03-30T09:20:29
CfgTRES=cpu=12,mem=31997M,billing=12
AllocTRES=cpu=12
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
[user@login1 ~]$ scontrol show job 849650
JobId=849650 JobName=0.00-1200-n
UserId=nats(1274) GroupId=uk-02(1007) MCS_label=N/A
Priority=35325 Nice=0 Account=nats QOS=normal
JobState=PENDING Reason=Nodes_required_for_job_are_DOWN,_DRAINED_or_reserved_for_jobs_in_higher_priority_partitions Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=3-00:00:00 TimeMin=N/A
SubmitTime=2021-05-24T19:22:17 EligibleTime=2021-05-24T19:22:17
AccrueTime=2021-05-24T19:22:17
StartTime=2021-05-26T10:27:00 EndTime=2021-05-29T10:27:00 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-05-24T20:19:43
Partition=long AllocNode:Sid=login1:10438
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1-1 NumCPUs=2 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=2,node=1,billing=2
Socks/Node=* NtasksPerN:B:S:C=2:0:*:* CoreSpec=*
MinCPUsNode=2 MinMemoryNode=0 MinTmpDiskNode=0
Features=k20m DelayBoot=00:00:00
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
Command=/lustre/home/nats/ML-NVT/needle/1200/lambda0.00/run
WorkDir=/lustre/home/nats/ML-NVT/needle/1200/lambda0.00
StdErr=/lustre/home/nats/ML-NVT/needle/1200/lambda0.00/job.out
StdIn=/dev/null
StdOut=/lustre/home/nats/ML-NVT/needle/1200/lambda0.00/job.out
Power=
MailUser=nats MailType=NONE
It is possible to create a resource allocation and launch the tasks for a job step in a single command line using the srun
command. Depending upon the MPI implementation used, MPI jobs may also be launched in this manner. See the MPI section for more MPI-specific information. In this example we execute /bin/hostname
on three nodes (-N3) and include task numbers on the output (-l). The default partition will be used. One task per node will be used by default. Note that the srun
command has many options available to control what resource are allocated and how tasks are distributed across those resources.
[user@login1 ~]$ srun -N3 -l /bin/hostname
0: comp04
1: comp07
2: comp10
This variation on the previous example executes /bin/hostname in four tasks (-n4). One processor per task will be used by default (note that we don’t specify a node count).
[user@login1 ~]$ srun -n4 -l /bin/hostname
0: comp22
1: comp22
2: comp22
3: comp22
One common mode of operation is to submit a script for later execution. In this example the script name is my.script and we explicitly use the nodes comp09 and comp10 (-w “adev[9-10]”, note the use of a node range expression). We also explicitly state that the subsequent job steps will spawn four tasks each, which will ensure that our allocation contains at least four processors (one processor per task to be launched). The output will appear in the file my.stdout (“-o my.stdout”). This script contains a timelimit for the job embedded within itself. Other options can be supplied as desired by using a prefix of “#SBATCH” followed by the option at the beginning of the script (before any commands to be executed in the script). Options supplied on the command line would override any options specified within the script. Note that my.script contains the command /bin/hostname
that executed on the first node in the allocation (where the script runs) plus two job steps initiated using the srun
command and executed sequentially.
[user@login1 ~]$ cat my.script
#!/bin/sh
#SBATCH --time=1
/bin/hostname
srun -l /bin/hostname
srun -l /bin/pwd
[user@login1 ~]$ sbatch -n4 -w "comp[09-10]" -o my.stdout my.script
sbatch: Submitted batch job 849469
[user@login1 ~]$ cat my.stdout
comp09
0: comp09
1: comp09
2: comp10
3: comp10
0: /home/user
1: /home/user
2: /home/user
3: /home/user
The final mode of operation is to create a resource allocation and spawn job steps within that allocation. The salloc
command is used to create a resource allocation and typically start a shell within that allocation. One or more job steps would typically be executed within that allocation using the srun
command to launch the tasks (depending upon the type of MPI being used, the launch mechanism may differ, see MPI details below). Finally the shell created by salloc
would be terminated using the exit command. Slurm does not automatically migrate executable or data files to the nodes allocated to a job. Either the files must exists on local disk or in some global file system (e.g. NFS or Lustre). We provide the tool sbcast
to transfer files to local storage on allocated nodes using Slurm’s hierarchical communications. In this example we use sbcast
to transfer the executable program a.out to /tmp/joe.a.out on local storage of the allocated nodes. After executing the program, we delete it from local storage
[user@login1 ~]$ salloc -N1024 bash
$ sbcast a.out /tmp/joe.a.out
Granted job allocation 849471
$ srun /tmp/joe.a.out
Result is 3.14159
$ srun rm /tmp/joe.a.out
$ exit
salloc: Relinquishing job allocation 849471
In this example, we submit a batch job, get its status, and cancel it.
[user@login1 ~]$ sbatch test
srun: jobid 849473 submitted
[user@login1 ~]$ squeue -u user
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
849473 batch test user R 00:00 1 comp09
[user@login1 ~]$ scancel 849473
[user@login1 ~]$ squeue -u user
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Best Practices, Large Job Counts¶
Consider putting related work into a single Slurm job with multiple job steps both for performance reasons and ease of management. Each Slurm job can contain a multitude of job steps and the overhead in Slurm for managing job steps is much lower than that of individual jobs.
Job arrays are an efficient mechanism of managing a collection of batch jobs with identical resource requirements. Most Slurm commands can manage job arrays either as individual elements (tasks) or as a single entity (e.g. delete an entire job array in a single command).
Job Syntax¶
The script consists of key expressions for Slurm (lines starting with #SBATCH
) and commands, which will be interpreted.
At the beginning of the script, you need to specify the job’s resources.
Lines with Slurm keywords should not be split by lines that do not contain them.
Keywords are followed by commands for job execution. You will usually use ‘srun’ or ‘mpirun’ to run your parallel program. You can also use shell commands inside your script.
MPI¶
MPI use depends upon the type of MPI being used. There are three fundamentally different modes of operation used by these various MPI implementation.
Slurm directly launches the tasks and performs initialization of communications through the PMI2 or PMIx APIs. (Supported by most modern MPI implementations.)
Slurm creates a resource allocation for the job and then mpirun launches tasks using Slurm’s infrastructure (older versions of OpenMPI).
Slurm creates a resource allocation for the job and then mpirun launches tasks using some mechanism other than Slurm, such as SSH or RSH. These tasks initiated outside of Slurm’s monitoring or control. Slurm’s epilog should be configured to purge these tasks when the job’s allocation is relinquished. The use of pam_slurm_adopt is also strongly recommended.
Script example (openMPI):
#!/bin/bash
#SBATCH --partition=short
#SBATCH --job-name=example
#SBATCH --nodes=2
#SBATCH --output=out.txt
#SBATCH --error=err.txt
#SBATCH --ntasks=24
#SBATCH --time=10:00
#SBATCH --ntasks-per-node=12
mpirun /path/to/your/executable
The following table shows available partitions and their parameters:
partition |
max_node per job |
max. walltime (HH:MM) |
priority |
---|---|---|---|
short |
6 |
24:00 |
10 |
long |
4 |
72:00 |
10 |
devel |
12:00 |
20 |
|
urgent |
infinite |
30 |
User Support¶
If you encounter any problems or need support you can create a ticket on https://register.sivvp.sk/en/support or email us to hpcsupport@savba.sk.