Usually jobs on the cluster are started by submitting a job or batch script to one of the partitions (queues). Slurm will then take care of reserving the correct amount of resources and start the application on the reserved nodes. A job script, in the case of slurm, is a bash script and can be written locally (using your favorite plain text editor) or directly within the console on the cluster (using VIM, Emacs, nano ...). A typical example script named job.sh is given below:


Example Job Script
#!/bin/bash

#SBATCH --nodes=1					# the number of nodes you want to reserve
#SBATCH --ntasks-per-node=1 		# the number of tasks/processes per node
#SBATCH --cpus-per-task=36          # the number cpus per task
#SBATCH --partition=normal			# on which partition to submit the job
#SBATCH --time=24:00:00				# the max wallclock time (time limit your job will run)

#SBATCH --job-name=MyJob123			# the name of your job
#SBATCH --mail-type=ALL				# receive an email when your job starts, finishes normally or is aborted
#SBATCH --mail-user=your_account@uni-muenster.de # your mail address

# LOAD MODULES HERE IF REQUIRED
...
# START THE APPLICATION
...

The #!/bin/bash tells the script to use bash. #SBATCH  is a slurm directive and is used to configure slurm. Everywhere else the # sign is used to for comments.


You can submit your script to the batch system with the command: sbatch job.sh

MPI parallel Jobs

Start an MPI job with 72 MPI ranks distributed on 2 nodes for 1 hour on the normal partition. Instead of mpirun, the preferred command to start MPI jobs within slurm is srun.

MPI Job Script
#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=36
#SBATCH --partition=normal
#SBATCH --time=01:00:00

#SBATCH --job-name=MyMPIJob123
#SBATCH --output=output.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de

# load needed modules
module load intel

# Previously needed for Intel MPI (as we do here) - not needed for OpenMPI
# export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

# run the application
srun /path/to/my/mpi/program

Note that srun here is starting as many tasks as you requested with --ntasks-per-node. It is essentially a substitute for mpirun. Know what you are doing when your use it!

OpenMP parallel Jobs

Start a job on 36 CPUs with 1 threads each for 1 hour on the normal partition.

OpenMP Job Script
#!/bin/bash

#SBATCH --nodes=1
#SBATCH --cpus-per-task=36
#SBATCH --partition=normal
#SBATCH --time=01:00:00

#SBATCH --job-name=MyMPIJob123
#SBATCH --output=output.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de

# Bind each thread to one core
export OMP_PROC_BIND=TRUE
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# load needed modules
module load intel

# run the application
/path/to/my/openmp/program

Hybrid MPI/OpenMP Jobs

Start a job on 2 nodes, 9 MPI tasks per node, 4 OpenMP threads per task.

Hybrid MPI/OpenMP Job Script
#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=9
#SBATCH --partition=normal
#SBATCH --time=01:00:00

#SBATCH --job-name=MyMPIJob123
#SBATCH --output=output.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de

export OMP_NUM_THREADS=4

# load needed modules
module load intel

# run the application
srun /path/to/my/hybrid/program

GPU jobs in gpu2080  partition

You can use the following submit script to use the gpu2080 Partition (assuming that your code runs on a single GPU). Please be aware that the nodes have 32 CPU cores and 8 GPUs, so please don't use more than 4 cores per GPU! Same for memory, there are 240GB of usable RAM, so don't use more than 30 GB per GPU reserved. Adjust the reserved time according to your needs.

GPU jobs in gpu2080 partition
#!/bin/bash  

#SBATCH --partition=gpu2080
#SBATCH --nodes=1
#SBATCH --mem=30G
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:1
#SBATCH --time=0-01:00:00
#SBATCH --job-name=MyCUDAJob
#SBATCH --output=output.dat
#SBATCH --error=error.dat 
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de

# load needed modules module purge
ml palma/2023a
ml CUDA
ml foss
ml UCX-CUDA
ml CMake

# Use UCX to be compatible with Nvidia (formerly Mellanox) Infiniband adapters  
export OMPI_MCA_pml=ucx

# run the application using mpirun in this case 
./my_program

Hybrid MPI/OpenMP/CUDA Jobs

Start a job on 2 nodes, 2 MPI tasks per node, 4 OpenMP threads per task, 2 GPUs:

Hybrid MPI/OpenMP Job Script
#!/bin/bash  

#SBATCH --partition=gpu2080
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:2
#SBATCH --job-name=MyMPIOpenMPCUDAJob
#SBATCH --output=output.dat
#SBATCH --error=error.dat 
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de

export OMP_NUM_THREADS=4

# load needed modules module purge
ml palma/2022a
ml CUDA/11.7.0
ml foss/2022a
ml UCX-CUDA/1.12.1-CUDA-11.7.0
ml CMake/3.23.1

# Use UCX to be compatible with Nvidia (formerly Mellanox) Infiniband adapters  
export OMPI_MCA_pml=ucx

# run the application using mpirun in this case 
mpirun /path/to/my/hybrid/program

Interactive Jobs

You can request resources from SLURM and it will allocate an interactive shell to the user. On the login node type the following into your shell:

Interactive Session
salloc --nodes 1 --cpus-per-task 36 -t 00:30:00 --partition express

This will give you a session with 36 CPUs for 30 minutes on the express partition. You will automatically be forwarded to a compute node.

Flexible Submission Skript

If you want to change parameters in your script without actually editing it, you can use command line arguments overwriting the #SBATCH pragmas in the script:

Interactive Session
sbatch --cpus-per-task 16 submit_script.sh
  • No labels