Usually jobs on the cluster are started by submitting a job or batch script to one of the partitions (queues). Slurm will then take care of reserving the correct amount of resources and start the application on the reserved nodes. A job script, in the case of slurm, is a bash script and can be written locally (using your favorite plain text editor) or directly within the console on the cluster (using VIM, Emacs, nano ...). A typical example script named job.sh is given below:
#!/bin/bash #SBATCH --nodes=1 # the number of nodes you want to reserve #SBATCH --ntasks-per-node=1 # the number of tasks/processes per node #SBATCH --cpus-per-task=36 # the number cpus per task #SBATCH --partition=normal # on which partition to submit the job #SBATCH --time=24:00:00 # the max wallclock time (time limit your job will run) #SBATCH --job-name=MyJob123 # the name of your job #SBATCH --mail-type=ALL # receive an email when your job starts, finishes normally or is aborted #SBATCH --mail-user=your_account@uni-muenster.de # your mail address # LOAD MODULES HERE IF REQUIRED ... # START THE APPLICATION ...
The #!/bin/bash tells the script to use bash. #SBATCH is a slurm directive and is used to configure slurm. Everywhere else the # sign is used to for comments.
You can submit your script to the batch system with the command: sbatch job.sh
MPI parallel Jobs
Start an MPI job with 72 MPI ranks distributed on 2 nodes for 1 hour on the normal partition. Instead of mpirun, the preferred command to start MPI jobs within slurm is srun.
#!/bin/bash #SBATCH --nodes=2 #SBATCH --ntasks-per-node=36 #SBATCH --partition=normal #SBATCH --time=01:00:00 #SBATCH --job-name=MyMPIJob123 #SBATCH --output=output.dat #SBATCH --mail-type=ALL #SBATCH --mail-user=your_account@uni-muenster.de # load needed modules module load intel # Previously needed for Intel MPI (as we do here) - not needed for OpenMPI # export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so # run the application srun /path/to/my/mpi/program
Note that srun here is starting as many tasks as you requested with --ntasks-per-node. It is essentially a substitute for mpirun. Know what you are doing when your use it!
OpenMP parallel Jobs
Start a job on 36 CPUs with 1 threads each for 1 hour on the normal partition.
#!/bin/bash #SBATCH --nodes=1 #SBATCH --cpus-per-task=36 #SBATCH --partition=normal #SBATCH --time=01:00:00 #SBATCH --job-name=MyMPIJob123 #SBATCH --output=output.dat #SBATCH --mail-type=ALL #SBATCH --mail-user=your_account@uni-muenster.de # Bind each thread to one core export OMP_PROC_BIND=TRUE export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # load needed modules module load intel # run the application /path/to/my/openmp/program
Hybrid MPI/OpenMP Jobs
Start a job on 2 nodes, 9 MPI tasks per node, 4 OpenMP threads per task.
#!/bin/bash #SBATCH --nodes=2 #SBATCH --ntasks-per-node=9 #SBATCH --partition=normal #SBATCH --time=01:00:00 #SBATCH --job-name=MyMPIJob123 #SBATCH --output=output.dat #SBATCH --mail-type=ALL #SBATCH --mail-user=your_account@uni-muenster.de export OMP_NUM_THREADS=4 # load needed modules module load intel # run the application srun /path/to/my/hybrid/program
Hybrid MPI/OpenMP/CUDA Jobs
Start a job on 2 nodes, 2 MPI tasks per node, 4 OpenMP threads per task, 2 GPUs:
#!/bin/bash #SBATCH --partition=gpu2080 #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=1 #SBATCH --gres=gpu:2 #SBATCH --job-name=MyMPIOpenMPCUDAJob #SBATCH --output=output.dat #SBATCH --error=error.dat #SBATCH --mail-type=ALL #SBATCH --mail-user=your_account@uni-muenster.de export OMP_NUM_THREADS=4 # load needed modules module purge ml palma/2022a ml CUDA/11.7.0 ml foss/2022a ml UCX-CUDA/1.12.1-CUDA-11.7.0 ml CMake/3.23.1 # Use UCX to be compatible with Nvidia (formerly Mellanox) Infiniband adapters export OMPI_MCA_pml=ucx # run the application using mpirun in this case mpirun /path/to/my/hybrid/program
Interactive Jobs
You can request resources from SLURM and it will allocate an interactive shell to the user. On the login node type the following into your shell:
salloc --nodes 1 --cpus-per-task 36 -t 00:30:00 --partition express
This will give you a session with 36 CPUs for 30 minutes on the express partition. You will automatically be forwarded to a compute node.
Flexible Submission Skript
If you want to change parameters in your script without actually editing it, you can use command line arguments overwriting the #SBATCH pragmas in the script:
sbatch --cpus-per-task 16 submit_script.sh