The batch or job scheduling system on PALMA-II is called SLURM. If you are used to PBS/Maui and want to switch to SLURM, this document might help you. The job scheduler is used to start and manage computations on the cluster but also to distribute resources among all users depending on their needs. Computation jobs (but also interactive sessions) can be submitted to different queues (or partitions in the slurm language), which have different purposes:

Partitions

Available for everyone:

NamePurposeCPU Arch

# Nodes

# GPUs
/ node

Compute capability of GPUmax. CPUs (threads)
/ node
max. Mem
/ node
max. WalltimeBeeOND storage
normalgeneral computationsSkylake
(Gold 6140)

143

160

--36

92 GB

192 GB

24 hours350 GB
longgeneral computations

Skylake
(Gold 6140)


--36

92 GB

192 GB

7 days350 GB
express

short running (test) jobs

compilation

Skylake
(Gold 6140)
5--3692 GB2 hours350 GB
bigsmpSMPSkylake
(Gold 6140)
3--721.5 TB7 days350 GB
largesmpSMPSkylake
(Gold 6140)
2--723 TB7 days350 GB
requeue*

This queue will use the free nodes from
the group exclusive nodes listed below.

Skylake

(Gold 6140)

68

50

3

--

36

36

72

92 GB

192 GB

1.5 TB

1 day
350 GB
gpuv100Nvidia V100 GPUsSkylake
(Gold 6140)
147.024192 GB7 days930 GB
vis-gpuNvidia Titan XPSkylake
(Gold 6140)
186.124192 GB2 days--
visVisualization / GUIsSkylake
(Gold 6140)
1--3692 GB2 hours--
broadwellLegacy Broadwell CPUs

Broadwell
(E5-2683 v4)

44--32118 GB7 days168 GB
zen2-128C-496G
SMP

Zen2 (EPYC 7742)

12--128496 GB7 days1,8 TB
gpu2080GeForce RTX 2080 Ti

Zen3
(EPYC 7513)

587.532240 GB7 days930 GB
gpuexpress
GeForce RTX 2080 Ti

Zen3
(EPYC 7513)

1

8

7.532240 GB2 hours930 GB
gputitanrtxNvidia Titan RTX

Zen3
(EPYC 7343)

147.532240 GB7 days1,4 TB
gpu3090GeForce RTX 3090

Zen3
(EPYC 7413)

288.648240 GB7 days--
gpua100Nvidia A100

Zen3
(EPYC 7513)

548.032240 GB7 days930 GB
gpuhgx
Nvidia A100 SXM 80GB

Zen3
(EPYC 7343)

288.064990 GB7 days

7 TB

gpuexpress

You can allocate a maximum of 1 Job with 2 GPUs, 8 CPU cores and 60G of RAM on this node.

requeue*

If your jobs are running on one of the requeue nodes while they are requested by one of the exclusive group partitions, your job will be terminated and resubmitted, so use with care!

Group exclusive:

Name

# Nodes

max. CPUs (threads) / nodemax. Mem / nodemax. Walltime
p0fuchs93692 GB7 days
p0kulesz

6

3

36

92 GB

192 GB

7 days
p0kapp13692 GB7 days
p0klasen

1

1

36

92 GB

192 GB

7 days
hims

25

1

36

92 GB

192 GB

7 days
d0ow13692 GB7 days
q0heuer153692 GB7 days
e0mi236192 GB7 days
e0bm136192 GB7 days
p0rohlfi

7

8

36

92 GB

192 GB

7 days
SFB8583721.5 TB21 days
The above listed partitions and their resources might vary from time to time. To receive further and up to date information about the partitions use: scontrol show partition. Even more details can be shown by using sinfo -p <partitionname> or just sinfo.
  • Keine Stichwörter