Running batch jobs
Contents
- The concept of virtual nodes
- Job sizes
- Job runtimes
- Accounts
- File systems and log files
- Temporary/scratch files
- Job reports
See also:
The concept of virtual nodes
Hummel-2 can be viewed as a cluster consisting of virtual nodes (not to be confused with virtual machines). Each virtual node has 8 CPU cores which share the same L3 data cache. The batch system always allocates full virtual nodes. Besides 8 CPU cores each virtual node has in the
std
partition: 32 GB RAM,big
partition: 96 GB RAM,gpu
partition: 144 GB RAM and 1 NVIDIA H100 80 GB GPU.
Unfortunately, the batch system does know the concept of a virtual node. Hence, the unit virtual node can not be used to request resources from the batch system.
In practice the number of virtual nodes a batch job gets allocated is determined by the requested number of CPU cores or GPUs, respectively:
std
partition
- For single-node jobs the number of virtual nodes is determined by
the number of CPU cores requested:
number of virtual nodes = ceiling(
--ntasks
×--cpus-per-task
/ 8) - Full nodes (192 cores) can be requested.
- Multi-node jobs must use full nodes throughout (must specify
--exclusive
).
big
partition
- Only full nodes (192 cores) can be requested.
gpu
partition
- The number of virtual nodes is given by the
--gpus
parameter.
All partitions
- A memory parameter (like
--mem
) must never be used.
Job sizes
std
and big
partitions
Hummel-2 is designed to run parallel programs. The goal is that each job uses at least half of the number of cores requested or half of the RAM (implicitly) requested.
Single-core tasks should be packed and executed in parallel in order to achieve that goal. This process is called trivial parallelization. On Hummel-2 it can easily accomplished with the
The majority of jobs will fit into a single node.
Multi-node jobs must use full nodes.
gpu
partition
- Single-GPU jobs are expected to be dominant.
- Multi-GPU jobs are possible. A multi-GPU job will be scheduled onto a single node.
Job runtimes
Runtimes should be hours, not minutes. In order to obtain reasonable runtimes short tasks can (like small tasks) be packed with the
Very long runtimes (days) should be avoided by splitting jobs if this is possible.
Accounts
In order to achieve a fair distribution of computing time the fairshare algorithm is employed, see:
In the fair tree account subtrees were introduced per
partition. The consequence is that users who are allowed to use more
than one partition must specify --account
in addition to
--partition
. The account names are:
WorkingGroupName_std
WorkingGroupName_big
WorkingGroupName_gpu
File systems and log files
In batch jobs
- the
/home
file system is readonly, - the
/usw
file system is readonly.
Also log files cannot be written there. If you submit jobs
from $HOME
or $USW
a log filename including an
absolute path must be specified with --output=
, i.e. the
first character following --output=
must be a slash
(/
).
Temporary/scratch files
All directories mentioned in this section will be automatically created by the batch system and automatically deleted at job end.
Each job can use directories
/tmp
and/dev/shm
. Both are virtual file systems that are both kept in memory, i.e. a job can run out-of-memory if to much data is written there.$RRZ_GLOBAL_TMPDIR
is a directory in the/beegfs
file system.
Job reports
Users should check whether their batch jobs use resources
efficiently. This can be coomplished with the RRZ tool rrz-batch-jobreport
.