RRZ tools
rrz-batch-jobreport
This page is under construction!
The command rrz-batch-jobreport
displays information for
a batch job with a given job-ID:
rrz-batch-jobreport jobID
Resources shown are: CPU, memory, disk, network and possibly GPU and GPU-memory.
rrz-batch-jobreport
can be called for completed as well
as for running batch jobs. At the end of every batch a job-report is
written to the job’s stderr file automatically.
Users should check at least job report summaries regularly.
Motivation
Motivations for checking resource usage are
- to improve resource utilisation (in particular of CPUs, GPUs and RAM),
- to find performance bottlenecks (in particular for I/O).
Limitations
Job reports must not be confused with performance reports. Job reports only check utilisation rather than performance. The goal is to achieve good performance. Good utilisation is a necessary but not sufficient condition for performance: performance can be low even if CPU or GPU utilisation looks good. For completeness it should be mentioned that even performance is not the ultimate criterion. The latter is the shortest possible execution time (and today also low energy consumption). This requires to consider algorithms in the first place. Shorter executions times might be achieved with a better algorithm although performance (measured in operations per second) might be lower.
It is important to know that parallel performance (good scaling) is not guaranteed if all CPU cores are busy. Therefore, explicit scaling tests are mandatory.
Another limitation is the reproducibility of time measurements. Time measurements are only resonably reproducible if hardware is used exlcusively, which, in general, is not the case on a cluster where hardware is shared with other users. Nonetheless, parts of the hardware can be consider to be provided non-shared.
Shared and non-shared resources
rrz-batch-jobreport
collects data from the operating
system on a per-node basis (in contrast to per-job).
For jobs that use(d) full nodes all resources of the nodes used are
non-shared, i.e. all data displayed by
rrz-batch-jobreport
is data of the corresponding job. For
smaller jobs, which share(d) a node with other jobs, it is important to
keep in mind that some numbers shown by rrz-batch-jobreport
do not apply to the job but rather to the whole node.
Shared resources
These resources are always shared (unless the whole cluster is used excusively):
- network
- disks of distributed file systems
Non-shared resources
Typically, a node of a cluster is a non-shared resource, i.e. non-shared resources can be:
- CPU and memory (RAM)
- GPU and GPU-memory
- local disks (not available on Hummel-2)
Hummel-2 is configured in such a way that CPU, GPU and memory are practically non-shared. Technically this is achieved by not sharing the largest data caches and by putting batch jobs into cgroups.
Taking advantage of job reports
Job reports can help to improve machine utilisation in the following ways:
CPU and GPU utilisation. CPUs and GPUs are the most expensive components of an HPC system. Therefore, one should strive for high CPU or GPU utilisation, respectively. There are two main reasons for under-utilisation:
- some CPUs/GPUs are not used at all (typically as a consequence of a bad job specification)
- CPUs/GPUs are waiting for disk I/O operations (reasons can be that disks are heavily used by other users or that the program itself has an I/O bottleneck)
Memory high water mark. Peak memory usage determines whether a program fits into the memory of a compute node. If this value is known, smaller compute nodes can be used, if available. The value can also be used to estimate the maximal problem size that would fit into a given type of node.
Summary
The summary gives a quick overview and implies hints for action. It appears at the end of the report because this position makes it easy to find. Examples for a CPU and a GPU job:
Summary: Elapsed time: 7% (0.2 out of 3.0 h timelimit) CPU: 100% (8.0 out of 8 physical CPU cores) Max. main memory: 3% (0.9 out of 31.3 GiB min. available per node)
Summary: Elapsed time: 35% (0.4 out of 1.0 h timelimit) GPU: 70% (0.7 out of 1 GPUs) CPU: 12% (1.0 out of 8 physical CPU cores) Max. GPU memory: 3% (2.4 out of 79.7 GiB per GPU) Max. main memory: 2% (2.9 out of 141.1 GiB min. available per node)