Running independent tasks with jobber
The examples on this page show how to use the RRZ tool jobber in batch jobs. (The jobber page should be read first.) The main idea of jobber is to enable filling compute nodes with single-core tasks:
--nodesshould always be1.--ntasks-per-nodeis the number of physical cores to be used (on Hummel-2 typically a multiple of 8).- If more parallelism is needed, the same jobber batch job can be submitted several times, i.e. the number of nodes used for processing a task list is given by the number of batch jobs initially submitted (in contrast to running a single job on several nodes). This possibility is a prominent feature of jobber.
 
There are three examples:
- Executing all tasks in a single job: 
jobber-all-tasks.sh - Executing a fixed number of tasks: 
jobber-n-tasks.sh - Using jobber in a job chain: 
jobber-job-chain.sh 
In order to try the examples a task list needs to be generated first. In all examples the task list is called task.list. For testing a list of sleep tasks can be created this way:
shell$ { for i in $(seq 1 100); do echo "sleep 10; echo '--task-$i--'"; done } > task.list
Afterwards each example can be run in batch mode by entering these commands:
shell$ module load jobber shell$ jobber task.list cleanup init shell$ sbatch jobber-example.sh
Executing all tasks in a single job
This example shows the simplest way of using jobber in a batch job: all tasks specified in the task list are executed in a single job.
| 
line no.  | 
/sw/batch/examples/jobber/jobber-all-tasks.sh | 
12345678910111213 | 
#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=16#SBATCH --time=00:05:00#SBATCH --export=NONEsource /sw/batch/init.shmodule load jobberjobber -p $SLURM_NTASKS_PER_NODE task.list allexit | 
|---|
Executing a fixed number of tasks
In many cases it will be impossible to execute all tasks from the task list in a single job (because this would exceed the batch job's time limit).  In such cases the total number of task to be executed can be specified. If execution times per task are similar and tasks are executed in parallel it is more natural to specify the number of tasks that shall be executed per parallel slot (because in this situation the run time of the batch job is approximately given by the number of tasks per slot times the execution time per task). In the example the variable n_tasks_per_parallel_slot contains the number of tasks per parallel slot.
| 
line no.  | 
/sw/batch/examples/jobber/jobber-n-tasks.sh | 
123456789101112131415161718 | 
#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=16#SBATCH --time=00:01:00#SBATCH --export=NONEsource /sw/batch/init.shmodule load jobbern_tasks_per_parallel_slot=2parallel_slots=$SLURM_NTASKS_PER_NODE n_tasks=$(($parallel_slots * $n_tasks_per_parallel_slot))jobber -p $parallel_slots task.list $n_tasksexit | 
|---|
Using jobber in a job chain
A batch job chain is a batch job that submits itself again before it ends unless a stop condition is met. Effectively a sequence of jobs is started at submission of the first job on the command line because subsequent jobs will be started automatically. In conjunction with jobber this allows to launch the execution of a long task list with a single submit command. In the example the -e/--endtime option is employed to decide when to stop executing (see also the jobber time-limit example).  The end time is obtained from the batch system's squeue command.      
Some care must be taken with job chains. In particular, an endless chain must be avoided:
- A job chain script should immediately stop if an error occurs (in order to not submit the next job which is expected to run into the same problem again). This is achieved by 
setting the-euflags of the shell. (set -x, which is set in addition, helps to trace back problems.) - The stopping mechanism must be robust. The 
moreaction of jobber is provided for that purpose. - If re-submissions should go out of control remember that a chain which is implemented like 
jobber-job-chain.shcan be stopped by renaming the self-submitting script! 
| 
line no.  | 
/sw/batch/examples/jobber/jobber-job-chain.sh | 
12345678910111213141516171819202122 | 
#!/bin/bash#SBATCH --nodes=1#SBATCH --ntasks-per-node=16#SBATCH --time=00:01:00#SBATCH --export=NONEsource /sw/batch/init.shmodule load jobberset -euxtask_list=task.listthis_file=jobber-job-chain.shend_time=$(squeue -h -j $SLURM_JOB_ID -O EndTime)jobber -p $SLURM_NTASKS_PER_NODE -e "$end_time" "$task_list" alljobber "$task_list" more && sbatch "$this_file"exit | 
|---|