First test jobs
This page contains two batch job examples, that can be used to get started with batch processing:
- First SLURM job which could be run on any SLURM installation.
- First Hummel job which has the properties of any batch job to be run on Hummel.
First SLURM job
This job is generic, i.e. it contains nothing that is specific to our cluster. It contains resource specifications (--ntasks
and --time
) and one command (echo
).
line no. |
first-slurm-job.sh |
1 2 3 4 5 6 7 |
#!/bin/bash #SBATCH --ntasks=1 #SBATCH --time=00:01:00
echo "Hello, world!"
exit |
---|
First Hummel job
This job has all characteristic properties of a batch job to be run on our cluster:
- First of all it is a parallel job. It uses 2 nodes and all cores on each node. On Hummel all jobs must be parallel jobs. The smallest job size is 1 node. At least 50% of the cores or 50% of the main memory shall be used.
- The recommended submit option --export=NONE is set.
- The recommended initialization source /sw/batch/init.sh is included.
srun
command for demonstration purpose.
(OpenMP programs do not need srun
.
MPI programs shall be started with mpirun
.)
line no. |
first-hummel-job.sh |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
#!/bin/bash # Do not forget to select a proper partition if the default # one is no fit for the job! You can do that either in the sbatch # command line or here with the other settings. #SBATCH --job-name=hello #SBATCH --nodes=2 #SBATCH --tasks-per-node=16 #SBATCH --time=00:10:00 #SBATCH --export=NONE # Never forget --export=NONE! Strange happenings ensue otherwise.
set -e # Good idea to stop operation on first error.
source /sw/batch/init.sh
# Load environment modules for your application here.
# Actual work starting here. You might need to call # srun or mpirun depending on your type of application # for proper parallel work. # Example for a simple command (that might itself handle # parallelisation). echo "Hello World! I am $(hostname -s) greeting you!" echo "Also, my current TMPDIR: $TMPDIR"
# Let's pretend our started processes are working on a # predetermined parameter set, looking up their specific # parameters using the set number and the process number # inside the batch job. export PARAMETER_SET=42
# Simplest way to run an identical command on all allocated # cores on all allocated nodes. Use environment variables to # tell apart the instances. srun bash -c 'echo "process $SLURM_PROCID \ (out of $SLURM_NPROCS total) on $(hostname -s) \ parameter set $PARAMETER_SET"'
exit |
---|
Assume that the script resides in $HOME/first-hummel-job.sh
. Because the /home
filesystem is mounted read-only on the compute nodes the job should be submitted from /work
(otherwise no output could be written), for example:
shell$mkdir $WORK/first_workdir
shell$cd $WORK/first_workdir
shell$sbatch $HOME/first-hummel-job.sh
Submitted batch job 123456
In the message from the sbatch
command 123456
is the job ID by which the job is identified in the job queue. When a job is being executed it writes its output to a log file. The default name of a SLURM log file is slurm-jobID.out
. The default location of a log file is the directory from where the job was submitted. After completion the log file can be inspected:
shell$cat slurm-123456.out
module: loaded site/slurm module: loaded site/tmpdir module: loaded site/hummel module: loaded env/system-gcc Hello World! I am node223 greeting you! Also, my current TMPDIR: /scratch/rrztest.123456 process 8 (out of 32 total) on node223 parameter set 42 process 15 (out of 32 total) on node223 parameter set 42 process 4 (out of 32 total) on node223 parameter set 42 process 5 (out of 32 total) on node223 parameter set 42 process 9 (out of 32 total) on node223 parameter set 42 process 7 (out of 32 total) on node223 parameter set 42 process 3 (out of 32 total) on node223 parameter set 42 process 6 (out of 32 total) on node223 parameter set 42 process 11 (out of 32 total) on node223 parameter set 42 process 2 (out of 32 total) on node223 parameter set 42 process 13 (out of 32 total) on node223 parameter set 42 process 12 (out of 32 total) on node223 parameter set 42 process 1 (out of 32 total) on node223 parameter set 42 process 10 (out of 32 total) on node223 parameter set 42 process 0 (out of 32 total) on node223 parameter set 42 process 14 (out of 32 total) on node223 parameter set 42 process 28 (out of 32 total) on node224 parameter set 42 process 23 (out of 32 total) on node224 parameter set 42 process 26 (out of 32 total) on node224 parameter set 42 process 27 (out of 32 total) on node224 parameter set 42 process 30 (out of 32 total) on node224 parameter set 42 process 19 (out of 32 total) on node224 parameter set 42 process 18 (out of 32 total) on node224 parameter set 42 process 22 (out of 32 total) on node224 parameter set 42 process 25 (out of 32 total) on node224 parameter set 42 process 17 (out of 32 total) on node224 parameter set 42 process 29 (out of 32 total) on node224 parameter set 42 process 21 (out of 32 total) on node224 parameter set 42 process 24 (out of 32 total) on node224 parameter set 42 process 16 (out of 32 total) on node224 parameter set 42 process 31 (out of 32 total) on node224 parameter set 42 process 20 (out of 32 total) on node224 parameter set 42
Note that the output demonstrates that 32 processes were executed on 2 nodes, and that the output is not ordered, which is typical for all parallel programs.