Slurm
A simple program to control job execution. Allows us to run, pause and stop jobs. The jobs are launched through simple bash scripts containing a few parameters used by Slurm. Slurm consider CPUs as consumables, when we allocate 1 CPU it is locker and can't be allocated to another job.
More details and example files on Git
Running a job
Create a executable bash file
It is a regular bash file where we add SBATCH parameters through bash comments.
#!/bin/bash #SBATCH --job-name=sbatch_stress8 #SBATCH --output=/dev/null # Change it by the path you want the output to go #SBATCH --ntasks=1 # Number of tasks it runs, if ntasks=2 then the process will be run 2 times srun java -jar my-job.jar # or any executable tasks srun my_script.shBe careful if your job need more CPU than what you define it may not run (check the --output)
Run the job
Then we run the job using the sbatch command.
sbatch my_sbatch_script.sh
Display the job queue
R = running job, PD = pending job
squeue
Stopping jobs
scancel $jobid scancel -u $username
Suspend a job
sudo scontrol suspend $jobid
Resume a suspended job
sudo scontrol resume $jobid
Slurm global config
Display configuration
show config
Display and edit a node
show node node1 update NodeName=node1 State=RESUME # (or IDLE for instance)