|
|
# Running a task on the cluster
|
|
|
|
|
|
Jobs can be run on the cluster through the [srun](https://slurm.schedmd.com/srun.html) or [sbatch](https://slurm.schedmd.com/sbatch.html) commands. Both command supports common options, the main differences being that
|
|
|
* 'srun' is blocking, will print on the standard output and can run arbitrary commands, including sbatch scripts.
|
|
|
* `sbatch` will schedule the job for later execution and will only work with an executable sbatch script.
|
|
|
* `srun` is blocking, will print on the standard output and can run arbitrary commands, including sbatch scripts. Please refer to the [srun manual](https://slurm.schedmd.com/srun.html) for a full description.
|
|
|
* `sbatch` will schedule the job for later execution and will only work with an executable sbatch script. Please refer to the [sbatch manual](https://slurm.schedmd.com/sbatch.html) for a full description.
|
|
|
|
|
|
Here, we will only mention the most common/important options.
|
|
|
|
|
|
## Running a command on a compute node
|
|
|
### Selecting a partition
|
|
|
|
|
|
Before you can run anything through SLURM, you need to select a partition. A partition can identify a job profile.
|
|
|
|
|
|
#### Licallo
|
|
|
All queue have a maximum time limit of 24 hours. Other limitation may apply in term of number of concurrently running jobs or resource usage.
|
|
|
|
|
|
Here are the currently available partitions on Licallo
|
|
|
* **seq** dedicated to sequential jobs. All you sequential jobs or arrays of sequential jobs taking more than 4 hours.
|
|
|
* **seq-short** if your sequential job takes less than 4 hours, it should go in there (to be scheduled more esealy)
|
|
|
* **short** short jobs, both sequential and parallel. Used mostly for debugging.
|
|
|
* **fdr** parallel jobs requiring small node can go in there. The nodes in this partition have to sockets/processors of 10 cores each and 64Go of RAM.
|
|
|
* **x40** parallel job requiring fat node (typically hybrid job) should go in this partition. The nodes in this partition have 2 sockets/processors of 20 core each and 192Go of RAM.
|
|
|
* **1to** the big memory partition, only contains one node, with 1To of RAM and 4 sockets of 8 core each.
|
|
|
|
|
|
|
|
|
Any command can be dispatched on a compute node through `srun`:
|
|
|
```
|
|
|
$ hostname
|
|
|
pollux.cluster
|
|
|
$ srun --partition seq --time 0:1:0 hostname
|
|
|
p080.cluster
|
|
|
$ srun --partition x40 --time 0:1:0 hostname
|
|
|
x033.cluster
|
|
|
$
|
|
|
```
|
|
|
Note that you need to select a *partition*, more on that later.
|
|
|
|
|
|
|
|
|
## sbatch script
|
|
|
|
... | ... | |