... | ... | @@ -106,4 +106,38 @@ is the same as adding |
|
|
|
|
|
at the top of the script[^1]
|
|
|
|
|
|
# Job monitoring
|
|
|
## Job status
|
|
|
To know the current status of your jobs, use the [squeue](https://slurm.schedmd.com/squeue.html) command:
|
|
|
```
|
|
|
$ sbatch --job-name=Tex ./sequential.slurm
|
|
|
Submitted batch job 15292925
|
|
|
$ squeue -u alainm
|
|
|
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
|
|
15292925 seq Tex alainm PD 0:00 1 (None)
|
|
|
$ squeue -u alainm
|
|
|
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
|
|
15292925 seq Tex alainm R 0:02 1 p087
|
|
|
$ squeue -u alainm
|
|
|
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
|
|
```
|
|
|
* the `-u alainm` option limits the outputs to the jobs of `alainm` user.
|
|
|
* in the first `squeue` call, the job is **P**en**D**ing (PD)
|
|
|
* in the second call, the jobs has been running for 2 second on host p087
|
|
|
* in the third call, the job is finished
|
|
|
|
|
|
## Job trace
|
|
|
|
|
|
Use the `--output <file>` to specify in which file to write the job's output[^2]. Default is `slurm.<jobid>.out`.
|
|
|
|
|
|
You can use the `--error <file>` option to redirect the error messages.
|
|
|
|
|
|
The output are available in real time once the job starts running, so you can track it with:
|
|
|
```
|
|
|
$ tail -f slurm.12345.out
|
|
|
...job outut...
|
|
|
```
|
|
|
|
|
|
|
|
|
[^1]: if both are used, the command line option takes precedence.
|
|
|
[^2]: to be preferred to redirection. |
|
|
\ No newline at end of file |