aurora start simulation issue
Hi, recently I have a new issue....
To start, using the sbatch script jobs with
--nodes=1 --ntasks-per-node=1 -c 1
run without issue (as expected as the ctest passes etc).
However using (for example)
--nodes=2 --ntasks-per-node=2
I would find that -c 4
works fine but -c 10
, with the same config file, would silently do nothing
I'm talking here about a run with a config file that would without issue work a week ago with '-c 10'
With doing nothing I mean with this: I do sbatch submit.sh
and
this is submit.sh
#SBATCH -A snic2019-3-398
#SBATCH --output=OUT.%J
#
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:10:00
#
#SBATCH -c 10
#
#
# Set OMP_NUM_THREADS to the same value as -c
# with a fallback in case it isn't set.
# SLURM_CPUS_PER_TASK is set to the value of -c, but only if -c is explicitly set
if [ -n "$SLURM_CPUS_PER_TASK" ]; then
omp_threads=$SLURM_CPUS_PER_TASK
else
omp_threads=1
fi
export OMP_NUM_THREADS=$omp_threads
date >time_run.dat
echo $SLURM_NTASKS
echo $SLURM_CPUS_PER_TASK
srun --mpi=pmi2 -K1 /home/michiel/fargOCA/build/fargoInit out config.info
srun --mpi=pmi2 -K1 /home/michiel/fargOCA/build/fargOCA ./out -p
date>>time_run.dat
and then the code hangs silently on fargoInit, it does not get excuted. No error message. Just noting... In the .OUT file it just has the 2 echo statements
(The weird thing is that the issue is, as far as I can tell now, not consistent. Sometimes the same sbatch script works, and sometimes it silently does nothing.)
I am completely lost... Any tips for me on how to proceed?