Skip to content

aurora start simulation issue

Hi, recently I have a new issue....

To start, using the sbatch script jobs with

--nodes=1 --ntasks-per-node=1 -c 1

run without issue (as expected as the ctest passes etc).

However using (for example)

--nodes=2 --ntasks-per-node=2 

I would find that -c 4 works fine but -c 10, with the same config file, would silently do nothing

I'm talking here about a run with a config file that would without issue work a week ago with '-c 10'

With doing nothing I mean with this: I do sbatch submit.sh and this is submit.sh

#SBATCH -A snic2019-3-398
#SBATCH --output=OUT.%J
#
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:10:00
#
#SBATCH -c 10
#
#

# Set OMP_NUM_THREADS to the same value as -c
# with a fallback in case it isn't set.
# SLURM_CPUS_PER_TASK is set to the value of -c, but only if -c is explicitly set
    if [ -n "$SLURM_CPUS_PER_TASK" ]; then
      omp_threads=$SLURM_CPUS_PER_TASK
    else
      omp_threads=1
    fi
    export OMP_NUM_THREADS=$omp_threads



date >time_run.dat
echo $SLURM_NTASKS
echo $SLURM_CPUS_PER_TASK
srun --mpi=pmi2 -K1 /home/michiel/fargOCA/build/fargoInit out config.info
srun --mpi=pmi2 -K1 /home/michiel/fargOCA/build/fargOCA ./out -p
date>>time_run.dat

and then the code hangs silently on fargoInit, it does not get excuted. No error message. Just noting... In the .OUT file it just has the 2 echo statements

(The weird thing is that the issue is, as far as I can tell now, not consistent. Sometimes the same sbatch script works, and sometimes it silently does nothing.)

I am completely lost... Any tips for me on how to proceed?