SBATCH scripts

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
barack99
Posts: 17
Joined: Thu Aug 31, 2017 4:33 pm

SBATCH scripts

#1 Unread post by barack99 »

Hi all,

Could anyone give me an example of SBATCH script or teach me how to run ROMS/COAWST on cluster using SBATCH ? I see there is an example in the source codes, but that is for PBS. I was using the below command to test my installation on our cluster, but it does not work.

Thanks for help.

Regards
Barack

#!/bin/bash
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=0:30:00

cd /data/COAWST/Projects/Estuary_test2
mpirun -np 8 ./coawstM ocean_estuary_test2.in > cwstv3.out
Barack

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: SBATCH scripts

#2 Unread post by kate »

Really, this depends on how your system is set up. It's going to vary from one system to the next, you have to ask your system people. Here's what I'm using:

Code: Select all

#!/bin/bash
#SBATCH -t 144:00:00
#SBATCH --ntasks=192
#SBATCH --job-name=ARCTIC4
#SBATCH --tasks-per-node=24
#SBATCH -p t2standard
#SBATCH --account=akwaters
#SBATCH --output=ARCTIC4.%j
#SBATCH --no-requeue

cd $SLURM_SUBMIT_DIR
. /usr/share/Modules/init/bash
module purge
module load slurm
module load toolchain/pic-iompi/2016b
module load numlib/imkl/11.3.3.210-pic-iompi-2016b
module load toolchain/pic-intel/2016b
module load compiler/icc/2016.3.210-GCC-5.4.0-2.26
module load compiler/ifort/2016.3.210-GCC-5.4.0-2.26
module load openmpi/intel/1.10.4
module load data/netCDF-Fortran/4.4.4-pic-intel-2016b
module list

#
#  Prolog
#
echo " "
echo "++++ Chinook ++++ $PGM_NAME began:    `date`"
echo "++++ Chinook ++++ $PGM_NAME hostname: `hostname`"
echo "++++ Chinook ++++ $PGM_NAME uname -a: `uname -a`"
echo " "
TBEGIN=`echo "print time();" | perl`

mv arctic4_rst.nc arctic4_foo.nc
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in

#
#  Epilog
#
TEND=`echo "print time();" | perl`
echo " "
echo "++++ Chinook ++++ $PGM_NAME pwd:      `pwd`"
echo "++++ Chinook ++++ $PGM_NAME ended:    `date`"
echo "++++ Chinook ++++ $PGM_NAME walltime: `expr $TEND - $TBEGIN` seconds"
That strange command to move the restart file is there because I'm having trouble modifying an existing file on restart, better to start fresh. This didn't used to be necessary - I think it's because I'm using hdf5 compression now.

The Prolog and Epilog are things I stole from a colleague.

barack99
Posts: 17
Joined: Thu Aug 31, 2017 4:33 pm

Re: SBATCH scripts

#3 Unread post by barack99 »

Thanks Kate for sharing!

The SBATCH file I previously shown was given to me by our cluster admin. It works for me when I run on SWAN codes alone. For example, when I run SWAN on 1 nodes, 8 cores, I just prepare a slurm file like this:
#!/bin/bash
#SBATCH -p physical
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem-per-cpu=16000
#SBATCH --time=10:00:00
mpirun --np 8 /data/models/swan/swan.exe

So maybe with ROMS/COAWST, it is more complicated.

Looking at your SBATCH script, I am puzzle :shock:, however I really want to make it work for me, so do you mind to explain further? For example:

Why do you need Prolog and Epilog? what does it mean?
Why do you move arctic4_rst.nc arctic4_foo.nc? and to where?

and regarding two lines below:
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in

Why do you need -machinefile ./nodes --mca mpi_paffinity_alone 1? Is it compulsory?

Sorry for asking you further detail or too much.

Thanks & Regards
Barack
Barack

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: SBATCH scripts

#4 Unread post by kate »

barack99 wrote:So maybe with ROMS/COAWST, it is more complicated.
It shouldn't be.
Why do you need Prolog and Epilog? what does it mean?
They are both informational only, not required.
Why do you move arctic4_rst.nc arctic4_foo.nc? and to where?
I'm moving arctic4_rst.nc to arctic4_foo.nc. I tell ROMS to read the latter and write to the former. With Netcdf3, it could read and write to the same file.
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes
mpirun -np $SLURM_NTASKS -machinefile ./nodes --mca mpi_paffinity_alone 1 ./oceanM ocean_arctic4.in
The -machinefile comes from our system guys - I just do what they say. The mpi_paffinity_alone comes from a tip for making it run more efficiently.
Sorry for asking you further detail or too much.
No worries, you have to learn somehow.

Whey you say it doesn't work for you, what happens?

barack99
Posts: 17
Joined: Thu Aug 31, 2017 4:33 pm

Re: SBATCH scripts

#5 Unread post by barack99 »

Oh yeah! It should not be more complicated as you said! :D :lol:

Previously, I got an error message: "SIGSEGV: Segmentation fault". After adjusting the grid/mesh to right directories, It works for me now.

Thanks Kate a lot for help and sharing & Have a great sunday!

Regards
Barack
Barack

Post Reply