Ocean Modeling Discussion

ROMS/TOMS

Search for:
It is currently Mon Sep 23, 2019 9:21 am




Post new topic Reply to topic  [ 6 posts ] 

All times are UTC

Author Message
PostPosted: Tue Sep 04, 2018 12:36 pm 
Offline

Joined: Mon Feb 22, 2016 10:22 pm
Posts: 4
Location: JHU/APL
I am trying to run ROMS compiled under Red Hat with intel compilers, mpi. I am running into the following error:

Resolution, Grid 01: 898x762x90, Parallel Nodes: 1, Tiling: 16x16

ROMS/TOMS: Wrong choice of grid 01 partition or number of parallel nodes.
NtileI * NtileJ must be equal to the number of parallel nodes.
Change -np value to mpirun or
change domain partition in input script.
Found Error: 06 Line: 153 Source: ROMS/Utility/inp_par.F
Found Error: 06 Line: 111 Source: ROMS/Drivers/nl_ocean.h

So this would be 256 processors (16 x 16)

The submit script actually IS partitioning 256 nodes (8 processors, 32 processors per node) so NtileI * NtileJ DOES equal the number of processors allocated:

This job runs on the following processors:
vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-
064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-064 vn-063 vn-063 vn-063 vn-063 vn-063
vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn
-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-063 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-06
2 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 vn-062 v
n-062 vn-062 vn-062 vn-062 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-0
61 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-061 vn-060
vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-
060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-060 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059
vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn
-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-059 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-05
8 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 vn-058 v
n-058 vn-058 vn-058 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-0
57 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057 vn-057
This job has allocated 32 processors per node.
This job has allocated 256 processors.
This job has allocated 8 nodes.


Has anyone else run into this issue? I am sure I am doing something silly on my end! Apologies if the answer is obvious. :)


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 04, 2018 2:37 pm 
Offline

Joined: Wed Dec 31, 2003 6:16 pm
Posts: 795
Location: USGS, USA
what does your mpirun command line look like? maybe your cluster also needs some slurm info like
#SBATCH --ntasks=108 # Number of MPI ranks
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=36
or something ????
-j


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 04, 2018 2:50 pm 
Offline

Joined: Mon Feb 22, 2016 10:22 pm
Posts: 4
Location: JHU/APL
Here is what gets called:

#//# Number of nodes= 2, using 4 Processor Per Node
#PBS -l nodes=16:ppn=16

Now, our admins at the moment, ignore this and instead of giving me 16 nodes/ 16 processors per node, they give me 8 nodes with 32 processors.

Below is the relevant command. What I don't understand is whether I need to force the cluster to do the 16 x 16 thing or whether 8 x 32 would work. Both are the correct number of processors. I should say that when I try to run from the command line with mpirun -np 1 (and set ntilei and ntilej both equal to 1, it works). If I change one of the ntiles to 2 and then do -np 2, it does NOT work. Gives the tiling error.

So I am more than willing to believe it is something in my openmpi configuration? But I have openmpi on many other codes (e.g. WRF) and it works fine.

Thanks again, in advance, for any help on this! I will figure it out eventually...

MPI_EXECUTABLE="./oceanM ocean_ET2S.in"

echo "========================================================"
echo "======= BEGIN RUN ========="
echo "========================================================"
#
#
# FOR RUNNING OUTSIDE OF TORQUE
#
if [ -z ${PBS_JOBID} ]; then
PBS_JOBID=$$
fi
#


# run the code
echo ""
echo "Running CODE"
# time mpirun ${MPI_EXECUTABLE} ${inputFile} > ${logFile} 2>&1
time mpirun ${MPI_EXECUTABLE} 2>&1


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 05, 2018 1:18 pm 
Offline

Joined: Mon Feb 22, 2016 10:22 pm
Posts: 4
Location: JHU/APL
Quick Update:

After further testing, I am beginning to think there is an issue with my openMPI compile. I will try recompiling openmp and see if I can get this to work. I tested another application that I know works, and got exactly the same error. Thanks for the initial reply - I will post again only if I later find I think this is a ROMS issue vs. a local compilation problem on my end.

Best regards


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 05, 2018 4:23 pm 
Offline
User avatar

Joined: Wed Jul 02, 2003 5:29 pm
Posts: 3673
Location: IMS/UAF, USA
Do your nodes indeed have 16 cores or 32 cores? Your admins might be trying to spawn 32 tasks on 16 nodes which can sometimes give good performance (I've heard), but when ROMS checks the numbers, it can't figure out what to do.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 05, 2018 4:40 pm 
Offline

Joined: Mon Feb 22, 2016 10:22 pm
Posts: 4
Location: JHU/APL
Hi Kate,

It turns out that my compile of openmpi got messed up somehow. I don't know how. When I recompiled it, it worked fine now. So everything is good in ROMS world again! :)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group