mpirun problem?

Discussion on computers, ROMS installation and compiling

Moderators: arango, robertson

Post Reply
Message
Author
Barbara
Posts: 25
Joined: Thu Sep 14, 2006 4:33 pm
Location: LTO, SCSIO

mpirun problem?

#1 Post by Barbara » Wed Apr 21, 2010 3:41 am

Hi all,

I have recently moved my cases to a new machine, and i use

mpirun -np 4 ./oceanM ocean_prereal_grd2_wrt_8.in > out.log

and following errors occur (all the node number is zero) in the log file:



Process Information:

Node # 0 (pid= 4314) is active.

Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------
Process Information:

Node # 0 (pid= 4315) is active.

Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------
Process Information:

Node # 0 (pid= 4316) is active.

Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------
Process Information:

Node # 0 (pid= 4317) is active.

Model Input Parameters: ROMS/TOMS version 3.0
Tuesday - April 20, 2010 - 9:23:38 PM
-----------------------------------------------------------------------------

Wind-Driven Upwelling/Downwelling over a Periodic Channel

Operating system : Linux
CPU/hardware : x86_64
Compiler system : gfortran
Compiler command : /opt/mpich2/gnu/bin/mpif90
Compiler flags : -frepack-arrays -O3 -ffast-math -ffree-form -ffree-line-length-none

Input Script : ocean_prereal_grd2_wrt_8.in

SVN Root URL : https://www.myroms.org/svn/src/trunk
SVN Revision :

Local Root : /home/zutt/source/roms148m63_obc1
Header Dir : /home/zutt/ttcase/PREreal_grd2_wrt_8/Forward
Header file : prereal_grd2_wrt_8.h
Analytical Dir: /home/zutt/source/roms148m63_obc1/ROMS/Functionals

Resolution, Grid 01: 0398x0198x030, Parallel Nodes: 1, Tiling: 002x002

ROMS/TOMS: Wrong choice of domain 01 partition or number of parallel threads.
NtileI * NtileJ must be equal to the number of parallel nodes.
Change -np value to mpirun or
change domain partition in input script.

Tile partition information for Grid 01: 0398x0198x0030 tiling: 002x002

tile Istr Iend Jstr Jend Npts

0 1 199 1 99 591030
1 200 398 1 99 591030
2 1 199 100 198 591030
3 200 398 100 198 591030

Maximum halo size in XI and ETA directions:

HaloSizeI(1) = 630
HaloSizeJ(1) = 330
TileSide(1) = 204
TileSize(1) = 21216

.
.
.

and the output informaiton in the out.log file is not in a right sequence, seems each node is writing into the log file without communicating with each other.


The new machine is:
Linux cluster.hpc.cc 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:32:05 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
and I use gfortran to compile the ROMS, and it is compiled successfully without any errors,
the netcdf is the precomiled binary file (binary-netcdf-3.6.3_nc3_gfortran_gfortran_g++.tar.gz) from unidata,

Previous machine is:
Linux hqlx75.ust.hk 2.6.18-53.1.21.el5 #1 SMP Tue May 20 09:35:07 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
and I use pgi to compile the ROMS,
The exactly same case is working all right on the previous machine.


I am wondering if there is something wrong with the mpirun command? Any comments and suggestions are appreciated!
Last edited by Barbara on Wed Apr 21, 2010 9:59 am, edited 1 time in total.

User avatar
kate
Posts: 3761
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: mpirun problem?

#2 Post by kate » Wed Apr 21, 2010 7:14 am

It sure sounds like a problem in the mpi. Each processor is running its own copy of the exact same code, which queries the system to find out which node they are, from 0 to 3. If they all think they are node zero, there will be trouble - they all think they are the master and will act accordingly. The master is also in charge of writing to stdout, so yes, they would all be writing.

Barbara
Posts: 25
Joined: Thu Sep 14, 2006 4:33 pm
Location: LTO, SCSIO

Re: mpirun problem?

#3 Post by Barbara » Wed Apr 21, 2010 9:58 am

Thanks for your reply, the problem is solved by adding .mpd.conf file in my home directory and excute mpd &, then use the same mpirun command.

rgon
Posts: 12
Joined: Thu Jan 29, 2009 4:00 pm
Location: University of Plymouth

Re: mpirun problem?

#4 Post by rgon » Tue May 25, 2010 9:31 am

In order to set the mpi running in a cluster, I wonder if you are familiarised with these problems or know who can help me.

It compiles and runs good in serial mode, also it compiles well for parallel purposes (oceanM), For parallel It compiles either in ifort or gfortran, but there is a partition error that I can’t find, all nodes are zero and the number of parallel nodes always is 1.

I’ve tried either with intel/mpich, gcc/mpich, gcc/mpich2, but nothing. also with the default roms-sample cases, the idealised ones, idealised bathymetry and parameters that can be compiled in serial or parallel.

out.log file:

Code: Select all

Process Information:

 Process Information:

 Process Information:

 Process Information:

 Node #  0 (pid=    2687) is active.
 Node #  0 (pid=    2686) is active.
 Node #  0 (pid=    2688) is active.
 Node #  0 (pid=    2689) is active.

 Model Input Parameters:  ROMS/TOMS version 3.4  
                          Thursday - May 20, 2010 - 10:57:01 AM
 -----------------------------------------------------------------------------

 Model Input Parameters:  ROMS/TOMS version 3.4  
                          Thursday - May 20, 2010 - 10:57:01 AM
 Model Input Parameters:  ROMS/TOMS version 3.4  
                          Thursday - May 20, 2010 - 10:57:01 AM

 Model Input Parameters:  ROMS/TOMS version 3.4  
                          Thursday - May 20, 2010 - 10:57:01 AM
 -----------------------------------------------------------------------------

 ----------------------------------------------------------------------------- -----------------------------------------------------------------------------


 Lake Signell Sediment Test Case

 Operating system : Linux
 CPU/hardware     : x86_64
 Compiler system  : gfortran
 Compiler command : /cvos/shared/apps/mpich/ge/gcc/64/1.2.7/bin/mpif90
 Compiler flags   : -frepack-arrays -O3 -ffast-math -ffree-form -ffree-line-length-none

 Input Script  : /home/primare/pl/raulg/roms/Projects/lake_signell/ocean_lake_signell.in

 SVN Root URL  : https://www.myroms.org/svn/src/trunk
 SVN Revision  : 448M

 Local Root    : /home/primare/pl/raulg/roms/trunk
 Header Dir    : /home/primare/pl/raulg/roms/Projects/lake_signell
 Header file   : lake_signell.h
 Analytical Dir: /home/primare/pl/raulg/roms/Projects/lake_signell

 Resolution, Grid 01: 0100x0020x008,  Parallel Nodes:   1,  Tiling: 002x002

 ROMS/TOMS: Wrong choice of domain 01 partition or number of parallel threads.
            NtileI * NtileJ  must be equal to the number of parallel nodes.
            Change -np value to mpirun or
            change domain partition in input script.

...

 All percentages are with respect to total time =     ************

 ROMS/TOMS - Output NetCDF summary for Grid 01:

 ROMS/TOMS - Partition error ......... exit_flag:   6


 ERROR: Illegal domain partition.
I've tried with the smpd.conf file (linked with mpich2) in the home directory but none. it seems there is an environment error that might have a wrong initialization, lack of variables, ...

Thanks for any reply.
RaulG

Barbara
Posts: 25
Joined: Thu Sep 14, 2006 4:33 pm
Location: LTO, SCSIO

Re: mpirun problem?

#5 Post by Barbara » Wed May 26, 2010 3:47 am

Barbara wrote:Thanks for your reply, the problem is solved by adding .mpd.conf file in my home directory and excute mpd &, then use the same mpirun command.

Sorry for misleading, It seems the problem is solved when I posted the reply, but failed later, then I give up trying mpich2 and use openmpi instead.

badriya
Posts: 1
Joined: Sun Feb 16, 2014 5:32 pm
Location: Public Authority for Civil Aviation

Re: mpirun problem?

#6 Post by badriya » Wed Mar 19, 2014 10:13 am

Good day to all;
Im new in ROMS..
I got the same error when I Run ROMS...
-----------------------------------------------------------------------------
Resolution, Grid 01: 0398x0498x001, Parallel Nodes: 1, Tiling: 004x004
-----------------------------------------------------------------------------

ROMS/TOMS: Wrong choice of domain 01 partition or number of parallel threads.
NtileI * NtileJ must be equal to the number of parallel nodes.
Change -np value to mpirun or
change domain partition in input script.

Elapsed CPU time (seconds):


ROMS/TOMS - Output NetCDF summary for Grid 01:

ROMS/TOMS - Partition error ......... exit_flag: 6

ERROR: Illegal domain partition.
-----------------------------------------------------------------------------
I used distributed memory.
export USE_MPI=on # distributed-memory parallelism
export USE_MPIF90=on # compile with mpif90 script
#export USE_OpenMP=on # shared-memory parallelism
#export which_MPI=mpich # compile with MPICH library
#export which_MPI=mpich2 # compile with MPICH2 library
export which_MPI=openmpi # compile with OpenMPI library

when I set the NtileI=1 and NtileJ=1
the model is run smoothly with
$MPI -np 16 /home/badria/ROMS3.5/FORWARD/oceanM /home/badria/ROMS3.5/FORWARD/oman.in > /home/badria/ROMS3.5/oman_GONU.out .

had any one solved this problem before??

Kosa
Posts: 18
Joined: Mon Jan 12, 2015 4:12 pm
Location: URI GSO

Re: mpirun problem?

#7 Post by Kosa » Tue Jun 23, 2015 3:17 pm

Upon moving to a new machine I am experiencing this exact problem. I wish the other users who experienced this problem had reported back with their solution... :?

ymamoutos
Posts: 55
Joined: Fri Nov 19, 2010 2:33 pm
Location: University of Aegean

Re: mpirun problem?

#8 Post by ymamoutos » Tue Dec 22, 2015 7:05 pm

Greetings,

recently i gain access to a HPC system and I had
the same problem. System has various mpi libraries (openmpi and intelmpi).
With openmpi 1.8.5 for all available compilers (gnu,intel) i had no luck but when
i start using openmpi 1.8.7 had no problem at all.
I suggest to change your openmpi library and try again.

Giannis

Post Reply