problem with mpirun...

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
mashinde
Posts: 135
Joined: Mon Jun 22, 2009 3:46 pm
Location: Indian Institute of Tropical Meteorology, Pune, INDIA

problem with mpirun...

#1 Unread post by mashinde »

i run oceanM file with mpirun -np 4 ./oceanM ocean_upwelling.in, which gives me following error

******************************************************************++++++*******************************++++++++
$ mpirun -np 4 ./oceanM ROMS/External/ocean_upwelling.in
[ubuntu:15999] *** Process received signal ***
[ubuntu:15999] Signal: Segmentation fault (11)
[ubuntu:15999] Signal code: Address not mapped (1)
[ubuntu:15999] Failing at address: 0x60
[ubuntu:16000] *** Process received signal ***
[ubuntu:16000] Signal: Segmentation fault (11)
[ubuntu:16000] Signal code: Address not mapped (1)
[ubuntu:16000] Failing at address: 0x60
[ubuntu:16000] [ 0] /lib/libpthread.so.0 [0x7ff6b01810f0]
[ubuntu:16000] [ 1] /usr/lib/libhdf5-1.6.6.so.0(MPIR_ToPointer+0xb7) [0x7ff6b2001ff7]
[ubuntu:16000] [ 2] /usr/lib/libhdf5-1.6.6.so.0(PMPI_Comm_rank+0xe) [0x7ff6b200356e]
[ubuntu:16000] [ 3] /usr/local/lib/libmpi_f77.so.0(mpi_comm_rank_+0x26) [0x7ff6b18bac26]
[ubuntu:16000] [ 4] ./oceanM(MAIN__+0x49) [0x41e429]
[ubuntu:16000] [ 5] ./oceanM(main+0x2c) [0x5cf2ac]
[ubuntu:16000] [ 6] /lib/libc.so.6(__libc_start_main+0xe6) [0x7ff6afe1e466]
[ubuntu:16000] [ 7] ./oceanM [0x41e319]
[ubuntu:16000] *** End of error message ***
[ubuntu:16001] *** Process received signal ***
[ubuntu:16001] Signal: Segmentation fault (11)
[ubuntu:16001] Signal code: Address not mapped (1)
[ubuntu:16001] Failing at address: 0x60
[ubuntu:16001] [ 0] /lib/libpthread.so.0 [0x7ff201e420f0]
[ubuntu:16001] [ 1] /usr/lib/libhdf5-1.6.6.so.0(MPIR_ToPointer+0xb7) [0x7ff203cc2ff7]
[ubuntu:16001] [ 2] /usr/lib/libhdf5-1.6.6.so.0(PMPI_Comm_rank+0xe) [0x7ff203cc456e]
[ubuntu:16001] [ 3] /usr/local/lib/libmpi_f77.so.0(mpi_comm_rank_+0x26) [0x7ff20357bc26]
[ubuntu:16001] [ 4] ./oceanM(MAIN__+0x49) [0x41e429]
[ubuntu:16001] [ 5] ./oceanM(main+0x2c) [0x5cf2ac]
[ubuntu:16001] [ 6] /lib/libc.so.6(__libc_start_main+0xe6) [0x7ff201adf466]
[ubuntu:16001] [ 7] ./oceanM [0x41e319]
[ubuntu:16001] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 16001 on node ubuntu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
please help to solve this problem...


Mahesh

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: problem with mpirun...

#2 Unread post by kate »

It looks like libhdf5 is unhappy. Did you ask ROMS for parallel I/O? Did you compile hdf5 and netcdf4 for parallel I/O? Have you run any parallel tests of the hdf and netcdf libraries? I assume they come with some tests. Does anything work in serial mode?

robertson
Site Admin
Posts: 219
Joined: Wed Feb 26, 2003 3:12 pm
Location: IMCS, Rutgers University

Re: problem with mpirun...

#3 Unread post by robertson »

That HDF5 version (1.6.6) is much to old to be used with NetCDF-4. Perhaps you have a new version in a different location?

feroda

Re: problem with mpirun...

#4 Unread post by feroda »

I probably met the same problem:
On my laptop with Ubuntu system and 2 CPUs, I installed the MPICH2.
When the successfully compiled oceanM was executed following:
mpiexec -np 2 ./oceanM ocean_*.in
Things are going well, but the process was killed at the end:

NL ROMS/TOMS: started time-stepping: (Grid: 01 TimeSteps: 00000001 - 00036000)
GET_2DFLD - surface u-momentum stress, t = 15 00:00:00
(Rec=0001, Index=2, File: SCS-forc-aug-nowind.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 0.00000000E+00 Max = 0.00000000E+00)
GET_2DFLD - surface v-momentum stress, t = 15 00:00:00
(Rec=0001, Index=2, File: SCS-forc-aug-nowind.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 0.00000000E+00 Max = 0.00000000E+00)
GET_2DFLD - solar shortwave radiation flux, t = 15 00:00:00
(Rec=0001, Index=2, File: SCS-forc-aug-nowind.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 3.40529103E-05 Max = 6.08471248E-05)
GET_NGFLD - free-surface eastern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 3.07866417E-01 Max = 1.10178467E+00)
GET_NGFLD - free-surface southern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 4.27124822E-01 Max = 6.95942208E-01)
GET_NGFLD - 2D u-momentum eastern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -3.83395787E-01 Max = 3.03214096E-01)
GET_NGFLD - 2D v-momentum eastern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -7.59140116E-02 Max = 1.21399133E-01)
GET_NGFLD - 2D u-momentum southern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -2.05323426E-01 Max = 1.01100750E-01)
GET_NGFLD - 2D v-momentum southern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -2.62352232E-01 Max = 1.39583157E-01)
GET_NGFLD - 3D u-momentum eastern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -3.82833226E-01 Max = 6.29373687E-01)
GET_NGFLD - 3D v-momentum eastern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -1.23403633E-01 Max = 1.23196424E-01)
GET_NGFLD - 3D u-momentum southern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -4.76592953E-01 Max = 1.10937390E-01)
GET_NGFLD - 3D v-momentum southern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = -9.31719153E-01 Max = 2.86592785E-01)
GET_NGFLD - temperature eastern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 1.19765575E+00 Max = 3.00410382E+01)
GET_NGFLD - salinity eastern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 3.25857851E+01 Max = 3.50023552E+01)
GET_NGFLD - temperature southern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 1.22471450E+00 Max = 3.15486310E+01)
GET_NGFLD - salinity southern boundary condition, t = 15 00:00:00
(Rec=0001, Index=1, File: SCS-bry-SODA-aug.nc)
(Tmin= 15.0000 Tmax= 345.0000)
(Min = 3.18748405E+01 Max = 3.48537465E+01)

STEP Day HH:MM:SS KINETIC_ENRG POTEN_ENRG TOTAL_ENRG NET_VOLUME

0 0 00:00:00 4.443919E-16 2.133980E+04 2.133980E+04 1.141677E+16
rank 1 in job 9 tiger-laptop_47103 caused collective abort of all ranks
exit status of rank 1: killed by signal
9


Then, I tried the serial mode:
./oceanS < ocean_*.in
here came the errors at the beginning of the executing process:

NLM: GET_STATE - Read state initial conditions, t = 0 00:00:00
(File: SCS-init-WOA-aug.nc, Rec=0001, Index=1)
- free-surface
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- vertically integrated u-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- vertically integrated v-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
Segmentation fault
tiger@tiger-laptop:~/work/SCS$


In my mind, those may due to the memory limitation.
Anyone has invaluable comment on that or know how to furthest release and make use of the memory of my laptop(with 2GB memory)?

Thanks a lot!

mashinde
Posts: 135
Joined: Mon Jun 22, 2009 3:46 pm
Location: Indian Institute of Tropical Meteorology, Pune, INDIA

Re: problem with mpirun...

#5 Unread post by mashinde »

i installed new HDF5 and Netcdf4 I also compile it in parallel I/O mode. In serial it work fine. Also some test example with mpich2 runs okay.

now i am getting the following error.


$mpirun -l -n 4 ./oceanM ROMS/External/ocean_upwelling.in
rank 3 in job 18 ubuntu_50363 caused collective abort of all ranks
exit status of rank 3: killed by signal 11
rank 2 in job 18 ubuntu_50363 caused collective abort of all ranks
exit status of rank 2: killed by signal 11
rank 0 in job 18 ubuntu_50363 caused collective abort of all ranks
exit status of rank 0: killed by signal 9


please help.

Post Reply