cluster node died after model blow up (3.0 version)

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
schen
Posts: 29
Joined: Wed Feb 09, 2005 6:34 pm
Location: WHOI

cluster node died after model blow up (3.0 version)

#1 Unread post by schen »

Hi all,

I recently experienced a strange problem after migrating to ROMS 3.0. My application worked fine for serial run but blew up for parallel run (tilting 1 x 4). After blowing up, one of the computing node died. Such strange behavior happened repeatedly. Suggestion is greatly appreciated !!!! Thanks

Shih-Nan


My configuration:

Operating system : Suse Linux
CPU/hardware : x86_64
Compiler system : ifort
Compiler command : /usr/local/mvapich/bin/mpif90
Compiler flags : -ip -O3 -xW -free -free

Resolution, Grid 01: 0192x0101x020, Parallel Nodes: 4, Tiling: 001x004

USE_MPI ?= on
USE_MPIF90 ?= on

My cpp :

#define UV_LOGDRAG
#define UV_ADV
#define UV_PSOURCE
#define DJ_GRADPS
#define TS_MPDATA
#define MIX_GEO_TS
#define TS_PSOURCE
#define NONLIN_EOS
#define SALINITY
#define MASKING
#define SOLVE3D
#define SPLINES

#define RADIATION_2D

#define TCLM_NUDGING /* Nudging of tracer climatology */
#define TCLIMATOLOGY /* Processing of tracer climatology */
#define SOUTH_TNUDGING
#define NORTH_TNUDGING
#define WEST_TNUDGING

#define EASTERN_WALL
#define NORTH_FSCHAPMAN
#define NORTH_M2FLATHER
#define NORTH_M3RADIATION
#define NORTH_TRADIATION
#define WEST_FSCHAPMAN
#define WEST_M2FLATHER
#define WEST_M3RADIATION
#define WEST_TRADIATION
#define SOUTH_FSCHAPMAN
#define SOUTH_M2FLATHER
#define SOUTH_M3RADIATION
#define SOUTH_TRADIATION
#undef SOUTH_FSGRADIENT
#undef SOUTH_M2GRADIENT

#define ANA_INITIAL
#define ANA_TCLIMA
#define ANA_PSOURCE
#define ANA_SMFLUX
#define ANA_SRFLUX
#define ANA_SSFLUX
#define ANA_STFLUX
#define ANA_BSFLUX
#define ANA_BTFLUX
#define ANA_FSOBC
#define ANA_M2OBC
#define ANA_TOBC
#define ANA_SEDIMENT
#define ANA_SPFLUX
#define ANA_BPFLUX

#define GLS_MIXING
#ifdef GLS_MIXING
# define N2S2_HORAVG
# define CANUTO_A
# undef KANTHA_CLAYSON
# undef CRAIG_BANNER
# undef CHARNOK
# undef ZOS_HSIG
# undef TKE_WAVEDISS
#endif

#define SEDIMENT
#ifdef SEDIMENT
# define SUSPLOAD
# undef BEDLOAD_SOULSBY
# undef BEDLOAD_MPM
# undef SED_DENS
# undef SED_MORPH
# undef SED_BIODIFF
#endif

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

#2 Unread post by m.hadfield »

Try re-making the model with USE_DEBUG=on. This will enable bounds checking, which might well reveal problems with the code when you run it. If not, you need to learn more about the nature of the crash: where it occurs and when. Run the model until just before the crash, save model history fields and examine this for some clues.

By the way, what does the model print to stdout?

schen
Posts: 29
Joined: Wed Feb 09, 2005 6:34 pm
Location: WHOI

#3 Unread post by schen »

I tried to restart and save model history field more frequently. Turn out it's the ANA_SEDIMENT that caused problems. But I have not figured out what I did wrong.
Thanks so much for the suggestion. I will try the USE_DEBUG option and post the result (hopefully soon).

Post Reply