TLM: forrtl: severe(174): SIGSEGV, segmentation fault occur

Discussion about tangent linear and adjoint models, variational data assimilation, and other related issues.

Moderators: arango, robertson

Post Reply
Message
Author
wangzc
Posts: 28
Joined: Fri Dec 28, 2012 5:44 am
Location: National Marine Environmental Forecasting Center

TLM: forrtl: severe(174): SIGSEGV, segmentation fault occur

#1 Unread post by wangzc »

hi everyone,
I run IS4DVAR models. Adjust_wstress and adjust_boundary is actived, and compiling is ok. When outer =001, Inner =000, NLM proceeds well. While TLM runs, it says ''forrtl: severe(174): SIGSEGV, segmentation fault occurred'' after *_itl.nc, *_frc.nc and *_bry.nc have been read. Anyone know Why?
My CPP is as followed:
*****************************************
#define ANA_BSFLUX
#define ANA_BTFLUX
#define ANA_SSFLUX
#define ANA_STFLUX

#define UV_ADV
#define DJ_GRADPS
#define UV_COR
#define UV_QDRAG
#define UV_VIS2
#define MIX_S_UV
#define MIX_GEO_TS
#define TS_DIF2
#define TS_U3HADVECTION
#define TS_C4VADVECTION
#define SOLVE3D
#define SALINITY
#define CURVGRID
#define SPHERICAL
#define PROFILE
#define SPLINES
#define MASKING

#define GLS_MIXING
#ifdef GLS_MIXING
# define N2S2_HORAVG
# define KANTHA_CLAYSON
#endif

#if defined IS4DVAR
# define VCONVOLUTION
# define ADJUST_WSTRESS
# define ADJUST_BOUNDARY
# define IMPLICIT_VCONV
# define BALANCE_OPERATOR
# ifdef BALANCE_OPERATOR
# define ZETA_ELLIPTIC
# endif
# define FORWARD_WRITE
# define FORWARD_READ
# define FORWARD_MIXING
# define OUT_DOUBLE
#endif
***************************************************
--zongchen

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: TLM: forrtl: severe(174): SIGSEGV, segmentation fault oc

#2 Unread post by m.hadfield »

wangzc wrote:Anyone know Why?
Not I. I suggest you build ROMS with the USE_DEBUG variable set to on (in build.bash) and try again. This *might* give you some useful extra information.

mariafattorini
Posts: 52
Joined: Tue Mar 03, 2009 2:39 pm
Location: C.N.R. - LaMMA

Re: TLM: forrtl: severe(174): SIGSEGV, segmentation fault oc

#3 Unread post by mariafattorini »

Hello Everyone,
I have tried to run the FTE double-gyre test, as distributed plus USE-DEBUG on, and I have got a segmentation fault:
...
119 4 23:00:00 -3.169893E-11 3.953715E-07 3.953398E-07 2.000000E+15
120 5 00:00:00 -3.120319E-11 4.231900E-07 4.231588E-07 2.000000E+15
TL_WRT_HIS - wrote tangent fields (Index=1,1) into time record = 0000002

PROPAGATOR - Grid: 01, Tangent Final Norm: 2.733433E+04

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
oceanG 0000000000489BAB distribute_mod_mp 2674 distribute.f90
oceanG 000000000055782D wrt_gst_ 186 wrt_gst.f90
oceanG 000000000042874F ocean_control_mod 301 cean_control.f90
oceanG 0000000000425832 MAIN__ 108 master.f90
oceanG 00000000004254BC Unknown Unknown Unknown
libc.so.6 000000308841D994 Unknown Unknown Unknown
oceanG 00000000004253C9 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 25611 on
node ekman exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

I have no idea what the problem could be.
Any suggestion?

Thank you so much,
maria

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: TLM: forrtl: severe(174): SIGSEGV, segmentation fault oc

#4 Unread post by m.hadfield »

I think it's crashing at some code in distribute.f90 (is 2674 the line number??), which is called from some code in wrt_gst.f90 (line 186??).

I would attack this by inserting WRITE statements at the relevant locations to print out relevant variables, rebuilding and rerunning (repeat as necessary). I believe other people use software tools called "debuggers".

mariafattorini
Posts: 52
Joined: Tue Mar 03, 2009 2:39 pm
Location: C.N.R. - LaMMA

Re: TLM: forrtl: severe(174): SIGSEGV, segmentation fault oc

#5 Unread post by mariafattorini »

Hello to everyone again,

I have not found yet where the problem is. :roll:
I have looked for it at the lines of the files indicated by debugging message:

1. line 2674 of distribute.f90:
DO j=LB2,UB2
DO i=LB1,UB1
np=np+1
WRITE(*,*) 'i =', i ! MARIA
WRITE(*,*) 'j =', j
WRITE(*,*) 'A(i,j) =', A(i,j)
Arecv(np)=A(i,j)
END DO
END DO
mp_ncwrite2d=nf90_inq_varid(ncid, TRIM(ncvname), varid)

and afterwars at line 2737 of distribute.f90:
CALL mpi_wait (request, status, MyError)
It is about to "write out data into NetCDF file [...] otherwise send data to master node".

2. line 186 of wrt_gst.f90:
status=mp_ncwrite2d(ng, model, GST(ng)%ncid, 'Bvec', &
& GST(ng)%name, vrecord, &
& Nstr(ng), Nend(ng), 1, NCV, scale, &
& STORAGE(ng)%Bvec(Nstr(ng):,1))

It is about "write out Lanczos/Arnoldi basis vectors".

3. line 301 of ocean_control.f90: CALL wrt_gst (ng, iTLM)
It is about "write out checkpoint data into GST restart NetCDF file.

4. line 108 of master.f90: CALL ROMS_run (run_time)
It is about "time-step oceal model over all nested grids (just only 1 in my case).

As suggested by m.hadfield, I have printed out on the screen the following variables:
- ng, model, GST(ng)%ncid, 'Bvec', GST(ng)%name, vrecord, Nstr(ng), Nend(ng), 1, NCV, scale, STORAGE(ng)%Bvec(Nstr(ng):,1)) in the file wrt_gst.f90;
- A(i,j) in the file distribute.f90.

What it comes written in my screen during the run does not show particular strange values at my eyes. The model, after having read the state initial conditions from itl.nc and fwd.nc, and after having created the checkpointing file (gst.nc), writes out the variables I have asked (ng, model, ..., A(i,j). The variable A(i,j) has been written for i=1:39096 and j=1:10.
After this, segmentation fault comes.

Does anyone have any idea about what the problem could be?

Thank you very much,
Maria Fattorini

Post Reply