Segmentation Faults

Frequently Asked Questions about ROMS usage

Moderators: arango, kate, robertson

Post Reply
Message
Author
User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Segmentation Faults

#1 Unread post by kate »

Just had a segmentation fault that I can't figure out at all:

Code: Select all

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
oceanG             00000000035D46E5  Unknown               Unknown  Unknown
oceanG             00000000035D2307  Unknown               Unknown  Unknown
oceanG             000000000357EA64  Unknown               Unknown  Unknown
oceanG             000000000357E876  Unknown               Unknown  Unknown
oceanG             0000000003531296  Unknown               Unknown  Unknown
oceanG             0000000003534E90  Unknown               Unknown  Unknown
libpthread.so.0    00007F62B67E87E0  Unknown               Unknown  Unknown
oceanG             00000000035082A5  nf_fread3d_mod_mp         156  nf_fread3d.f90
oceanG             0000000002EA599B  get_state_                851  get_state.f90
oceanG             0000000000F71736  initial_                  213  initial.f90
oceanG             000000000040C8EF  ocean_control_mod         133  ocean_control.f90
oceanG             000000000040B8B6  MAIN__                     95  master.f90
oceanG             000000000040B68E  Unknown               Unknown  Unknown
libc.so.6          00007F62B53A6D1D  Unknown               Unknown  Unknown
oceanG             000000000040B569  Unknown               Unknown  Unknown
It is failing in the reading of "u", specifically in the floating point attributes of "u". This is a new initial file I made the same way as the last one which ROMS has read many times. The above failure was with ifort, trying again with gfortran doesn't fail at all, so I'm chalking it up to a compiler bug. :?

mathieu
Posts: 74
Joined: Fri Sep 17, 2004 2:22 pm
Location: Institut Rudjer Boskovic

Re: Segmentation Faults

#2 Unread post by mathieu »

Hi Kate,
in my experience it has never happened that the compiler was wrong. Bug detected for one compiler but not for the other means bug.
In order to detect the bug in gfortran you can use compilation option -fcheck=all -fsanitize=address -fsanitize=undefined.
For Intel Fortran Compiler, options are -check all -warn interfaces,nouncalled -gen-interface.

There are other options for detecting NaN in the computation.

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Segmentation Faults

#3 Unread post by kate »

Thanks - those "check all" flags are scary! Both compilers warn about creating temporary arrays when reading parameter files (read_phypar, read_stapar, etc).

Ifort still fails in nf_fread3d when calling netcdf_get_fatt.

gfortran now fails in wclock_on because it is a nonrecursive procedure being called recursively (from the mp_barrier in there).

Gfortran without all the checking still runs.

mathieu
Posts: 74
Joined: Fri Sep 17, 2004 2:22 pm
Location: Institut Rudjer Boskovic

Re: Segmentation Faults

#4 Unread post by mathieu »

The temporary arrays are when you pass a A(1,:) array to a subroutine. Since the values are not aligned there is a need for a new array which of course slows things down. But it is no problem if done only in the input parameter reading.

It is of course a problem if wclock_on is called recursively. Solution to that is to declare a "RECURSIVE SUBROUTINE".

The fact that the error occurs in netcdf_get_fatt means that the bug happens in the netcdf routine itself. So, two possibilities:

(A) The bug is in the netcdf routine itself (rather unlikely). Then one needs to compile the netcdf itself with check all. Hard work to do that.

(B) Print the input to the function netcdf_get_fatt. Long time ago I had random errors occurring because of pointers erased by a previous call to a function. This pointer erasure can happen before the call to netcdf_get_fatt and create the problem. Since the compilers are free to organize memory as they want this can explain why it can work with gfortran but not for ifort.

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Segmentation Faults

#5 Unread post by kate »

I'm happy to ignore warnings during initialization.

Thanks, would have gotten to adding the recursive modifier, but had to leave yesterday. The gfortran case is now running past that.

The netcdf_get_fatt thing happens in the debugger when stepping into netcdf_get_fatt from nf_fread3d.
I can see the values of all eight arguments to netcdf_get_fatt and they are all fine. netcdf_get_fatt is a ROMS routine, so I should be able to step into it but no, that's when the error occurs for ifort.

I've been around long enough to believe in compiler bugs, no question.

mathieu
Posts: 74
Joined: Fri Sep 17, 2004 2:22 pm
Location: Institut Rudjer Boskovic

Re: Segmentation Faults

#6 Unread post by mathieu »

Hernan, a remark on your point on "wrap-around integer". Actually in Fortran (and C/C++) the integer overflow is undefined behavior. See for example https://stackoverflow.com/questions/405 ... r-overflow
So, gfortran is right to stop at that.

mitya
Posts: 3
Joined: Wed Apr 17, 2013 12:47 pm
Location: NTNU, Trondheim
Contact:

Re: Segmentation Faults

#7 Unread post by mitya »

Hi Kate,
have you managed to run this app with ifort?
I am building a metroms on recently deployed supercomputer (Betzy, in Norway), and run exactly into the same error the same place (reading attributes for u)
the toolchain I use on new supercomputer is the same as on previous.

Another question -- I remember you also use metroms, have you tried to build metroms with gfortran as well?

Dmitry

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Segmentation Faults

#8 Unread post by kate »

Oh gosh, that was two and a half years ago! I have no memory of it whatsoever.

As for metroms, I haven't played with that lately either. I might have to go back to it if I can't get this other monster (CESM) working on our supercomputer.

mitya
Posts: 3
Joined: Wed Apr 17, 2013 12:47 pm
Location: NTNU, Trondheim
Contact:

Re: Segmentation Faults

#9 Unread post by mitya »

heh, it reminds me this
https://xkcd.com/979/

well, on the bright side you haven't been bothered with this error since then!

Mitya

mitya
Posts: 3
Joined: Wed Apr 17, 2013 12:47 pm
Location: NTNU, Trondheim
Contact:

Re: Segmentation Faults

#10 Unread post by mitya »

ok, back to the case,
well, in my case this error was due to stack size on our new supercomputer,
so it is solved by setting it to unlimited:
ulimit -s unlimited

Fatima
Posts: 14
Joined: Mon Aug 04, 2014 3:14 pm
Location: Oceanic and atmospheric science center

Re: Segmentation Faults

#11 Unread post by Fatima »

Hi
I try to run my model in fedora 30 and i used gfortran without mpi. I have this error

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7f92562bfd51 in ???
#1 0x7f92562bef15 in ???
#2 0x7f92555fbf3f in ???
#3 0x7f92565b17f7 in ???
#4 0x319998a in __mod_netcdf_MOD_netcdf_create
at /home/obuntooo/roms/upwelling1/Build_romsG/mod_netcdf.f90:5908
#5 0x2010b19 in def_his_nf90
at /home/obuntooo/roms/upwelling1/Build_romsG/def_his.f90:121
#6 0x20955b0 in __def_his_mod_MOD_def_his
at /home/obuntooo/roms/upwelling1/Build_romsG/def_his.f90:57
#7 0x52a648 in output_
at /home/obuntooo/roms/upwelling1/Build_romsG/output.f90:141
#8 0x41561e in main3d_
at /home/obuntooo/roms/upwelling1/Build_romsG/main3d.f90:235
#9 0x408b39 in __roms_kernel_mod_MOD_roms_run
at /home/obuntooo/roms/upwelling1/Build_romsG/roms_kernel.f90:175
#10 0x40531b in myroms
at /home/obuntooo/roms/upwelling1/Build_romsG/master.f90:86
#11 0x405462 in main
at /home/obuntooo/roms/upwelling1/Build_romsG/master.f90:50
Segmentation fault (core dumped)
please help me to fix it. Thank you.

Joeailvyou
Posts: 22
Joined: Wed Jul 19, 2017 4:03 pm
Location: Zhejiang University

Re: Segmentation Faults

#12 Unread post by Joeailvyou »

kate wrote:
> Just had a segmentation fault that I can't figure out at all:
> [code]forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line Source
>
> oceanG 00000000035D46E5 Unknown Unknown Unknown
> oceanG 00000000035D2307 Unknown Unknown Unknown
> oceanG 000000000357EA64 Unknown Unknown Unknown
> oceanG 000000000357E876 Unknown Unknown Unknown
> oceanG 0000000003531296 Unknown Unknown Unknown
> oceanG 0000000003534E90 Unknown Unknown Unknown
> libpthread.so.0 00007F62B67E87E0 Unknown Unknown Unknown
> oceanG 00000000035082A5 nf_fread3d_mod_mp 156
> nf_fread3d.f90
> oceanG 0000000002EA599B get_state_ 851
> get_state.f90
> oceanG 0000000000F71736 initial_ 213
> initial.f90
> oceanG 000000000040C8EF ocean_control_mod 133
> ocean_control.f90
> oceanG 000000000040B8B6 MAIN__ 95
> master.f90
> oceanG 000000000040B68E Unknown Unknown Unknown
> libc.so.6 00007F62B53A6D1D Unknown Unknown Unknown
> oceanG 000000000040B569 Unknown Unknown
> Unknown[/code]
> It is failing in the reading of "u", specifically in the floating
> point attributes of "u". This is a new initial file I made the
> same way as the last one which ROMS has read many times. The above failure
> was with ifort, trying again with gfortran doesn't fail at all, so I'm
> chalking it up to a compiler bug. :?


Dear Kate,
I noticed that you have tried the ROMS coupled to CICE by METROMS. When I build and run roms standalone , everything gose well. But If I [b]mpirun oceanG[/b] after build ROMS coupled with CICE (i.e. the METROMS https://github.com/metno/metroms), it turned [b]forrtl: severe (174): SIGSEGV, segmentation fault occurred[/b].
Could you please help me out ? I have tried many ways but still not solved it. I have upload my output log file and in file:
https://github.com/joeailvyou/METROMS-output-log
Even in [b]debug mode[/b] (but coupled to CICE), the error is not detail enouth:
[code]forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
oceanG 00000000012153C1 Unknown Unknown Unknown
oceanG 0000000001213B17 Unknown Unknown Unknown
libmpi.so.12 00002AF06FB10762 Unknown Unknown Unknown
libmpi.so.12 00002AF06FB105B6 Unknown Unknown Unknown
libmpi.so.12 00002AF06FAFEE1C Unknown Unknown Unknown
libmpi.so.12 00002AF06FAE07C8 Unknown Unknown Unknown
libpthread.so.0 00002AF07015B5E0 Unknown Unknown Unknown
oceanG 0000000000FD44A8 Unknown Unknown Unknown
oceanG 0000000000FD156C Unknown Unknown Unknown
oceanG 0000000000FBC6FC Unknown Unknown Unknown
oceanG 0000000000F13600 Unknown Unknown Unknown
oceanG 0000000000408403 MAIN__ 106 master.f90
oceanG 0000000000406DCE Unknown Unknown Unknown
libc.so.6 00002AF07058DC05 Unknown Unknown Unknown
oceanG 0000000000406CD9 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
oceanG 00000000012153C1 Unknown Unknown Unknown
oceanG 0000000001213B17 Unknown Unknown Unknown
libmpi.so.12 00002B8F2D10F762 Unknown Unknown Unknown
libmpi.so.12 00002B8F2D10F5B6 Unknown Unknown Unknown
libmpi.so.12 00002B8F2D0FDE1C Unknown Unknown Unknown
libmpi.so.12 00002B8F2D0DF7C8 Unknown Unknown Unknown
libpthread.so.0 00002B8F2D75A5E0 Unknown Unknown Unknown
oceanG 0000000000FD44A8 Unknown Unknown Unknown
oceanG 0000000000FD156C Unknown Unknown Unknown
oceanG 0000000000FBC6FC Unknown Unknown Unknown
oceanG 0000000000F13600 Unknown Unknown Unknown
oceanG 0000000000408403 MAIN__ 106 master.f90
oceanG 0000000000406DCE Unknown Unknown Unknown
libc.so.6 00002B8F2DB8CC05 Unknown Unknown Unknown
oceanG 0000000000406CD9 Unknown Unknown Unknown
real 0.49
user 5.06
sys 5.64
[/code]

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Segmentation Faults

#13 Unread post by kate »

It has been years since I ran that. Did you recompile CICE in debug mode too?

Post Reply