Caught error: Segmentation fault (signal 11)

Discussion about modeling ice with ROMS

Moderators: arango, robertson

Post Reply
Message
Author
papaya
Posts: 19
Joined: Mon Apr 02, 2012 4:58 pm
Location: Georgia Tech

Caught error: Segmentation fault (signal 11)

#1 Post by papaya » Mon Jul 08, 2013 5:02 pm

Hi all,

When I was submitting a job. It stops immediately and the log file gives
Caught error: Segmentation fault (signal 11)

It seems there something with the mpi?

But even I turn off the mpi, it gives error too:
READ_PhyPar - Error while processing line:
pu9 60848864 207 954289 59062278 40855 0 220 0 0
cpu10 60307697 88 907827 59667940 23801 0 425 0 0
cpu11 60824558 123 871282 59186992 21580 0 346 0 0
cpu12 60532883 0 903659 59451675 20867 0 371 0 0
cpu13 61129422 0 862952 58896509 21886 0 156 0 0
cpu14 60

Is this caused by the compilation? Or are there some problems with the source code?

Thanks in advance!

Fan

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: Caught error: Segmentation fault (signal 11)

#2 Post by kate » Mon Jul 08, 2013 5:21 pm

Note that when turning off MPI, you need to "make clean" and rebuild the thing. It looks like you didn't turn off MPI there.

From my email to you:
This is without USE_DEBUG, isn't it? Could you try again with it? You might get more useful information about where in read_phypar the thing is failing. If you get a line number, check your read_phypar.f90 file to see what's on that line. In any case, read_phypar is the routine that reads your ocean.in file. So, what are the differences between my branch and the trunk for read_phypar and ocean.in in the region of interest (trouble)?

papaya
Posts: 19
Joined: Mon Apr 02, 2012 4:58 pm
Location: Georgia Tech

Re: Caught error: Segmentation fault (signal 11)

#3 Post by papaya » Mon Jul 08, 2013 8:28 pm

Thanks very much for your reply.

I turned debug on, still it is not working properly.

log file with MPI:
Time is Mon Jul 8 16:03:16 EDT 2013
This jobs runs on the following processors:
iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu iw-k30-24.pace.gatech.edu
This job has allocated 16 nodes

Model Input Parameters: ROMS/TOMS version 3.6
Monday - July 8, 2013 - 4:03:17 PM
-----------------------------------------------------------------------------

=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

log file without mpi:
Please see the attached file. There are plenty of non-readable symbols, including the line for Read_PhyPar

I tried with the previous main source code from roms.org, it runs for 6 steps and then blows up. Thus, the problem may arise from the compilation, excluding the possibility of server/cluster.

Fan
kate wrote:Note that when turning off MPI, you need to "make clean" and rebuild the thing. It looks like you didn't turn off MPI there.

From my email to you:
This is without USE_DEBUG, isn't it? Could you try again with it? You might get more useful information about where in read_phypar the thing is failing. If you get a line number, check your read_phypar.f90 file to see what's on that line. In any case, read_phypar is the routine that reads your ocean.in file. So, what are the differences between my branch and the trunk for read_phypar and ocean.in in the region of interest (trouble)?
Attachments
log.ross-nud-zice.txt
(215.29 KiB) Downloaded 199 times

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: Caught error: Segmentation fault (signal 11)

#4 Post by kate » Mon Jul 08, 2013 8:57 pm

That's a singularly unhelpful log file there. It's full of null characters, nothing useful.

I would recompile this with USE_DEBUG and without USE_MPI. You can then run it in a debugger such as gdb in serial mode. Perhaps then you can see what the problem is, or at least see which line of read_phypar is giving you trouble.

papaya
Posts: 19
Joined: Mon Apr 02, 2012 4:58 pm
Location: Georgia Tech

Re: Caught error: Segmentation fault (signal 11)

#5 Post by papaya » Mon Jul 08, 2013 10:22 pm

Thanks very much!

Now the problem has been solved and it is running smoothly.

It turns out that the problem is caused by the wrong header file.

I was using the header file I have been using from the old source code, which misses Ice boundary conditions.

Again, Kate, thanks very much for your continuous patience and time!

Fan
kate wrote:That's a singularly unhelpful log file there. It's full of null characters, nothing useful.

I would recompile this with USE_DEBUG and without USE_MPI. You can then run it in a debugger such as gdb in serial mode. Perhaps then you can see what the problem is, or at least see which line of read_phypar is giving you trouble.

Post Reply