Ocean Modeling Discussion

ROMS/TOMS

Search for:
It is currently Wed Jul 17, 2019 6:43 pm




Post new topic Reply to topic  [ 14 posts ] 

All times are UTC

Author Message
PostPosted: Fri Sep 21, 2018 8:54 am 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Dear all,

I know that this error (the one that I am more afraid to get) can be caused by a lot of things, so I will try to give as much information as possible. I have been working for a while with ROMS. Now I am using ROMS/TOMS version 3.7, revision 921. After last updates, when I try to run the model I get:

--------------------------------------------------------------------------------
Model Input Parameters: ROMS/TOMS version 3.7
Wednesday - September 19, 2018 - 5:10:14 PM
--------------------------------------------------------------------------------
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
oceanM 00000000008544E5 Unknown Unknown Unknown
oceanM 0000000000852107 Unknown Unknown Unknown
oceanM 0000000000801784 Unknown Unknown Unknown
oceanM 0000000000801596 Unknown Unknown Unknown
oceanM 00000000007B40A6 Unknown Unknown Unknown
oceanM 00000000007B7CA0 Unknown Unknown Unknown
libpthread.so.0 00007F9460F46790 Unknown Unknown Unknown
oceanM 000000000086E313 Unknown Unknown Unknown
oceanM 00000000007FCE18 Unknown Unknown Unknown
oceanM 000000000041A869 Unknown Unknown Unknown
oceanM 00000000004249A2 Unknown Unknown Unknown
oceanM 0000000000412CEC Unknown Unknown Unknown
oceanM 000000000040BAD2 Unknown Unknown Unknown
oceanM 000000000040B59C Unknown Unknown Unknown
oceanM 000000000040B45E Unknown Unknown Unknown
libc.so.6 00007F946093DD5D Unknown Unknown Unknown
oceanM 000000000040B369 Unknown Unknown Unknown


I am able to run the model with older ROMS revision (i.e: ROMS/TOMS version 3.7 SVN Revision : 836M). I have tested successfully that NetCDF used was NetCDF4 using "ncdump -k", returning for all the input files: netCDF-4.

I am using ifort to compile the code. I am sorry, but I am not able to use another one cause I am not the admin of the system. The compiler file is set up:

setenv USE_MPI on # distributed-memory parallelism
# setenv USE_MPIF90 on # compile with mpif90 script
#setenv which_MPI mpich # compile with MPICH library
setenv which_MPI mpich2 # compile with MPICH2 library
## setenv which_MPI openmpi # compile with OpenMPI library

#setenv USE_OpenMP on # shared-memory parallelism

setenv FORT ifort
#setenv FORT gfortran
#setenv FORT pgi

#setenv USE_DEBUG on # use Fortran debugging flags
setenv USE_LARGE on # activate 64-bit compilation
setenv USE_NETCDF4 on # compile with NetCDF-4 library
setenv USE_PARALLEL_IO on # Parallel I/O with NetCDF-4/HDF5



I get the same result running it serial (oceanS). When I tried to activate DEBUG, I get this error:

ld: cannot find -ldl


So I am not able to know exactly what file is getting me in troubles.

I have been able to parallel run with MPI the upwelling test case. So I am pretty sure that my problem is related with the NetCDF file that I am using.

I really appreciate if you could point me out next steps to follow.

Thanks a lot, :D

-Francisco

I tried


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 21, 2018 4:11 pm 
Offline
User avatar

Joined: Wed Jul 02, 2003 5:29 pm
Posts: 3633
Location: IMS/UAF, USA
I have a similar problem and am running with gfortran for one domain and with an old code for the other domain. Sorry I don’t have a third fix.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 22, 2018 3:44 am 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Thanks a lot Kate. Let´s see if someone could give us any clue about what´s happening. Meanwhile I will try to use the old code as you suggested.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 22, 2018 6:18 am 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1078
Location: IMCS, Rutgers University
Nowadays, severe segmentation errors are associated with stack size, which is used for allocating automatic arrays. They are allocated on stack or heap according to you choice of compiler options. I mentioned this in the last trac ticket.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 25, 2018 7:27 am 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Dear Arango,

Thanks a lot for your answer. I will talk with the administrator of our HPC system to try to configure the stack size properly.

Regards,

-Francisco


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 25, 2018 1:48 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1078
Location: IMCS, Rutgers University
It is very simple as I have mentioned several times before. You just need to edit your login script and add one of the lines below:

Code:
.cshrc, .tcshrc, etc.

limit stacksize unlimited

or  .bashrc

ulimit -s unlimited


I wrote lots of information in previous :arrow: trac ticket.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 26, 2018 7:46 am 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Dear Arango,

I am sorry for not explaining it properly. I followed your advice at the ticket doing:


.cshrc, .tcshrc, etc.

limit stacksize unlimited

or .bashrc

ulimit -s unlimited


But I got the same error. The next step was to compile the model with the option -heap-arrays (I am using ifort). So I asked the administrator to do so. Although as you point out in the ticket it may affect performance by slowing down the computations, I hope it could help to detect where is the problem and solved it in a better way.

Thanks a lot, I really appreciate your help.

-Francisco


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 26, 2018 4:33 pm 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Today I have been able to run the model without error using the option -heap-arrays but it affect a lot the performance. Below you will find a comparative table


OceanM Versión 3.7 rev.922 // 2 Nested grid // 2 nodes 16 CPU/node // Total Elapsed CPU Time = 30138.171 sec
OceanM Versión 3.7 rev.836 // 2 Nested grid // 2 nodes 16 CPU/node // Total Elapsed CPU Time = 9568.668 sec


I will like to keep ROMS updated but the performance penalty it´s too high. The memory configuration used has been:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256669
max locked memory (kbytes, -l) 4086160
max memory size (kbytes, -m) 65536000
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited


I will try to keep on working to find the problem with my NetCDF file that I was able to use in the older version but give me this error in the last one. Any clue are really welcome.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 26, 2018 10:00 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1078
Location: IMCS, Rutgers University
Yes, your problem is the stack size per CPU and it seems to be associated with the automatic arrays used in distributed-memory for I/O operations. This is not a direct ROMS problem, but a computer problem because of not enough memory to handle automatic arrays that are either allocated on stack or heap for scattering/gathering of I/O.

I see that you are using 32 CPUs. How big are all your grids? You said that have two nested grids.

I updated the code today for reporting memory requirements. See :arrow: trac ticket for more information.


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 27, 2018 12:36 pm 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Dear Arango,

I have updated the code with the last revision. Now I have been able to run the model without the option -heap-arrays but it still take more time than older revisions.

OceanM Versión 3.7 rev.923 // 2 Nested grid // 2 nodes 16 CPU/node // Total Elapsed CPU Time = 28580.836 sec
OceanM Versión 3.7 rev.922 // 2 Nested grid // 2 nodes 16 CPU/node // Total Elapsed CPU Time = 30138.171 sec
OceanM Versión 3.7 rev.836 // 2 Nested grid // 2 nodes 16 CPU/node // Total Elapsed CPU Time = 9568.668 sec

The memory report shows:

Code:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 Dynamic and Automatic memory (MB) usage for Grid 01:  332x332x10  tiling: 4x8

     tile          Dynamic        Automatic            USAGE      MPI-Buffers

        0            44.95            19.63            64.59             8.92
        1            45.33            19.63            64.97             8.92
        2            45.33            19.63            64.97             8.92
        3            45.14            19.63            64.78             8.92
        4            46.48            19.63            66.11             8.92
        5            46.90            19.63            66.53             8.92
        6            46.90            19.63            66.53             8.92
        7            46.69            19.63            66.32             8.92
        8            46.48            19.63            66.11             8.92
        9            46.90            19.63            66.53             8.92
       10            46.90            19.63            66.53             8.92
       11            46.69            19.63            66.32             8.92
       12            46.48            19.63            66.11             8.92
       13            46.90            19.63            66.53             8.92
       14            46.90            19.63            66.53             8.92
       15            46.69            19.63            66.32             8.92
       16            46.48            19.63            66.11             8.92
       17            46.90            19.63            66.53             8.92
       18            46.90            19.63            66.53             8.92
       19            46.69            19.63            66.32             8.92
       20            46.48            19.63            66.11             8.92
       21            46.90            19.63            66.53             8.92
       22            46.90            19.63            66.53             8.92
       23            46.69            19.63            66.32             8.92
       24            46.48            19.63            66.11             8.92
       25            46.90            19.63            66.53             8.92
       26            46.90            19.63            66.53             8.92
       27            46.69            19.63            66.32             8.92
       28            45.33            19.63            64.97             8.92
       29            45.72            19.63            65.36             8.92
      30            45.72            19.63            65.36             8.92
       31            45.53            19.63            65.16             8.92

      SUM          1484.82           628.28          2113.11           285.58

 Dynamic and Automatic memory (MB) usage for Grid 02:  222x189x10  tiling: 4x8

     tile          Dynamic        Automatic            USAGE      MPI-Buffers

        0            24.61             9.00            33.60             9.00
        1            24.61             9.00            33.60             9.00
        2            24.61             9.00            33.60             9.00
        3            24.48             9.00            33.48             9.00
        4            24.61             9.00            33.60             9.00
        5            24.61             9.00            33.60             9.00
        6            24.61             9.00            33.60             9.00
        7            24.48             9.00            33.48             9.00
        8            24.61             9.00            33.60             9.00
        9            24.61             9.00            33.60             9.00
       10            24.61             9.00            33.60             9.00
       11            24.48             9.00            33.48             9.00
       12            24.61             9.00            33.60             9.00
       13            24.61             9.00            33.60             9.00
       14            24.61             9.00            33.60             9.00
       15            24.48             9.00            33.48             9.00
       16            24.61             9.00            33.60             9.00
       17            24.61             9.00            33.60             9.00
       18            24.61             9.00            33.60             9.00
       19            24.48             9.00            33.48             9.00
       20            24.61             9.00            33.60             9.00
       21            24.61             9.00            33.60             9.00
       22            24.61             9.00            33.60             9.00
       23            24.48             9.00            33.48             9.00
       24            24.61             9.00            33.60             9.00
       25            24.61             9.00            33.60             9.00
       26            24.61             9.00            33.60             9.00
       27            24.48             9.00            33.48             9.00
       28            24.06             9.00            33.06             9.00
       29            24.06             9.00            33.06             9.00
       30            24.06             9.00            33.06             9.00
       31            23.94             9.00            32.94             9.00

      SUM           784.22           287.92          1072.14           287.92

    TOTAL          2269.04           916.20          3185.24           573.50


I have been reviewing old model outputs and I have realize that in the older version the options heap-array was activated. Below you will find the compiler options used:

Code:
 Operating system : Linux

 CPU/hardware     : x86_64
 Compiler system  : ifort
 Compiler command : /opt/intel/parallel_studio_xe_2016_update2/impi/5.1.3.181/intel64/bin/mpiifort
 Compiler flags   : -heap-arrays -fp-model precise -ip -O3 -free -free -free

 SVN Root URL  : https://www.myroms.org/svn/src/trunk
 SVN Revision  : 836M
==============================================================

 Operating system : Linux
 CPU/hardware     : x86_64
 Compiler system  : ifort
 Compiler command : /opt/intel/parallel_studio_xe_2016_update2/impi/5.1.3.181/intel64/bin/mpiifort
 Compiler flags   : -fp-model precise -ip -O3
 MPI Communicator : 1140850688  PET size = 32

 SVN Root URL  : https://www.myroms.org/svn/src/trunk
 SVN Revision  : 923M


Quote:
I see that you are using 32 CPUs. How big are all your grids? You said that have two nested grids.


I used to run 1 grid with 3 refined grids. After reading some of your tickets explaining the importance to test the best cores configuration to run the model I decided to get a test case only with 1 donor grid (Lm=332 and Mn=332) and 1 refined grid (Lm=222 and Mm=189)and do some test changing the number of code used and different domain decomposition parameters. But then I started to get the segmentation fault error.

Thanks a lot for your help,

-Francisco


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 27, 2018 1:53 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1078
Location: IMCS, Rutgers University
I think that you need to read the following :arrow: trac ticket and choose the MPI communication options that are more efficient in the computer environment that you are running. You should check the profiling information that ROMS reports to standard output to see in what region of the code are slower. If -heap-arrays is faster, then use it. However, it is our experience that the -heap-arrays option for ifort is less efficient.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 28, 2018 7:53 am 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Dear Arango,

I really appreciate your help. I was trying to test the different configuration to try to speed up my runs, but then I got in trouble with the segmentation fault.

I will talk with the administrator to analyze the output to see in what region of the code are slower.

About the heap-array options it´s quite strange. The latest revisions (922) took three time more than the older ones (836), both using heap-array. After last updated (923) I am able to run the model without heap-array but I am still getting worse performance than with 836M (near three times slower).

Regards,

-Francisco


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 28, 2018 7:50 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1078
Location: IMCS, Rutgers University
I am going to try again for the last time. Read carefully :arrow: trac ticket 747. In the older version of the code, we choose either lower- or higher-level MPI function for exchanges. We no longer do that in the newer versions, you need to experiment and select which options are more efficient in your computer. The computer administrator cannot help you with that. You need to select the appropriate ROMS CPP options :idea: If you don't know what I am talking about, you need to learn a little about the distributed-memory paradigm.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 29, 2018 9:19 am 
Offline

Joined: Tue Nov 10, 2009 6:42 pm
Posts: 68
Location: Technical University of Cartagena,Murcia, Spain
Hi Arango,

I am sorry for bothering you. I was writing the result in the forum just in case it could help other users and perhaps get some feedback. I have started to do the performance test with the different configuration explained in the ticket and trying to learn a little about distributed memory paradigm. I hope to be able to get the same performance with the new revision as the older one.

Thanks a lot,

-Francisco


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group