Ocean Modeling Discussion

ROMS/TOMS

Search for:
It is currently Wed Jul 17, 2019 6:40 am




Post new topic Reply to topic  [ 3 posts ] 

All times are UTC

Author Message
PostPosted: Wed Oct 17, 2018 5:18 am 
Offline

Joined: Wed Oct 01, 2014 8:57 pm
Posts: 58
Location: Tokyo Institute of Technology
Dear ROMS users,

We've had this problem for around half a year already and have attempted working with the technical staff in our institution. However, the problem persists and I thought I'd try asking here in the forums.

We're running ROMS on a parallel computing cluster, and when we use only 1 node (which in our case consists of 28 cores), we are able to run successfully. However, whenever we try to use 2 or more nodes, the run fails near the start and seems to occur while reading the initial condition netcdf file. In the log file, here's how it appears:

Metrics information for Grid 01:
===============================

Minimum X-grid spacing, DXmin = 1.50000000E+00 km
Maximum X-grid spacing, DXmax = 1.50000000E+00 km
Minimum Y-grid spacing, DYmin = 1.50000000E+00 km
Maximum Y-grid spacing, DYmax = 1.50000000E+00 km
Minimum Z-grid spacing, DZmin = -1.33120450E+01 m
Maximum Z-grid spacing, DZmax = 2.34310913E+03 m

Minimum barotropic Courant Number = 2.66422670E-02
Maximum barotropic Courant Number = 7.21447999E-01
Maximum Coriolis Courant Number = 3.96367952E-03


Minimum horizontal diffusion coefficient = 1.25000000E+01 m2/s
Maximum horizontal diffusion coefficient = 1.25000000E+01 m2/s

Minimum horizontal viscosity coefficient = 1.25000000E+01 m2/s
Maximum horizontal viscosity coefficient = 1.00000000E+20 m2/s

NLM: GET_STATE - Reading state initial conditions, 2016-04-30 00:00:00.00
(Grid 01, t = 5964.0000, File: CRSE_MB1_ini_160501.nc, Rec=0001, Index=1)
- free-surface
(Min = -2.04140008E-01 Max = 1.37334052E+00)
- vertically integrated u-momentum component
(Min = -3.23873087E-01 Max = 7.55435041E-01)
- vertically integrated v-momentum component
(Min = -2.82343629E-01 Max = 6.29832212E-01)

And when the run fails, a file with a *.btr extension is generated and contains the following lines:

oceanM:56948 terminated with signal 11 at PC=0 SP=7fffffff74a8. Backtrace:
/usr/lib64/libinfinipath.so.4(+0x45a8)[0x2aaac28bd5a8]
/lib64/libpthread.so.0(+0x10b20)[0x2aaaac957b20]

If anyone has experienced a similar issue and solved it or might have some thoughts on how to go about doing so, I would greatly appreciate any help.

Thanks,
Lawrence


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 18, 2018 3:55 am 
Offline

Joined: Wed Dec 31, 2003 6:16 pm
Posts: 786
Location: USGS, USA
this looks like an architecture/lib issue. this looks similar:
https://software.intel.com/en-us/forums ... pic/270080

-j


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 18, 2018 10:16 am 
Offline

Joined: Wed Oct 01, 2014 8:57 pm
Posts: 58
Location: Tokyo Institute of Technology
Thank you for the link Dr. Warner. I'll see if I can use this when I get a chance to consult with our technical staff on the issue.

Lawrence


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group