How to tell ROMS to exit due to problematic numbers?

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
kearneyb10k
Posts: 14
Joined: Tue Oct 16, 2018 4:26 am
Location: University of Washington, JISAO

How to tell ROMS to exit due to problematic numbers?

#1 Unread post by kearneyb10k »

In our current ROMS domain (which includes sea ice and biological dynamics), numerical instability issues (i.e. blowups) sometimes manifest themselves in ice- or biology-related portions of the code before hitting the usual checks for speed and energy (in diag.F). Specifically, we sometimes get negative numbers where they shouldn't be in the ice_frazil.F calculations, or NaNs where they shouldn't be in the biological source/sink calculations. These trouble spots typically appear in winter months with high mixing and dynamic ice conditions, and can be resolved by decreasing the time step of the simulation for a short period of time. So same underlying cause and solution as the typical blowup, just a slightly different symptom.

I would like to modify our code such that these types of situations trigger the same blowup messages and clean exit as in a more typical blowup situation. Thus far, I've added a some simple code to check for these bad conditions and set the exit_flag variable to 1. When running, this successfully causes the calculations to stop when errant conditions are encountered. However, unlike in classic blowups, the running simulation does not gracefully clean up files, write error messages to standard output, etc.; it simply hangs, and I then need to manually cancel the process. It seems things are not properly proceeding through the close_io.F cleanup steps.

I've pored diag.F and close_io.F as well as the various different uses of exit_flag throughout the code, but I can't seem to figure out what I'm missing here. Can anyone point me toward the bits of code I should modify in order to throw the blowup error message and gracefully exit the running simulation?

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: How to tell ROMS to exit due to problematic numbers?

#2 Unread post by kate »

What code are you starting with, exactly? Over the years I have added some of these checks to diag.F, etc. Still need more...

kearneyb10k
Posts: 14
Joined: Tue Oct 16, 2018 4:26 am
Location: University of Washington, JISAO

Re: How to tell ROMS to exit due to problematic numbers?

#3 Unread post by kearneyb10k »

Sorry for the delayed response...

I'm starting with the Al Hermann's Bering 10K code, which derives from the NEP5 code... which derives from one of your branches with sea ice. The version is identified as svn $Id: License_ROMS.txt 895 2009-01-12 21:06:20Z kate $ in the License, though I believe it had incorporated more recent updates through manual editing prior to me starting to work with it. So definitely old and out of sync with the version offered currently on myroms.org, but I was hoping the general framework for introducing error flags was still similar enough to be applicable.

The code does include several of your custom error_flag values in the ice code. But those, like the ones I attempted to add, seem to only partially work. They stop the simulation, but do not proceed to the step where files are closed, error messages are written to standard output, and CPUs are released; instead, everything just pauses until I manually kill the computation job.

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: How to tell ROMS to exit due to problematic numbers?

#4 Unread post by kate »

Oh, boy. I fixed my codes, but while it was after Al's fork, it was long enough ago that I don't remember exactly what it took to get that working. Not only do you have to change the error code, you have to check for it (master process) and share it with the other processes. Then *everyone* has to die gracefully together.

Post Reply