Possible bug in writing out restart file?

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
lanerolle
Posts: 157
Joined: Mon Apr 28, 2003 5:12 pm
Location: NOAA

Possible bug in writing out restart file?

#1 Post by lanerolle » Wed Jun 08, 2011 11:47 pm

I am running a large ROMS application - 622 x 667 x 50 points and the model crashes when it attempts to write out the restart file for the very first time. It however successfully writes out the history file and a ncdump shows that its contents are OK. I have tried using the Fortran-NetCDF library versions 4.0.1 and also 3.6.3 and they both give the same unfortunate outcome. The screen output for the crash is :

STEP Day HH:MM:SS KINETIC_ENRG POTEN_ENRG TOTAL_ENRG NET_VOLUME

0 68 00:00:00 0.000000E+00 2.386324E+04 2.386324E+04 9.077648E+14
DEF_HIS - creating history file: ocean_his1.nc
WRT_HIS - wrote history fields (Index=1,1) into time record = 0000001

NETCDF_ENDDEF - unable to end definition mode for file:
ocean_rst1.nc
call from: def_rst.F

Elapsed CPU time (seconds):

A ncdump of the restart file gives:

>>ncdump: ocean_rst1.nc: Not a netCDF file

I am using ROMS/TOMS version 3.4 (svn version/update 526) together with the CPP option PERFECT_RESTART.

Could this be a bug in the restart file writing routines?

User avatar
shchepet
Posts: 185
Joined: Fri Nov 14, 2003 4:57 pm

Re: Possible bug in writing out restart file?

#2 Post by shchepet » Thu Jun 09, 2011 2:17 am


NETCDF_ENDDEF - unable to end definition mode for file:
ocean_rst1.nc
call from: def_rst.F
Now it is something a bit new: it looks like it went through several netCDF calls
regarding creation of all dimensions and the variables, and then cannot finish its
definition. Do you have any other error messages prior to this? Do you attempt
to create the restart file from scratch, or there is a file already present and
you attempt to modify it?

[Note that my experience is that if I run model and allow a netCDF file to grow
beyond 2GB, while having it created using 32-bit offset, the outcome of this is
not just termination of the program because of non-zero error status from a netCDF
call, but actually a corrupt header of the file, so that it is completely not
readable, and any subsequent attempt to do anything with this file (other than
deleting it) will cause an error.]

>>ncdump: ocean_rst1.nc: Not a netCDF file
This may be caused by two reasons: either the header of file "ocean_rst1.nc" was
not created properly, so that it is, indeed, not a netCDF file (note that in order
for a file to be recognised as netCDF file by ncdump, it does not necessary have to
contain any data: it is sufficient to create a dimension or to put an attribute, and
then close it); or (2) less likely, if file was created with 64-bit offset using a
sufficiently modern netCDF library, but ncdump program was compiled with older version
(say ncdump was from a previous installation placed in /usr/local/bin) which does not
support 64-bit. Type "which ncdump" to make sure.

1. From your output it looks like you have at least one record of history file.
1a. Does ncdump recognize it as a netCDF file?
2a. Is it 32- or 64-bit netCDF file?

2. When you compiled the netCDF library, did you also go through "make test" to
see whether it survives built-in tests coming with netCDF package?

Could this be a bug in the restart file writing routines?
If history file is OK (hence netCDF is working), but restart is not, then it is
definitely a bug in the calling program, since subroutines def_his and def_rst are
similar to each other, and so do wrt_his and wrt_rst.

lanerolle
Posts: 157
Joined: Mon Apr 28, 2003 5:12 pm
Location: NOAA

Re: Possible bug in writing out restart file?

#3 Post by lanerolle » Thu Jun 09, 2011 10:52 am

[quote]Now it is something a bit new: it looks like it went through several netCDF calls
regarding creation of all dimensions and the variables, and then cannot finish its
definition. Do you have any other error messages prior to this? Do you attempt
to create the restart file from scratch, or there is a file already present and
you attempt to modify it?[/quote

No the only error messages are the ones I wrote in my post - those from def_rst.F. Yes, the error appears when the restart file is created for the very first time. However, the history file is created and written quite successfully even for the very first time.

The ncdump I used on the history file (which showed healthy NetCDF file) is the same one I used to do a dump on the restart file. So the ncdump works correctly for the history file but not the restart file!

The outcomes when using NetCDF library version 3.6.3 and version 4.0.1 are identical. I have never seen this error before and ROMS has written out history and restart files without any problems for smaller applications for all of my previous applications.

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Possible bug in writing out restart file?

#4 Post by kate » Thu Jun 09, 2011 4:28 pm

Are you familiar with the BENCHMARK problem? It's a setup in which you can tune the size simply by changing Lm and Mm in ocean_benchmark?.in. Right now we have three versions. Can you reproduce this problem by going to an even bigger version of BENCHMARK? If there is a problem with the def_rst, it's likely to be in PERFECT_RESTART since most of us don't exercise that code very often.

If you suspect a size issue with netcdf, I would build the latest myself and run it through the tests it comes with, then make very sure I was linking to that version and using its ncdump.

Post Reply