problem with perfect restart

Cool Findings and Plots

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
susonic
Posts: 159
Joined: Tue Aug 21, 2007 5:44 pm
Location: Jeju National University
Contact:

problem with perfect restart

#1 Post by susonic » Fri Apr 30, 2010 3:43 am

Hi all,

I found possible bug with perfect restart option in ROMS.
without perfect restart option, we can write restart nc file
continusouly(say up to 5th or 10th).

But with defining perfect restart option, ROMS cannot write more than 3 step in rst nc file.
The error message is
'WRT_RST - error while writing variable Hsbl into restart Netcdf file for time record : 4'

I'm using svn 453.

Any solution with this error?

Regards,

-Peter

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: problem with perfect restart

#2 Post by kate » Sat May 01, 2010 3:52 am

It sounds like you're running into a file size limit. PERFECT_RESTART makes the restart files bigger because you need to save more of the state of the model. Are your files 2 GB in size when you get into trouble?

User avatar
susonic
Posts: 159
Joined: Tue Aug 21, 2007 5:44 pm
Location: Jeju National University
Contact:

Re: problem with perfect restart

#3 Post by susonic » Sat May 01, 2010 5:45 am

Hi, Kate. Thanks for the reply.

My rst file size is only 504mb when it stops.

I just came to know that it's concerned with vertical mixing scheme.
When I ran the model with LMD and PERFECT_RESTART activated, it couldn't save more than 3 step.
However, when I define different vertical mixing scheme(say gen) and PERFECT_RESTART, it went through.

Any suggestion?

Best,

-Peter

tony1230
Posts: 87
Joined: Wed Mar 31, 2010 3:29 pm
Location: SKLEC,ECNU,Shanghai,China

Re: problem with perfect restart

#4 Post by tony1230 » Sat May 01, 2010 9:31 am

hi susonic and kate
i got a same return from the run.yes,without the PERFECT_RESTART option, run can
go on wheels, but with it turn on, run blowup within some steps.
till now, even no one idea rocks me.seeing susonic reported it to all,i deem it's
not a problem to me only.
[quote="susonic"]Hi all,

The error message is
'WRT_RST - error while writing variable Hsbl into restart Netcdf file for time record : 4'

the only difference to susonic is the variable in my err display is "wetdry_mask_rho".i think
there's also err with my "mask",but i check it over and oevr again, it seems ordinary!
so,how can i get rid of them?

ths in advance and any suggestion is appreciate!

hugobastos
Posts: 15
Joined: Tue May 06, 2008 8:46 pm
Location: FURG

Re: problem with perfect restart

#5 Post by hugobastos » Fri Jun 10, 2011 4:53 pm

Hello,

I think that have found this bug.

is all about LMD_SKPP or LMD_BKPP defined together with PERFECT RESTART.

If the user set LMD_MIXING + LMD_SKPP || LMD_BKPP + PERFECT_RESTART + CycleRST == F a error happens when writing the 4 timestamp output of rst file:

If only LMD_BKPP :

WRT_RST - error while writing variable: Hbbl
into restart NetCDF file for time record: 4

only LMD_SKPP or both:

WRT_RST - error while writing variable: Hsbl
into restart NetCDF file for time record: 4

This can be easy reproduced running the wc13 test case, changing the GLS_MIXING by LMD + SKPP or BKPP ( and cyclerst=false). Without the SKPP & BKPP the model write everything fine and pass the 4 step. So the bug seems to be about the dimensions of the variables when writing/define the restart file...

Looking inside def_rst.F:

Code: Select all

status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHsbl),     &
     &                 NF_FRST, nvd3, t2dgrd, Aval, Vinfo, ncname)


status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHbbl),     &
     &                 NF_FRST, nvd3, t2dgrd, Aval, Vinfo, ncname)

So i think that this fields r expected to be the like free-surface (shape)...the input to def_var should be:

Code: Select all

status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHsbl),     &
     &                 NF_FRST, nvd4, t2dgrd, Aval, Vinfo, ncname

status=def_var(ng, iNLM, RST(ng)%ncid, RST(ng)%Vid(idHbbl),     &
     &                 NF_FRST, nvd4, t2dgrd, Aval, Vinfo, ncname)
the variable nvd4 points to 4 (time,3,xi,eta) dimensions. But , when compiling and running the model again, the same error occurs with nans inside the variables ( ocean_time++), maybe by some miss-shape around this variables.

So, this behaviour raised a doubt about the writing of this variables being wrong in the default code. In the default code, when writing 2 timestamps in the rst file, the first timestamp write all variables right (zeta,u,v,temp,etc) and the Hsbl & Hbbl with only 3 dimensions (three,eta,xi),without the ocean_time unlimited dimension. So after the next timestep, u know what occurs, all the fields are writed to the next ocean_time dimension and Hsbl & Hbbl are writed to the second dimension ( not ocean_time), until the 4, when the code complains about writing to a wrong size variable.

So the writing of these variables are wrong. Probably is the shape of this variables, or a missing redirect to a new shape to store the 3 timesteps (dimension "three"), etc. Maybe, the wet_dry case is similar.

Actually, it's seems that the variables Hsbl & Hbbl inside the restart file are not used to restart the model. if the user save only 1 step in the restart file with the fields, remove them , and restart the model with this file, no error is raised about the missing variables

So i think , for now, it's safe to remove the write of this variables in the restart file to permit runs with cyclerst=false.


Sorry about the long text ... :shock:

turuncu
Posts: 128
Joined: Tue Feb 01, 2005 8:21 pm
Location: Istanbul Technical University (ITU)
Contact:

Re: problem with perfect restart

#6 Post by turuncu » Sat Apr 21, 2012 5:37 pm

Hi,

I have same problem like,

WRT_RST - error while writing variable: wetdry_mask_rho
into restart NetCDF file for time record: 4


In my configuration following CPP options are defined.

Code: Select all

....
#define PERFECT_RESTART
#define WET_DRY
....
and also i defined to write restart file each day with following options,

Code: Select all

   LcycleRST == F
        NRST == 288
Is there any solution for this problem? The model can only write three time step into the restart file and it is not a file size issue (size is around 180 MB). May be it is solved by removing PERFECT_RESTART but i want to used it to be consistent with my previous runs.

Regards,

--ufuk

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: problem with perfect restart

#7 Post by kate » Mon Apr 23, 2012 5:59 pm

Things to try:

1. Set LcycleRST to T
2. Find out what's really going wrong. What is the netcdf error code when the failure happens? ROMS can be modified to print this out on error. In mod_netcdf.F, I added

Code: Select all

PRINT *, trim(nf90_strerror(status))
to each block inside:

Code: Select all

IF (status.ne.nf90_noerr) THEN

turuncu
Posts: 128
Joined: Tue Feb 01, 2005 8:21 pm
Location: Istanbul Technical University (ITU)
Contact:

Re: problem with perfect restart

#8 Post by turuncu » Fri Apr 27, 2012 2:22 pm

Hi Kate,

I add the print statement inside the if statement. The code complaining about "index" like following,

Code: Select all

 NetCDF: Index exceeds dimension bound

 WRT_RST - error while writing variable: wetdry_mask_rho
           into restart NetCDF file for time record:    4
I think that i found the problem in the ROMS/Utility/def_rst.F file there is a definition like,

Code: Select all

!
!  Set unlimited time record dimension to current value.
!
        IF (LcycleRST(ng)) THEN
          RST(ng)%Rindex=0
        ELSE
          RST(ng)%Rindex=rec_size
        END IF
So, if i set the LcycleRST as true then it will create file with unlimited time dimension but in my case i want to set it true. What do you think? BTW, this is ice branch.

Regards,

--ufuk

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: problem with perfect restart

#9 Post by kate » Fri Apr 27, 2012 6:45 pm

Thanks for pointing out that this is the ice branch, but of course we want to know if the trunk also has the same issue.

The time dimension should be unlimited no matter how you set LcycleRST. I take it you meant that you want LcycleRST to be false?

I tried a simple 2-D problem with LcycleRST set to false and WET_DRY #defined. I got up to 9 restart records before I killed it. The only weirdness I experienced is that zeta has values inside the land mask. I don't know what to suggest.

User avatar
arango
Site Admin
Posts: 1081
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: problem with perfect restart

#10 Post by arango » Fri Apr 27, 2012 6:57 pm

I doubt that it is possible to have a perfect restart with wetting and drying with the current design. In wetting and drying, the free-surface and the land/sea mask is changing at every time-step! So we would need the land/sea masking for three-consecutive time-steps averaged over all barotropic time-steps... We cannot reproduce the changes in land/sea masking for every barotropic time-steps. We would be able to get a restart but it won't be a perfect restart. But I wander if that really matters...

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: problem with perfect restart

#11 Post by kate » Fri Apr 27, 2012 10:22 pm

I tried again with PERFECT_RESTART and I can reproduce the problem. In this case, we have:

Code: Select all

        float wetdry_mask_rho(three, eta_rho, xi_rho) ;
                wetdry_mask_rho:long_name = "wet/dry mask on RHO-points" ;
                wetdry_mask_rho:flag_values = 0.f, 1.f ;
                wetdry_mask_rho:flag_meanings = "land water" ;
                wetdry_mask_rho:time = "ocean_time" ;
                wetdry_mask_rho:coordinates = "lon_rho lat_rho ocean_time" ;
                wetdry_mask_rho:field = "wetdry_mask_rho, scalar, series" ;
        double zeta(ocean_time, three, eta_rho, xi_rho) ;
                zeta:long_name = "free-surface" ;
                zeta:units = "meter" ;
                zeta:time = "ocean_time" ;
                zeta:coordinates = "lon_rho lat_rho ocean_time" ;
                zeta:field = "free-surface, scalar, series" ;
Note that the dimensionality of the wetdry_masks have a "three" in them, so fail on the fourth. This should be "ocean_time". The problem comes about in def_rst.F, in which t2dgrd is correct for zeta, but is used for both zeta and the mask.

User avatar
arango
Site Admin
Posts: 1081
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: problem with perfect restart

#12 Post by arango » Fri Apr 27, 2012 10:28 pm

OK, thank you for looking at it. I will add it to several corrections that I am currently doing and will update repository early next week.

Pysh
Posts: 30
Joined: Tue Nov 29, 2011 3:51 pm
Location: Hydrometcenter of Russia

Re: problem with perfect restart

#13 Post by Pysh » Sat Jul 14, 2012 12:01 pm

Hi, all

I know that it isn't new subject in the forum, but I can't understand is it real to use correctly PERFECT_RESTART with WET_DRY options now. In my case (changeset_624) restart file include fields WETDRY_MASK_RHO = 1 in all points for both records (I use LcycleRST == T), WETDRY_MASK_U = 1, WETDRY_MASK_V = 1 for first and second layers and WETDRY_MASK_U = NOVALUE, WETDRY_MASK_V = NOVALUE for third layer. But, in WETDRY_MASK_RHO for the writing time of the restart file were points with 0 values. In the case of running model from this restart file for the first moment in the points where should be WETDRY_MASK_RHO == 0 (but in the restart file WETDRY_MASK_RHO = 1) the temperature start with zero value. If I can use PERFECT_RESTART with WET_DRY options how to do it?

Thanks in advance

Boris

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: problem with perfect restart

#14 Post by kate » Fri Nov 02, 2012 4:51 pm

I recently posted a bug fix to Trac for LMD plus perfect_restart. It required getting the right indices for Hsbl and Hbbl in def_rst, also storing Ghats. Not to mention reading in all of the above and using the stored values instead of computing new ones on the first step.

Post Reply