Potential bug in ROMS perfect restart

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
lanerolle
Posts: 157
Joined: Mon Apr 28, 2003 5:12 pm
Location: NOAA

Potential bug in ROMS perfect restart

#1 Post by lanerolle » Wed Apr 16, 2014 10:42 pm

I am using ROMS/TOMS version 3.7 and I did two runs:

* Run 1: Ran ROMS for 207360 timesteps and saved the outputs including the perfect restart file. Then continued the run forward with a perfect restart (all input parameters identical to previous run) and the model blew-up:

Code: Select all

 208957    12 02:13:05  5.590659E-02  7.639421E+02  7.639980E+02  6.495758E+12
         (540,0523,30)  8.763278E-02  2.608425E-01  3.309943E+02  5.883825E+00
 208958    12 02:13:10  5.590291E-02  7.639392E+02  7.639951E+02  6.495738E+12
         (540,0523,30)  8.483870E-02  2.578357E-01  4.108136E+02  5.776456E+00

 Blowing-up: Saving latest model state into  RESTART file

      WRT_RST   - wrote re-start fields (Index=1,1) into time record = 0000001

 Elapsed CPU time (seconds):
* Run 2: Ran ROMS for 220000 timesteps (past the blow-up in Run 1) as a single model run (no restarts) and it completed the full 220000 timesteps without blowing up!

So there appears to be a potential bug in the perfect restart mechanism or there is something else going on - e.g. my model run is very close to the numerical stability limit, the wetting/drying (I have a tidal range of ~10-12 m in this run) is inducing a numerical instability, etc.

Could you please advise on how to address this situation?

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Potential bug in ROMS perfect restart

#2 Post by kate » Wed Apr 16, 2014 11:39 pm

I don't know that this is the only bug for WET_DRY and PERFECT_RESTART, but it is one: the tracer values for dry points are masked out in the restart file. Here's what I'm trying:

Code: Select all

diff --git a/ROMS/Utility/wrt_rst.F b/ROMS/Utility/wrt_rst.F
index 96c6867..db759cc 100644
--- a/ROMS/Utility/wrt_rst.F
+++ b/ROMS/Utility/wrt_rst.F
@@ -545,7 +545,7 @@
      &                     RST(ng)%Rindex, gtype,                       &
      &                     LBi, UBi, LBj, UBj, 1, N(ng), 1, 2, scale,   &
 #  ifdef MASKING
-     &                     GRID(ng) % rmask_io,                         &
+     &                     GRID(ng) % rmask,                         
   &
 #  endif
      &                     OCEAN(ng) % t(:,:,:,:,itrc))
 # else
@@ -553,7 +553,7 @@
      &                     RST(ng)%Rindex, gtype,                       &
      &                     LBi, UBi, LBj, UBj, 1, N(ng), scale,         &
 #  ifdef MASKING
-     &                     GRID(ng) % rmask_io,                         &
+     &                     GRID(ng) % rmask,                         
   &
 #  endif
      &                     OCEAN(ng) % t(:,:,:,NOUT,itrc))
 # endif
Correct me if I'm wrong, but I believe we want the dry points to have valid (non-zero) tracer values, but we don't want them to get timestepped. They are being timestepped in my domain - and sometimes going bad. I haven't fixed it yet...

Tomasz
Posts: 22
Joined: Tue Oct 07, 2008 11:27 am
Location: Marine Institute, Ireland

Re: Potential bug in ROMS perfect restart

#3 Post by Tomasz » Thu Apr 17, 2014 9:06 am

I had the same issue upon restarting (either PERFECT_RESTART or not). I am not using WET_DRY. In my case the solution was bit unstable around the rivers. I am using LMD mixing and in my case it was LMD_DDMIX and LMD_SHAPIRO that were causing issues. My full LMD set up that currently works is this:

# define LMD_RIMIX
# define LMD_CONVEC
# define LMD_SKPP
# define LMD_NONLOCAL
# define LMD_BKPP

If I recall correctly TS_MPDATA is also the only scheme that results in a stable solution around the rivers.

User avatar
arango
Site Admin
Posts: 1116
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: Potential bug in ROMS perfect restart

#4 Post by arango » Fri Apr 18, 2014 8:00 pm

The correction suggested by Kate above is incorrect. We ALWAYS need to use the rmask_io, umask_io, and vmask_io in the output NetCDF files :!: This is even more crucial when we have rivers and wetting and drying...

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Potential bug in ROMS perfect restart

#5 Post by kate » Fri Apr 18, 2014 8:51 pm

The original question is why WET_DRY breaks PERFECT_RESTART. I know the tracers in dry cells are a perfect example of this. If you write to the restart file with rmask_io, the tracers in the restart file in the dry cells are set to the _FillValue in the file, then set to zero on restart. If you don't stop the simulation, tracers in dry cells continue to have some value based on their initial value.

User avatar
arango
Site Admin
Posts: 1116
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: Potential bug in ROMS perfect restart

#6 Post by arango » Fri Apr 18, 2014 10:22 pm

I don't know. I will need to reproduce the problem to check if this is the case.

AlexisEspinosa
Posts: 23
Joined: Fri May 24, 2013 3:05 am
Location: UWA

Re: Potential bug in ROMS perfect restart

#7 Post by AlexisEspinosa » Wed Sep 24, 2014 3:29 am

I'm facing a similar problem here, although I'm not using perfect restart but a normal restart (many times) from the *rst.nc files. But I found that ROMS BLOWS UP WHEN RESTARTING MANY TIMES in a row!!! And does not blows up when ran continuously!

Due to a restricted allocation run-time in our supercomputing facility, I planned to run my ROMS simulations in small chunks and restart the simulation many times in a cycle. In that way, if my allocation time runs-out, then I will restart from my latest saved set of files. So I did a test of my submission script and I found that ROMS does not like restarts very much. ROMS BLOWS UP WHEN RESTARTING MANY TIMES!!! Why is this happening???

As in the above comments of this thread, I'm using WET_DRY (needed because of my huge tidal range in the forcing). But restarting from the common RST files (and not from perfect restart files). For this test, my small chunks of simulation are of 360 time steps (covering ~1hour in total as my time step is ~10s {DT=10.003168946726626d0}). I restart the simulation many times trying to cover a 24hours total simualtion time (24 restarts).

But doing this cycle of restarts, the simulations blows up. It runs fine for 8 restarts but it blows up at the 9th. Here is the output close to the blow up:

Code: Select all

 NL ROMS/TOMS: started time-stepping: (Grid: 01 TimeSteps: 00089281 - 00089640)


   STEP   Day HH:MM:SS  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
          C => (i,j,k)       Cu            Cv            Cw         Max Speed

  89280   282 08:04:42  2.479003E-02  2.410173E+02  2.410421E+02  7.148928E+11
          (048,063,35)  6.320714E-03  5.322611E-02  0.000000E+00  3.661505E+00
  89281   282 08:04:52  2.552012E-02  2.446391E+02  2.446647E+02  7.044243E+11
          (104,096,40)  1.590393E-02  1.424100E-02  1.607916E+01  3.657581E+00
  89282   282 08:05:02  2.558147E-02  2.446433E+02  2.446689E+02  7.044514E+11
          (104,096,40)  1.456755E-02  1.269888E-02  1.670602E+01  3.403521E+00
  89283   282 08:05:12  2.568717E-02  2.446475E+02  2.446732E+02  7.044785E+11
          (104,096,40)  1.321542E-02  1.074954E-02  1.536749E+01  3.359425E+00
  89284   282 08:05:22  2.581073E-02  2.446517E+02  2.446775E+02  7.045057E+11
          (104,096,40)  1.249286E-02  9.867474E-03  1.357870E+01  3.363641E+00
  89285   282 08:05:32  2.593870E-02  2.446560E+02  2.446819E+02  7.045330E+11
          (104,094,40)  1.006534E-02  1.366681E-03  1.317981E+01  3.370461E+00
  89286   282 08:05:42  2.606966E-02  2.446602E+02  2.446862E+02  7.045604E+11
          (104,096,40)  1.365864E-02  1.061730E-02  1.246750E+01  3.380622E+00
  89287   282 08:05:52  2.620533E-02  2.446644E+02  2.446906E+02  7.045879E+11
          (104,096,40)  1.514322E-02  1.154090E-02  1.387893E+01  3.390705E+00
  89288   282 08:06:02  2.634384E-02  2.446687E+02  2.446950E+02  7.046154E+11
          (104,096,40)  1.615597E-02  1.085659E-02  1.507718E+01  3.400685E+00
  89289   282 08:06:12  2.648552E-02  2.446729E+02  2.446994E+02  7.046430E+11
          (104,096,40)  1.762328E-02  1.227998E-02  1.555833E+01  3.410534E+00
  89290   282 08:06:22  2.662899E-02  2.446772E+02  2.447038E+02  7.046707E+11
          (104,096,40)  1.879293E-02  1.243754E-02  1.690179E+01  3.419896E+00
  89291   282 08:06:32  2.677449E-02  2.446815E+02  2.447083E+02  7.046985E+11
          (104,096,40)  2.014827E-02  1.384759E-02  1.836597E+01  3.428964E+00
  89292   282 08:06:42  2.692095E-02  2.446858E+02  2.447127E+02  7.047264E+11
          (104,096,40)  2.098085E-02  1.419100E-02  1.982157E+01  3.437324E+00
  89293   282 08:06:52  2.706869E-02  2.446901E+02  2.447172E+02  7.047544E+11
          (104,096,40)  2.135433E-02  1.511572E-02  2.133009E+01  3.444007E+00
  89294   282 08:07:02  2.721721E-02  2.446944E+02  2.447216E+02  7.047824E+11
          (104,096,40)  2.097109E-02  1.159407E-02  2.162402E+01  5.839296E+00
  89295   282 08:07:12  2.740157E-02  2.446988E+02  2.447262E+02  7.048105E+11
          (104,096,40)  1.986952E-02  5.070485E-02  1.939949E+01  4.035932E+01

Blowing-up: Saving latest model state into  RESTART file

      WRT_RST   - wrote re-start fields (Index=2,2) into time record = 0000001

 Elapsed CPU time (seconds):

 Node   #  0 CPU:      10.527
 Node   #  3 CPU:      10.581
 Node   #  1 CPU:      10.588
 Node   #  2 CPU:      10.586
 Node   #  5 CPU:      10.585
 Node   #  4 CPU:      10.586

 ROMS/TOMS - Output NetCDF summary for Grid 01:
             number of time records written in RESTART file = 00000001

 Analytical header files used:

     ROMS/Functionals/ana_btflux.h
     /scratch/partner658/espinosa/ROMS/670_my_own_testing/02_SS_JustM2_MassiveParticles_HydroTest1/Functionals/ana_fsobc.h
     /scratch/partner658/espinosa/ROMS/670_my_own_testing/02_SS_JustM2_MassiveParticles_HydroTest1/Functionals/ana_m2obc.h
     ROMS/Functionals/ana_nudgcoef.h
     ROMS/Functionals/ana_smflux.h
     ROMS/Functionals/ana_stflux.h
     /scratch/partner658/espinosa/ROMS/670_my_own_testing/02_SS_JustM2_MassiveParticles_HydroTest1/Functionals/ana_tobc.h

 ROMS/TOMS: DONE... Tuesday - September 23, 2014 - 12:38:47 AM

But when running the similation continuously, it runs without any problem!!
As you can see from the output of the similar time steps:


Code: Select all

...
...
...
...
...
  89280   282 08:04:42  2.670702E-02  2.446185E+02  2.446452E+02  7.042765E+11
          (104,096,40)  1.498553E-02  1.352495E-02  1.397800E+01  3.641511E+00
      WRT_HIS   - wrote history  fields (Index=1,1) into time record = 0000006
      WRT_RST   - wrote re-start fields (Index=1,1) into time record = 0000002
  89281   282 08:04:52  2.685490E-02  2.446227E+02  2.446496E+02  7.043038E+11
          (104,096,40)  1.547758E-02  1.397703E-02  1.492934E+01  3.637908E+00
  89282   282 08:05:02  2.700380E-02  2.446269E+02  2.446539E+02  7.043312E+11
          (104,096,40)  1.575886E-02  1.421823E-02  1.567068E+01  3.634307E+00
  89283   282 08:05:12  2.715340E-02  2.446311E+02  2.446583E+02  7.043587E+11
          (104,096,40)  1.588478E-02  1.431352E-02  1.618525E+01  3.630708E+00
  89284   282 08:05:22  2.730364E-02  2.446353E+02  2.446626E+02  7.043863E+11
          (104,096,40)  1.590744E-02  1.431219E-02  1.654407E+01  3.627112E+00
  89285   282 08:05:32  2.745461E-02  2.446396E+02  2.446670E+02  7.044140E+11
          (104,096,40)  1.590422E-02  1.426623E-02  1.679340E+01  3.623517E+00
  89286   282 08:05:42  2.760602E-02  2.446438E+02  2.446714E+02  7.044417E+11
          (104,096,40)  1.587656E-02  1.417374E-02  1.701210E+01  3.619922E+00
  89287   282 08:05:52  2.775840E-02  2.446481E+02  2.446758E+02  7.044695E+11
          (104,096,40)  1.583423E-02  1.405226E-02  1.717540E+01  3.616327E+00
  89288   282 08:06:02  2.791151E-02  2.446523E+02  2.446802E+02  7.044975E+11
          (104,096,40)  1.578613E-02  1.395136E-02  1.730568E+01  3.612730E+00
  89289   282 08:06:12  2.806538E-02  2.446566E+02  2.446847E+02  7.045255E+11
          (104,096,40)  1.575341E-02  1.388557E-02  1.743891E+01  3.609130E+00
  89290   282 08:06:22  2.822002E-02  2.446609E+02  2.446891E+02  7.045535E+11
          (104,096,40)  1.575442E-02  1.384692E-02  1.760374E+01  3.605526E+00
  89291   282 08:06:32  2.837534E-02  2.446652E+02  2.446936E+02  7.045817E+11
          (104,096,40)  1.572365E-02  1.381274E-02  1.782701E+01  3.601916E+00
  89292   282 08:06:42  2.853143E-02  2.446695E+02  2.446980E+02  7.046099E+11
          (104,096,40)  1.567685E-02  1.378948E-02  1.802741E+01  3.598298E+00
  89293   282 08:06:52  2.868806E-02  2.446738E+02  2.447025E+02  7.046382E+11
          (104,096,40)  1.561686E-02  1.376412E-02  1.823463E+01  3.594670E+00
  89294   282 08:07:02  2.884530E-02  2.446781E+02  2.447070E+02  7.046667E+11
          (104,096,40)  1.557221E-02  1.374078E-02  1.843543E+01  3.591031E+00
  89295   282 08:07:12  2.900351E-02  2.446825E+02  2.447115E+02  7.046951E+11
          (104,096,40)  1.553922E-02  1.369591E-02  1.866165E+01  3.587378E+00
  89296   282 08:07:22  2.916243E-02  2.446868E+02  2.447160E+02  7.047237E+11
          (104,096,40)  1.547416E-02  1.362648E-02  1.889349E+01  3.583709E+00
  89297   282 08:07:32  2.932207E-02  2.446912E+02  2.447205E+02  7.047524E+11
          (104,096,40)  1.540072E-02  1.355317E-02  1.907962E+01  3.580023E+00
  89298   282 08:07:42  2.948203E-02  2.446955E+02  2.447250E+02  7.047811E+11
          (104,096,40)  1.532771E-02  1.347997E-02  1.925973E+01  3.576317E+00
  89299   282 08:07:52  2.964292E-02  2.446999E+02  2.447296E+02  7.048099E+11
          (104,096,40)  1.218503E-02  1.016348E-02  1.937232E+01  3.572591E+00
  89300   282 08:08:02  2.980456E-02  2.447043E+02  2.447341E+02  7.048388E+11
          (104,096,40)  1.277930E-02  1.083514E-02  1.319938E+01  3.568841E+00
...
...
...
...
...
...
  95380   283 01:01:42  5.400885E-02  2.588579E+02  2.589119E+02  7.892415E+11
          (048,064,01)  0.000000E+00  3.799400E-02  5.501194E-01  5.228957E+00
  95381   283 01:01:52  5.378631E-02  2.588627E+02  2.589165E+02  7.892663E+11
          (048,064,01)  0.000000E+00  3.798532E-02  5.499430E-01  5.227709E+00
  95382   283 01:02:02  5.356422E-02  2.588676E+02  2.589212E+02  7.892909E+11
          (048,064,01)  0.000000E+00  3.797675E-02  5.497681E-01  5.226459E+00
  95383   283 01:02:12  5.334257E-02  2.588724E+02  2.589258E+02  7.893155E+11
          (048,064,01)  0.000000E+00  3.796829E-02  5.495952E-01  5.225211E+00
  95384   283 01:02:22  5.312137E-02  2.588773E+02  2.589304E+02  7.893399E+11
          (048,064,01)  0.000000E+00  3.795997E-02  5.494243E-01  5.223966E+00
  95385   283 01:02:32  5.290061E-02  2.588821E+02  2.589350E+02  7.893643E+11
          (048,064,01)  0.000000E+00  3.795180E-02  5.492557E-01  5.222726E+00
  95386   283 01:02:42  5.268031E-02  2.588869E+02  2.589395E+02  7.893886E+11
          (048,064,01)  0.000000E+00  3.794378E-02  5.490897E-01  5.221494E+00
  95387   283 01:02:52  5.246047E-02  2.588916E+02  2.589441E+02  7.894128E+11
          (048,064,01)  0.000000E+00  3.793593E-02  5.489263E-01  5.220271E+00
  95388   283 01:03:02  5.224107E-02  2.588964E+02  2.589486E+02  7.894368E+11
          (048,064,01)  0.000000E+00  3.792825E-02  5.487656E-01  5.219059E+00
  95389   283 01:03:12  5.202214E-02  2.589011E+02  2.589531E+02  7.894608E+11
          (048,064,01)  0.000000E+00  3.792073E-02  5.486076E-01  5.217860E+00
  95390   283 01:03:22  5.180366E-02  2.589058E+02  2.589576E+02  7.894847E+11
          (048,064,01)  0.000000E+00  3.791339E-02  5.484524E-01  5.216674E+00
  95391   283 01:03:32  5.158564E-02  2.589105E+02  2.589621E+02  7.895086E+11
          (048,064,01)  0.000000E+00  3.790622E-02  5.482999E-01  5.215504E+00
  95392   283 01:03:42  5.136808E-02  2.589152E+02  2.589666E+02  7.895323E+11
          (048,064,01)  0.000000E+00  3.789921E-02  5.481500E-01  5.214350E+00
  95393   283 01:03:52  5.115099E-02  2.589199E+02  2.589710E+02  7.895559E+11
          (048,064,01)  0.000000E+00  3.789236E-02  5.480026E-01  5.213212E+00
  95394   283 01:04:02  5.093436E-02  2.589245E+02  2.589755E+02  7.895794E+11
          (048,064,01)  0.000000E+00  3.788566E-02  5.478576E-01  5.212092E+00
  95395   283 01:04:12  5.071820E-02  2.589292E+02  2.589799E+02  7.896029E+11
          (048,064,01)  0.000000E+00  3.787911E-02  5.477149E-01  5.210988E+00
  95396   283 01:04:22  5.050250E-02  2.589338E+02  2.589843E+02  7.896262E+11
          (048,064,01)  0.000000E+00  3.787269E-02  5.475742E-01  5.209902E+00
  95397   283 01:04:32  5.028728E-02  2.589384E+02  2.589887E+02  7.896495E+11
          (048,064,01)  0.000000E+00  3.786639E-02  5.474355E-01  5.208833E+00
  95398   283 01:04:42  5.007253E-02  2.589430E+02  2.589930E+02  7.896726E+11
          (048,064,01)  0.000000E+00  3.786021E-02  5.472985E-01  5.207779E+00
  95399   283 01:04:52  4.985825E-02  2.589475E+02  2.589974E+02  7.896957E+11
          (048,064,01)  0.000000E+00  3.785412E-02  5.471630E-01  5.206741E+00
  95400   283 01:05:02  4.964444E-02  2.589521E+02  2.590017E+02  7.897187E+11
          (048,064,01)  0.000000E+00  3.784813E-02  5.470288E-01  5.205717E+00
      WRT_HIS   - wrote history  fields (Index=1,1) into time record = 0000006
      WRT_RST   - wrote re-start fields (Index=1,1) into time record = 0000001

 Elapsed CPU time (seconds):

 Node   #  0 CPU:   11946.184
 Node   # 17 CPU:   12230.091
 Node   # 18 CPU:   12222.168
 Node   # 19 CPU:   12216.437
 Node   #  5 CPU:   12192.117
 Node   #  6 CPU:   12188.503
 Node   #  7 CPU:   12174.581
 Node   #  9 CPU:   12181.620
 Node   # 20 CPU:   12213.519
 Node   #  4 CPU:   12182.725
 Node   #  1 CPU:   12176.599
 Node   #  2 CPU:   12177.346
 Node   #  3 CPU:   12180.974
 Node   # 11 CPU:   12194.746
 Node   #  8 CPU:   12181.300
 Node   # 10 CPU:   12189.401
 Node   # 21 CPU:   12212.763
 Node   # 22 CPU:   12217.321
 Node   # 23 CPU:   12225.852
 Node   # 12 CPU:   12227.724
 Node   # 13 CPU:   12224.131
 Node   # 14 CPU:   12212.782
 Node   # 15 CPU:   12217.012
 Node   # 16 CPU:   12220.612
 Node   # 43 CPU:   12221.638
 Node   # 44 CPU:   12216.134
 Node   # 45 CPU:   12217.062
 Node   # 46 CPU:   12220.611
 Node   # 36 CPU:   12223.850
 Node   # 47 CPU:   12233.105
 Node   # 37 CPU:   12218.730
 Node   # 38 CPU:   12222.902
 Node   # 39 CPU:   12221.564
 Node   # 40 CPU:   12215.394
 Node   # 41 CPU:   12228.349
 Node   # 42 CPU:   12227.992
 Node   # 27 CPU:   12218.353
 Node   # 28 CPU:   12222.798
 Node   # 29 CPU:   12226.374
 Node   # 30 CPU:   12230.767
 Node   # 31 CPU:   12223.091
 Node   # 32 CPU:   12218.498
 Node   # 33 CPU:   12218.167
 Node   # 34 CPU:   12214.387
 Node   # 35 CPU:   12228.949
 Node   # 24 CPU:   12223.831
 Node   # 25 CPU:   12216.456
 Node   # 26 CPU:   12212.922

 ROMS/TOMS - Output NetCDF summary for Grid 01:
             number of time records written in HISTORY file = 00000006
             number of time records written in RESTART file = 00000002

 Analytical header files used:

     ROMS/Functionals/ana_btflux.h
     /scratch/partner658/espinosa/ROMS/670_my_own_testing/02_SS_JustM2_MassiveParticles_HydroTest1/Functionals/ana_fsobc.h
     /scratch/partner658/espinosa/ROMS/670_my_own_testing/02_SS_JustM2_MassiveParticles_HydroTest1/Functionals/ana_initial.h
     /scratch/partner658/espinosa/ROMS/670_my_own_testing/02_SS_JustM2_MassiveParticles_HydroTest1/Functionals/ana_m2obc.h
     ROMS/Functionals/ana_nudgcoef.h
     ROMS/Functionals/ana_smflux.h
     ROMS/Functionals/ana_stflux.h
     /scratch/partner658/espinosa/ROMS/670_my_own_testing/02_SS_JustM2_MassiveParticles_HydroTest1/Functionals/ana_tobc.h

 ROMS/TOMS: DONE... Tuesday - September 23, 2014 - 12:30:34 AM

Any clue of why this restart problems are happening? I can share my simulation files if you want to reproduce the problem on your side. This is happening for version 3.6 (Built 670) and version 3.7 (built 737) as well.

Many thanks,
Alexis Espinosa

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Potential bug in ROMS perfect restart

#8 Post by kate » Wed Sep 24, 2014 5:25 am

I heard that Hernan was working on a fix for this. Once fixed, you might consider turning on PERFECT_RESTART so that you actually get a perfect restart. As for what's going wrong, I wrote about it here.

By the way, I fixed it in my branch, but only for MPI for the forward model. Hernan is searching for a more global fix.

AlexisEspinosa
Posts: 23
Joined: Fri May 24, 2013 3:05 am
Location: UWA

Re: Potential bug in ROMS perfect restart

#9 Post by AlexisEspinosa » Fri Jul 03, 2015 3:44 am

Do you know if this problem has been fixed?
I'm having problems with a restart using WET_DRY and PERFECT_RESTART.
And I want to know if the problem commented here is causing this anomaly.

Potential temperatures are going down to ZERO values in the very first time step after restarting. This is happening close to the already "dried" cells.
CFL values are all below 1, although the vertical one is 0.79.

Many thanks,
Alexis Espinosa

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Potential bug in ROMS perfect restart

#10 Post by kate » Fri Jul 03, 2015 6:00 pm

A fix was applied: https://www.myroms.org/projects/src/ticket/648

If it is still broken, it would be good to know.

AlexisEspinosa
Posts: 23
Joined: Fri May 24, 2013 3:05 am
Location: UWA

Re: Potential bug in ROMS perfect restart

#11 Post by AlexisEspinosa » Tue Jul 14, 2015 6:40 am

Thanks Katie.

I was using an older version of ROMS. I upgraded to version 766 and my problems dissapeared. So far, the fix seems to be working correctly.

Thanks.
Alexis

nbruneau
Posts: 4
Joined: Thu Jul 23, 2015 6:34 pm
Location: Imperial College London

Re: Potential bug in ROMS perfect restart

#12 Post by nbruneau » Tue Aug 04, 2015 2:38 pm

Hi all,

I'm new to ROMS and I might use a wrong set-up but I cannot get a perfect restart run working. The run is restarting well and leads to realistic results but the results are different from my single run.

After doing different tests, it appears to come from my river forcing.

For example, if I turn off the following,
LuvSrc == F
LwSrc == F
LtracerSrc == F F
My perfect restart is "PERFECT" but if they are set as
LuvSrc == T
LwSrc == F
LtracerSrc == T T
My perfect restart is different from the single run.

I also tried an old version of ROMS (UV_PSOURCE / TS_PSOURCE) as well as different regions / boundary conditions but I observe the same behaviours.

So far I only changed the NRREC and the ININAME in my input file. Do I have to change other parameters ? I tried the DSTART but without success either.

Many thanks for your help.
Nico

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Potential bug in ROMS perfect restart

#13 Post by kate » Tue Aug 04, 2015 7:19 pm

How important is this to you? The last time I was debugging PERFECT_RESTART is in a domain with rivers and I don't recall them being a problem. The best way to debug it (for me) is to run duelling debuggers, one starting at the beginning, the other starting from a restart file just a few steps into a run (must be an even number). Did you run your tests with an even number of steps for the restart interval?

nbruneau
Posts: 4
Joined: Thu Jul 23, 2015 6:34 pm
Location: Imperial College London

Re: Potential bug in ROMS perfect restart

#14 Post by nbruneau » Wed Aug 05, 2015 4:17 pm

Hi Kate,
Thanks for your fast answer.

In one of my case, after a while I could definitely see the differences by eyes in the fields but I was also playing with the DSTART variable at the time...
Do I need to update the DSTART for a PERFECT_RESTART or ROMS reads it from the restart file ? I had the feeling that changing it doesn't lead to the same results

I tried with both odd and even numbers for the NRST variable but I got the same results.

I also realized that if I output a variable at the restart time step, one of the time step is missing from the output and all output index are shifted of 1.
Example: if I have an initial output at 5min, 10 min, 15min and a restart at 10min, my restart run output (with LDEFOUT=T) only includes 5min (Index 1), 15min (Index 2), 15min (as it was written by the initial run and has not been overwritten).

I tried to dig a bit into the code but didn't figure out my issue so far...
Without the source/sink tracer activated:
- temp, salt, u,v, ubar, vbar, zeta, omega are the same for each time step
- however, Akt, Aks, Akv and gls are different

With the source/sink tracer activated: all variables differ...
- Akt, Aks, Akv and gls are different since the first time step of the restart run
- temp, salt match at (Restart + DT) but differ for the following time steps
- u, v, ubar, vbar, omega match at (Restart + DT) and (Restart + 2*DT) but differ after
- zeta matches for the first 4 time steps and differ after.

Sorry if I'm not very clear. I will keep investigating.
Thanks for the help again.

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Potential bug in ROMS perfect restart

#15 Post by kate » Wed Aug 05, 2015 5:11 pm

You don't change DSTART on restart.

If I want all my history records to be in one file, I do set LDEFOUT=F on restart.

It sounds like PERFECT_RESTART is broken for GLS. I would start by fixing that. Does it differ first at the surface or bottom or just everywhere?

nbruneau
Posts: 4
Joined: Thu Jul 23, 2015 6:34 pm
Location: Imperial College London

Re: Potential bug in ROMS perfect restart

#16 Post by nbruneau » Wed Aug 05, 2015 8:53 pm

I'm a bit puzzled to get the same salinity/temperature/velocities (for the no source/sink activated) but different AK*...

nbruneau
Posts: 4
Joined: Thu Jul 23, 2015 6:34 pm
Location: Imperial College London

Re: Potential bug in ROMS perfect restart

#17 Post by nbruneau » Thu Aug 06, 2015 5:03 pm

A few more details :

- Using LMD_MIXING (with LMD_CONVEC, DDMIX, NONLOCAL, RIMIX, SKPP) seems perfect (according to my tests) with and without river forcing

- Using MY25_MIXING is perfect without rivers but small differences with. I tried to activate/ desactivate KANTHA_CLAYSON and N2S2_HORAVG flags: same kind of results. With rivers, differences between 10^-6 to 10^-9 so I guess it's reasonable.

- Using GLS_MIXING, I got the same as for MY25_MIXING but differences are more 10^-2 which starts to be not negligible

I tried to turn of the Qsrc in the code as well as other terms linked to the sink/source terms but didn't find a clear pattern. It seems removing in Nonlinear/step3d_uv.F the "Apply momentum transport point sources" part solves the issue for MY25 but not for GLS. It would mean than the z_w variable is not exactly the same... but doesn't make sense as it works without river. Not sure my issue comes from here as I'm not too familiar with all the code of ROMS.

Post Reply