Ocean Modeling Discussion

ROMS/TOMS

Search for:
It is currently Fri Sep 22, 2017 4:34 am




Post new topic Reply to topic  [ 15 posts ] 

All times are UTC

Author Message
PostPosted: Tue Jan 24, 2012 5:09 am 
Offline
User avatar

Joined: Tue Jul 01, 2003 4:12 am
Posts: 476
Location: NIWA
I have a simulation with the non-linear, forward model driven by the M2 tide only. Accordingly I have made the simulation length a multiple of the M2 period, which is 44714.1 s. The simulation ends by writing a record to the restart file. Parameters are:
Code:
      NTIMES =  23040
          DT == 7.7628819D0
     NDTFAST == 1
        NRST == 23040

This worked OK a year ago, but with the current version of ROMS the simulation ends at 23039 time steps and therefore fails to write a restart record. I strongly suspect that the problem is with the code controlling the time step loop in maind2d.F (it's the same in main3d.F):
Code:
      my_StepTime=0.0_r8
      STEP_LOOP : DO WHILE (my_StepTime.le.RunInterval)
        my_StepTime=my_StepTime+MAXVAL(dt)
        ...
      END DO STEP_LOOP

Combining tests for exact equality with floating-point arithmetic is generally a bad idea. How about a more tolerant test, like the following?
Code:
      my_StepTime=0.0_r8
      STEP_LOOP : DO WHILE (my_StepTime.le.RunInterval+0.5_r8*MAXVAL(dt))
        my_StepTime=my_StepTime+MAXVAL(dt)
        ...
      END DO STEP_LOOP

It fixed the problem in my case.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 25, 2012 9:27 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1011
Location: IMCS, Rutgers University
Indeed, the misery of floating-point round-off in computers... I made the suggested change to the stepping loop so it is more robust. See following trac :arrow: ticket.

Thank you for bringing this to my attention...


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 26, 2012 12:26 am 
Offline

Joined: Fri Nov 14, 2003 4:57 pm
Posts: 170
Location: UCLA, USA
Hernan and Mark,

Are you serious about your idea of counting REAL-type numbers, like it is proposed in the latest fix,
Code:
      my_StepTime=0.0_r8
      MaxDT=MAXVAL(dt)

      STEP_LOOP : DO WHILE (my_StepTime.le.(RunInterval+0.5_r8*MaxDT))

        my_StepTime=my_StepTime+MaxDT

        ...

      END DO STEP_LOOP

?

It is a bad idea in any case because you are accumulating roundoff errors no matter how you fix it.

...Interestingly enough, recently a person in our group run into trouble by doing something like that
in matlab, basically because he kept adding a small "dt" to a large number.

In addition, recently I myself had a lot of fun investigating an MPI deadlock, which turned out
to be caused by logical error due to roundoff of REAL-type number -- just comparing using .gt.
and .lt. operations -- I absolutely never use .eq. or .ge. or .le. on computed real numbers, but
still... a small roundoff error flipped logical switch differently on different MPI nodes resulting
in loss of synchronization, MPI deadlock, and a lot of fun to find the cause.

Counters should always be of an integer type.

The above code should be rewritten as
Code:
      StartTime=0.0_r8   ! <-- not necessarily 0.0_r8, may be just time from the initial condition
      MaxDT=MAXVAL(dt)
      ntimes=(RunInterval+0.5*MaxDT)/MaxDT     !<-- 0.5*MaxDT is added to make roundoff upward.
      MaxDT=RunInterval/dble(ntimes)           !<-- this adjusts "dt" a little bit to make sure
                                               !    that at the end of time stepping my_StepTime
      STEP_LOOP : DO itime=1,ntimes            !    will be exactly  StartTime+RunInterval
        my_StepTime=StartTime + dble(itime)*MaxDT

        ...
      END DO STEP_LOOP


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 26, 2012 7:32 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1011
Location: IMCS, Rutgers University
Thank you for the comment. I will be revisiting this soon when I add the nesting calls to main2d and main3d. I will take your suggestion into account. You are right about the issue of dealing with large numbers :!:

Yes, the round-off in Matlab is very delicate. I struggled with this about a month ago. Just try this simple statement and you will be surprised about the result:
Code:
>> format long eng
>> double(single(1.0d+12))

ans = 999.999995904000e+009

>> eps(1.0d+12)

ans = 122.070312500000e-006

In the script to read NetCDF data that we distribute, matlab/utility/nc_read.m, I have to put the following logic to process _FillValue and missing_value attributes:
Code:
if (got.FillValue || got.missing_value),
  if (iscellstr(f) || ischar(f)),
    ind=find(f == spval);
    f(ind)=spval;
  else
    ind=find(abs(f-spval) < 4*eps(double(spval)));
  end
end

Notice that to find the special values correctly, I need to have abs(f-spval) < 4*eps(double(spval)) to get the correct solution that accounts for Matlab round-off. They have made a lot of decisions to accelerate computations that compromise precision. There is nothing like IEEE floating-point arithmetic in Matlab. Here f is the data read from the NetCDF file and spval is the attribute value in its native file precision.

In Fortran, we need a function with similar capabilities as eps to do this comparisons according to the computer round-off.

I was completely surprised that all the Matlab interfaces out there and scripts to read NetCDF data are not taken into account the round-off inherent to Matlab. This gives surprising results sometimes. See the following :arrow: ticket for details about processing NetCDF data.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 26, 2012 7:57 pm 
Offline
User avatar

Joined: Tue Jul 01, 2003 4:12 am
Posts: 476
Location: NIWA
shchepet wrote:
Counters should always be of an integer type.


Quite right. The quick fix should do for now. If it doesn't (I have some runs with a much larger number of time steps) I'll send Hernan some code for all the different drivers.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 27, 2012 8:51 pm 
Offline

Joined: Fri Nov 14, 2003 4:57 pm
Posts: 170
Location: UCLA, USA
As of right now, ROMS expects to read the 3D-mode time step "dt" and mode-splitting ratio
"ndtfast" from the input script.

A potentially attractive alternative would be to specify "dt" and fast-time step "dtfast", and let
the model figure out the mode-splitting ratio within the code (along with a slight adjustment to
"dtfast", so ndtfast*dtfast=dt made by the model via a simple algorithm described above).

This way it makes it a bit simple to handle: users often change "dt" experimentally because it
is hard to predict a priori for a given configuration (its limit depends on stratification, topography,
tides, if any). In contrast, "dtfast" is very predictable -- basically it depends on the maximum
depth and grid resolution. When user changes "dt", he should also change "ndtfast" to keep
"dtfast" about the same. Specifying it in the input script would be a bit more straightforward.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 29, 2012 8:34 pm 
Offline
User avatar

Joined: Tue Jul 01, 2003 4:12 am
Posts: 476
Location: NIWA
shchepet wrote:
A potentially attractive alternative would be to specify "dt" and fast-time step "dtfast", and let the model figure out the mode-splitting ratio within the code (along with a slight adjustment to "dtfast", so ndtfast*dtfast=dt made by the model via a simple algorithm described above).


+1 to that.

I was explaining to a new user the other day that dtfast is limited by the barotropic CFL criterion, but you don't specify that, you specify dt and ndtfast. We both thought that was a bit odd.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 30, 2012 10:42 am 
Offline

Joined: Tue Oct 12, 2010 9:12 pm
Posts: 1
Location: ACRI-ST
Very interesting topic !

shchepet wrote:
Code:
      StartTime=0.0_r8   ! <-- not necessarily 0.0_r8, may be just time from the initial condition
      MaxDT=MAXVAL(dt)
      ntimes=(RunInterval+0.5*MaxDT)/MaxDT     !<-- 0.5*MaxDT is added to make roundoff upward.
      MaxDT=RunInterval/dble(ntimes)           !<-- this adjusts "dt" a little bit to make sure
                                               !    that at the end of time stepping my_StepTime
      STEP_LOOP : DO itime=1,ntimes            !    will be exactly  StartTime+RunInterval
        my_StepTime=StartTime + dble(itime)*MaxDT

        ...
      END DO STEP_LOOP

I was just wondering how you would generalize your formulation if you don't know a priori the number of time steps, for instance in the case of a varying time step.
Eventually, it could be a subject of interest for automatic time step monitoring during simulation in future ROMS release.
Thank you for your insight !
Best Regards


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 30, 2012 4:32 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1011
Location: IMCS, Rutgers University
Well, the rational for dt and ndtfast was to force the user to think and have the baroclinic time-steps (dt) divisible by the number of barotropic time steps (dtfast). That is, dt = ndtfast * dtfast. We wanted to have a full number. This has been in ROMS since the beginning.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 06, 2012 6:45 pm 
Offline
User avatar

Joined: Wed Jul 02, 2003 5:29 pm
Posts: 3219
Location: IMS/UAF, USA
I've been wondering about the new main loop code since you brought it to our attention. It seems that if different patches don't have the same timestep, they will all run for the number of steps needed by the biggest timestep. So I whipped up an upwelling input file for two domains. I had to make this change to get it to go:
Code:
diff --git a/ROMS/Utility/inp_par.F b/ROMS/Utility/inp_par.F
index e4c3f92..50221da 100644
--- a/ROMS/Utility/inp_par.F
+++ b/ROMS/Utility/inp_par.F
@@ -1777,7 +1777,7 @@
 !  symbol in the input "line" is used to advance the counters.
 !
       IF (.not.load) THEN
-        IF (ifile.le.nFfiles(igrid)) THEN
+        IF (ifile.lt.nFfiles(igrid)) THEN
           ifile=ifile+MIN(1,Icont)
         ELSE
           ifile=1

Indeed, it ran both domains for the smaller number of timesteps then quit:
Code:
   1439     4 23:55:00  2.181924E-02  6.585713E+02  6.585931E+02  3.884376E+11
            (01,01,11)  1.751682E-01  2.616835E-03  2.911273E-02  6.295591E-01
   1439     3 07:56:40  1.230684E-02  6.585685E+02  6.585808E+02  1.894817E+11
            (01,01,08)  8.060289E-02  4.904456E-03  9.185044E-02  5.322489E-01
   1440     5 00:00:00  2.184275E-02  6.585713E+02  6.585931E+02  3.884376E+11
            (01,01,11)  1.752055E-01  2.609796E-03  2.904356E-02  6.296896E-01
   1440     3 08:00:00  1.232267E-02  6.585685E+02  6.585808E+02  1.894817E+11
            (01,01,08)  8.062519E-02  4.899209E-03  9.192351E-02  5.324433E-01
      WRT_HIS   - wrote history  fields (Index=1,1) into time record = 0000021
      WRT_AVG   - wrote averaged fields into time record =             0000020
      WRT_DIAGS - wrote diagnostics fields into time record =          0000020
      WRT_RST   - wrote re-start fields (Index=1,1) into time record = 0000001

 Elapsed CPU time (seconds):

 Thread #  0 CPU:     587.431

I had asked for:
Code:
      NTIMES == 1440 2160
          DT == 300.0d0 200.0d0


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 06, 2012 7:55 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1011
Location: IMCS, Rutgers University
I already changed this logic. It will be available in the next ROMS release. I am almost done coding nesting Phase III. This stuff is quite complex. Once that I am done coding, the testing will start. The plan is to release this as ROMS 3.7 soon, so everybody may start planning for what it is coming in your way. I am very happy with the design. The full tested version will be released as ROMS 4.0. I really hope that you guys help me testing these capabilities.

If you are interested, I gave a talk at NOAA last week about this. It is quite technical and it is going to require some training. However, you can now check out the full capabilities of ROMS nesting and start thinking how it can be used in your applications (old and new). The nesting presentation can be :arrow: downloaded from ROMS website. Some parts have ActiveX controls that only work on Windows. You may deactivate them, but you will not be able to roll the attached code windows.

Now, main3d.F looks like:

Code:
      ...

      NEST_LAYER : DO nl=1,NestLayers
!
!  In composite and mosaic grids, it is assumed that the donor and
!  receiver grids have the same time-step size. This is done to avoid
!  the time interpolation between donor and receiver grids. Only
!  spatial interpolations are possible in the current nesting design.
!
!  In grid refinement, it is assumed that the donor and receiver grids
!  are an interger factor of the grid size and time-step size.
!
!  Determine number of time steps to compute in each nested grid layer
!  based on the specified time interval (seconds), RunInterval. Notice
!  that RunInterval is set in the calling driver. Its value may span
!  the full period of the simulation, or a multi-model coupling
!  interval, or just a single step for the main grid.
!
        Nsteps=0
        DO ig=1,GridsInLayer(nl)
          ng=GridNumber(ig,nl)
          Nsteps=MAX(Nsteps, INT((RunInterval+0.5_r8*dt(ng))/dt(ng))+1)
        END DO
!
!  Time-step governing equations for Nsteps.
!
        STEP_LOOP : DO istep=1,Nsteps
!
!  Set time indices and time clock.
!
          DO ig=1,GridsInLayer(nl)
            ng=GridNumber(ig,nl)
            iic(ng)=iic(ng)+1
            nstp(ng)=1+MOD(iic(ng)-ntstart(ng),2)
            nnew(ng)=3-nstp(ng)
            nrhs(ng)=nstp(ng)
            time(ng)=time(ng)+dt(ng)
            tdays(ng)=time(ng)*sec2day
            CALL time_string (time(ng), time_code(ng))
          END DO
          ...

Notice that we now have an integer operation which is not subject to floating-point round off. I don't understand the point that Kate is bringing out. Are you running in a nested grid set-up? How? There are still missing parts of code in the version that you are using to do this correctly.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 06, 2012 8:08 pm 
Offline
User avatar

Joined: Wed Jul 02, 2003 5:29 pm
Posts: 3219
Location: IMS/UAF, USA
I'm not running nested, just two independent copies of UPWELLING - but using different timesteps.

The patch I included is for the forcing files to transition from one grid to the next in read_phypar.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 06, 2012 8:28 pm 
Offline

Joined: Fri Nov 14, 2003 4:57 pm
Posts: 170
Location: UCLA, USA
Hernan,

by adding a small increment dt to time as follows from your code,
Code:
...
time(ng)=time(ng)+dt(ng)
...

(see close to the end) you are back to the original problem: you accumulate roundoff errors in counting time.


What you should do instead is to change the above into something of the sort of
Code:
...
time(ng)=time_init(ng) + dt(ng)*(iic(ng)-ntstart(ng))
...

where time_init(ng) is the time corresponding to the initial condition.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 06, 2012 10:03 pm 
Offline
Site Admin
User avatar

Joined: Wed Feb 26, 2003 4:41 pm
Posts: 1011
Location: IMCS, Rutgers University
OK, Sasha. I see what you mean. Let me see what I can do. I have so much testing on my way. I am stuck now on how to do the MPI communications efficiently in the fine2coarse coupling (two-way). It is trivial in serial and shared-memory computations because we have access to the finer grid global array. This gets complicated in higher refinement ratios 1:5 and 1:7 because the averaging of data in the fine2coarse step may be in different tiles. The 1:3 is trivial. Right now, I am only allowing refinement ratios of 1:3, 1:5, 1:7. I don't think that it is advisable to use higher rations.

I am trying an elegant way to do this. It is kind of a tricky MPI reduction operation. I welcome any suggestions, if you have any. Anything that I tried so far involves collecting and communicating the global arrays for each state variable that need to be averaged at the interior of the finer grid and send back to the coarse grid. Not trivial at all. This is the last thing that I need to figure out...


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 08, 2012 7:58 pm 
Offline

Joined: Wed Dec 31, 2003 6:16 pm
Posts: 664
Location: USGS, USA
Just to bring everyone up to speed - last year I have developed a fully functional 2-way mass conserving grid refinement method for ROMS as part of my COAWST modeling system. This is available to anyone who asks, and I have made this available to Rutgers. I had also struggled with many of these same issues, such as the float problem, and identified a method that uses integer counts to keep track of the grid stepping. For the exchanges that are listed as "This is the last thing that I need to figure out..." What I coded is for the parent master to do an mp_gather and mpi_bcast of the child data to all parent tiles. The parent tiles have a flag that identifies if it contains any child ranges. If so, then it does an average of the child data and replaces the parent values.

For now, i am comfortable with this. I am looking into making this more efficient potentially with my knowledge of the mct sparse matrix multiplication tools or some sort. Not sure. I think there are other issues that are more important.

The timing of these calls in main3d loop is critical. It will violate the constancy preserving aspects and mass conservation if not done correctly. My time stepping computes the mass flux from the parent, adjusts the 2D barotropic fields at the grid connectivity regions to ensure 100% mass (volume exchange) from parent to child, and performs a 3D vertically distributed tracer correction to the parent to ensure tracer conservation properties.

When you reach the next hurdle, be sure to ask. I already have this working.

-john


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group