Possible bug in get_2dfldr.F and get_3dfldr.F

Message

choboter · #1 Unread post by **choboter** » Wed Jun 24, 2009 10:24 pm

This bug showed up while running the DOUBLE_GYRE test case on Linux with g95, and only when one of the compiler optimizations -O1, -O2, or -O3 were used. When -g option was on, and no optimizations, ROMS ran perfectly (albeit slowly). It's debatable whether this should be called a ROMS bug or a g95 bug, but there's a quick fix to make ROMS more robust when using a temperamental g95 compiler.

Symptoms
While running the double_gyre case, the nonlinear and tangent linear parts ran fine. When the adjoint part of the code started, ROMS printed a dozen or so errors of the form:

Code: Select all

 SET_2DFLDR - current model time exceeds ending value for variable: zeta
              TDAYS     =          1.0000
              Data Tmin =          0.0000  Data Tmax =          1.0000
              Data Tstr =          1.0000  Data Tend =          1.0000
              TINTRP1   =          1.0000  TINTRP2   =          1.0000
              FAC1      =          0.0000  FAC2      =          0.0000

and similar errors for SET_3DFLDR, and then stopped the job, citing Input error, and exit_flag: 2.

The problem
In get_2dfldr.F and get_3dfldr.F, there is code that looks like

Code: Select all

IF (Tval.lt.Tend) THEN
   Trec=Trec+1
END IF

In older revisions of ROMS, it looked slightly different: "IF (Tval(1).lt.Tend)..." (Full disclosure: I'm using version 240, where this line has the form "Tval(1)".) This is in the part of the code that creates a local, monotonically decreasing time variable so the interpolation between snapshots is trivial when cycling forcing fields. At the moment things broke down, Tval and Tend were each equal to 1.0. The problem is that Tval and Tend are floating-point reals, and weren't equal enough (when compiler optimizations were on). That is, Tval and Tend did not agree all the way to machine precision, with Tval less than Tend by some epsilon. The condition was interpreted as true and Trec was incremented, when it should not have been.

One possible fix
I've edited each offending IF statement to read

Code: Select all

IF (Tval.lt.(Tend-eps)) THEN

where eps was added as a parameter at the beginning of the subroutine. Defining eps = 1.0E-10_r8 seemed to make everything work fine.

In ROMS/Utility/get_*fld*.F (6 files), there are similar IF statements, where real variables are compared. It might be worth rephrasing all of them in a more robust form, to prevent picky compilers from causing problems.

Thanks,
Paul

arango · #2 Unread post by **arango** » Thu Jun 25, 2009 1:18 am

This does not make sense to me. It seems like a compiler bug. Anyway, why are you changing get_2dflr.F and get_3dflr.F? These routines are extremely complicated and only used in the adjoint model. The time logic is backwards

It took me years to get this logic correct. They have always work in the double gyre adjoint applications.

Since you are using adjoint-based applications, I highly recommend to you use the lastest version of the code. In addition, g95 compiler is not further developed. Use gfortran instead.

xinyou_lin · Thu Feb 25, 2010 11:30 am

In adjoint-based application, I have made some correction of get_ngfldr.F, get_2dfldr.F, get_3dfldr.F in version 424.
Then it works well.
The correction is listed as follows:

IF (tdays(ng).lt.Tmax) THEN
Tmono=Tend
ELSE
! Tmono=Tend+(tdays(ng)-MOD(tdays(ng),Clength)) !original code
Tmono=Tend+Clength+(tdays(ng)-MOD(tdays(ng),Clength)) !revised by xinyou Lin
END IF

m.hadfield · Fri Feb 26, 2010 1:19 am

I disagree, Hernan. Testing for equality between floating-point numbers is always dangerous. That's why using floating point numbers for DO loop counters is unwise (or illegal) and for the same reason you can't assume (for example) that if you add 1.0 to a floating number 10000 times, you will get the same result as adding 10000.0. There are places where ROMS effectively assumes that you will and this is dangerous. xinyou_lin has found one. Sometimes compilers let you get away with dangerous stuff some of the time, but sometimes they don't. Best modify the code to be more robust.

Ocean Modeling Discussion

Possible bug in get_2dfldr.F and get_3dfldr.F

Possible bug in get_2dfldr.F and get_3dfldr.F

Re: Possible bug in get_2dfldr.F and get_3dfldr.F

Re: Possible bug in get_2dfldr.F and get_3dfldr.F

Re: Possible bug in get_2dfldr.F and get_3dfldr.F