Possible bug in get_2dfldr.F and get_3dfldr.F

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
choboter
Posts: 4
Joined: Wed Dec 22, 2004 5:50 pm
Location: California Polytechnic State University

Possible bug in get_2dfldr.F and get_3dfldr.F

#1 Unread post by choboter »

This bug showed up while running the DOUBLE_GYRE test case on Linux with g95, and only when one of the compiler optimizations -O1, -O2, or -O3 were used. When -g option was on, and no optimizations, ROMS ran perfectly (albeit slowly). It's debatable whether this should be called a ROMS bug or a g95 bug, but there's a quick fix to make ROMS more robust when using a temperamental g95 compiler.

Symptoms
While running the double_gyre case, the nonlinear and tangent linear parts ran fine. When the adjoint part of the code started, ROMS printed a dozen or so errors of the form:

Code: Select all

 SET_2DFLDR - current model time exceeds ending value for variable: zeta
              TDAYS     =          1.0000
              Data Tmin =          0.0000  Data Tmax =          1.0000
              Data Tstr =          1.0000  Data Tend =          1.0000
              TINTRP1   =          1.0000  TINTRP2   =          1.0000
              FAC1      =          0.0000  FAC2      =          0.0000
and similar errors for SET_3DFLDR, and then stopped the job, citing Input error, and exit_flag: 2.

The problem
In get_2dfldr.F and get_3dfldr.F, there is code that looks like

Code: Select all

IF (Tval.lt.Tend) THEN
   Trec=Trec+1
END IF
In older revisions of ROMS, it looked slightly different: "IF (Tval(1).lt.Tend)..." (Full disclosure: I'm using version 240, where this line has the form "Tval(1)".) This is in the part of the code that creates a local, monotonically decreasing time variable so the interpolation between snapshots is trivial when cycling forcing fields. At the moment things broke down, Tval and Tend were each equal to 1.0. The problem is that Tval and Tend are floating-point reals, and weren't equal enough (when compiler optimizations were on). That is, Tval and Tend did not agree all the way to machine precision, with Tval less than Tend by some epsilon. The condition was interpreted as true and Trec was incremented, when it should not have been.

One possible fix
I've edited each offending IF statement to read

Code: Select all

IF (Tval.lt.(Tend-eps)) THEN
where eps was added as a parameter at the beginning of the subroutine. Defining eps = 1.0E-10_r8 seemed to make everything work fine.

In ROMS/Utility/get_*fld*.F (6 files), there are similar IF statements, where real variables are compared. It might be worth rephrasing all of them in a more robust form, to prevent picky compilers from causing problems.

Thanks,
Paul

User avatar
arango
Site Admin
Posts: 1350
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: Possible bug in get_2dfldr.F and get_3dfldr.F

#2 Unread post by arango »

This does not make sense to me. It seems like a compiler bug. Anyway, why are you changing get_2dflr.F and get_3dflr.F? These routines are extremely complicated and only used in the adjoint model. The time logic is backwards :!: It took me years to get this logic correct. They have always work in the double gyre adjoint applications.

Since you are using adjoint-based applications, I highly recommend to you use the lastest version of the code. In addition, g95 compiler is not further developed. Use gfortran instead.

xinyou_lin
Posts: 2
Joined: Tue Sep 22, 2009 3:54 pm
Location: Xiamen University

Re: Possible bug in get_2dfldr.F and get_3dfldr.F

#3 Unread post by xinyou_lin »

In adjoint-based application, I have made some correction of get_ngfldr.F, get_2dfldr.F, get_3dfldr.F in version 424.
Then it works well.
The correction is listed as follows:

IF (tdays(ng).lt.Tmax) THEN
Tmono=Tend
ELSE
! Tmono=Tend+(tdays(ng)-MOD(tdays(ng),Clength)) !original code
Tmono=Tend+Clength+(tdays(ng)-MOD(tdays(ng),Clength)) !revised by xinyou Lin
END IF

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Possible bug in get_2dfldr.F and get_3dfldr.F

#4 Unread post by m.hadfield »

I disagree, Hernan. Testing for equality between floating-point numbers is always dangerous. That's why using floating point numbers for DO loop counters is unwise (or illegal) and for the same reason you can't assume (for example) that if you add 1.0 to a floating number 10000 times, you will get the same result as adding 10000.0. There are places where ROMS effectively assumes that you will and this is dangerous. xinyou_lin has found one. Sometimes compilers let you get away with dangerous stuff some of the time, but sometimes they don't. Best modify the code to be more robust.

Post Reply