Opened 8 years ago

Closed 8 years ago

#512 closed bug (Fixed)

Corrected shared-memory bug in step3d_uv.F

Reported by: arango Owned by: arango
Priority: major Milestone: Release ROMS/TOMS 3.5
Component: Nonlinear Version: 3.5
Keywords: Cc:

Description

I have been looking for a shared-memory parallel bug, on and off for several months, in North-South periodic (NS_PERIODIC) applications. This bug appeared when TS_MPDATA is activated. I have re-written and analyzed mpdata_adiff.F several times but I still get the same problem. I don't get identical solutions with different partitions. I came to the conclusion that the problem was somewhere else. This has been a very difficult bug to track in the debugger.

Currently, I am removing all the CPP options associated with lateral boundary conditions which allow me to look at the code in great detail. I found the problem in step3d_uv.F. We need to have the following conditionals around line 1122:

# if !defined EW_PERIODIC && !defined COMPOSED_GRID
        IF (DOMAIN(ng)%Western_Edge(tile)) THEN
          ...
        END IF
        IF (DOMAIN(ng)%Eastern_Edge(tile)) THEN
          ...
        END IF
# endif
# if !defined NS_PERIODIC && !defined COMPOSED_GRID
        IF (DOMAIN(ng)%Southern_Edge(tile)) THEN
          ...
        END IF
        IF (DOMAIN(ng)%Northern_Edge(tile)) THEN
          ...
        END IF
# endif

instead of

# if !defined EW_PERIODIC && !defined COMPOSED_GRID
        IF (Istr.eq.1)THEN
          ...
        END IF
        IF (Iend.eq.Lm(ng)) THEN
          ...
        END IF
# endif
# if !defined NS_PERIODIC && !defined COMPOSED_GRID
        IF (j.eq.0) THEN
          ...
        END IF
        IF (j.eq.Mm(ng)+1) THEN
          ...
        END IF
# endif

when replacing the incorrect vertical mean with more accurate barotropic component at only the boundary points. I introduced this bug long time ago when removing redundant operations that are illegal in the adjoint algorithms. This bug only affected NS_PERIODIC applications. Recall that I- and J-ranges are different in periodic applications. Well, shared-memory code is very delicate.

After the above fix, I can get identical solutions with TS_MPDATA in shared-memory applications. However, there is still a bug in serial with partitions when both NS_PERIODIC and TS_MPDATA are activated together. In the past, I thought that both problems were related... Well, I will continue looking. This type of bugs takes a lot of time to fix. Sometimes we are lucky and find them pretty fast. I think that this one is related to the 3 ghost-points required for TS_MPDATA or more probably an illegal call inside a parallel region. This will be even nastier because implies splitting the call to mpdata_adiff.F to a different parallel region. A nasty proposition. This is the kind of solutions that were required in the past for serial with partitions bugs...

I also corrected a small bug in ana_smflux.h when the LMD_TEST is activated. Many thanks to Chris Edwards for reporting this bug.

Change History (1)

comment:1 Changed 8 years ago by arango

  • Resolution set to Fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.