IMPORTANT: Parallel Global Reduction and Volume Conservation

Archive of important messages sent via the ROMS mailing list

Moderators: arango, robertson

Post Reply
User avatar
Site Admin
Posts: 1082
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University

IMPORTANT: Parallel Global Reduction and Volume Conservation

#1 Post by arango » Thu Mar 05, 2015 1:28 am

WARNING: You cannot replay or post comments in this forum thread. If you have any comments, use the following :arrow: post instead.

In ROMS, we have the option to impose a volume conservation in applications with open boundaries if tidal forcing is not anabled:

Code: Select all

! Set lateral open boundary edge volume conservation switch for
! nonlinear model and adjoint-based algorithms. Usually activated
! with radiation boundary conditions to enforce global mass
! conservation, except if tidal forcing is enabled. [1:Ngrids].

   VolCons(west)  ==  T                            ! western  boundary
   VolCons(east)  ==  T                            ! eastern  boundary
   VolCons(south) ==  T                            ! southern boundary
   VolCons(north) ==  F                            ! northern boundary

ad_VolCons(west)  ==  T                            ! western  boundary
ad_VolCons(east)  ==  T                            ! eastern  boundary
ad_VolCons(south) ==  T                            ! southern boundary
ad_VolCons(north) ==  F                            ! northern boundary
The routines in obc_volcons.F are used to impose volume conservation. The flux integral along the open boundary (bc_flux) and its area (bc_flux) are computed in routine obc_flux_tile. Both values are used to compute the barotropic velocity correction (normal component to open boundary), ubar_xs = bc_flux / bc_area, so the volume integral along the open boundary is conserved. They are scalar values since it represents the accumulative sum along the open boundary.

The computation of bc_flux and bc_area are simple but they are subject to round-off in parallel and serial applications with tile partitions. Both values are usually large numbers, specially in big domains and deep bathymetry. This makes the round-off problem worse.

In parallel applications, we need to have a special code to carry out the global reduction operation (summation):

Code: Select all

      IF (ANY(VolCons(:,ng))) THEN
        NSUB=1                           ! distributed-memory
        IF (DOMAIN(ng)%SouthWest_Corner(tile).and.                      &
     &      DOMAIN(ng)%NorthEast_Corner(tile)) THEN
          NSUB=1                         ! non-tiled application
          NSUB=NtileX(ng)*NtileE(ng)     ! tiled application
        END IF
        IF (tile_count.eq.0) THEN
        END IF
        IF (tile_count.eq.NSUB) THEN
          CALL mp_reduce (ng, iNLM, 2, buffer, op_handle)
        END IF
      END IF
Well, here is where the round-off problem comes from and we get different solutions for different parallel partitions. Therefore, we have a replicability problem. In general, collective parallel floating-point operations can produce different results on different tile (processor) configurations. That is, for example, configuring your application on 2x5 tiles accumulate round-off errors differently than on 4x8.

Notice that in any computer, (A + B + C + D) will not give the same results as (A + D + C + B). Therefore, when we do floating-point parallel reduce operations the order of the reduction between chunks of numbers matters. This order is not deterministic in routine like mpi_allreduce, which is called by the ROMS routine mp_reduce. As matter of fact, there are three different ways in mp_reduce to compute this reduction operation using either mpi_allreduce, mpi_allgather, and mpi_isend/mpi_irecv. All three methods are subject to round-off problems.

What To Do:
  • Currently, there is not much that we can do. Sasha suggested to carry the summation-by-pairs to reduce the round-off error. This summation adds only comparable numbers from the chunk of parallel data.
  • If you are using frequent data from larger scale models, you can suppress volume conservation to see what happens. Specially, you need frequent free-surface (zeta) data. You may get into trouble if running for long (years) periods.
  • Impose tidal forcing at the open boundary since we need to avoid volume conservation.
  • Notice that we not longer can compare NetCDF files byte by byte to check for parallel bugs when volume conservation is activated:

    Code: Select all

    % diff r1/ r2/
    Binary files r1/ and r2/ differ
    % diff r1/ r2/
    Binary files r1/ and r2/ differ
    % diff r1/ r2/
    Binary files r1/ and r2/ differ
    To compute binary differences, you need to activate CPP options: DEBUGGING, OUT_DOUBLE, and POSITIVE_ZERO.

Post Reply