WARNING: You cannot replay or post comments in this forum thread. If you have any comments, use the following

post instead.

In ROMS, we have the option to impose a volume conservation in applications with open boundaries if tidal forcing is not anabled:

**Code:**

! Set lateral open boundary edge volume conservation switch for

! nonlinear model and adjoint-based algorithms. Usually activated

! with radiation boundary conditions to enforce global mass

! conservation, except if tidal forcing is enabled. [1:Ngrids].

VolCons(west) == T ! western boundary

VolCons(east) == T ! eastern boundary

VolCons(south) == T ! southern boundary

VolCons(north) == F ! northern boundary

ad_VolCons(west) == T ! western boundary

ad_VolCons(east) == T ! eastern boundary

ad_VolCons(south) == T ! southern boundary

ad_VolCons(north) == F ! northern boundary

The routines in

obc_volcons.F are used to impose volume conservation. The flux integral along the open boundary (

bc_flux) and its area (

bc_flux) are computed in routine

obc_flux_tile. Both values are used to compute the barotropic velocity correction (normal component to open boundary),

ubar_xs = bc_flux / bc_area, so the volume integral along the open boundary is conserved. They are scalar values since it represents the accumulative sum along the open boundary.

The computation of

bc_flux and

bc_area are simple but they are subject to round-off in parallel and serial applications with tile partitions. Both values are usually large numbers, specially in big domains and deep bathymetry. This makes the round-off problem worse.

In parallel applications, we need to have a special code to carry out the global reduction operation (summation):

**Code:**

IF (ANY(VolCons(:,ng))) THEN

#ifdef DISTRIBUTE

NSUB=1 ! distributed-memory

#else

IF (DOMAIN(ng)%SouthWest_Corner(tile).and. &

& DOMAIN(ng)%NorthEast_Corner(tile)) THEN

NSUB=1 ! non-tiled application

ELSE

NSUB=NtileX(ng)*NtileE(ng) ! tiled application

END IF

#endif

!$OMP CRITICAL (OBC_VOLUME)

IF (tile_count.eq.0) THEN

bc_flux=0.0_r8

bc_area=0.0_r8

END IF

bc_area=bc_area+my_area

bc_flux=bc_flux+my_flux

tile_count=tile_count+1

IF (tile_count.eq.NSUB) THEN

tile_count=0

#ifdef DISTRIBUTE

buffer(1)=bc_area

buffer(2)=bc_flux

op_handle(1)='SUM'

op_handle(2)='SUM'

CALL mp_reduce (ng, iNLM, 2, buffer, op_handle)

bc_area=buffer(1)

bc_flux=buffer(2)

#endif

ubar_xs=bc_flux/bc_area

END IF

!$OMP END CRITICAL (OBC_VOLUME)

END IF

Well, here is where the round-off problem comes from and we get different solutions for different parallel partitions. Therefore, we have a replicability problem. In general, collective parallel floating-point operations can produce different results on different tile (processor) configurations. That is, for example, configuring your application on

2x5 tiles accumulate round-off errors differently than on

4x8.

Notice that in any computer,

(A + B + C + D) will not give the same results as

(A + D + C + B). Therefore, when we do floating-point parallel reduce operations the order of the reduction between chunks of numbers matters. This order is not deterministic in routine like

mpi_allreduce, which is called by the ROMS routine

mp_reduce. As matter of fact, there are three different ways in

mp_reduce to compute this reduction operation using either

mpi_allreduce,

mpi_allgather, and

mpi_isend/

mpi_irecv. All three methods are subject to round-off problems.

What To Do: