In ROMS, we have the option to impose a volume conservation in applications with open boundaries if tidal forcing is not anabled:
Code: Select all
! Set lateral open boundary edge volume conservation switch for ! nonlinear model and adjoint-based algorithms. Usually activated ! with radiation boundary conditions to enforce global mass ! conservation, except if tidal forcing is enabled. [1:Ngrids]. VolCons(west) == T ! western boundary VolCons(east) == T ! eastern boundary VolCons(south) == T ! southern boundary VolCons(north) == F ! northern boundary ad_VolCons(west) == T ! western boundary ad_VolCons(east) == T ! eastern boundary ad_VolCons(south) == T ! southern boundary ad_VolCons(north) == F ! northern boundary
The computation of bc_flux and bc_area are simple but they are subject to round-off in parallel and serial applications with tile partitions. Both values are usually large numbers, specially in big domains and deep bathymetry. This makes the round-off problem worse.
In parallel applications, we need to have a special code to carry out the global reduction operation (summation):
Code: Select all
IF (ANY(VolCons(:,ng))) THEN #ifdef DISTRIBUTE NSUB=1 ! distributed-memory #else IF (DOMAIN(ng)%SouthWest_Corner(tile).and. & & DOMAIN(ng)%NorthEast_Corner(tile)) THEN NSUB=1 ! non-tiled application ELSE NSUB=NtileX(ng)*NtileE(ng) ! tiled application END IF #endif !$OMP CRITICAL (OBC_VOLUME) IF (tile_count.eq.0) THEN bc_flux=0.0_r8 bc_area=0.0_r8 END IF bc_area=bc_area+my_area bc_flux=bc_flux+my_flux tile_count=tile_count+1 IF (tile_count.eq.NSUB) THEN tile_count=0 #ifdef DISTRIBUTE buffer(1)=bc_area buffer(2)=bc_flux op_handle(1)='SUM' op_handle(2)='SUM' CALL mp_reduce (ng, iNLM, 2, buffer, op_handle) bc_area=buffer(1) bc_flux=buffer(2) #endif ubar_xs=bc_flux/bc_area END IF !$OMP END CRITICAL (OBC_VOLUME) END IF
Notice that in any computer, (A + B + C + D) will not give the same results as (A + D + C + B). Therefore, when we do floating-point parallel reduce operations the order of the reduction between chunks of numbers matters. This order is not deterministic in routine like mpi_allreduce, which is called by the ROMS routine mp_reduce. As matter of fact, there are three different ways in mp_reduce to compute this reduction operation using either mpi_allreduce, mpi_allgather, and mpi_isend/mpi_irecv. All three methods are subject to round-off problems.
What To Do:
- Currently, there is not much that we can do. Sasha suggested to carry the summation-by-pairs to reduce the round-off error. This summation adds only comparable numbers from the chunk of parallel data.
- If you are using frequent data from larger scale models, you can suppress volume conservation to see what happens. Specially, you need frequent free-surface (zeta) data. You may get into trouble if running for long (years) periods.
- Impose tidal forcing at the open boundary since we need to avoid volume conservation.
- Notice that we not longer can compare NetCDF files byte by byte to check for parallel bugs when volume conservation is activated:
To compute binary differences, you need to activate CPP options: DEBUGGING, OUT_DOUBLE, and POSITIVE_ZERO.
Code: Select all
% diff r1/ocean_avg.nc r2/ocean_avg.nc Binary files r1/ocean_avg.nc and r2/ocean_avg.nc differ % diff r1/ocean_his.nc r2/ocean_his.nc Binary files r1/ocean_his.nc and r2/ocean_his.nc differ % diff r1/ocean_rst.nc r2/ocean_rst.nc Binary files r1/ocean_rst.nc and r2/ocean_rst.nc differ