Opened 13 years ago
Last modified 13 years ago
#552 closed upgrade
IMPORTANT: OpenMP shared-memory directives Revisited — at Initial Version
Reported by: | arango | Owned by: | arango |
---|---|---|---|
Priority: | major | Milestone: | Release ROMS/TOMS 3.6 |
Component: | Nonlinear | Version: | 3.6 |
Keywords: | Cc: |
Description
This update includes a full revision of ROMS shared-memory pragma directives using OpenMP standard. This is a very important and delicate update that requires expertise. Luckly, I doubth that will affect you customized code.
All the parallel loops of ROMS are modified to simpler directives. For example, the old strategy:
!$OMP PARALLEL DO PRIVATE(thread,subs,tile) SHARED(numthreads) DO thread=0,numthreads-1 subs=NtileX(ng)*NtileE(ng)/numthreads DO tile=subs*thread,subs*(thread+1)-1,+1 ... END DO END DO !$OMP END PARALLEL DO
is replaced with:
DO tile=first_tile(ng),last_tile(ng),+1 ... END DO !$OMP BARRIER
In shared-memory, the parallel threads are spawn at higher calling routines. For example, we now have:
!$OMP PARALLEL CALL main3d (RunInterval) #endif !$OMP END PARALLEL
This directive is less restrictive and allows MASTER, BARRIER, and other useful OpenMP pragmas inside the parallel region. If you are interested, please see the following discussion in the Forum.
This change cleans the code and facilitates parallelization of tricky algorithms for nesting, MPDATA, random number generation, point-sources, etc using the shared-memory paradigm.
WARNINGS:
- The values of NtileX(ng) and NtileE(ng) are no longer equal to one in distributed-memory (MPI). They have the same values as the one specified in standard input NtileI(ng) and NtileJ(ng). Notice that in the critical regions for global reduction operatios we now use instead the following code:
#ifdef DISTRIBUTE NSUB=1 ! distributed-memory #else IF (DOMAIN(ng)%SouthWest_Corner(tile).and. & & DOMAIN(ng)%NorthEast_Corner(tile)) THEN NSUB=1 ! non-tiled application ELSE NSUB=NtileX(ng)*NtileE(ng) ! tiled application END IF #endif
That is, we do a special exception for distribute-memory. This change is necessary in your customized versions of ana_grid.h and ana_psource.h.
- Notice that few important variables of ROMS in mod_scalars.F and mod_stepping.F use the THREADPRIVATE directive in shared-memory so all the parallel threads have a private copy of such variables to avoid parallel collisions.
- Two new variables (first_tile(ng) and last_tile(ng)) are introduced to specify the tile range in each parallel region:
integer, allocatable :: first_tile(:) integer, allocatable :: last_tile(:) !$OMP THREADPRIVATE (first_tile, last_tile)
These variables are specified during the initialization of ROMS kernel using:!$OMP PARALLEL #if defined _OPENMP MyThread=my_threadnum() #elif defined DISTRIBUTE MyThread=MyRank #else MyThread=0 #endif DO ng=1,Ngrids chunk_size=(NtileX(ng)*NtileE(ng)+numthreads-1)/numthreads first_tile(ng)=MyThread*chunk_size last_tile (ng)=first_tile(ng)+chunk_size-1 END DO !$OMP END PARALLEL
Many thanks to Sasha shchepetkin for suggesting this strategy. Also many thanks to Mark Hadfield for his persistence and testing.