#552 closed upgrade (Done)
IMPORTANT: OpenMP shared-memory directives Revisited — at Version 3
Reported by: | arango | Owned by: | arango |
---|---|---|---|
Priority: | major | Milestone: | Release ROMS/TOMS 3.6 |
Component: | Nonlinear | Version: | 3.6 |
Keywords: | Cc: |
Description (last modified by )
This update includes a full revision of ROMS shared-memory pragma directives using OpenMP standard. This is a very important and delicate update that requires expertise. Luckly, I doubth that will affect your customized code.
All the parallel loops of ROMS are modified to simpler directives. For example, the old strategy:
!$OMP PARALLEL DO PRIVATE(thread,subs,tile) SHARED(numthreads) DO thread=0,numthreads-1 subs=NtileX(ng)*NtileE(ng)/numthreads DO tile=subs*thread,subs*(thread+1)-1,+1 ... END DO END DO !$OMP END PARALLEL DO
is replaced with:
DO tile=first_tile(ng),last_tile(ng),+1 ... END DO !$OMP BARRIER
In shared-memory, the parallel threads are spawn in the higher calling routines (drivers). For example, we now have:
!$OMP PARALLEL CALL main3d (RunInterval) !$OMP END PARALLEL
This directive is less restrictive and allows MASTER, BARRIER, and other useful OpenMP pragmas inside the parallel regions. If you are interested, please see the following discussion in the Forum.
This change cleans the code and facilitates parallelization of tricky algorithms for nesting, MPDATA, random number generation, point-sources, etc using the shared-memory paradigm.
WARNINGS:
- The values of NtileX(ng) and NtileE(ng) are no longer equal to one in distributed-memory (MPI). They have the same values as the ones specified in standard input: NtileI(ng) and NtileJ(ng). Notice that in the critical regions for global reduction operations, we now use the following code instead:
#ifdef DISTRIBUTE NSUB=1 ! distributed-memory #else IF (DOMAIN(ng)%SouthWest_Corner(tile).and. & & DOMAIN(ng)%NorthEast_Corner(tile)) THEN NSUB=1 ! non-tiled application ELSE NSUB=NtileX(ng)*NtileE(ng) ! tiled application END IF #endif
That is, we do a special exception for distribute-memory cases. This change is necessary in your customized versions of ana_grid.h and ana_psource.h.
- Notice that we no longer use the TILE (uppercase) as argument to the kernel routines. We use tile (lowercase) instead. This was important in previous versions of distributed-memory code where TILE was replaced with MyRank during C-preprocessing. Be careful with this one...
- Notice that few important variables of ROMS in mod_scalars.F and mod_stepping.F use the THREADPRIVATE directive in shared-memory so all the parallel threads have a private copy of such variables to avoid parallel collisions.
- Two new variables (first_tile(ng) and last_tile(ng)) are introduced to specify the tile range in each parallel region:
integer, allocatable :: first_tile(:) integer, allocatable :: last_tile(:) !$OMP THREADPRIVATE (first_tile, last_tile)
These variables are specified during the initialization of ROMS kernel using:!$OMP PARALLEL #if defined _OPENMP MyThread=my_threadnum() #elif defined DISTRIBUTE MyThread=MyRank #else MyThread=0 #endif DO ng=1,Ngrids chunk_size=(NtileX(ng)*NtileE(ng)+numthreads-1)/numthreads first_tile(ng)=MyThread*chunk_size last_tile (ng)=first_tile(ng)+chunk_size-1 END DO !$OMP END PARALLEL
Many thanks to Sasha shchepetkin for suggesting this strategy. Also many thanks to Mark Hadfield for his persistence and testing.
Change History (3)
comment:1 by , 13 years ago
Resolution: | → Done |
---|---|
Status: | new → closed |
comment:2 by , 13 years ago
Description: | modified (diff) |
---|
comment:3 by , 13 years ago
Description: | modified (diff) |
---|