Incorrect nesting of OpenMP directives: FLOAT_VWALK

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Incorrect nesting of OpenMP directives: FLOAT_VWALK

#1 Post by m.hadfield » Tue Apr 17, 2012 5:27 am

Did you know that a search for "1587-114 Incorrect nesting of OpenMP directives" on Google returns only one match, namely the following thread on the ROMS Forum

viewtopic.php?t=1714

Shortly there should be a second one!

I have run into the same error message, this time in connection with the code activated by FLOAT_VWALK. It occurs on an IBM PowerPC 575 running AIX and the xlf compiler. The message is printed on the first time step and the model then stops. On a Linux machine with gfortran with the same code, the message is not printed and the model run goes to completion.

I think the error is related to the fact that the step_floats is called from main2d inside an $OMP PARALLEL block:

Code: Select all

# ifdef FLOATS
!
!-----------------------------------------------------------------------
!  Compute Lagrangian drifters trajectories.
!-----------------------------------------------------------------------
!
        DO ng=1,Ngrids
          IF (Lfloats(Ng)) THEN
!$OMP PARALLEL DO PRIVATE(thread,chunk_size,Lstr,Lend)                  &
!$OMP&            SHARED(numthreads,Nfloats)
            DO thread=0,numthreads-1
              chunk_size=(Nfloats(ng)+numthreads-1)/numthreads
              Lstr=1+thread*chunk_size
              Lend=MIN(Nfloats(ng),Lstr+chunk_size-1)
              CALL step_floats (ng, Lstr, Lend)
            END DO
!$OMP END PARALLEL DO
!
!  Shift floats time indices.
!
            nfp1(ng)=MOD(nfp1(ng)+1,NFT+1)
            nf  (ng)=MOD(nf  (ng)+1,NFT+1)
            nfm1(ng)=MOD(nfm1(ng)+1,NFT+1)
            nfm2(ng)=MOD(nfm2(ng)+1,NFT+1)
            nfm3(ng)=MOD(nfm3(ng)+1,NFT+1)
          END IF
        END DO
# endif
and step_floats (which has no $OMP directives) then calls vwalk_floats, in which the random number generator is called inside an $OMP SINGLE block:

Code: Select all

!$OMP SINGLE
        CALL gasdev (rwalk)
!$OMP END SINGLE
I tried (optimistically) commenting out the $OMP SINGLE and END SINGLE directives, but then I get

Code: Select all

"gasdev.f90", line 111: 1525-108 Error encountered while attempting to allocate a data object.  The program will stop.
[Edit: is this related to the fact that the gasdev subroutines have local variables with the save attribute?]

I'd like to sort this out as I have a large floats run that should (I think) run faster under OpenMP than under MPI, but presently doesn't run at all.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#2 Post by m.hadfield » Wed Apr 18, 2012 4:41 am

Incidentally, I also get the same error message when I run the RIVERPLUME1 test case in OpenMP mode on the IBM. (I checked this out because the problem reported in the previous thread was related to ana_psource.) As before, the simulation runs to completion with Linux/gfortran/OpenMP.

User avatar
arango
Site Admin
Posts: 1114
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#3 Post by arango » Wed Apr 18, 2012 5:46 am

I am very busy now and my brain is overloaded to think about this problem in detail. However, I put it on my to-do list of things to check. I remember testing this carefully with the floats and point-sources. The point-sources via ANA_PSOURCE is extremely difficult in serial with partitions and shared-memory (OpenMP). I have mentioned this so many times in this forum and I still see so many users not following the advice to use a NetCDF file for point-sources.

Of course, the RIVERPLUME1 is one of ROMS test cases so you are not at fault here. What you are reporting here made me suspicious about the implementation of the OpenMP standard in some compilers. I recall testing this with the PGI and IFORT compilers some time ago. I don't recall testing with other compilers. It seems to work with gfortran, as you report. I don't have access to IBM computers.

Now, the random number sequences are very difficult in parallel. We need to get the same solution regardless of the partition. All the solutions must be identical when we compare serial, serial with partitions, shared-memory (OpenMP), and distributed-memory (MPI) solutions.

When I revisited the vertical random walk for floats, the following strategy gave identical solutions:

Code: Select all

!
!  If predictor step, generate random number sequence.
!
      IF (Predictor) THEN
# ifdef DISTRIBUTE
        IF (Master) THEN
          CALL gasdev (rwalk)
        END IF
        CALL mp_bcastf (ng, iNLM, rwalk)
# elif defined _OPENMP
!$OMP SINGLE
        CALL gasdev (rwalk)
!$OMP END SINGLE
# else
        IF (Lstr.eq.1) THEN
          CALL gasdev (rwalk)
        END IF
# endif
      END IF
I tried many things. Apparently, now this is not working with the IBM compiler that you are using and the version of OpenMP in it. As I understand it, the SINGLE directive construct is only executed by a single thread. The first thread to encounter the SINGLE construct will execute it. This is what we want in shared-memory. This is all executed inside a parallel region (main2d/main3d).

I think that the IBM is treating the SINGLE clause differently. Maybe it is doing a private execution on the clause and not making the random numbers data available to the other threads. I wonder if needs a OpenMP copy directive for this. I have never use them. The others threads are getting very wrong values, if any. I think that this explains why your simulation blows-up or terminate execution. You can put a print statement to see what is the values of rwalk on all threads. Just use a 2x1 partition for simplicity.

I think that we need to look at the IBM documentation for the implementation of OpenMP. Maybe there is a special environmental variable that needs to be activated.

This is what I :arrow: found:
Rules:

It is illegal to branch into or out of a block that is enclosed within the SINGLE construct.

The SINGLE construct must be encountered by all threads in a team or by none of the threads in a team. The first thread to encounter the SINGLE construct will execute it. All work-sharing constructs and BARRIER directives that are encountered must be encountered in the same order by all threads in the team.

If you specify NOWAIT on the END SINGLE directive, the threads that are not executing the SINGLE construct will proceed to the instructions following the SINGLE construct. If you do not specify the NOWAIT clause, each thread will wait at the END SINGLE directive until the thread executing the construct reaches the END SINGLE directive. You may not specify NOWAIT and COPYPRIVATE as part of the same END SINGLE directive.

There is no implied BARRIER at the start of the SINGLE construct. If you do not specify the NOWAIT clause, the BARRIER directive is implied at the END SINGLE directive.

You cannot nest SECTIONS, DO and SINGLE directives inside one another if they bind to the same PARALLEL directive.

SINGLE directives are not permitted within the dynamic extent of CRITICAL, MASTER, ORDERED, or TASK directives. BARRIER and MASTER directives are not permitted within the dynamic extent of SINGLE directives.

If you have specified a variable as PRIVATE, FIRSTPRIVATE, LASTPRIVATE or REDUCTION in the PARALLEL construct which encloses your SINGLE construct, you cannot specify the same variable in the PRIVATE or FIRSTPRIVATE clause of the SINGLE construct.

The SINGLE
directive binds to the closest dynamically enclosing PARALLEL directive, if one exists.
Perhaps, you can help us here 8) since I don't have access to an IBM computer now and I will not be able to reproduce your problem.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#4 Post by m.hadfield » Wed Apr 18, 2012 8:17 pm

Thanks, Hernan. I'll look into it.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#5 Post by m.hadfield » Fri Apr 20, 2012 4:36 am

So far, my conclusions for the floats vertical-walk code are the same as the one drews reported for the river source:
  • There are no obvious violations of the OpenMP rules
  • The simulation runs and appears to give correct results if you remove the PARALLEL directives in main3d
Perhaps it's an AIX compiler bug? Perhaps having the SINGLE directives in a different module from the PARALLEL directive is confusing the compiler's analysis?

User avatar
arango
Site Admin
Posts: 1114
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#6 Post by arango » Fri Apr 20, 2012 4:56 am

Yes, it is possible that there is a bug in the AIX compiler. The SINGLE directive has to be inside of a parallel region. That's what the rules above specify. If you remove the parallel directives in main3d, all the floats are computed by the master thread. Then, I assume that you get better efficiency with MPI.

Usually the IBM computers have different version of the AIX compiler. Perhaps, you can try an older version.

We would have to come-up with a more robust strategy that the SINGLE directive. I tried different strategies before without success. I was not able to get the CRITICAL region directive to work because this is not a global reduction operation. I know that the only strategy that works is for the random sequences to be computed serially by a single threat. I did try to have the random number sequences computed in main3d by the master before calling the floats stepping routine. I don't recall what were the issues with this strategy.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#7 Post by m.hadfield » Thu Apr 26, 2012 3:40 am

I have done a little more investigation and now think that
  • The use of the OpenMP SINGLE directive in ROMS is incorrect.
  • The xlf compiler on AIX is correct in recognising this.
  • The PGI and Gfortran compilers are incorrect in failing to do so.
Consider the following OpenMP test program:

Code: Select all

module openmp
  
  contains
  
  subroutine main
  
    implicit none
  
    integer, parameter :: n=10
    real, dimension(n) :: a,b,c
    integer :: i
  
    do i=1,n
      a(i) = i*1.0
      b(i) = a(i)
    end do
        
    !$OMP PARALLEL DO SHARED(a,b,c) PRIVATE(i)
    do i=1,n
      c(i) = a(i)+b(i)
      call hello(i)
    end do
    !$OMP END PARALLEL DO
  
  end subroutine main
  
  subroutine hello (i)
  
    implicit none
  
    integer, intent(in) :: i
  
    write (unit=*,fmt='("Hello from iteration",I3)') i
  
  end subroutine hello
  
end module openmp

program run
  use openmp
  call main
end program run
When run with OMP_NUM_THREADS set to a reasonable value (I've tried 2 and 4) this produces the following output (though not always in the same order):

Code: Select all

Hello from iteration  1
Hello from iteration  2
Hello from iteration  3
Hello from iteration  4
Hello from iteration  5
Hello from iteration  6
Hello from iteration  7
Hello from iteration  8
Hello from iteration  9
Hello from iteration 10
Now, inside the hello subroutine, I enclose the write statement inside a SINGLE block. (This is intended to reproduce the essentials of the ROMS float vertical-walk code.) On AIX with xlf I get the familiar run-time error

Code: Select all

1587-114 Incorrect nesting of OpenMP directives.
1587-114 Incorrect nesting of OpenMP directives.
On Linux with the gfortran or pgf90 I get (with OMP_NUM_THREADS=2)

Code: Select all

Hello from iteration  6
Hello from iteration  7
Hello from iteration  8
Hello from iteration  9
Hello from iteration 10
Now, instead of having the OMP SINGLE block inside the hello subroutine, I put it in the main program, enclosing the call to the hello subroutine. (This should make no difference to he order of execution.) There is now a compile-time error in all cases. With xlf it is

Code: Select all

"openmp.f90", line 21.11: 1516-151 (S) The SINGLE directive is not permitted in the dynamic extent of MASTER, CRITICAL, SINGLE, work-sharing DO, PARALLEL DO, WORKSHARE, PARALLEL WORKSHARE, work-sharing SECTIONS and PARALLEL SECTIONS directives without an associated PARALLEL directive.
With pgf90 it is

Code: Select all

PGF90-S-0155-Illegal context for SINGLE (openmp.f90: 21)
With Gfortran it is

Code: Select all

openmp.f90:21:0: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region
So what is wrong with this program? Recall that the PARALLEL DO directive is just a compound of PARALLEL and DO directives. When we add the SINGLE directive to the test program what we have is

Code: Select all

!$OMP PARALLEL
!$OMP DO
!$OMP SINGLE
!$OMP END SINGLE
!$OMP END DO
!$OMP END PARALLEL
and this violates the rule that
You cannot nest SECTIONS, DO and SINGLE directives inside one another if they bind to the same PARALLEL directive.
The following on the other hand would be valid

Code: Select all

!$OMP PARALLEL
!$OMP SINGLE
!$OMP END SINGLE
!$OMP DO
!$OMP END DO
!$OMP END PARALLEL
All compilers recognise that the former is invalid when the OpenMP directives are in the same subroutine. Only xlf recognises it (at run time) when they are in different subroutines.

Note also that, although pgf90 and gfortran may execute the code inside the SINGLE block, and do so on a single thread, the code is generally executed a number of times. That number depends on OMP_NUM_THREADS: this dependence is clearly incorrect.

User avatar
arango
Site Admin
Posts: 1114
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#8 Post by arango » Thu Apr 26, 2012 8:46 pm

Okey, let's try the following code instead in vwalk_floats.F:

Code: Select all

      IF (Predictor) THEN
# ifdef DISTRIBUTE
        IF (Master) THEN
          CALL gasdev (rwalk)
        END IF
        CALL mp_bcastf (ng, iNLM, rwalk)
# elif defined _OPENMP
!$OMP CRITICAL (VWALK_RANDOM)
        IF (tile_count.eq.0) THEN
          CALL gasdev (rwalk)
        END IF
        tile_count=tile_count+1
        IF (tile_count.eq.NtileX(ng)*NtileE(ng)) THEN
          tile_count=0
        END IF
!$OMP END CRITICAL (VWALK_RANDOM)
# else
        IF (Lstr.eq.1) THEN
          CALL gasdev (rwalk)
        END IF
# endif
      END IF
This gives me identical solutions for serial with partitions, shared-memory, and distributed-memory using ifort, gfortran, and pgi compilers.

I still don't know what it is the problem with ana_psourse.h. This routine doesn't have illegal constructs as far as I know. Parallelism here is extremely difficult. I have warn everybody to not use this routine for parallel applications. It is much easier to put the point-source data in a NetCDF file. I still see many user not following this recommendation and still posting in this forum about problems about this. Of course, we ignore such postings.

Please, let me know if this works for you so I can update the repository.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#9 Post by m.hadfield » Thu Apr 26, 2012 9:37 pm

Yes, that works: it runs to completion with AIX/xlf+OpenMP, and gives identical results to a serial run. Thank you!

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#10 Post by m.hadfield » Fri Apr 27, 2012 7:33 am

Hi Hernan

The problem with ana_psource isn't a big issue for me and I know you are busy on other things, but I just can't let these things go unsolved!

I think the issue is with the BARRIER directives inside ana_psource.h. According to IBM, the rules are
Rules

A BARRIER directive binds to the closest dynamically enclosing PARALLEL directive, if one exists.

A BARRIER directive cannot appear within the dynamic extent of the CRITICAL, DO (work-sharing), MASTER, ORDERED, PARALLEL DO , PARALLEL SECTIONS, PARALLEL WORKSHARE, SECTIONS, SINGLE, TASK, and WORKSHARE directives.

All threads in the team must encounter the BARRIER directive if any thread encounters it.

All BARRIER directives and work-sharing constructs must be encountered in the same order by all threads in the team.

In addition to synchronizing the threads in a team, the BARRIER directive implies the FLUSH directive without the variable_name_list.
In ROMS, ana_psource contains BARRIER directives and is called from get_idata inside a PARALLEL DO block:

Code: Select all

!$OMP PARALLEL DO PRIVATE(thread,subs,tile) SHARED(numthreads)
        DO thread=0,numthreads-1
          subs=NtileX(ng)*NtileE(ng)/numthreads
          DO tile=subs*thread,subs*(thread+1)-1,+1
            CALL ana_psource (ng, TILE, iNLM)
          END DO
        END DO
!$OMP END PARALLEL DO
This violates the rule in the 2nd paragraph above. Again, I think xlf is correct in rejecting it and ifort/gfortran/pgf90 are incorrect in allowing it.

User avatar
shchepet
Posts: 185
Joined: Fri Nov 14, 2003 4:57 pm

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#11 Post by shchepet » Sat Apr 28, 2012 3:57 am

Watching this conversation between Mark and Hernan, I realized today that it has been approximately
10 years (no kidding!) since I eliminated all C$OMP PARALLEL DO directives from my code. Back then
I was motivated by making the code more explicit (and, frankly by beauty as well), but it appears that
it helped to avoid problems later (although initially it brought some problems).

May be you should do the same in Rutgers ROMS as well.

Obviously, the old semantics with the double-nested loops,

Code: Select all

!$OMP PARALLEL DO PRIVATE(thread,chunk_size,Lstr,Lend)                  &
!$OMP&            SHARED(numthreads,Nfloats)
         do thread=... 
           do tile=...
 
is the product of evolution from SGI C$DOACROSS directive -- back then it was the only way to
create parallel threads. Open MP is actually more in line with older Cray directives (also known as
"workshop" directives) which has two levels: C$DIR PARALLEL (outer) and C$DIR DO (inner).
Interestingly enough, that BARRIER was acceptable inside SGI C$DOACROSS (which is most similar to
C$OMP PARALLEL DO), but is no longer acceptable in Open MP standard. Same applies to SINGLE ...
END SINGLE.


As of today, here fundamentally only one C$OMP PARALLEL ... C$OMP END PARALLEL region in my
code (there is another one inside mpi_setup routine executed by MPI_rank=0 -- this needed only to
find out how many parallel threads are to use on rank=0 in order to broadcast to to all other MPI nodes).


The elementary structure is

Code: Select all

....
      do tile=my_fist_tile,my_last_tile,+1
        call set_HUV (tile)
      enddo
C$OMP BARRIER
      do tile=my_last_tile,my_first_tile,-1
        call omega (tile)
        call rho_eos (tile)
      enddo
C$OMP BARRIER
.....
.....
encapsulated into an C$OMP PARALLEL ... C$OMP END PARALLEL region on a subroutine-level above.
The encapsulation completely eliminates the need for SHARED and PRIVATE lists: the code relies on
default rules -- what is in common blocks and Fortran 90 modules is shared by default (unless explicitly
specified as threadprivate); all internal variables inside subroutines are private.


my_tile_range = [my_last_tile,my_first_tile] is determined from the total number of tiles
and the thread number of particular thread. It the actual code the zig-zag order of tile
processing is inserted by "mpi.F" instead of manually, so you literally see loops tile

Code: Select all

     do tile=my_tile_range
so "mpc" (which post-processes code after CPP) reverses the successive loops. This is done to
essentially make zig-zag "always straight", regardless whether some pieces of the code, including
some of the tile loops may be taken out by CPP.


The entire "main.F" looks like this:

Code: Select all

#include "cppdefs.h"

!!    program main               ! Open MP version of ROMS driver
      implicit none              ! with single parallel region using
      integer ierr               ! explicit barrier synchronization.
#include "param.h"
#ifdef MPI
      real*8 tstart, tend
C$    integer level,req_lev
#  include "mpif.h"
c**   call system('uname -nmr')
      ierr=1
C$    req_lev=MPI_THREAD_MULTIPLE
C$    call MPI_Init_thread (req_lev, level, ierr)
C$    mpi_master_only write(*,*) 'MPI thread support levels =',
C$   &                                          req_lev,level
C$    ierr=0
      if (ierr.eq.1) call MPI_Init (ierr)

      call mpi_setup (ierr)
      tstart=MPI_Wtime()
      if (ierr.eq.0) then
#endif
        call init_scalars (ierr)        ! Initialize global scalars,
        if (ierr.eq.0) then             ! model tunable paparameters,
C$        call omp_set_dynamic(.false.)
C$OMP PARALLEL                          ! fast-time averaging weights
          call roms_thread              ! for barotropic mode, and
C$OMP END PARALLEL                      ! launch the model in OpenMP
        endif                           ! parallel regime.
#ifdef MPI
      endif
      call MPI_Barrier(ocean_grid_comm, ierr)
      tend=MPI_Wtime()
      mpi_master_only write(*,*) 'MPI_run_time =', tend-tstart
      call MPI_Finalize (ierr)
#endif
      stop
      end


      subroutine roms_thread
      implicit none
#include "param.h"
#include "scalars.h"
                                         ! Note: Because there is
      call start_timers ()               ! a possibility of I/O error
      call roms_init                     ! occurring on some MPI nodes,
      if (may_day_flag.ne.0) goto 99     ! but not simultaneously on
      do iic=ntstart,ntstart+ntimes      ! all, exiting is deferred
        call roms_step                   ! until "may_day_flag" is
        if (may_day_flag.ne.0 .and.      ! summarized among the nodes
     &       diag_sync_flag) goto 99     ! and broadcasted by "diag"
      enddo                              ! (c.f. the control logic
  99  call stop_timers()                 ! here and in "diag").

C$OMP BARRIER
C$OMP MASTER
        call closecdf
C$OMP END MASTER
      return
      end

      subroutine roms_init
      implicit none
      integer trd, tile, my_first, my_last, range
C$    integer omp_get_thread_num, omp_get_num_threads
#include "param.h"
#include "scalars.h"

# include "ncvars.h"
#ifdef FLOATS
! grid.h is needed so that lonr and latr are readily available
# include "grid.h"
# include "floats.h"
# include "ncvars_floats.h"
#endif

      numthreads=1
C$    numthreads=omp_get_num_threads()
      trd=0
C$    trd=omp_get_thread_num()
      proc(2)=trd

      if (mod(NSUB_X*NSUB_E,numthreads).ne.0) then
C$OMP MASTER
        mpi_master_only write(*,'(/3(1x,A,I3),A/)')
     &    '### ERROR: wrong choice of numthreads =', numthreads,
     &         'while NSUB_X =', NSUB_X, 'NSUB_E =', NSUB_E,'.'
        may_day_flag=8
C$OMP END MASTER
C$OMP BARRIER
        goto 99 !-->  EXIT
      endif

      iic=0                     ! WARNING: This code is written
      kstp=1                    ! under assumption that the scalars
      knew=1                    ! on the left -- numthreads, iic,
#ifdef SOLVE3D
      iif=1                     ! kstp, knew, iif, nstp, nnew --
      nstp=1                    ! belong to a  THREADPRIVATE common
      nrhs=1                    ! block, so there no false sharing
      nnew=1                    ! here.
      nnew=1
#endif
      synchro_flag=.true.
      diag_sync_flag=.false.
      priv_count=0

      range=(NSUB_X*NSUB_E+numthreads-1)/numthreads
      my_first=trd*range
      my_last=min(my_first + range-1, NSUB_X*NSUB_E-1)
#define my_tile_range my_first,my_last

      do tile=my_tile_range         ! Initialize (FIRST-TOUCH) model
        call init_arrays (tile)     ! global arrays (most of them are
      enddo                         ! just set to to zero).
C$OMP BARRIER

c--#define CR
CR      write(*,*) '-11' MYID


#ifdef ANA_GRID
      do tile=my_tile_range          ! Set horizontal curvilinear
        call ana_grid (tile)         ! grid and model bathymetry
      enddo                          ! (either analyticaly or read
C$OMP BARRIER                        ! from grid netCDF file).
#else
C$OMP MASTER
      call get_grid
C$OMP END MASTER
C$OMP BARRIER
      if (may_day_flag.ne.0) goto 99 !-->  EXIT
#endif
      do tile=my_tile_range          ! Compute various metric terms
        call setup_grid1 (tile)      ! and their combinations.
      enddo
C$OMP BARRIER
CR      write(*,*) '-10' MYID
      do tile=my_tile_range
        call setup_grid2 (tile)
      enddo
C$OMP BARRIER
CR      write(*,*) '-9' MYID

#ifdef SOLVE3D
C$OMP MASTER                         ! Setup vertical grid-stretching
      call set_scoord                ! functions for S-coordinate
C$OMP END MASTER                     ! system
C$OMP BARRIER
      if (may_day_flag.ne.0) goto 99
#endif
CR      write(*,*) ' -8' MYID

#if (defined UV_VIS2 && defined VIS_GRID) ||\
    (defined TS_DIF2 && defined DIF_GRID)
      do tile=my_tile_range          ! Rescale horizontal mixing
        call visc_rescale (tile)     ! coefficients according to
      enddo                          ! local grid size.
C$OMP BARRIER
CR      write(*,*) ' -7' MYID
#endif

#ifdef SOLVE3D
      do tile=my_tile_range          ! Create three-dimensional
        call set_depth (tile)        ! S-coordinate system, which
#ifdef LMD_KPP
        call swr_frac (tile)         ! may be needed by ana_init.
#endif
      enddo
C$OMP BARRIER                        ! Here it is assumed that free
      do tile=my_tile_range          ! surface zeta is at rest state,
        call grid_stiffness (tile)   ! 'zeta=0). Also find and report
      enddo                          ! extremal values of topographic
C$OMP BARRIER                        ! slope parameters "rx0", "rx1".
CR      write(*,*) ' -6' MYID
#endif

                                     ! Set initial conditions for
#ifdef ANA_INITIAL
      do tile=my_tile_range          ! model prognostic variables,
        call ana_init (tile)         ! either analytically or read
      enddo                          ! from netCDF file. Note that
C$OMP BARRIER
      if (nrrec.gt.0) then           ! because ana_init may also
#endif
#ifdef EXACT_RESTART
C$OMP MASTER                         ! setup environmental variables
        call get_init (nrrec-1,2)    ! (e.g. analytical boundary
C$OMP END MASTER                     ! forcing), call it first, even
C$OMP BARRIER                        ! in the case of restart run.
# ifdef SOLVE3D
        do tile=my_tile_range
          call set_depth (tile)      !<-- needed to initialize Hz_bak
        enddo
C$OMP BARRIER
# endif
#endif
C$OMP MASTER
        call get_init (nrrec, 1)
C$OMP END MASTER
#ifdef ANA_INITIAL
      endif    !<-- nrrec.gt.0
#endif
C$OMP BARRIER
      if (may_day_flag.ne.0) goto 99      !--> ERROR
CR      write(*,*) ' -5' MYID
                                  ! Set initial model clock: at this
      time=start_time             ! moment "start_time" (global scalar)
      tdays=time*sec2day          ! is set by get_init or analytically
                                  ! copy it into threadprivate "time"
#ifdef SOLVE3D
      do tile=my_tile_range       ! Re-compute three-dimensional S-
        call set_depth (tile)     ! coordinate system: at this moment
      enddo                       ! free surface has non-zero status
C$OMP BARRIER

CR      write(*,*)  ' -4' MYID
      do tile=my_tile_range
        call set_HUV (tile)
      enddo
C$OMP BARRIER
CR      write(*,*)  ' -3' MYID

      do tile=my_tile_range
        call omega (tile)
        call rho_eos (tile)
      enddo
C$OMP BARRIER
CR      write(*,*)  ' -2' MYID
#endif

! Set up climatological environment: Set nudging coefficient for
!==== == ============== ============ sea-surface hight and tracer
! climatology; create analytical tracer and sea-surface hight
! climatology fields (if applicable); set bottom sediment grain
! size [m] and density [kg/m^3] used for bottom boundary layer
! formulation;

#if defined SPONGE || defined TCLIMATOLOGY \
  || (defined SG_BBL96 && defined ANA_BSEDIM)\
  || (defined TCLIMATOLOGY && defined ANA_TCLIMA)\
  || defined ANA_SSH

      do tile=my_tile_range
# if defined SPONGE || defined TCLIMATOLOGY
        call set_nudgcof (tile)
# endif
# if defined TCLIMATOLOGY && defined ANA_TCLIMA && defined SOLVE3D
        call ana_tclima (tile)
# endif
# ifdef ANA_SSH
        call ana_ssh (tile)
# endif
# if defined SG_BBL96 && defined ANA_BSEDIM
        call ana_bsedim (tile)
# endif
      enddo
C$OMP BARRIER
#endif
CR      write(*,*) ' -1' MYID

! Read initial input data for forcing fields; tracer and sea surface
! climatology; bottom sediment grain size and density (if applicable)
! from input netCDF files. Recall that CPP-logic here is mutually
! exclussive with respect to cals ana_tclima, ana_ssh, and ana_bsedim
! just above.

C$OMP MASTER
#ifdef ANA_GRID
        call wrt_ana_grid
#endif
        if (ldefhis .and. wrthis(indxTime)) call wrt_his
#ifdef FLOATS
! Initialization for Lagrangian floats
!-------------------------------------------------------
      nrecflt=0    ! initialization done here and not in
      ncidflt=-1   ! init_scalars since it must be done only
                   ! once (whether child levels exist or not)
      spval=1.E15  ! spval is the nodata flag for float variables

      deltac2p=2.3 ! distance from the boundary at which a float
                   ! is transferred from child to parent
      deltap2c=2.5 ! same for transfer from parent to child

      call init_arrays_floats
      call init_floats
# ifdef SPHERICAL
      call interp_r2d_type_ini (lonr(START_2D_ARRAY), iflon)

      call interp_r2d_type_ini (latr(START_2D_ARRAY), iflat)
# else
      call interp_r2d_type_ini (  xr(START_2D_ARRAY), iflon)
      call interp_r2d_type_ini (  yr(START_2D_ARRAY), iflat)
# endif
# ifdef SOLVE3D
      call fill_ini ! fills in trackaux for ixgrd,iygrd,izgrd
                    ! and ifld (either izgrd or ifld is meaningful)
# endif
      if (ldefflt) call wrt_floats
#endif /* FLOATS */
C$OMP END MASTER
C$OMP BARRIER
CR      write(*,*) '  0' MYID
      if (may_day_flag.ne.0) goto 99     !-->  EXIT

C$OMP MASTER
        mpi_master_only write(*,'(/1x,A/)')
     &     'main :: initialization complete, started time-steping.'
C$OMP END MASTER
  99  return
      end


!      *****    *********    ******   *******    *********
!    ***   ***  *  ***  *   *   ***   ***   ***  *  ***  *
!    ***           ***     **   ***   ***   ***     ***
!      *****       ***    ***   ***   ***   **      ***
!          ***     ***    *********   ******        ***
!    ***   ***     ***    ***   ***   ***  **       ***
!      *****       ***    ***   ***   ***   ***     ***


      subroutine roms_step
      implicit none
      integer trd, tile, my_first, my_last, range
#include "param.h"
#include "scalars.h"
#include "ncvars.h"
#ifdef FLOATS
# include "ncvars_floats.h"
# include "floats.h"
      integer chunk_size_flt, Lstr,Lend, flt_str
      common /floats_step/ flt_str
#endif
#ifdef GRID_LEVEL
      integer iter
#endif

      trd=proc(2)
      range=(NSUB_X*NSUB_E+numthreads-1)/numthreads
      my_first=trd*range
      my_last=min(my_first + range-1, NSUB_X*NSUB_E-1)

! Increment time-step index and set model clock. Note that "time" set
! below corresponds to step "n" (denoted here as "nstp"), while counter
! "iic" corresponds to "n+1", so normally, assuming that time is
! counted from zero, the following relation holds: time=dt*(iic-1).
!  Also note that the output history/restart/averages routines write
! time and all the fields at step "n" (not n+1), while the first
! element of structure "time_index" written into the files is actually
! iic-1, hence normally time=time_index*dt there.  Same rule applies
! to the diagnostic routine "diag" which prints time and time step
! (actually iic-1) on the screen.

      time=start_time+dt*float(iic-ntstart) !<-- corresp. to "nstp"
      tdays=time*sec2day
#ifdef SOLVE3D
      nstp=1+mod(iic-ntstart,2)
      nrhs=nstp
      nnew=3
#endif

#ifdef FLOATS
      nfp1=mod(nfp1+1,NFT+1)          ! rotate time
      nf  =mod(nf  +1,NFT+1)          ! indices for
      nfm1=mod(nfm1+1,NFT+1)          ! floats
      nfm2=mod(nfm2+1,NFT+1)
      nfm3=mod(nfm3+1,NFT+1)
C$OMP MASTER
      flt_str=0
C$OMP END MASTER
#endif
                                      ! Read forcing and climatology
      if (synchro_flag) then          ! data. Note: this operation may
        synchro_flag=.false.          ! raise "may_day_flag" in the
C$OMP MASTER                          ! case of IO errors, however the
        call get_forces               ! error condition may occur on
C$OMP END MASTER                      ! some nodes, but not on all at
C$OMP BARRIER                         ! the same time, so the exit is
      endif                           ! deferred until after broadcast
                                      ! of "may_day_flag" inside diag.
#ifdef SOLVE3D
      do tile=my_tile_range                ! Interpolate forcing date
        call set_forces (tile)             ! to model time and compute
# if defined SSH_TIDES || defined UV_TIDES
        call set_tides  (tile)             ! surface fluxes.
# endif
        call    rho_eos (tile)
        call    set_HUV (tile)
        call       diag (tile)
#ifdef BIOLOGY
        call   bio_diag (tile)
#endif
      enddo
C$OMP BARRIER

      do tile=my_tile_range
        call omega (tile)
# if defined ANA_VMIX
        call ana_vmix (tile)
# elif defined LMD_MIXING
        call lmd_vmix (tile)
c        call lmd_kmix (tile)
# elif defined BVF_MIXING
        call bvf_mix (tile)
# endif
      enddo
C$OMP BARRIER

      do tile=my_tile_range
        call     prsgrd (tile)
        call      rhs3d (tile)
        call pre_step3d (tile)
# ifdef PRED_COUPLED_MODE
#  ifdef UV_VIS2
        call     visc3d (tile)
#  endif
# endif
# ifdef AVERAGES
        call    set_avg (tile)
# endif
      enddo
C$OMP BARRIER

# ifdef CORR_COUPLED_MODE
      do tile=my_tile_range
        call set_HUV1 (tile)
      enddo
C$OMP BARRIER

      nrhs=3
      nnew=3-nstp   !!! WARNING

      do tile=my_tile_range
        call omega (tile)
        call rho_eos (tile)
      enddo
C$OMP BARRIER

      do tile=my_tile_range
        call     prsgrd (tile)
        call      rhs3d (tile)
        call step3d_uv1 (tile)
#  ifdef UV_VIS2
        call     visc3d (tile)
#  endif
      enddo
C$OMP BARRIER
# endif
#endif  /* SOLVE3D */

! Output block: write restart/history files.
!======= ====== ===== =============== ======

      if ( iic.gt.ntstart .and. ( mod(iic-ntstart,nrst).eq.0
#ifdef EXACT_RESTART
     &                        .or. mod(iic-ntstart+1,nrst).eq.0
#endif
     &   .or. (mod(iic-ntstart,nwrt).eq.0 .and. wrthis(indxTime))
#ifdef AVERAGES
     &   .or. (mod(iic-ntsavg,navg).eq.0  .and. wrtavg(indxTime))
#endif
#ifdef STATIONS
     &   .or. (mod(iic-ntstart,nsta).eq.0 .and. nstation.gt.0)
#endif
#ifdef FLOATS
     &   .or. (mod(iic-ntstart,nflt).eq.0 .and. nfloats.gt.0)
#endif
     &                                                  )) then
C$OMP MASTER
        if (mod(iic-ntstart,nrst).eq.0
#ifdef EXACT_RESTART
     &                      .or. mod(iic-ntstart+1,nrst).eq.0
#endif
     &                                ) nrecrst=nrecrst+1
        if (mod(iic-ntstart,nwrt).eq.0) nrechis=nrechis+1
#ifdef AVERAGES
        if (mod(iic-ntsavg,navg) .eq.0) nrecavg=nrecavg+1
#endif
#ifdef STATIONS
        if (mod(iic-ntstart,nsta).eq.0) nrecstn=nrecstn+1
#endif
#ifdef FLOATS
        if (mod(iic-ntstart,nflt).eq.0) nrecflt=nrecflt+1
#endif
        if (mod(iic-ntstart,nrst).eq.0
#ifdef EXACT_RESTART
     &                      .or. mod(iic-ntstart+1,nrst).eq.0
#endif
     &                                 ) call wrt_rst
        if (mod(iic-ntstart,nwrt).eq.0 .and. wrthis(indxTime)) then
          call wrt_his
c          if (iic.gt.60) nwrt=1 !<-- useful for debugging
        endif

#ifdef AVERAGES
        if (mod(iic-ntsavg,navg) .eq.0 .and. wrtavg(indxTime))
     &      call wrt_avg
#endif
#ifdef STATIONS
        if (mod(iic-ntstart,nsta).eq.0 .and. nstation.gt.0)
     &      call wrt_statn
#endif
#ifdef FLOATS
        if (mod(iic-ntstart,nflt).eq.0 .and. nfloats.gt.0)
     &      call wrt_floats
        diagfloats=.false.
#endif
C$OMP END MASTER
C$OMP BARRIER
        if (iic-ntstart .gt. ntimes) goto 99   !-->  DONE
      endif

#ifdef FLOATS
! flag for diagnostic computation (for writing at next time step)
      if (mod(iic-ntstart,nflt).eq.0) then
        diagfloats=.true.
      endif
#endif

! Solve the 2D equations for the barotropic mode.
!------ --- -- --------- --- --- ---------- -----
#ifdef SOLVE3D
      do iif=1,nfast
#endif
#define FORW_BAK
#ifdef FORW_BAK
        kstp=knew                     ! This might look a bit silly,
        knew=kstp+1                   ! since both branches of this
        if (knew.gt.4) knew=1         ! if statement are identical.
        if (mod(knew,2).eq.0) then    ! Nevertheless, it makes sense,
          do tile=my_tile_range       ! since mpc will reverse one of
# ifndef SOLVE3D
            call set_forces (tile)    ! these loops to make zig-zag
# endif
            call     step2d (tile)    ! tile-processing sequence.
          enddo
C$OMP BARRIER
        else
          do tile=my_tile_range
# ifndef SOLVE3D
            call set_forces (tile)
# endif
            call     step2d (tile)
          enddo
C$OMP BARRIER
        endif
#else
        kstp=knew
        knew=3
        do tile=my_tile_range
# ifndef SOLVE3D
          call set_forces (tile)
# endif
          call     step2d (tile)
        enddo
C$OMP BARRIER
        knew=3-kstp
        do tile=my_tile_range
           call step2d (tile)
        enddo
C$OMP BARRIER
#endif
#ifdef SOLVE3D
      enddo    ! <-- iif

# ifdef PRED_COUPLED_MODE
      do tile=my_tile_range            ! This code segment is for
        call set_HUV1 (tile)           ! predictor-coupled version.
      enddo
C$OMP BARRIER

      nrhs=3
      nnew=3-nstp

      do tile=my_tile_range
        call omega (tile)
        call rho_eos (tile)
      enddo
C$OMP BARRIER
      do tile=my_tile_range
        call     prsgrd (tile)
        call      rhs3d (tile)
        call step3d_uv1 (tile)
      enddo
C$OMP BARRIER
# endif

      do tile=my_tile_range            ! Continue solution of the
        call step3d_uv2 (tile)         ! three-dimensional equations:
      enddo                            ! finalize time step for momenta
C$OMP BARRIER                          ! and tracers
      do tile=my_tile_range
        call omega (tile)
        call step3d_t (tile)
# if defined TS_DIF2 || defined TS_DIF4
        call t3dmix (tile)
# endif
      enddo
C$OMP BARRIER
#endif /* SOLVE3D */

#ifdef FLOATS
      chunk_size_flt=32
      do while (flt_str.lt.nfloats)
C$OMP CRITICAL
        Lstr=flt_str+1
        flt_str=Lstr+chunk_size_flt-1
C$OMP END CRITICAL
        Lend=min(Lstr+chunk_size_flt-1,nfloats)
        call step_floats (Lstr,Lend)
      enddo
c**    call step_floats (1,nfloats)    ! serial version for debugging
#endif
  99  return
      end
Notes:

1. In the code above all time variables, time indices are THREADPRIVATE, that is every thread has its own
clock and advances it, therefore "clalars.h" contains among other things,

Code: Select all

      real*4 cpu_time(4)
      real WallClock, time, tdays
      integer proc(2), numthreads, iic, kstp, knew
#ifdef SOLVE3D
     &                           , iif, nstp, nnew, nrhs
#endif
#ifdef FLOATS
     &                     , nfp1, nf,  nfm1, nfm2, nfm3
#endif
     &                           , priv_count(16)
      logical synchro_flag, diag_sync_flag
      common /priv_scalars/  WallClock, cpu_time,   proc,
     &         time, tdays, numthreads, iic,  kstp, knew
#ifdef SOLVE3D
     &                           , iif, nstp, nnew, nrhs
#endif
#ifdef FLOATS
     &                      ,nfp1, nf,  nfm1, nfm2, nfm3
#endif
     &       , priv_count, synchro_flag, diag_sync_flag
C$OMP THREADPRIVATE(/priv_scalars/)
2. As said above, "mpi_setup.F" contains among other things

Code: Select all

#include "cppdefs.h"
#ifdef MPI

C$    subroutine master_num_threads (nthrds)
C$    implicit none
C$    integer nthrds, trd, omp_get_num_threads, omp_get_thread_num
C$    trd=omp_get_thread_num()
C$    if (trd.eq.0) nthrds=omp_get_num_threads()
C$    return
C$    end




      subroutine mpi_setup (ierr)
      implicit none
      integer ierr, nsize, i_W, i_E, j_S, j_N, off_XI, off_ETA
# include "param.h"
# include "hidden_mpi_vars.h"
# include "ncvars.h"
# include "mpif.h"
C$    integer nthrds

      ocean_grid_comm=MPI_COMM_WORLD
      call MPI_Comm_size (ocean_grid_comm, nsize,  ierr)
      call MPI_Comm_rank (ocean_grid_comm, mynode, ierr)

#ifdef VERBOSE
      mpi_master_only call system('echo Starting mpi_setup at `date`')
      write(*,*) 'Starting mpi_setup, node ', mynode,' out of', nsize
#endif
      exc_call_count=0
                                         ! Determine number of threads
C$    if (mynode.eq.0) then              ! on MPI node rank=0, then
C$OMP PARALLEL SHARED(nthrds)            ! broadcast it, so all others
C$      call master_num_threads (nthrds) ! can set the same number of
C$OMP END PARALLEL                       ! threads as the rank=0 node.
C$    endif
C$    call MPI_Bcast(nthrds, 1, MPI_INTEGER, 0, ocean_grid_comm, ierr)
C$    if (mynode.gt.0) then
C$      call omp_set_num_threads (nthrds)
C$    endif

....
      return
      end
#else
      subroutine MPI_Setup_empty
      end
#endif    /* MPI */

User avatar
arango
Site Admin
Posts: 1114
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: Incorrect nesting of OpenMP directives: FLOAT_VWALK

#12 Post by arango » Sat Apr 28, 2012 4:44 am

Thanks Sasha. Indeed, I do remember seeing that structure in your code several years ago. I assume that changes to the structure are not that difficult because they are localized to couple of routines since ROMS has coarse grained parallelization. I would have to study and understand completely what are you doing.

I guess that nowadays all compiler support modern versions of the OpenMP. I have to admit that I haven't follow the evolution of the OpenMP standard that closely and compilers have different interpretations of it.

Post Reply