Ocean Modeling Discussion

ROMS/TOMS

Search for:
It is currently Thu Oct 18, 2018 4:00 am




Post new topic Reply to topic  [ 1 post ] 

All times are UTC

Author Message
PostPosted: Thu Jul 24, 2014 7:47 pm 
Hi,
We have been working on optimizing ROMS benchmark application performance.
This was primarily targeting the Intel Xeon and Intel Xeon Phi architectures but there are some optimizations that should benefit all architectures.

Here is one.

Application and Workload used
Application: benchmark built from ROMS 3.7 svn version 709
Workload: https://www.myroms.org/svn/src/tags/rom ... chmark3.in.

From the profile of benchmark application on this workload, the function that took the longest time is lmd_skpp_tile. Though most of the loops in ROMS iterate fastest down a column which is cache efficient, there is one loop in lmd_skpp_tile where the innermost loop iterates over k (depth). This is the loop to compute the critical function (FC) for bulk Richardson number (BRN).

This loop needs to iterate over k (N(ng),1,-1) since it needs to find for each (i,j), the minimum depth at which the BRN achieves critical value. This can be made more cache efficient by using a vector, depth (depth(Istr,Iend)), instead of the scalar depth and interchanging the i and k loops.

Here is the original code:
Code:
!
!  Compute turbulent velocity scales for momentum (wm) and tracers (ws).
!  Then, compute critical function (FC) for bulk Richardson number.
!
        DO i=Istr,Iend
          FC(i,N(ng))=0.0_r8
          DO k=N(ng),1,-1
            depth=z_w(i,j,N(ng))-z_w(i,j,k-1)
            IF (Bflux(i,j,k-1).lt.0.0_r8) THEN
              sigma=MIN(sl_dpth(i,j),depth)
            ELSE
              sigma=depth
            END IF
            ...
         END DO
       END DO


Modified code
Code:
!
!  Compute turbulent velocity scales for momentum (wm) and tracers (ws).
!  Then, compute critical function (FC) for bulk Richardson number.
!
        DO k = 1, N(ng)
          DO i = Istr, Iend
            depth_vec(i, k) = z_w(i, j, N(ng))-z_w(i, j, k-1)
          END DO
        END DO
        DO k = N(ng),1,-1
          DO i = Istr, Iend
            sigma = depth_vec(i, k)
            IF ((Bflux(i, j, k-1) .LT. 0.0_r8) .AND.                    &   
     &          (sl_dpth(i, j) .LT. depth_vec(i, k))) THEN
              sigma = sl_dpth(i, j)
            END IF
            ...
          END DO
        END DO           


There were more modifications that were done specifically to aid vectorization in the above loop and other loops in benchmark which gave a performance boost close to 2.5x compared to the baseline.

Please write to one of us (emails given below), and we can share more details about the changes.

-indraneil.m.gokhale@intel.com
-gopal.bhaskaran@tcs.com


Top
  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group