# Ocean Modeling Discussion

ROMS/TOMS

Search for:
 It is currently Sun Feb 17, 2019 8:48 pm

 Page 1 of 1 [ 1 post ]
 All times are UTC
Author Message
 Post subject: Speeding up the BRN loop 2x timesPosted: Thu Jul 24, 2014 7:47 pm
Hi,
We have been working on optimizing ROMS benchmark application performance.
This was primarily targeting the Intel Xeon and Intel Xeon Phi architectures but there are some optimizations that should benefit all architectures.

Here is one.

Application: benchmark built from ROMS 3.7 svn version 709

From the profile of benchmark application on this workload, the function that took the longest time is lmd_skpp_tile. Though most of the loops in ROMS iterate fastest down a column which is cache efficient, there is one loop in lmd_skpp_tile where the innermost loop iterates over k (depth). This is the loop to compute the critical function (FC) for bulk Richardson number (BRN).

This loop needs to iterate over k (N(ng),1,-1) since it needs to find for each (i,j), the minimum depth at which the BRN achieves critical value. This can be made more cache efficient by using a vector, depth (depth(Istr,Iend)), instead of the scalar depth and interchanging the i and k loops.

Here is the original code:
Code:
!
!  Compute turbulent velocity scales for momentum (wm) and tracers (ws).
!  Then, compute critical function (FC) for bulk Richardson number.
!
DO i=Istr,Iend
FC(i,N(ng))=0.0_r8
DO k=N(ng),1,-1
depth=z_w(i,j,N(ng))-z_w(i,j,k-1)
IF (Bflux(i,j,k-1).lt.0.0_r8) THEN
sigma=MIN(sl_dpth(i,j),depth)
ELSE
sigma=depth
END IF
...
END DO
END DO

Modified code
Code:
!
!  Compute turbulent velocity scales for momentum (wm) and tracers (ws).
!  Then, compute critical function (FC) for bulk Richardson number.
!
DO k = 1, N(ng)
DO i = Istr, Iend
depth_vec(i, k) = z_w(i, j, N(ng))-z_w(i, j, k-1)
END DO
END DO
DO k = N(ng),1,-1
DO i = Istr, Iend
sigma = depth_vec(i, k)
IF ((Bflux(i, j, k-1) .LT. 0.0_r8) .AND.                    &
&          (sl_dpth(i, j) .LT. depth_vec(i, k))) THEN
sigma = sl_dpth(i, j)
END IF
...
END DO
END DO

There were more modifications that were done specifically to aid vectorization in the above loop and other loops in benchmark which gave a performance boost close to 2.5x compared to the baseline.

Please write to one of us (emails given below), and we can share more details about the changes.

-indraneil.m.gokhale@intel.com

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 1 [ 1 post ]

 All times are UTC

#### Who is online

Users browsing this forum: No registered users and 1 guest

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

Search for:
 Jump to:  Select a forum ------------------ News, Events & Job Opportunities    Meetings/Workshops    Job Opportunities    Ocean News ROMS/TOMS    ROMS Adjoint    ROMS Benchmarks    ROMS Bugs    ROMS Discussion    ROMS Documentation    ROMS Ecosystem    ROMS FAQ    ROMS Ice    ROMS Information    ROMS Installation    ROMS Known Problems    ROMS Messages    ROMS Problems    ROMS Releases    ROMS Results    ROMS Sediment    ROMS Source    ROMS Tips    ROMS Tools and Techniques    ROMS Trivia    ROMS Usage    ROMS Webinar    ROMS Wish List ROMS/TOMS Applications    User Applications    Adriatic Sea Ocean Modeling    Ocean Modeling FAQ