ROMS code modernization

Suggest improvements/optimizations to the ROMS code.

Moderators: arango, robertson

Post Reply
Message
Author
gnayar

ROMS code modernization

#1 Post by gnayar » Fri Jun 19, 2015 5:30 pm

Hi,
We have been working on several optimizations to the ROMS code base primarily for performance. The changes that were done to make use of some of the newer MPI-3 interfaces like neighborhood collectives have also made the code much cleaner.

The changes in no particular order are:
  1. Alighment and Padding to make efficient use of the vector units. These changes are mostly in the functions that were the hotspots in the benchmark application - step2d, lmd_skpp, step3d_uv, rhs3d and pre_step3d.
  2. Loop transformations for improving cache performance.
  3. Cartesian topology: Use of cartesian communicator for neighbor exchanges in mp_exchanges
  4. Derived Data Types: Use of MPI derived data types help to avoid explicit packing/unpacking in mp_exchange code and makes the code more efficient and simpler.
  5. Neighborhood Collectives: Use of MPI-3 neighborhood collectives replaces the four sends and recvs in mp_exchange with one all_to_all call making the code simpler and more efficient.
I would like to know what is the best way for making these changes widely available to the community.

-gopal

User avatar
jivica
Posts: 122
Joined: Mon May 05, 2003 2:41 pm
Location: The University of Western Australia, Perth, Australia

Re: ROMS code modernization

#2 Post by jivica » Mon Jun 22, 2015 4:38 am

Should be easy if you are using git.
What speedup you got when compared to "standard" version?
I am interested into making benchmark for my application on my cluster (using intel compilers + openmpi 1.8.6)

Cheers,
Ivica

Post Reply