Performance Analysis and Optimization of the Regional Ocean Model System (ROMS)

Yue Zuo, Xingfu Wu, and Valerie Taylor
Department of Computer Science, Texas A&M University


In this poster, we focus on how to optimize the communication time of ROMS code so that it can be efficiently executed on a grid environment: Teragrid (www.teragrid.org). The basic strategy we exploit to improve the communication efficiency is to combine the multiple communications and to overlap the communication with the computation as much as possible. The communication kernel, MP_Exchange is rewritten, and several new communication modules are added. To demonstrate the advantages of making this change to the communication kernel, we focus on the “step2d” function, which dominates execution time of the ROMS code. Experiments are conducted on Teragrid resources at NCSA, UC and CALTECH site with different number of processors and problem sizes. ROMS is configured and built using the BENCHMARK configuration. The overall execution time of 2D engine is improved up to 50% depends on the network latency and the problem sizes.

Figure 1 provides the performance of step2d on NCSA and CALTECH for the problem size of 1024x128; Figure 2 provides the performance of Step2d on NCS and UC. Each site has the equal number of processors. For example, 8 processors on NCSA and CALTECH implies 4 processors on NCSA and 4 processors on CALTECH. In the legend of each figure, Comm represents the communication time; Comp represents the computation time. It is noted that the communication latency on NCSA and CALTECH is much higher than that on NCSA and UC. Figure 1 indicates that the communication time increase significantly with the increase of the number of processors, and the communication time is becoming much larger than the computation time. The optimized Step2d results in up to 50% performance improvement. Figure 2 shows a good communication performance improvement on NCSA and UC.

Figure 1. Performance of Step2d on NCSA&CALTECH


Figure 2. Performance of Step2d on NCSA&UC


Future work will apply the new communication kernel to the whole ROMS code, and explore the efficient data partitioning and load balancing issues of the ROMS on distributed Grid environment: TeraGrid.