Blowups depending on model decomposition with MPI

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
mcobas

Blowups depending on model decomposition with MPI

#1 Unread post by mcobas »

Hello,

I have always run the ROMS model in parallel using OpenMP, but recently I changed to MPI. To my surprise, depending on the number of processors and the model decomposition, I get blowups, and/or different results. With OpenMP, I had been using a 4x4 scheme. If I use the same scheme with MPI, I get a blowup. 5x4 seems to work ok, but if I use more processors, I get blowups again. The only scheme that worked with more than 5x4 processors is 1x35.

When I get blowups, it doesn't matter if I change the timestep or VISC2/VISC4/TNU2/TNU4 parameters, it keeps blowing up.

With the different configurations, normally there is no difference in the mean kynetic energy (up to the point when it blows up), as you can see in the figure attached (the blue and green lines overlap). But with some configurations using 40 processors (both 5x8 and 1x40) the mean kynetic energy increases, owing to a strong current near the southern boundary, that you can see in the plot. The configuration 1x35 worked, 1x40 didn't.

The only parameter that changed in all the configurations was NtileI and NtileJ, the rest didn't change (boundary conditions, forcings, timestep, theta_b,theta_s, grid,...). I am using ROMS version 3.3. The only thing I did to use MPI was to change the value of "USE_MPI ?= on" and unset "USE_OpenMP =?" in the makefile, and to set the compiler "FC := mpiifort" in "Compilers/Linux-ifort.mk". With this setup I got ROMS compiled and run with the 5x4 scheme, with the same results as with OpenMP.

Am I missing something? Am I doing anything wrong? Has anybody had issues like this? Any comment would be a great help.

Many thanks,

Marcos Cobas Garcia

Centro de Supercomputacion de Galicia
Attachments
Mean kynetic energy for several domain decompositions.
Mean kynetic energy for several domain decompositions.
1x35 decomposition, no strong current in southern boundary
1x35 decomposition, no strong current in southern boundary
1x40 decomposition, see strong current in southern boundary
1x40 decomposition, see strong current in southern boundary

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Blowups depending on model decomposition with MPI

#2 Unread post by kate »

Rather than running it until it blows up, I would run for say 5-10 steps, saving a history record each timestep. Do this for a good case and a bad case, then do ncdiff on the two outputs. The correct answer is that you get all zeroes, but clearly there is a bug somewhere.

What are your values of Lm and Mm? It's possible that we've only really tested the case where Lm/NtileI and Mm/NtileJ are integers.

mcobas

Re: Blowups depending on model decomposition with MPI

#3 Unread post by mcobas »

Hello Kate and everybody,

Thanks for your fast reply, Kate.

So far I had compared the solutions visually, and they seemed equal to me. But today I took the two configurations with 35 processors (5x7 and 1x35) and compared the temperature at every layer. The difference is relevant for the top layers, as you may see in the figures attached, from the very beguinning. The differences in the bottom layers are much smaller, up to 2 days before it blows up (the output is saved every 12 hours, approx.)

The values of Lm and Mm are Lm = 258 and Mm = 508. I picked some bad numbers. 508 is only divisible by 2,4,127 and 254. If I had chosen 500, or even 507, I would have more options. 258 is only divisible by 2,3,6, and much bigger numbers. I will try a decomposition in 6x4, which gives integer values of Lm/NtileI and Mm/NtileJ, as you suggest.

Thanks a lot,

Marcos Cobas Garcia

Centro de Supercomputacion de Galicia
Attachments
TemperatureDifferences10TopLayers.png
TemperatureDifferences20BottomLayers.png

jcwarner
Posts: 1171
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: Blowups depending on model decomposition with MPI

#4 Unread post by jcwarner »

one approach is to #undef all the extra stuff like uv_visc2 and ts_mixing and tides etc etc etc. Run the model for just a few time steps and save every time step. Then slowly turn these things back on. compare solutions. One note of caution- running the model w/ the coeefs for mix to =0 may not be the same as #undef that mixing and rebuilding. One would hope so, but i suggest that you undef these things and slowly add them back in.

mcobas

Re: Blowups depending on model decomposition with MPI

#5 Unread post by mcobas »

Hello,

Thanks for the replies.

The run for the 6x4 domain decomposition didn't work either. Besides, I noticed there are differences between the OpenMP 4x4 and MPI 1x35 runs. In temperature the maximum difference is over 2.5 degrees. The differences in temperature, salinity, zeta, u and v concentrate near the shelf break, mainly, and have a similar pattern.

I will try as jcwarner suggests, and keep posting.

Thanks a lot,

Marcos Cobas

Centro de Supercomputacion de Galicia
Attachments
Salinity differences between OpenMP 4x4 and MPI 1x35.
Salinity differences between OpenMP 4x4 and MPI 1x35.
Temperature differences between OpenMP 4x4 and MPI 1x35.
Temperature differences between OpenMP 4x4 and MPI 1x35.

User avatar
arango
Site Admin
Posts: 1347
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: Blowups depending on model decomposition with MPI

#6 Unread post by arango »

I don't know what the problem is here. It seems a problem with partition and a parallel bug. However, your choice of partitions are not optimal. I also think that this application is not that large to use that high number of processors. You will be penalized by excessive MPI communications. There is always an optimal number of partitions per application. I had mentioned this so many times in this forum. The fact that you have so many processor available does not means that you need to use all of them to run your application. I bet that your application will be more efficient with less number of processors.

My experience with distributed-memory is that I get the same behavior with MPICH1, MPICH2, and OpenMPI. In some cases you can get identical solutions which depends on compiler, compiler flags, and MPI libraries. We always use the MPI libraries compiled with the same version of the compiler as ROMS. Notice that in the distributed-memory exchanges, ROMS uses the lower level MPI communication routines. Nothing fancy.

As far as I know the ROMS code, as distributed, is free of distributed-memory (MPI) parallel bugs. If a parallel bug is found in application it is usually associated with the user customization of the code. The MPI paradigm is the easier one. However, shared-memory and serial with partitions is more difficult in ROMS. I sometimes forget this when coding. I am so used to MPI nowadays. All the adjoint-based algorithms only work in MPI.

Now, OpenMP is a protocol for shared-memory. I just fixed today a parallel bug for the biharmonic stress tensor in shared-memory and serial with partitions. See :arrow: ticket. You need to update. This will be a problem for you if you are using shared-memory, OpenMP.

I always recommend users to use the build.sh or build.bach script instead of modifying the makefile.

:idea: When configuring a large application, it is always a good idea to set your grid as a multiple of 2 or 3. I actually use powers of 2, the best... This allows a lot of choices for tile partition and tile balancing for efficiency. You cannot select the number of grid points capriciously in the parallel computing world.

pedrodanielcosta
Posts: 1
Joined: Sun Dec 07, 2008 7:57 pm
Location: MeteoGalicia

Re: Blowups depending on model decomposition with MPI

#7 Unread post by pedrodanielcosta »

Hi Marcos

We have now exactly the same problem, our model roms version is 511


Did you find the solution??
Attachments
FIG1.png
FIG1.png (23.79 KiB) Viewed 4545 times
FIG2.png
FIG2.png (24.05 KiB) Viewed 4545 times
FIG3.png
FIG3.png (16.19 KiB) Viewed 4545 times

Post Reply