ROMS on Itanium with MPI and ifort

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
Paul_Budgell
Posts: 18
Joined: Wed Apr 23, 2003 1:34 pm
Location: IMR, Bergen, Norway

ROMS on Itanium with MPI and ifort

#1 Post by Paul_Budgell » Fri Aug 26, 2005 9:21 am

I have been having horrific problems trying to run ROMS 2.1 with MPI on an HP Itanium cluster and ifort 8.1. No matter what I try, I cannot run on more than one 4-processor node. Does anyone else in the ROMS community have experience with ROMS and MPI+Itanium+ifort?

The same model configuration works extremely well on an IBM Opteron cluster with both ifort 8.1 and PGI Fortran90, although there were a few hoops to jump through with ifort.

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

#2 Post by arango » Fri Aug 26, 2005 5:42 pm

I am not aware of such problems with the Itanium. However, we did discovered an odd behavior in the distributed-memory layer of ROMS when the tile partition is nonuniform accross the communicator group. This will generate access violations on MPI because the internal communication buffers are of different size. This has never happened to me because all my applications have a grid size that it is a power of 2 in both directions. I am very picky about this. I want balanced load in all my parallel applications.

I don't know if this is your problem but many users do not pay that much attention to the grid size. This problem is also in the released version of ROMS 2.2. However, there is a correction to this problem. See release notes:

viewtopic.php?t=198

Now, the internal buffers are dimensioned to the maximum size of all tiles. See parameter TileSize in mod_param.F.

I think that you need to update mod_param.F, inp_par.F and distribute.F to fix this potential problem for uneven tile partitions.

Good luck

Paul_Budgell
Posts: 18
Joined: Wed Apr 23, 2003 1:34 pm
Location: IMR, Bergen, Norway

#3 Post by Paul_Budgell » Tue Aug 30, 2005 3:11 pm

Thanks Hernan. I have now tried using the corrected (Aug. 11) version of ROMS 2.2 with the UPWELLING case and get the same problem. If I use Lm=41, Mm=80, N=16 (in mod_param.F), with NtileI=2, NtileJ=4 (in External/ocean_upw.in), everything works fine. But if I use Lm=252, Mm=296, N=30 (same as lanerolle of April 28 ), using NtileI, NtileJ as before, ROMS crashes with a segmentation fault after:

INITIAL: Configurating and initializing forward nonlinear model ...

I'm using mpiexec with mpich in a PBS queueing system. The same configuration works fine under OpenMP with 4 processors on a single node, and in serial mode, in both cases on the PBS batch queueing system.

The Linux operating system kernel is 2.4.21-20 and the compiler is ifort 8.1 . Previous experience with an HP Itanium suggests that such an old version of the kernel with a relatively new version of ifort may lead to problems, but I'm just guessing.

I've done all the obvious things like setting the stack size to unlimited and increasing the number of tiles (and processors, of course), but nothing seems to help.

The same MPI configuration works fine on an IBM Opteron cluster, IBM Regatta and SGI Origin 3800 systems.

Any suggestions?
:cry:

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

#4 Post by arango » Tue Aug 30, 2005 4:22 pm

If the problem is with ifort and the kernel there is not much that we can do. Perhaps, you can try other compilers to see if this is the case. You may try an older ifc compiler. An alternative is to install a working version of g95. ROMS works well with g95. I have a version that works very well on my laptop. We also had a version of g95 that worked on our cluster but we made the mistake of updating it and now it is broken. We have not been able to find one that works yet. The GNU software changes daily.

Good luck :wink:

Post Reply