sorry if this is "obvious" stuff, but I've trawled the forums and am having a hard time putting together some pieces. If someone can answer more-or-less clearly a few things, it would help me tremendously.
Basic context: I'm a sysadmin setting up a cluster to run Roms 2.2 (possibly 3.0 at a later time). Cluster is two-way dual-core opteron, gig-ether interconnect, Rocks (Redhat/centOS) platform, using PGI compiler to build (Intel/ifort seemed too hellish after first attempts for MPI)
I've got a "successful" (ie, it works but performance stinks) build of Roms thus,
-PGI compiler suite (latest and greatest version)
-MPICH for MPI, default config/install - compiled with PGI "by hand"
"performance stinks" means that as we add more CPUs the overall runtime *increases*. Thus, a 4-cpu job (single node) takes 30minutes with a test data set; then the same data set on 8-CPU run takes approx 60 minutes, and then 16 CPU it takes about 80-90minutes. In all cases they run as straight MPI-only job though, launched in identical manner. (Brief review of output suggests that "Halo exchange" is punishing us with the MPI scale-up? and also that 2d analysis phase in particular is suffering .. ? but alas I'm not really familiar with this, being a "sysadmin-type", not a "modeller-type")
I'm curious,
* What is the "recommended MPI for best performance" with roms 2.2? Roms 3.0? (one posting I've seen suggests that LAM is much better than MPICH ; now it seems LAM is replaced by "OpenMPI" though? and I see no mention of it anywhere.. and also I gather PGI may have available an integrated "tuned" MPI of some kind too? (not just a MPICH rebuild?) )
* Is there any option (now?future?) for "hybrid" builds, ie, OpenMP for SMP operation withing single SMP cluster node, but MPI job spanning multiple nodes in cluster ? I've seen discussion in forum on this topic on-and-off but haven't exactly seen clear concensus..
* IF anyone feels so inclined, pointers or "specific build hints" for a given "recommended / working well" MPI/PGI built setup .. would certainly be **VERY** welcome also. For that matter, comments on possible benefits of migrating successfully from Roms 2.2->3.X would also not be unwelcome

(I've tried,for example to build my Roms using not just Mpich, but also tried LAM and OpenMPI. The OpenMPI build attempts have simply failed so far - odd link/library issues (?) - and my attempt to build with LAM has been semi-successful, in that I believe I have a binary now compiled which launches, but I'm not certain it actually works (more testing needed on calling / launching it properly; slight hassle since cluster uses "SSH-no-passwords-needed" not RSH which is default of LAM? ugh?)
If anyone actually recommends it, I can also do builds using ifort,not just PGI, but I gather that ifort is a bit messy to build MPI Roms .. (?)
And - of course - I will summarize and post back to this thread any findings / progress I have with this topic, in case it is of use / interest to others.
Many thanks,
--Tim Chipman