Intel Core 2 Duo processor
- 
				DTokarev
Intel Core 2 Duo processor
Hi  All,
I have intel Core 2 Duo 3.6Gh processor (two processors) with Fedora Core 5 and g95 compiller.
But ROMS use only one processor.
How use two processors?
I'll appreciate for any help.
			
			
									
									
						I have intel Core 2 Duo 3.6Gh processor (two processors) with Fedora Core 5 and g95 compiller.
But ROMS use only one processor.
How use two processors?
I'll appreciate for any help.
ROMS on Dual Core
MPI is useless in this type of configuration, predominantly because of slow and shared
memory bus, see
http://www.atmos.ucla.edu/~alex/ROMS/po ... isited.pdf
On the other hand, OpenMP works remarkably well and I can get perfect scaling,
and overall excellent performance: my CoreDuo 1.83GHz laptop yields about 80%
performance of a dual-Opteron machine (with single cores).
Your problem comes from GCC g95 compiler which is not OpenMP capable.
Get the latest Intel compiler and enable SSE3 instruction set (the latest Intel processors,
including Pentium 4 Prescott core, Pentium D and Mobile Core Duo all support that,
but earlier Northwoods and AMD Opterons do not.)
OMP_FLAG = -fpp2 -openmp
CFT = ifort $(OMP_FLAG) -pc80 -tpp7 -axP -xP -msse3 -align dcommon -auto -stack_temp
FFLAGS = -O3 -IPF_fma -ip
You will be impressed.
			
			
									
									
						memory bus, see
http://www.atmos.ucla.edu/~alex/ROMS/po ... isited.pdf
On the other hand, OpenMP works remarkably well and I can get perfect scaling,
and overall excellent performance: my CoreDuo 1.83GHz laptop yields about 80%
performance of a dual-Opteron machine (with single cores).
Your problem comes from GCC g95 compiler which is not OpenMP capable.
Get the latest Intel compiler and enable SSE3 instruction set (the latest Intel processors,
including Pentium 4 Prescott core, Pentium D and Mobile Core Duo all support that,
but earlier Northwoods and AMD Opterons do not.)
OMP_FLAG = -fpp2 -openmp
CFT = ifort $(OMP_FLAG) -pc80 -tpp7 -axP -xP -msse3 -align dcommon -auto -stack_temp
FFLAGS = -O3 -IPF_fma -ip
You will be impressed.
Re: ROMS on Dual Core
Sasha,shchepet wrote: OpenMP works remarkably well and I can get perfect scaling,
and overall excellent performance: my CoreDuo 1.83GHz laptop yields about 80%
performance of a dual-Opteron machine (with single cores).
Just curious -- are you running Windows or Linux on this laptop? And would you expect OS-dependent performance differences for ROMS runs? I know that the Intel Fortran compiler for Linux used to be free for academic use, but the Intel Fortran compiler for Windows was not. Is this still the case?
Thanks,
-Rich
Regarding MPI on the Mac, I got this note from Enrique
BTW, the new version of gfortran for the Mac available at http://hpc.sourceforge.net does compile ROMS all the way to roms-3.0 just fine. Previous versions had a compiler bug that would not allow roms-3.0 to compile right.
			
			
									
									
						So some people *can* do it. I have only tried briefly on my laptop, but since I don't really do production runs on that, I am pleased just to get the thing going.Rob,
I can't seem to be able to log on to the forum.
Anyway, I have ROMS compiled on my new Mac Quad PRO using
mpich2 and g95.
e.
____________________________
Enrique Curchitser
Institute of Marine and Coastal Sci.
71 Dudley Rd
New Brunswick, NJ 08901
email: enrique@marine.rutgers.edu
Tel.: +1 (732) 932-7889
BTW, the new version of gfortran for the Mac available at http://hpc.sourceforge.net does compile ROMS all the way to roms-3.0 just fine. Previous versions had a compiler bug that would not allow roms-3.0 to compile right.
reply to Rich Signell
I am running both WindowsXP  (whatever come with the Laptop made by Dell) and Linux
Mandriva 2007 with 2.6.17-8mdv kernel. This kernel is SMP cabable: with the release of
2007 operating system Mandrake (now Mandriva) makes ints "normal" kernels be SMP,
thus abolishing the distinction between SMP and non-SMP kernels.
When the machine boots, I have to chose the operating system.
Under Linux everything is straightforward: I use 9.1.034 Intel compiler.
Some time in the past I tried to make Linux version of Intel compiler work under Cygwin
under Windows operating system, but failed. It turns out that Cygwin is mostly emulator
and does not contain any equivalent of glibcs (like normal Linux does). Linux version of
intel compiler relies on glibcs. I do not know what is the modern state of Cygwin.
Ultimately, because after gaining sufficient experience with Linux (between 50 and 100
installs, including laptops with weird wireless chips and exotic videocards) I lost interest
in Cygwin.
I presume that there is system dependency of performance, since I easily observe
up to 20% performance difference when changing kernel versions running with the same
distribution of Linux.
			
			
									
									
						Mandriva 2007 with 2.6.17-8mdv kernel. This kernel is SMP cabable: with the release of
2007 operating system Mandrake (now Mandriva) makes ints "normal" kernels be SMP,
thus abolishing the distinction between SMP and non-SMP kernels.
When the machine boots, I have to chose the operating system.
Under Linux everything is straightforward: I use 9.1.034 Intel compiler.
Some time in the past I tried to make Linux version of Intel compiler work under Cygwin
under Windows operating system, but failed. It turns out that Cygwin is mostly emulator
and does not contain any equivalent of glibcs (like normal Linux does). Linux version of
intel compiler relies on glibcs. I do not know what is the modern state of Cygwin.
Ultimately, because after gaining sufficient experience with Linux (between 50 and 100
installs, including laptops with weird wireless chips and exotic videocards) I lost interest
in Cygwin.
I presume that there is system dependency of performance, since I easily observe
up to 20% performance difference when changing kernel versions running with the same
distribution of Linux.
Re: ROMS on Dual Core
Unfortunately, the tangent linear and adjoint are not OpenMP compatible; hence, my requirement for MPI (and I sure don't want to run in serial!)...shchepet wrote:MPI is useless in this type of configuration, predominantly because of slow and shared
memory bus...
- 
				lefevre
cluster made with dual-core ...
Hi,
Concerning the dual-core issue, we are running Roms /Agrif and WRF on dual-Opteron 246 (mono-core) Cluster with Tyan K8W motherboard for 2 years and we are planing to upgrade our cpu with Opteron 285 (dual-core capable).
But in order to have the best performance on this cluster and as Sasha shows, we must compile Roms in hybrid mode : openMP + MPI.
I would like to hear about succesfull stories about running ROMS in hybrid mode ?
Many thanks.
Jerome Lefevre & Patrick Marchesiello
IRD, New Caledonia
Cluster spec :
tyan S2882, dual-opteron 246
10 nodes with 25 Gb (total RAM)
Swith gigabit 3COM
OS : Fedora Core 3 + Oscar 4.2
Compiler : Intel 9.1
MPI : LAM, MPICH, OpenMPI, Intel MPI 3.0
Job : Torque 2.1 + Maui
			
			
									
									
						Concerning the dual-core issue, we are running Roms /Agrif and WRF on dual-Opteron 246 (mono-core) Cluster with Tyan K8W motherboard for 2 years and we are planing to upgrade our cpu with Opteron 285 (dual-core capable).
But in order to have the best performance on this cluster and as Sasha shows, we must compile Roms in hybrid mode : openMP + MPI.
I would like to hear about succesfull stories about running ROMS in hybrid mode ?
Many thanks.
Jerome Lefevre & Patrick Marchesiello
IRD, New Caledonia
Cluster spec :
tyan S2882, dual-opteron 246
10 nodes with 25 Gb (total RAM)
Swith gigabit 3COM
OS : Fedora Core 3 + Oscar 4.2
Compiler : Intel 9.1
MPI : LAM, MPICH, OpenMPI, Intel MPI 3.0
Job : Torque 2.1 + Maui
- 
				RubenDiez-Lazaro
we are actually attempt to run ROMS in a cluster (setting by OSCAR) formed by 11 nodes with two x86_64 dual core (4 virtual procesors per node).
ROMS_Agrif compiles successfuly with gfortran compiler (using the mpif90 wrapper) with OpenMP and MPI (OpenMPI flavor) support, bud there are some problems at runing time that already we are studying (may be a scheduler problem)...
We have a modified jobcom script in order to compile it...
Please contact with us if somebody want more info...
best regards
			
			
									
									
						ROMS_Agrif compiles successfuly with gfortran compiler (using the mpif90 wrapper) with OpenMP and MPI (OpenMPI flavor) support, bud there are some problems at runing time that already we are studying (may be a scheduler problem)...
We have a modified jobcom script in order to compile it...
Please contact with us if somebody want more info...
best regards
MPI on Intel MacBook Pro
We recently managed to get ROMS 2.2 and BioToys case study running on the Intel MacBook Pro using mpich2 and g95 too. Here's how to configure it. 
ROMS 2.2 / BioToys setup on Intel MacBook Pro
· Install XCode Utilities from the System DVDs.
· Install Fink 0.8.1-Intel from fink.sourceforge.net
· G95 is not available in binary, so had to set fink to use rsync.fink install g95
· apt-get install netcdf
· follow instructions at http://www.myroms.org/users/forum/viewt ... hlight=g95
· This runs ROMS in single-thread mode. In order to run in parallel on both processors, follow these instructions.
· download MPICH2 source code
· configure it with
· CC=gcc CXX=g++ FC=g95 F90=g95 ./configure; make; make install
· Follow the README instructions to getting mpd running.
· In the top-level ROMS makefile, set MPI to on, and set mpif90 as your compiler
· In the Compilers directory, make a copy of Linux-mpif90.mk and call it Darwin-mpif90.mk
· Edit Darwin-mpif90.mk and change the NETCDF_* variables to point to /sw.
			
			
									
									
						ROMS 2.2 / BioToys setup on Intel MacBook Pro
· Install XCode Utilities from the System DVDs.
· Install Fink 0.8.1-Intel from fink.sourceforge.net
· G95 is not available in binary, so had to set fink to use rsync.fink install g95
· apt-get install netcdf
· follow instructions at http://www.myroms.org/users/forum/viewt ... hlight=g95
· This runs ROMS in single-thread mode. In order to run in parallel on both processors, follow these instructions.
· download MPICH2 source code
· configure it with
· CC=gcc CXX=g++ FC=g95 F90=g95 ./configure; make; make install
· Follow the README instructions to getting mpd running.
· In the top-level ROMS makefile, set MPI to on, and set mpif90 as your compiler
· In the Compilers directory, make a copy of Linux-mpif90.mk and call it Darwin-mpif90.mk
· Edit Darwin-mpif90.mk and change the NETCDF_* variables to point to /sw.
Reply to Lefevre
We have several Tyan Thunder K8W (S2885) in our lab.
Although frequently advertized as "DUAL CORE READY", this board MAY or MAY NOT
accept dual-core CPUs, depending on revision number. So before you decide to upgrade
CPUs get your flashlight and make sure that you can read revision number
04MOAb
in the lower left corner of the board, see
http://www.tyan.com.tw/support/assets/i ... _revID.jpg
and
http://www.tyan.com.tw/support/html/cpu ... teron.html
Also make sure to upgrade your BIOS to the latest available. This is necessary too.
			
			
									
									
						Although frequently advertized as "DUAL CORE READY", this board MAY or MAY NOT
accept dual-core CPUs, depending on revision number. So before you decide to upgrade
CPUs get your flashlight and make sure that you can read revision number
04MOAb
in the lower left corner of the board, see
http://www.tyan.com.tw/support/assets/i ... _revID.jpg
and
http://www.tyan.com.tw/support/html/cpu ... teron.html
Also make sure to upgrade your BIOS to the latest available. This is necessary too.
- 
				lefevre
Hi, 
I know i have 5 nodes with revision 04MOAb, and 5 others with revision MOA_A. In a first time, i would like to upgrade the 04MOAb box.
In the past, I have found some post about MOA_A and this is not clear if this MB accept or not dual core. Some guy says yes, but the manufacturer no. Later, I will look for other experience about MOA_A and dual core (forum from HPC, RessourceCluster, LAM, OSCAR...).
So, as RubenDiez's experience, we can compile in hybrid mode ROMS/Agrif but the runtime failed. Perhaps the scheduling, i don't know?. Yes i appreciate to get more info about your issue, in this forum ?
Sasha, do you have dual core inside your K8W box or not ? Do you try (or plan) to run ROMS or WRF in hybrid mode on your cluster ?
On this AMD box, I have some strange behaviour with LAM and Mpich about cpu binding : the processes migrate from cpu1 to cpu2 in a ping-pong manner. But with openMPI or MPI 3.0 from Intel, this trouble disappear and the performance are better. And you ?
Many thanks. Best regards,
Jerome
			
			
									
									
						I know i have 5 nodes with revision 04MOAb, and 5 others with revision MOA_A. In a first time, i would like to upgrade the 04MOAb box.
In the past, I have found some post about MOA_A and this is not clear if this MB accept or not dual core. Some guy says yes, but the manufacturer no. Later, I will look for other experience about MOA_A and dual core (forum from HPC, RessourceCluster, LAM, OSCAR...).
So, as RubenDiez's experience, we can compile in hybrid mode ROMS/Agrif but the runtime failed. Perhaps the scheduling, i don't know?. Yes i appreciate to get more info about your issue, in this forum ?
Sasha, do you have dual core inside your K8W box or not ? Do you try (or plan) to run ROMS or WRF in hybrid mode on your cluster ?
On this AMD box, I have some strange behaviour with LAM and Mpich about cpu binding : the processes migrate from cpu1 to cpu2 in a ping-pong manner. But with openMPI or MPI 3.0 from Intel, this trouble disappear and the performance are better. And you ?
Many thanks. Best regards,
Jerome
- 
				RubenDiez-Lazaro
About the comment by "lefevre" when he say "[...] i appreciate to get more info about your issue, in this forum [...]",  here are our lasts experiences with ROMS (roms_agrif flavor) working in  hybrid mode (OpenMP AND MPI). 
Like we have said, we compiled the code in hybrid mode, but there are problems at run time. These problems are mainly this two:
a) At run time, ROMS hangs(never ends): it not makes any output, and not writes any file (excepts the header part of history file).
b) At run time, roms make a memory violation.
We make many test, changing the compiler and MPI implementation, don't use optimizations flags, compiling only with MPI, only with OpenMP, both, and serial...
In serial and OpenMP only mode, the code runs OK, but in MPI only and hybrid mode, always get one of the errors listed...
We mail to Pierrick Penven and Patrick Marchesiello, from roms_agrif project, and both tell that "roms_agrif is not yet ready for hybrid (openmp and mpi) parallelisation" and "that the hybrid mode is not operational yet"....
Surprised by this, we decide to center ours efforts in the MPI only mode...
Finally we found that
a') the error of type a) was produced by some the values of the parameters NSUB_X, NSUB_E, NPP (in param.h file)... Values of NSUB_X=1; NSUB_E=1; NPP=1 makes the programm runs OK. Any others values, make the program hangs.
b') the error type b) was produced by runs the program in hybrid mode, but some times, mistake a') manifests first...
The next questions we must answer (perhaps anyone can help) are:
1)Does the Rutgers or Ucla versions of ROMS runs in hybrid mode?
2)What exactly the NSUB_X, NSUB_E, NPP parameters means? The NP_XI and NP_ETA tell about the subdomains in parallel (DISTRIBUTED memory, using MPI)mode, but the code comments says that the NSUB_X, NSUB_E are parameters related to SHARED memory (and we are in MPI only mode....).
Best regards
			
			
									
									
						Like we have said, we compiled the code in hybrid mode, but there are problems at run time. These problems are mainly this two:
a) At run time, ROMS hangs(never ends): it not makes any output, and not writes any file (excepts the header part of history file).
b) At run time, roms make a memory violation.
We make many test, changing the compiler and MPI implementation, don't use optimizations flags, compiling only with MPI, only with OpenMP, both, and serial...
In serial and OpenMP only mode, the code runs OK, but in MPI only and hybrid mode, always get one of the errors listed...
We mail to Pierrick Penven and Patrick Marchesiello, from roms_agrif project, and both tell that "roms_agrif is not yet ready for hybrid (openmp and mpi) parallelisation" and "that the hybrid mode is not operational yet"....
Surprised by this, we decide to center ours efforts in the MPI only mode...
Finally we found that
a') the error of type a) was produced by some the values of the parameters NSUB_X, NSUB_E, NPP (in param.h file)... Values of NSUB_X=1; NSUB_E=1; NPP=1 makes the programm runs OK. Any others values, make the program hangs.
b') the error type b) was produced by runs the program in hybrid mode, but some times, mistake a') manifests first...
The next questions we must answer (perhaps anyone can help) are:
1)Does the Rutgers or Ucla versions of ROMS runs in hybrid mode?
2)What exactly the NSUB_X, NSUB_E, NPP parameters means? The NP_XI and NP_ETA tell about the subdomains in parallel (DISTRIBUTED memory, using MPI)mode, but the code comments says that the NSUB_X, NSUB_E are parameters related to SHARED memory (and we are in MPI only mode....).
Best regards
- 
				lefevre
Re: Intel Core 2 Duo processor
The same guy, three years later ... 
About Mpi+openMP :
Like RubenDiez-Lazaro, I have left the openMP+Mpi idea to focalize with MPI through our cluster shipped with ten 2way+2cores nodes. But, I have done some test, when i was young, and I remember that my issue when trying openMPI+openMP were very similar with those described from RubenDiez-Lazaro. In addition, I remember that the big deal was with the MPI_THREAD_MULTIPLE support in MPI-2, which was not or partially implemented in openMPI at this date (2007).
Since then, maybe someone have any clue with MPI+openMP using roms-agrif or roms Rutgers?
About heavy multicore
For now, after buying a new compute node, I'm experiencing a multicore server, a 4-way supermicro shipped with four opteron 5134 8-cores (total = 32 cores). I'm doing some check with ifort11+openMPI 1.3.3, but get some poor performance with 32 mpi tasks running on the 32-cores node versus 32 mpi tasks equaly distributed accross 8 compute nodes with 4-cores each. Maybe explained with the cache coherency and bus oversharing. Using Memory sharing with openMP+ifort11, I got some poor performance too. So, i'm looking for some clue with ROMS running on multicore hardware. From http://developer.amd.com/documentation/ ... pen64.aspx, the open64 compiler from AMD seems to improve openMP enabled code using Operon multicore hardware. I will try that.
About Tyan Thunder K8W (S2885) and compatibility issue with Dual-core (Sasha)
Finally like the 04MOAb box, my old MOA_A box are happy with the 285 opteron, so don't waste your old motherboard, mine are running smoothly since 5 years, after 2 successive upgrades : from opteron 246 to 285 and from ethernet network to Infiniband.
Many thanks
Jerôme Lefevre
Noumea, New caledonia
			
			
									
									
						
About Mpi+openMP :
Like RubenDiez-Lazaro, I have left the openMP+Mpi idea to focalize with MPI through our cluster shipped with ten 2way+2cores nodes. But, I have done some test, when i was young, and I remember that my issue when trying openMPI+openMP were very similar with those described from RubenDiez-Lazaro. In addition, I remember that the big deal was with the MPI_THREAD_MULTIPLE support in MPI-2, which was not or partially implemented in openMPI at this date (2007).
Since then, maybe someone have any clue with MPI+openMP using roms-agrif or roms Rutgers?
About heavy multicore
For now, after buying a new compute node, I'm experiencing a multicore server, a 4-way supermicro shipped with four opteron 5134 8-cores (total = 32 cores). I'm doing some check with ifort11+openMPI 1.3.3, but get some poor performance with 32 mpi tasks running on the 32-cores node versus 32 mpi tasks equaly distributed accross 8 compute nodes with 4-cores each. Maybe explained with the cache coherency and bus oversharing. Using Memory sharing with openMP+ifort11, I got some poor performance too. So, i'm looking for some clue with ROMS running on multicore hardware. From http://developer.amd.com/documentation/ ... pen64.aspx, the open64 compiler from AMD seems to improve openMP enabled code using Operon multicore hardware. I will try that.
About Tyan Thunder K8W (S2885) and compatibility issue with Dual-core (Sasha)
Finally like the 04MOAb box, my old MOA_A box are happy with the 285 opteron, so don't waste your old motherboard, mine are running smoothly since 5 years, after 2 successive upgrades : from opteron 246 to 285 and from ethernet network to Infiniband.
Many thanks
Jerôme Lefevre
Noumea, New caledonia

