problems running ROMS on a x86-64 system.

Discussion on computers, ROMS installation and compiling

Moderators: arango, robertson

Post Reply
Message
Author
wcheng
Posts: 3
Joined: Tue Oct 10, 2006 7:36 pm
Location: UW/JISAO/PMEL

problems running ROMS on a x86-64 system.

#1 Unread post by wcheng »

Dear all,

I am porting the ROMS code (version is pretty close to the official 3.0, but slightly older)
to a X86-64 system. Compiling went fine. When I ran the executable (through using
mpirun -np 4 a.out, for example), it quit shortly after it is started. Here is the
end of my log file:

INITIAL: Configurating and initializing forward nonlinear model ...<-- up to here is standard
ROMS things
rank 1 in job 1 w162_41073 caused collective abort of all ranks
exit status of rank 1: killed by signal 9

The job would run fine if I use only one processor (this is not much useful information, but
I'd mention it anyway)

The same code also works fine on a 32-bit linux cluster with multiple processors.

Are there any special compiler options that need to be turned on for the
code to work on a 64-bit machine, or something else?

I'd appreciate any suggestions. Thanks for your time!

Wei

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

#2 Unread post by kate »

You don't say which computer, which compiler, which flags you are using, which submit script. Is this on midnight? If so, my submit script is in:
/wrkdir/kate/GOA_Tides/Model/runmpi

Oh, one thing to watch: Since I run tcsh and use ksh for scripting, I have to set my PrgEnv modules in both my .login and my .profile. The interactive session uses one and the scripts use the other.

wcheng
Posts: 3
Joined: Tue Oct 10, 2006 7:36 pm
Location: UW/JISAO/PMEL

#3 Unread post by wcheng »

Hi Kate,

Thanks for your quick reply! This is about running ROMS 3.0-beta version on
NOAA FSL's wjet machine, since they are retiring ijet.

The makefile has following setup:

MPI := on
MPIF90 := mpif90
LARGE := on
FORT ?= mpif90

Supposedly, 'LARGE' is turned on to activate 64-bit compilation. But
./Compiler/Linux-mpif90.mk (I assume this is the relevant makefile for using
mpif90 compiler) does not contain anything about if LARGE is defined.
This is perhaps why I am having the problem?

Wei

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

#4 Unread post by kate »

We discussed by phone:

Code: Select all

 MPIF90 := mpif90
LARGE := on
FORT ?= mpif90 
should be:

Code: Select all

 MPIF90 := on
LARGE := 
FORT ?= <real> 
The computer probably has 64-bit compiling on by default. mpif90 is the name of a wrapper script for mpich, not the name of the actual compiler. My guess is it's PGI since I had a similar trouble with PGI on midnight, dying on the very first MPI broadcast. The guys have since changed something on midnight so that PGI dies elsewhere in the run. I'm afraid I haven't investigated with two other compilers to torture.

wcheng
Posts: 3
Joined: Tue Oct 10, 2006 7:36 pm
Location: UW/JISAO/PMEL

#5 Unread post by wcheng »

Hi Kate,

BTW, here is the output from 'module list' on the FSL machine:

[wcheng@wfe1 Compilers]$ module list
Currently Loaded Modulefiles:
1) sge
2) ifort/9.1.036
3) icc/9.1.042
4) idb/9.1.042
5) mkl/8.1.1
6) netcdf/3.6.1_intel-9.1
7) ncarg/4.4.1_intel-9.1
8) intel/9.1
9) sms/2.9.0_ofed-1.2.5.1_mvapich2-1.0
10) totalview/8.0.0-2
11) wjet
12) mvapich/2-1.0.2p1_ofed-1.2.5.1-intel91

I think item 12) is about the compiler. Do you know what it means?

According to its web page, the computer uses Intel EM64T architure.

I sent questions to FSL consultants based on your suggestion. Haven't heard back from them yet.

Thanks again!
Wei

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

#6 Unread post by kate »

My "module list" shows:

Code: Select all

  1) voltairempi-S-1.pathcc   3) PrgEnv.path-3.1
  2) pathscale-3.1            4) totalview-8.4.0-0
Get them to install the new totalview - the 2-D graphics is fixed. Well, if it's broken on that system.

The voltaire thing has to do with the MPI communications. That's probably what your #12 is as well. Each compiler on our system has its own voltaire module, Voltaire being the switch manufacturer.

howard

Different problem running ROMS on a x86-64bit system

#7 Unread post by howard »

I am currently in the process of setting up ROMS2.2 on a 64-bit LINUX machine.

I was previously running ROMS2.2 on a PC using Cygwin.

I am able to get ROMS2.2 to compile using g95 with the NETCDF 3.6.2 distribution. However, when I run it - I get the following errors:

Code: Select all

...
  WEST_FSCHAPMAN     Western edge, free-surface, Chapman condition.
  WEST_M2FLATHER     Western edge, 2D momentum, Flather condition.
  WEST_M3RADIATION   Western edge, 3D momentum, radiation condition.
  WEST_TRADIATION    Western edge, tracers, radiation condition.
  ZCLIMATOLOGY       Processing sea surface height climatology data.

 INITIAL: Configurating and initializing forward nonlinear model ...


 OPENCDF - error while reading dimension ID:  11   in input NetCDF file: grid_mbay_dm1_2km_edsm2.nc

 ROMS/TOMS - IO error ................ exit_flag:   4


 Elapsed CPU time (seconds):

 Thread #  0 CPU:       0.110
 Total:                 0.110

 Nonlinear model elapsed time profile:

                                              Total:         0.000    0.0000

 ROMS/TOMS - number of time records written in history  file, Grid 01, 00000000
             number of time records written in restart  file, Grid 01, 00000001

 ERROR: OPENCDF - Can not open NetCDF file.
 REASON: NetCDF: Invalid dimension ID or name  ]
I am not sure what the problem is. I assume it might be something to do with the netcdf distribution I downloaded. I am using a grid and model setup that I have used on my windows machine using Cygwin. The grid file is fine and I can open it in Matlab on this machine.

I downloaded netcdf-3.6.2.tar.gz - Binary distribution of netcdf-3.6.2 on linux_2.6-x86_64.
from http://www.unidata.ucar.edu/downloads/n ... /index.jsp

Do I need to recompile the NetCDF libaries? or is there another reason the model can't open my NetCDF files correctly?

Thanks
Susan

User avatar
arango
Site Admin
Posts: 1351
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

#8 Unread post by arango »

Why are you running ROMS 2.2? I cannot foresee a reason to run a very old version (May 2005) of the code. One of the attractions of ROMS is that it not stagnant. It is continuously evolving and improving. A lot have changed in three-years. Many bugs have been corrected since them.

We get messages like this quite frequently. Perhaps, we need to suppress access to older versions of the code and only give access to the latest version of the code. Old problems appear again and it is confusing for others and frustrating for us.

howard

#9 Unread post by howard »

My intention is to use the latest version of ROMS (3.1) on this new machine. But my experience is with PCs and ROMS2.2. So, i was trying to do an incremental switch over. First get something I know how to run up and running on my 64bit Linux machine, and then move to the new version of ROMS. And learn as I go.

If all this can be avoided but using a later version of ROMS, then that is great. That is what I need to know.

But, there are also issues with collaborations with other people we need to consider. There are people we work with that use modified forms of the ice shelf in ROMS2.2. And we need to be able to implement them in ROMS3.1 before we discontinue use of 2.2. So, we can't totally drop out of 2.2 yet, but we are trying to.

Susan

Post Reply