Problem with model start

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
susonic
Posts: 160
Joined: Tue Aug 21, 2007 5:44 pm
Location: Jeju National University
Contact:

Problem with model start

#1 Post by susonic » Wed Jul 20, 2011 1:02 pm

Dear ROMS users,

I encountered this problem only with specific case.

I ran same domain with three different dimension in new pc, Quad core, 16GB RAM, SUSE 11.2, 3.6.3 netcdf, PGI fortran, mpich2-1.2.1, svn 564.

case 1 dimension 88*96*20
case 2 dimension 177*194*20
case 3 dimension 292*332*20

The problem occurs only in case 2. ROMS doesn't start showing memory error message.
At first, I thought odd number in case 2 is a problem.
But three cases run in my other pc cluster without any problem.

I tried to run the model without writing (no rst, his, avg file).
If I don't write anything to my rst, his and avg file, model runs.
Therefore, the possible cause might be related with writing nc file.

I tried Serial, openmp and mpirun but none of them worked. Error messages are below.

in a serial and openmp run
STEP Day HH:MM:SS KINETIC_ENRG POTEN_ENRG TOTAL_ENRG NET_VOLUME
C => (i,j,k) Cu Cv Cw Max Speed

0 0 00:00:00 2.777112E-02 1.172568E+04 1.172571E+04 5.212383E+14
(153,001,16) 6.828001E-02 5.502242E-03 0.000000E+00 1.977955E+00
DEF_HIS - creating history file: output/his_ecsy12_td_2011_kkl_v2_tpx6_dye_novol_a4_wt1_0001.nc
0: ALLOCATE: 18446744073709551615 bytes requested; not enough memory
in mpirun
0 0 00:00:00 2.777112E-02 1.172568E+04 1.172571E+04 5.212383E+14
(153,001,16) 6.828001E-02 5.502242E-03 0.000000E+00 1.977955E+00
DEF_HIS - creating history file: output/his_ecsy12_td_2011_kkl_v2_tpx6_dye_novol_a4_wt1_0001.nc
rank 0 in job 3 pang4_49117 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
Any suggestion or idea would be great.

-JH

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Problem with model start

#2 Post by kate » Wed Jul 20, 2011 8:03 pm

Have you tried compiling with array bounds checking turned on? I'd run in a debugger to see exactly where things go bad.

User avatar
susonic
Posts: 160
Joined: Tue Aug 21, 2007 5:44 pm
Location: Jeju National University
Contact:

Re: Problem with model start

#3 Post by susonic » Thu Jul 21, 2011 1:18 am

Hi Kate,

Thank you for your reply. With debugger, I got below message.
Model Input Parameters: ROMS/TOMS version 3.5
Thursday - July 21, 2011 - 10:13:41 AM
-----------------------------------------------------------------------------
0: Subscript out of range for array s%files (inp_par.f90: 3063)
subscript=1, lower bound=1, upper bound=0, dimension=1

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Problem with model start

#4 Post by kate » Thu Jul 21, 2011 5:20 pm

OK, so what exactly is at line 3063 in your version of inp_par.f90? Remember,that's the output of the cpp operation and will be different depending on your options. Is it in routine load_s1d or load_s2d? Who called it? Why is load set to .FALSE.? What line of input is it responding to?

User avatar
susonic
Posts: 160
Joined: Tue Aug 21, 2007 5:44 pm
Location: Jeju National University
Contact:

Re: Problem with model start

#5 Post by susonic » Fri Jul 22, 2011 1:26 pm

Thank you for your reply, Kate.
3007 !-----------------------------------------------------------------------
3008 ! Load I/O information into structure.
3009 !-----------------------------------------------------------------------
In inp_par.f90. There is Load I/O section.


3044 ! Initialize and load fields into structure.
3045 !
3046 k=0
3047 DO ng=1,Ngrids
3048 DO i=1,idim
3049 S(i,ng)%Nfiles=Nfiles(i,ng) ! number of multi-files
3050 S(i,ng)%Fcount=1 ! multi-file counter
3051 S(i,ng)%Rindex=0 ! time index
3052 S(i,ng)%ncid=-1 ! closed NetCDF state
3053 S(i,ng)%Vid=-1 ! NetCDF variables IDs
3054 S(i,ng)%Tid=-1 ! NetCDF tracers IDs
3055 DO j=1,Nfiles(i,ng)
3056 k=k+1
3057 S(i,ng)%files(j)=TRIM(Fname(k)) ! load multi-files
3058 S(i,ng)%Nrec(j)=0 ! record counter
3059 S(i,ng)%time_min(j)=0.0_r8 ! starting time
3060 S(i,ng)%time_max(j)=0.0_r8 ! ending time
3061 END DO
3062 S(i,ng)%label=TRIM(label) ! structure label
3063 S(i,ng)%name=TRIM(S(i,ng)%files(1)) ! load first file
3064 lstr=LEN_TRIM(S(i,ng)%name)
3065 S(i,ng)%base=S(i,ng)%name(1:lstr-3) ! do not include ".nc"
3066 END DO
At the line of 3063 which the error ocurred, ROMS is trying to load first file.
But it failed to load it for some reason. What do you recommend me to do next?
Sorry, I'm so naive with this type of error.

-JH

User avatar
kate
Posts: 3780
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Problem with model start

#6 Post by kate » Fri Jul 22, 2011 4:59 pm

You are in routine load_s2d, which is only called for the forcing files. You need to set a breakpoint at the top of this routine and watch how it's running, watch the value of the load variable. Above that chunk of code you show, there's an allocate on a bunch of fields in the S structure if load is .TRUE..

Maybe you need to double check what you have in the input file for the forcing filenames.

User avatar
susonic
Posts: 160
Joined: Tue Aug 21, 2007 5:44 pm
Location: Jeju National University
Contact:

Re: Problem with model start

#7 Post by susonic » Fri Jul 22, 2011 5:17 pm

Thank you for your reply, Kate.

Yes, you are exatly right about what you said. The problem occurred what I put the number of forcingfile did not match what really exist. My mistake.
Now I've learned more about how to find a solution from error.

I appreciate your instruction Kate. Thank you very much.

Regards,

-JH

Post Reply