Ocean Modeling Discussion


Search for:
It is currently Sat Sep 21, 2019 7:48 pm

Post new topic Reply to topic  [ 4 posts ] 

All times are UTC

Author Message
PostPosted: Mon Jun 27, 2005 8:36 pm 

Joined: Mon Apr 28, 2003 5:12 pm
Posts: 157
Location: NOAA
Further to my earlier email, I have found that when using ifort to compile and run ROMS/TOMS 2.1 there appears to be an array size limitation. For example, if I were to compile and run the UPWELLING model problem, it runs successfully for the default arrays sizes of Lm=41, Mm=80, N=16 (in mod_param.F). If however, I increase these parameters to Lm=252, Mm=296, N=30 (which is the size of one of my ROMS/TOMS applications), I get a "Segmentation fault" error when running the code and it occurs in the pre_step3d_tile routine and in particular in the statement:

real(r8), dimension(PRIVATE_2D_SCRATCH_ARRAY,0:N(ng)) :: swdk

I did not get such an error with ifc and I have successfully compiled and run my application with Lm=252, Mm=296, N=30.

Therefore, there appears to be a problem when using ifort. I compile and run on a Dell machine using Red Hat Linux (Fedora 2.0) and the ifort 8.1 compiler. I am sure that my machine can handle ROMS/TOMS applications as large as this and if not larger without any problems.

Does anybody know why this happens and a possible solution? I wonder whether others are able to compile (with ifort) and run the ROMS/TOMS UPWELLING model problem with Lm=252, Mm=296, N=30? Please let me know what you think.

Thank you.

Reply with quote  
 Post subject: Solutions to the problem
PostPosted: Mon Oct 24, 2005 2:20 pm 

Joined: Fri Sep 17, 2004 2:22 pm
Posts: 74
Location: Institut Rudjer Boskovic
Dear Lanerolle,

I encounter exactly the same problem. I managed to pinpoint the bug to
the following minimal program:

      CALL gls_corstep_tile(1044574)
      SUBROUTINE gls_corstep_tile(Jend)
      integer, intent(in) :: Jend
      integer, parameter :: r8 = selected_real_kind(12,300)
      real(r8), dimension(Jend) :: var1
      Print *, 'The program has finished'
      END SUBROUTINE gls_corstep_tile

The bug happens for ifort 9.0, ifort 8.1 but not for ifc 7.1
It does not happen if 1044574 is replaced by a smaller
value and it does not happen if Jend is replaced by its
value 1044574 in the declaration of the variable var1.

Asking on the Intel Fortran compiler forums, I was told
that the problem is the stacksize and the proposed
solution consist of replacing the line
real(r8), dimension(Jend) :: var1

by following couple of lines
real(r8), allocatable, dimension(:) :: var1
allocate (var1(jend))

Another solution, which allows to bypass the corresponding
huge code rewrite consists of putting in one's .zshrc
unlimit stacksize

Reply with quote  
 Post subject:
PostPosted: Mon Oct 31, 2005 7:07 pm 
User avatar

Joined: Tue Jul 01, 2003 4:12 am
Posts: 515
Location: NIWA
As Mathieu has pointed out, Intel Fortran is a heavy user of stack space. Short of re-coding ROMS (which I advise against) there are 2 things you can do (and I suggest doing both):
    Increase the stack space limit imposed by the shell or OS. Details of how to do this differ between systems, but under the bash shell on Linux the command is "ulimit -S -s 65536". This increases the stack size limit from the default of 8192 kiB (8 MiB) to 65536 kiB (64 MiB). The "-S" switch means that this is a soft limit, which can be exceeded in subsequent calls to ulimit. Run this command before starting ROMS; once you're happy with it put it in a startup script

    Increase the values of the ROMS input variables NtileI and NtileJ. Automatic arrays (the ones declared inside subroutines and allocated memory automatically at run time) tend to have dimensions proportional to the tile size, so reducing tile size reduces demands on stack space. It also tends to speed ROMS up. On Intel CPUs ROMS tends to run fastest when NtileJ > NtileI, ie when the tiles are wide

Reply with quote  
 Post subject:
PostPosted: Mon Nov 07, 2005 6:39 pm 
User avatar

Joined: Fri Nov 14, 2003 4:57 pm
Posts: 185
Dear Mark, and everybody,

The problem, as already pointed out, is related with stack size limitation and may
be fixed in many cases if one can increase or even unlimit stacksize. This, however,
is not as innocent as it might sound.

In older days (ROMS 1.9 and earlier) scratch arrays were pre-allocated into
[THREADPRIVATE] common blocks and were passed as arguments into physical
routines. This eliminates the use of automatic arrays completely, and excludes situations
like this. It also saves some time because allocation-deallocation actually takes some
time, and depending on compiler/operating system, may cause noticeable performance

In 2.x codes this mechanism was abandoned for unknown reason.

Starting with version 8.0, Intel compilers use different mechanism of handling
allocation of automatic arrays, resulting in better performance, but at the same time
facing limitations. It cannot be illustrated using 2.x codes without significant rewrite,
but can be shown easily using 1.9 codes: the scratch arrays are passed as
arguments; however this is mainly for optimization purposes and not necessary from
mathematical point of view. You can comment out these arguments in almost
all cases and the model produces exactly the same result, however arrays are now
automatic. If you use 7.1 Intel compiler on 2.4.x Linux kernel, you may observe the
code running 30% slower than when the arrays were passed as arguments.
If you switch to 8.1 compiler, the performance degradation is significantly less than
than, up to not noticeable at all.

Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group