ice code hanging / freezing ...

Discussion about modeling ice with ROMS

Moderators: arango, robertson

Post Reply
Message
Author
turuncu
Posts: 128
Joined: Tue Feb 01, 2005 8:21 pm
Location: Istanbul Technical University (ITU)
Contact:

ice code hanging / freezing ...

#1 Post by turuncu » Tue May 01, 2012 11:22 am

Hi,

I am running ice code but after certain time (3 months of simulation) the model stop running and freezing. It is waiting without any error messages. I put some print statement to the code to find the place that causes the problem. In main3d the code hanging just after calling ice_frazil subroutine. So i put additional print statement into ice_frazil subroutine and i found that the code waiting in the following part of the code,

Code: Select all

# ifdef DISTRIBUTE
      write(*,*) 'part 3 - ', ng, tile
      CALL mp_exchange2d (ng, tile, iNLM, 1,                            &
     &                    LBi, UBi, LBj, UBj,                           &
     &                    NghostPoints, EWperiodic(ng), NSperiodic(ng), &
     &                    wfr)
# endif
!
!  Apply periodic boundary conditions.
!


I think that there is a problem in this call but i could not find the solution. After restarting model from the last point (just before hanging) the code works without any problem and after 3 month simulation it is hanging again.

This is very suspicious because it always hanging with same interval (after 90-92 days of simulation). I think that it could be related with the buffer size and to test it i change the when i run the model with following MPI options (by the way, i am using OpenMPI 1.5.3 compiled with intel compiler 12.0.4),

--mca btl_tcp_sndbuf 524288 --mca btl_tcp_rcvbuf 524288

but it did not work. I also try to give some other MPI options like,

--mca btl openib,self,sm --mca mpi_leave_pinned 1

but there is no any success. Maybe the process waiting a message from others that are not supposed to send. I am not sure. So, i just want to know that have you ever see this kind of problem before in the ice code? Are there anything wrong in the ice_frazil subroutine?

PS: i am using snapshot of the ice code which dated in 20-03-2012.

Best regards,

--ufuk

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: ice code hanging / freezing ...

#2 Post by kate » Tue May 01, 2012 9:04 pm

I have not ever seen that. However, I recently changed the code to take the communications out of ice_frazil by calling ice_frazil from step3d_t instead (before the communications there). Just today I got the update pushed out to svn, so see if that behaves differently.

turuncu
Posts: 128
Joined: Tue Feb 01, 2005 8:21 pm
Location: Istanbul Technical University (ITU)
Contact:

Re: ice code hanging / freezing ...

#3 Post by turuncu » Wed May 02, 2012 7:26 am

Thanks Kate. is it in the git repository or somewhere else? anyway, i will try the newer version and let you know.

regards,

--ufuk

User avatar
kate
Posts: 3678
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: ice code hanging / freezing ...

#4 Post by kate » Wed May 02, 2012 4:35 pm

My code is available on a branch svn site at myroms.org and also at github. I try to update them both together.

User avatar
wilkin
Posts: 503
Joined: Mon Apr 28, 2003 5:44 pm
Location: Rutgers University
Contact:

Re: ice code hanging / freezing ...

#5 Post by wilkin » Fri May 04, 2012 2:45 am

The ice code stops and freezes! :shock:

But isn't it supposed to do that? :-)
John Wilkin: DMCS Rutgers University
71 Dudley Rd, New Brunswick, NJ 08901-8521, USA. ph: 609-630-0559 jwilkin@rutgers.edu

muenchow
Posts: 4
Joined: Wed Sep 24, 2008 8:49 pm
Location: University of Delaware

Re: ice code hanging / freezing ...

#6 Post by muenchow » Sat May 05, 2012 2:24 am

wilkin wrote:The ice code stops and freezes! :shock:

But isn't it supposed to do that? :-)
Wanna come into the Arctic this August, John, and see for yourself at sea how the ocean freezes while the earth, ocean, and ice on it keeps spinning? Let me know ASAP as security clearances are necessary 8)

rduran
Posts: 139
Joined: Fri Jan 08, 2010 7:22 pm
Location: Theiss Research

Re: ice code hanging / freezing ...

#7 Post by rduran » Sat May 05, 2012 4:27 am

that's what happens in realistic simulations of ice!

turuncu
Posts: 128
Joined: Tue Feb 01, 2005 8:21 pm
Location: Istanbul Technical University (ITU)
Contact:

Re: ice code hanging / freezing ...

#8 Post by turuncu » Sat May 05, 2012 7:53 am

Hi,

I think, i found the problem. It is related with MPI itself. After hanging the model, i attched the Totalview into one of the processes to see the problem. The model always hanging in the mp_distributeXXX calls. So, it is obvious that the problem triggering by some limitation of MPI. Actually, i am using Open MPI (1.5.3) compiled with Intel Compiler (12.0.4). Then i try to run the model with following options,

Code: Select all

mpirun --mca btl_openib_eager_limit 65000 --mca btl_sm_eager_limit 1000000 ./oceanM cas.in > roms.txt
and now it is not hanging. Tuning btl_openib_eager_limit and btl_sm_eager_limit parameters is working in this case. I just wonder that do you have any experience in ROMS like that? Anyway, thanks to everybody.

Regards,

--ufuk

Post Reply