Ocean Modeling Discussion

ROMS/TOMS

Search for:
It is currently Sun Aug 19, 2018 5:39 pm




Post new topic Reply to topic  [ 8 posts ] 

All times are UTC

Author Message
PostPosted: Tue May 01, 2012 11:22 am 
Offline

Joined: Tue Feb 01, 2005 8:21 pm
Posts: 126
Location: Istanbul Technical University (ITU)
Hi,

I am running ice code but after certain time (3 months of simulation) the model stop running and freezing. It is waiting without any error messages. I put some print statement to the code to find the place that causes the problem. In main3d the code hanging just after calling ice_frazil subroutine. So i put additional print statement into ice_frazil subroutine and i found that the code waiting in the following part of the code,

Code:
# ifdef DISTRIBUTE
      write(*,*) 'part 3 - ', ng, tile
      CALL mp_exchange2d (ng, tile, iNLM, 1,                            &
     &                    LBi, UBi, LBj, UBj,                           &
     &                    NghostPoints, EWperiodic(ng), NSperiodic(ng), &
     &                    wfr)
# endif
!
!  Apply periodic boundary conditions.
!


I think that there is a problem in this call but i could not find the solution. After restarting model from the last point (just before hanging) the code works without any problem and after 3 month simulation it is hanging again.

This is very suspicious because it always hanging with same interval (after 90-92 days of simulation). I think that it could be related with the buffer size and to test it i change the when i run the model with following MPI options (by the way, i am using OpenMPI 1.5.3 compiled with intel compiler 12.0.4),

--mca btl_tcp_sndbuf 524288 --mca btl_tcp_rcvbuf 524288

but it did not work. I also try to give some other MPI options like,

--mca btl openib,self,sm --mca mpi_leave_pinned 1

but there is no any success. Maybe the process waiting a message from others that are not supposed to send. I am not sure. So, i just want to know that have you ever see this kind of problem before in the ice code? Are there anything wrong in the ice_frazil subroutine?

PS: i am using snapshot of the ice code which dated in 20-03-2012.

Best regards,

--ufuk


Top
 Profile  
Reply with quote  
PostPosted: Tue May 01, 2012 9:04 pm 
Offline
User avatar

Joined: Wed Jul 02, 2003 5:29 pm
Posts: 3487
Location: IMS/UAF, USA
I have not ever seen that. However, I recently changed the code to take the communications out of ice_frazil by calling ice_frazil from step3d_t instead (before the communications there). Just today I got the update pushed out to svn, so see if that behaves differently.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 02, 2012 7:26 am 
Offline

Joined: Tue Feb 01, 2005 8:21 pm
Posts: 126
Location: Istanbul Technical University (ITU)
Thanks Kate. is it in the git repository or somewhere else? anyway, i will try the newer version and let you know.

regards,

--ufuk


Top
 Profile  
Reply with quote  
PostPosted: Wed May 02, 2012 4:35 pm 
Offline
User avatar

Joined: Wed Jul 02, 2003 5:29 pm
Posts: 3487
Location: IMS/UAF, USA
My code is available on a branch svn site at myroms.org and also at github. I try to update them both together.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 04, 2012 2:45 am 
Offline
User avatar

Joined: Mon Apr 28, 2003 5:44 pm
Posts: 449
Location: Rutgers University
The ice code stops and freezes! :shock:

But isn't it supposed to do that? :-)

_________________
John Wilkin: IMCS Rutgers University
71 Dudley Rd, New Brunswick, NJ 08901-8521, USA. ph: 609-630-0559 jwilkin@rutgers.edu


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2012 2:24 am 
Offline

Joined: Wed Sep 24, 2008 8:49 pm
Posts: 4
Location: University of Delaware
wilkin wrote:
The ice code stops and freezes! :shock:

But isn't it supposed to do that? :-)

Wanna come into the Arctic this August, John, and see for yourself at sea how the ocean freezes while the earth, ocean, and ice on it keeps spinning? Let me know ASAP as security clearances are necessary 8)


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2012 4:27 am 
Offline

Joined: Fri Jan 08, 2010 7:22 pm
Posts: 135
Location: Theiss Research
that's what happens in realistic simulations of ice!


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2012 7:53 am 
Offline

Joined: Tue Feb 01, 2005 8:21 pm
Posts: 126
Location: Istanbul Technical University (ITU)
Hi,

I think, i found the problem. It is related with MPI itself. After hanging the model, i attched the Totalview into one of the processes to see the problem. The model always hanging in the mp_distributeXXX calls. So, it is obvious that the problem triggering by some limitation of MPI. Actually, i am using Open MPI (1.5.3) compiled with Intel Compiler (12.0.4). Then i try to run the model with following options,

Code:
mpirun --mca btl_openib_eager_limit 65000 --mca btl_sm_eager_limit 1000000 ./oceanM cas.in > roms.txt


and now it is not hanging. Tuning btl_openib_eager_limit and btl_sm_eager_limit parameters is working in this case. I just wonder that do you have any experience in ROMS like that? Anyway, thanks to everybody.

Regards,

--ufuk


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group