I am running ice code but after certain time (3 months of simulation) the model stop running and freezing. It is waiting without any error messages. I put some print statement to the code to find the place that causes the problem. In main3d the code hanging just after calling ice_frazil subroutine. So i put additional print statement into ice_frazil subroutine and i found that the code waiting in the following part of the code,
Code: Select all
# ifdef DISTRIBUTE write(*,*) 'part 3 - ', ng, tile CALL mp_exchange2d (ng, tile, iNLM, 1, & & LBi, UBi, LBj, UBj, & & NghostPoints, EWperiodic(ng), NSperiodic(ng), & & wfr) # endif ! ! Apply periodic boundary conditions. !
I think that there is a problem in this call but i could not find the solution. After restarting model from the last point (just before hanging) the code works without any problem and after 3 month simulation it is hanging again.
This is very suspicious because it always hanging with same interval (after 90-92 days of simulation). I think that it could be related with the buffer size and to test it i change the when i run the model with following MPI options (by the way, i am using OpenMPI 1.5.3 compiled with intel compiler 12.0.4),
--mca btl_tcp_sndbuf 524288 --mca btl_tcp_rcvbuf 524288
but it did not work. I also try to give some other MPI options like,
--mca btl openib,self,sm --mca mpi_leave_pinned 1
but there is no any success. Maybe the process waiting a message from others that are not supposed to send. I am not sure. So, i just want to know that have you ever see this kind of problem before in the ice code? Are there anything wrong in the ice_frazil subroutine?
PS: i am using snapshot of the ice code which dated in 20-03-2012.