Problems on run

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
ryxcxmy
Posts: 3
Joined: Wed Nov 26, 2008 4:45 pm
Location: SKLEC

Problems on run

#1 Unread post by ryxcxmy »

Dear all,
Something is wrong when ROMS run. The paralel settings is NtileI==1;NtileJ==1. Here are the clues:
/** rank 0 in job 1 c0202_35389 caused collective abort of all ranks.
exit status of rank 0: killed by signal 9 **/
I don't know what that is mean. can someone help me? Any information is helpful.
Thank you

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Problems on run

#2 Unread post by kate »

Did you compile it for parallel processing? Did you submit it as a parallel job with mpirun? How many tasks? At what state of the initialization did it die?

ryxcxmy
Posts: 3
Joined: Wed Nov 26, 2008 4:45 pm
Location: SKLEC

Re: Problems on run

#3 Unread post by ryxcxmy »

Dear Kate,
Thank you for your reply.
Yes, I did it for parallel processing with mpirun. At first, the settings is 2*8, but it doesn't run.Finally, the settings is 1*1, however, the same problems occured. The process of make is ok. The initialization has passed, consequently it died without any computation. The clues as followed:
rank 0 in job 1 c0202_35389 caused collective abort of all ranks.
exit status of rank 0: killed by signal 9

I don't know what's wrong and where something is wrong?

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Problems on run

#4 Unread post by kate »

What happens if you recompile in serial mode? Does that work?
In the above, does ROMS print out anything at all in the initialization?

ryxcxmy
Posts: 3
Joined: Wed Nov 26, 2008 4:45 pm
Location: SKLEC

Re: Problems on run

#5 Unread post by ryxcxmy »

Dear kate,
Thank you.
In serial mode, Roms can run. However, why not run in MPI mode?

User avatar
gouillon
Posts: 9
Joined: Tue Sep 12, 2006 2:45 pm
Location: SHOM - Toulouse
Contact:

Re: Problems on run

#6 Unread post by gouillon »

I had this error message
** rank 0 in job 1 c0202_35389 caused collective abort of all ranks
because I had computer memory issues. I increased the number of processors and it ran but I'm not sure this message is unique to this problem. But you could try... Could you also post the full log of your experiment?

Post Reply