ROMS runtime error

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
shrikantmp
Posts: 27
Joined: Mon Jan 27, 2014 9:50 pm
Location: Indian Institute of Science

ROMS runtime error

#1 Unread post by shrikantmp »

Hi ROMS users

I have been trying to run a roms application, but after submitting my job (parallel) to the cluster, it exits from the queue after running for 1 second. I checked the log file and it displays the following message

=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 15
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)

I checked with my cluster admin and there is nothing wrong with the job submit script. Please help.

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: ROMS runtime error

#2 Unread post by kate »

What does the ROMS output look like? Did you get any?

shrikantmp
Posts: 27
Joined: Mon Jan 27, 2014 9:50 pm
Location: Indian Institute of Science

Re: ROMS runtime error

#3 Unread post by shrikantmp »

I didn't get any output. The log file is supposed to show something like this:
--------------------------------------------------------------------------------
Model Input Parameters: ROMS/TOMS version 3.7
Wednesday - November 8, 2017 - 2:50:07 PM
--------------------------------------------------------------------------------


Instead, all I get is the message
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 15
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)

Cluster admin says job submission script is submitting the job successfully. The problem is with running the model.

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: ROMS runtime error

#4 Unread post by kate »

ROMS is normally quite verbose about what went wrong with it. It starts writing to stdout very early in the job. You're saying it gets killed even before that, there's no ROMS output at all. Very odd.

shrikantmp
Posts: 27
Joined: Mon Jan 27, 2014 9:50 pm
Location: Indian Institute of Science

Re: ROMS runtime error

#5 Unread post by shrikantmp »

Yes Kate. That's exactly what's happening. Please help.

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: ROMS runtime error

#6 Unread post by kate »

Did it used to run? Are you sure you have the right syntax on the ROMS execute line in your script? Can we see that?

shrikantmp
Posts: 27
Joined: Mon Jan 27, 2014 9:50 pm
Location: Indian Institute of Science

Re: ROMS runtime error

#7 Unread post by shrikantmp »

Please find my job submit submit script in the attachment. It used to run, albeit with a little difference in execute line.
Attachments
submit_new.sh
(486 Bytes) Downloaded 218 times

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: ROMS runtime error

#8 Unread post by kate »

Try taking out the blank line #2. I don't know if you're allowed to have a blank line there. Some queueing systems end the batch commands on the first blank line.

shrikantmp
Posts: 27
Joined: Mon Jan 27, 2014 9:50 pm
Location: Indian Institute of Science

Re: ROMS runtime error

#9 Unread post by shrikantmp »

Removed the blank lines. Still the same log. Did a tracejob on jobid, the output is as follows:

[casparga@tyrone-cluster upwelling]$ tracejob 78891
/var/spool/torque/server_priv/accounting/20171111: Permission denied

Job: 78891.tyrone-cluster

11/11/2017 12:29:47 S enqueuing into batch, state 1 hop 1
11/11/2017 12:29:47 S dequeuing from batch, state QUEUED
11/11/2017 12:29:47 S enqueuing into idqueue, state 1 hop 1
11/11/2017 12:29:47 S Job Queued at request of casparga@tyrone-cluster, owner = casparga@tyrone-cluster, job name = UPWELLING, queue = idqueue
11/11/2017 12:29:47 S Job Modified at request of Scheduler@tyrone-cluster
11/11/2017 12:29:47 S Exit_status=2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00
11/11/2017 12:29:47 L Job Run
11/11/2017 12:29:47 S Job Run at request of Scheduler@tyrone-cluster
11/11/2017 12:29:47 S Not sending email: User does not want mail of this type.
11/11/2017 12:29:47 S Not sending email: User does not want mail of this type.
11/11/2017 12:29:47 S dequeuing from idqueue, state COMPLETE
11/11/2017 12:29:47 M scan_for_terminated: job 78891.tyrone-cluster task 1 terminated, sid=7807
11/11/2017 12:29:47 M job was terminated
11/11/2017 12:29:47 M obit sent to server
11/11/2017 12:29:47 M removed job script

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: ROMS runtime error

#10 Unread post by kate »

/var/spool/torque/server_priv/accounting/20171111: Permission denied
Have you talked to your supercomputer people? I don't think this is anything to do with ROMS. Maybe you should let it send you email if it can be more verbose.

shrikantmp
Posts: 27
Joined: Mon Jan 27, 2014 9:50 pm
Location: Indian Institute of Science

Re: ROMS runtime error

#11 Unread post by shrikantmp »

Sorry for late reply but I communicated my cluster admin that there is no problem with the model. After resolving an issue of password less login, I tried to run the model and got the following as output in the log file.

[mpiexec@tyrone-node16] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:184): assert (!closed) failed
[mpiexec@tyrone-node16] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:74): unable to send SIGUSR1 downstream
[mpiexec@tyrone-node16] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@tyrone-node16] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec@tyrone-node16] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion


Is there a problem with the compilation of the model?

User avatar
kate
Posts: 4088
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: ROMS runtime error

#12 Unread post by kate »

You can search the web for answers to things like this. Here's one match which might be useful.

Post Reply