ROMS NL run MPI error

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
pheidary
Posts: 6
Joined: Thu May 09, 2024 1:06 am
Location: NOAA-NOS

ROMS NL run MPI error

#1 Unread post by pheidary »

Hello,
I have a problem that I hope someone could guide me how to resolve it.
I am encountering an MPI-related error while running the nonlinear ROMS model on a Cray system. Interestingly, this issue does not occur when I run ROMS 4D-Var or ROMS-split on the same system with similar configurations. The problem arises during the initialization phase of the nonlinear simulation. Below are the details of the error message and my current settings:

Code: Select all

Job started at:  
    Tue 24 Dec 2024 12:04:13 PM EST  
MPICH ERROR [Rank 15] [job id 207418914.0] [Tue Dec 24 12:04:17 2024] [c6n0145] - Abort(939550479) (rank 15 in comm 0): Fatal error in MPIDI_Cray_shared_mem_coll_bcast: Other MPI error, error stack:  
MPIDI_Cray_shared_mem_coll_bcast(500): message sizes do not match across processes in the collective routine: I am using 4 but a peer process on my node is using 12  
aborting job:
Fatal error in MPIDI_Cray_shared_mem_coll_bcast: Other MPI error, error stack:
MPIDI_Cray_shared_mem_coll_bcast(500): message sizes do not match across processes in the collective routine: I am using 4 but a peer process on my node is using 12
MPICH ERROR [Rank 13] [job id 207457413.0] [Mon Dec 30 14:03:34 2024] [c6n0150] - Abort(939550479) (rank 13 in comm 0): Fatal error in MPIDI_Cray_shared_mem_coll_bcast: Other MPI error, error stack:
MPIDI_Cray_shared_mem_coll_bcast(500): message sizes do not match across processes in the collective routine: I am using 4 but a peer process on my node is using 12
...
The strange thing is that ROMS 4D-Var and ROMS-split run correctly on the same system without any MPI-related issues, with the only difference being the use of a different header file for compilation.
Any guidance or suggestions would be greatly appreciated. Please let me know if more information is needed to troubleshoot the issue. I have attached my output and error files here.

Best,
Parisa
Attachments
wcofs_.txt
*.out
(134.85 KiB) Downloaded 54 times
log_.txt
err.out
(176.86 KiB) Downloaded 56 times

Post Reply