Weird experience with ROMS3.72

Message

pengjia · #1 Unread post by **pengjia** » Fri Oct 09, 2009 3:00 pm

Hi all,

I set-upped a Linux box with Intel i7 + Fedora11_64 + ifort11 + mpich2. I could compile and run test case smoothly. Happy! But something weird happens. Whatever I edit in "*.in" file (like NHIS, NINFO, somethings not important) and rerun the previously successful case, it will blow up and NaN values show up within several steps. Then run it again without changing anything, sometimes it works, sometimes it needs the third try.

We have a 16-node cluster with 32 AMD CPUs, Redhat Enter.3 and ifort9. I runned the same case aforementioned (same ROMS version, same CPP options...), but got different results. Take salinity for an example, the discrepancy at some points can be as high as 3psu.

What kind of things do you think cause that. Linux box configuration? Or the ROMS itself? Or different intel fortran? Or platforms and OS do give different results?

Sorry that I'm totally confused and the above may also confuse you.

Thank you.

Peng

linzhenhua · Sat Oct 10, 2009 1:17 am

Below is part of the discussion from Martin Schmidt to MOM4 mailing list,I guess it may help:

the modern intel architecture allows for several models how floating
point operations are defined. The
default is a "sloppy" mode where speed gain is preferred against
reproducibilty.
Accurate results are obtained with |"-O -fp-model strict". |
|Otherwise results may differ even from repeated runs with the same binary.

pengjia · #3 Unread post by **pengjia** » Sat Oct 10, 2009 11:08 pm

That makes much sense. But it's still weird to me that in order to start running a certain case, I have to do double or even triple try. Just now, I submitted one case three times and after twice blowing-ups, it's running smoothly

I don't want to do this sort of things for very case.

You know, I wasted several days trying to figure out what leads my model to blowing up. I checked the boundary, the forcing, the initial... And finally, it ends up having something to do with the machine itself.

kate · #4 Unread post by **kate** » Mon Oct 12, 2009 7:01 pm

Can you get consistent results using the strict compile flag?

arango · #5 Unread post by **arango** » Mon Oct 12, 2009 7:53 pm

I get identical results with the -fp-model precise flag. This is the flag that is distributed in configuration files.

Ocean Modeling Discussion

Weird experience with ROMS3.72

Weird experience with ROMS3.72

Re: Weird experience with ROMS3.72

Re: Weird experience with ROMS3.72

Re: Weird experience with ROMS3.72

Re: Weird experience with ROMS3.72