different result obtained with different versions of pgi

Message

zhang · #1 Unread post by **zhang** » Tue Jul 03, 2007 6:07 pm

We recently updated our pgi compiler, and I am comparing the results from models compiled with different versions of pgi, 6.1 and 7.0. I ran the two nonlinear models for 120 time steps forced by exact same atmospheric forcings and initial conditions. For both of them, Flather and Chapman are used for 2D OBC, and radiation is used for 3D momentum and tracers. GLS(k-kl) is used for the vertical mixing. Surprise to me, the results from the two models are quite different.

Later, I switched to serial debugging mode which doesn't use any optimization in the compilation, nor any parallelization for the computation. But the difference between the two models is roughly same. For temperature, the difference reaches 0.8 degree on the open boundaries, and about 0.1 degree in the interior. The difference is everywhere and looks very random, no coherent structure at all.

The growth of the error is also checked. At the first time step, there is no difference between the two models. At about the tenth step, the difference starts to show up and gradually increases. I also ran the two models for 60-day simulation, the difference in temperature reaches 1.5 degree in the area close to the open boundaries and coherent structure also exists.

I am wondering what could cause this problem, an old bug in the old compiler or a new one they just imported? or nonlinearity of the system blows up the roundoff error? I am simulating New Jersey coastal area which is pretty shallow. The deepest place in my model domain is about 80m. Realistic atmospheric forcing and river discharge is used to force the model.

Does somebody else have the similar experience?

kurapov · #2 Unread post by **kurapov** » Mon Jul 09, 2007 9:34 pm

The underlying continuous equations (w/out dissipation) with any type of local boundary conditions used in ROMS constitute an ill-posed problem (ref: Oliger and Sundstrom, SIAM J. Appl. Math, 1978). That may mean for us that ROMS with open boundary conditions (and low enough dissipation - like coastal case) will provide solutions highly sensitive to small changes in inputs.

It may be instructive to check different compilers for a test case with all closed boundaries. Will the temperature differences be smaller for the case with closed boundaries?

Including a sponge layer improves the conditioning of the discrete problem (although may introduce non-physical effects). With the sponge, the solution may again be less sensitive to the choice of compiler.

This is of course just one of the possible reasons. Some other explanation may be found relevant for your case.

rsignell · #3 Unread post by **rsignell** » Tue Jul 10, 2007 11:49 am

Could it also be that machine accuracy small perturbations of your initial conditions would also lead to the types of differences you are seeing?

Ten years ago we were evaluating different computers for running POM, and we were surprised to discover that the same executable running on the SGI machine we were testing was producing significantly different results (e.g. 5 km variation in river plume location after several weeks of simulation)! But then we realized that the fast compiler options we were using did not guarantee IEEE arithmetic, and when we forced the compiler to use IEEE, we got identical results. We concluded that the model truly was sensitive to "noise" (butterfly effect), and that although it would be sometimes desirable to obtain the exact same result, the family of solutions represented by non-IEEE math were all equally valid.

nperlin · #4 Unread post by **nperlin** » Tue Jul 17, 2007 4:56 pm

Hi Zhang,
I could not answer definitely about your particular problem and BC formulation, but one thing I know about PGI compilers that there had been a bug in PGI 6.x version (I had v.6.0) that was fixed in PGI 7.0. This bug was related to array boundaries (upper and lower indices), in assumed shape arrays. There is a Fortran90 test program below checking the array boundaries, and further down the "correct answers" you should get after running this program. In my case, PGI 6.0 fortran compiler did not get them right, while PGI 7.0 did.
I ran into this difference in Fortran compilers behavior when handling another software (ESMF), where this problem appeared to be critical. The ESMF user support group helped me to track this down, and provided the test program below (they should get a credit for that...) .
I can't tell if this affects the model performance in your case, simply mean to demonstrate an example of certainly erroneous behaviour of earlier PGI compiler version.
Natalie.

Below is the program "assumedShapeTest.f90":
******************************************

program assumedShapeTest

integer:: i
integer:: a_stat1(10)
integer:: a_stat2(-5:4)
integer:: a_stat3(0:9)
integer, allocatable:: a_dyn(:)

print *, "a_stat1:"
a_stat1 = (/(i,i=-1,-10,-1)/)
call assumedShapeSub(a_stat1)

print *, "a_stat2:"
a_stat2 = (/(i,i=-1,-10,-1)/)
call assumedShapeSub(a_stat2)

print *, "a_stat3:"
a_stat3 = (/(i,i=-1,-10,-1)/)
call assumedShapeSub(a_stat3)

print *, "a_dyn case 1:"
allocate(a_dyn(10))
a_dyn = (/(i,i=-1,-10,-1)/)
call assumedShapeSub(a_dyn)
deallocate(a_dyn)

print *, "a_dyn case 2:"
allocate(a_dyn(-5:4))
a_dyn = (/(i,i=-1,-10,-1)/)
call assumedShapeSub(a_dyn)
deallocate(a_dyn)

print *, "a_dyn case 3:"
allocate(a_dyn(0:9))
a_dyn = (/(i,i=-1,-10,-1)/)
call assumedShapeSub(a_dyn)
deallocate(a_dyn)

contains

subroutine assumedShapeSub(a)
integer, target:: a(:)
integer, pointer:: array(:)
array=>a
print *, "(lbound:ubound) is: (", lbound(array),":",ubound(array),")"
call assumedShapeExternalSub(array(1))
end subroutine

end program

subroutine assumedShapeExternalSub(element)
integer:: element
print *, "First element: ", element
end subroutine

********************
The correct answers you should receive (using PGI 7.0)are:
a_stat1:
(lbound:ubound) is: ( 1 : 10 )
First element: -1
a_stat2:
(lbound:ubound) is: ( 1 : 10 )
First element: -1
a_stat3:
(lbound:ubound) is: ( 1 : 10 )
First element: -1
a_dyn case 1:
(lbound:ubound) is: ( 1 : 10 )
First element: -1
a_dyn case 2:
(lbound:ubound) is: ( 1 : 10 )
First element: -1
a_dyn case 3:
(lbound:ubound) is: ( 1 : 10 )
First element: -1

zhang · #5 Unread post by **zhang** » Tue Jul 17, 2007 8:28 pm

hey, Natalie,

The pgi6.1 I have gives the right answer as pgi7.0. I also tried pgi6.0 which does give wrong boundary indices as you said. So, the bug has been corrected in my pgi6.1.

For the problem I was talking at the beginning of this thread, I was using pgi6.1 and pgi7.0. So, that is not the reason. But it is good to know this. Thanks.

Gordon

zhang · #6 Unread post by **zhang** » Tue Jul 17, 2007 9:00 pm

I tried the IEEE arithmetic standard ('-Kieee' option for pgi), the model results with different versions of pgi (pgi6.1, pgi7.0) became identical exactly as Rich said. But the results from different compilers (pgi, ifort and g95) are still different by same amount as before even all of them are forced to use the IEEE standard.

I also tested the model speed with/without '-Kieee' on for my application with full optimization and parallelization. On average, the speed are same, although the pgi manual says '-Kieee' could slow the computation down. So, I guess we should always use the IEEE arithmetic stardard.

After this, I went back compiling without using the IEEE standard, but changed the 3D boundary condition from RADIATION to GRADIENT. Then, the big difference (0.5 degree for temperature after 120 time steps) on the open boundaries disappears, but the small difference (0.01degree for temperature) in the interior is still there. So, the radiation OBC is less deterministic and makes the situation worse.

Ocean Modeling Discussion

different result obtained with different versions of pgi

different result obtained with different versions of pgi

PGI Fortran compilers difference