ROMS/TOMS Developers

Algorithms Update Web Log

kate - January 25, 2008 @ 0:15
Change is hard- Comments (0)

We completed a Northeast Pacific simulation almost two years ago now. There are things we like about it and things we’d like to improve on for a new simulation. There have been some changes to ROMS, changes to our bathymetry, boundary conditions, options chosen. So it comes time to give it all a whirl and see if the code still runs.

I have some initial troubles, but find that with a sufficiently short timestep, the code runs. There’s a bit of annoyance in that I ran a 3 km simulation with tides with a timestep of 240 s and I’m needing a timestep of 200 s for this 10 km grid. At least it runs, though it’s going to cost more than we had expected.

Then I hear from Ken Coyle about some wacky temperatures causing the biology to blow up. He told me it was happening at an i,j location at which I just happened to have a station. I find that he is indeed right, the temperature there gets temporarily quite bizarre:
temperature timeseries showing bad value
I plot up the whole vertical column of temperatures and find that they all get bad, as do the salinities:
vertical column of temperatures
After plotting every imaginable field, I find that I’m growing a negative amount of ice at that spot. The negative value is coming from a negative factor of Hz at that time and place. So, plotting total column water depth I see:
water depths at those stations
Very interesting – the water depth goes negative for several timesteps, but the model keeps on going. I didn’t know that was possible.

It’s time to try the shiny new WET_DRY option. My first attempts at it caused the model to blow up much earlier in the run (less than a day as opposed to running ten days). The time and place of that trouble was elsewhere in the domain where the total water depth wasn’t even close to zero. Further investigation showed that to be a compiler bug (goody, goody) to be avoided with less optimization. I can’t even report it to the vendor without a great deal more work pinning it down. Pathscale with -Ofast is trouble, using all the suboptions to -Ofast except -ipa is OK. Sunstudio with -O3 is also OK. The temperatures are now quite well-behaved too:
fixed temperatures

I tell Ken to update and to turn on WET_DRY, figuring that would solve all his problems. No such luck! His biology code is reporting wacky temperatures just prior to blowing up:

 Temperature:  37.5827072746595690 152 388 42 1 8.34259259259124519
6787409 36896.59491  1.577662E-03  2.255593E+04  2.255593E+04  5.268561E+16   0
 Temperature:  51.0785453469517208 152 388 39 1 8.34490740740875481
 Temperature:  39.2654092639337264 152 389 39 1 8.34490740740875481
6787410 36896.59722  1.584459E-03  2.255596E+04  2.255596E+04  5.268571E+16   0
 Temperature:  61.5939893767976798 152 388 6 1 8.34722222221898846
 Temperature:  73.0509220012720846 152 388 42 1 8.34722222221898846
 Temperature:  57.1291962882013564 152 389 42 1 8.34722222221898846
6787411 36896.59954  1.591257E-03  2.255598E+04  2.255599E+04  5.268581E+16   0
 Temperature:  79.0037572721587935 152 388 7 1 8.34953703703649808
 Temperature:  36.0570528159368351 152 388 8 1 8.34953703703649808
 Temperature:  132.177242769374715 152 388 39 1 8.34953703703649808
 Temperature:  99.5619134891402524 152 389 39 1 8.34953703703649808
6787412 36896.60185  1.598029E-03  2.255601E+04  2.255601E+04  5.268592E+16   0
 Temperature:  145.794858427701342 152 388 11 1 8.35185185184673173
 Temperature:  75.7987454267087344 152 388 12 1 8.35185185184673173
 Temperature:  234.923241638649216 152 388 31 1 8.35185185184673173
 Temperature:  245.081721186121740 152 388 42 1 8.35185185184673173
 Temperature:  208.911218725014379 152 389 42 1 8.35185185184673173
6787413 36896.60417  1.604735E-03  2.255604E+04  2.255604E+04  5.268602E+16   0
 Temperature:  104.828633142693036 152 388 25 1 8.35416666666424135
 Temperature:  70.9285478559007743 152 388 26 1 8.35416666666424135
 Temperature:  195.296389285105079 152 388 39 1 8.35416666666424135
 Temperature:  689.578533632940207 152 389 39 1 8.35416666666424135
6787414 36896.60648  1.611350E-03  2.255607E+04  2.255607E+04  5.268613E+16   0
 Temperature:  40.8787945360774572 152 388 9 1 8.35648148148175096
 Temperature:  45.6441714731643415 152 388 42 1 8.35648148148175096
 Temperature:  261.585913331404129 152 389 41 1 8.35648148148175096
 Temperature:  4469.44170242107884 152 389 42 1 8.35648148148175096
6787415 36896.60880  1.617861E-03  2.255610E+04  2.255610E+04  5.268624E+16   0

I asked Ken to turn on the stations for this run too and his stations temperatures never get outside the range of 3 to 7 degrees (this was a warmer initial condition file than I’d been using, but that isn’t relevant):
Ken's temperatures
The depth too completely stays in bounds though it gets shallow at the time of the trouble:
Ken's depths

So what is going on here? The temperatures are multiplied by Hz during the pre_step3d phase. The step2d code updates Hz with the new estimate based on a new water depth. The biology code extracts a temperature by dividing tracer t by Hz. If the new Hz is sufficiently different from the old Hz, the resulting value is not a sensible temperature. The biology needs a temperature for computing growth rates and values of 200 C were not considered to be inside the valid range.

What to do? One option would be to check the wet/dry mask and not compute biology when things are dry. However, it seems to me that the temperatures get a bit off several steps before we get to the minimum depth. Do we make the minimum depth bigger? Or do we save the old Hz and use that to compute a temperature for the biological terms?

Update:

I think we know what’s going on. The t(nnew) gets updated in pre_step3d to have the contribution from the vertical diffusion added on. In shallow water, this term can dominate the t*Hz term, meaning that t(nnew)/Hz is not a valid temperature. The same can be true for any of the other tracers as well. Looking in the debugger at a time/place where the water is shallow, we see that t(nrhs) for the upper points is:

(152,388,37,1,1)           4.590317...
(152,388,38,1,1)           4.59349
(152,388,39,1,1)           4.59674
(152,388,40,1,1)           4.59805
(152,388,41,1,1)           4.58889
(152,388,42,1,1)           4.57199

The water is within the range 4.4 to 4.6 from top top bottom, cooling at the surface due to atmospheric heat loss. The values of Hz for these points range from 0.0234 to 0.0237. The values for t(nnew) are:

(152,388,37,2,1)           0.1221898...
(152,388,38,2,1)           0.121764
(152,388,39,2,1)           0.121402
(152,388,40,2,1)          -0.247068
(152,388,41,2,1)           0.100149
(152,388,42,2,1)           0.4911977

You can do the math to see that t(nnew)/Hz will not result in values close to the target 4.58. The deep values where the vertical diffusion is small are pretty close to t*Hz_old, where Hz_old has values between 0.0257 and 0.0260. Hz is changing, but not enough to explain all that’s going near the surface. We need to be using t(nrhs) to get valid temperatures and other tracer values.