ROMS/TOMS Developers

Algorithms Update Web Log

kate - January 24, 2013 @ 18:09
Ice model debugging fun- Comments (1)

Yes, debugging can be fun. It’s a job where you know there’s an answer to be found in some finite amount of time. It’s not always as quick as you expect, though.

Our Arctic domain has been run for years and doesn’t look too horrible, but there’s a bit too much ice in the summers. I’ve tried a couple of different albedo schemes, then found one more to try. It’s by Ebert and Curry and is the only one to include the effects of melt ponds. Why hadn’t I ever tried it before? A quick test with PGI Fortran showed me why – the thing blew up in a few timesteps. Maybe that had happened before and I found it easier to just go back to another albedo option.

This time, I recompiled for the debugger and fired up totalview. The thing ran this time, no problem. My favorite trick in this situation is to try another compiler. I’ve also got gfortran, so I gave that a whirl. That one ran fine too – but then it blew up three steps after a restart. How strange to run for hundreds of days, then fail right after a restart. Both this and the earlier PGI failure reported trouble from the ice_frazil routine, so I looked to the temperature field.

Totalview and gfortran don’t play nice, so it’s back to PGI and this time it failed in the same way as gfortran, in the debugger even. I focused on the i,j point reported by the ice_frazil error message and I discovered that in the very first timestep, the surface salinity value became nan, then the entire water column of salinity was set to zero. I’m not quite sure how that loss of nan happened… Those zero salinities with sub-zero temperatures generated a bunch of ice at just the one point and the water column then became unstable.

The nan at the surface led me to check the surface tracer forcing, which indeed was nan for salt. On restart, the ice concentration contained a very tiny non-zero value which introduced trouble to the computation of the ice albedo there. I have an ice_limit routine which keeps ice fields well-behaved, but I wasn’t running it from within get_state. I looked into calling it there, but ice_limit works on the MPI tile while get_state does not. I managed to add a call to ice_limit from the ini_fields routine instead. It’s running again – in both PGI and gfortran. The problem this time was due to a single precision restart file, but I already knew I wanted a call to ice_limit on startup.

While I was poking around in there, I also fixed it so that trouble in ice_frazil really does cause the job to end. I had added a check on exit_flag, but was then broadcasting the value from process 0, not broadcasting the bad value to the rest of the nodes – unless process 0 was the one in trouble. It needed a “MAX” reduction so that any process in trouble made that trouble known to the head node.

Anyway, I learned something, which is always fun.

1 Comment »

  1. Interesting… I have been arguing with the TotalView developers for almost two years now. TotalView does not work well with ifort and gfortran. The debugging with ifort is cumbersome since ROMS has so many continuation lines that are executable and the assembly code has multiple instructions for those statements. I recall once finding more than ten separated (clicks) instructions. The solution for this is so nasty that we will be waist a lot of time debugging. TotalView says that it is ifort fault and viceversa. I think that it is TotalView fault. Now, gfortran and TotalView is a mess. We cannot see any of the module variables and their associated pointers. I think that this is gfortran fault because it does not build the correct assembly code for debugging. In my experience, TotalView only work with the PGI compiler. However, it does not work on the Mac computers. I have been waiting for them to fix that problem for more than a year now. I think that it is very simple to fix. They tell me that they don’t have a PGI Fortan license to run and debug in their Macs… Can you believe it? TotalView only works for me with ifort in my iMac for version 10.6 (Snow Lepard) of their operating system. It does not work for 10.8 (Mountain Lion). Rats…

    As you can see, I don’t have a version of TotalView that work in my new iMac computers. I am frustrated. Nowadays, I am looking for another debugger that works for the iMacs with ifort, gfortan, and pgf90. I have a lot of stuff to debug and this is slowing down several of the ROMS new algorithms. I need vacations badly…

    I find that gfortran has better error reporting and you can find problems easily. It is nice that it give us the routine and line number. I love that. And this one is a free Fortran compiler… I cannot understand why ifort and pgf90 don’t give us better information about these type of problems.

    It is always a good idea to manage the exit_flag correctly in MPI. I have invested a lot of time cleaning that logic. Of course, I haven’t put my hands on the ice model routines.

    Comment by arango — January 25, 2013 @ 2:22 @ 2:22 am

RSS feed for comments on this post.

Leave a comment

You must be logged in to post a comment.