ROMS/TOMS Developers

kate - April 20, 2015 @ 19:01 Some thoughts from a Python conference- Comments (0)

Last week I was lucky enough to attend the Software Engineering Assembly 2015 at NCAR in Boulder, Colorado. This year it was all about what’s going on in Python in Scientific Computing. The agenda and links to the slides are here. I encourage you to check them out.

The first talk was on Jupyter, a browser-based means of running not just Python, but many, many computer language. I heard about IPython notebooks just a few months ago through one of the Coursera courses on Python and haven’t yet gotten a chance to play with them much. Enrique is using them for teaching his class – you can run things interactively, you can have equations, you can output it all to pdf. Anyway, the Jupyter talk is from the maker of IPython who has now set his sights on bigger things, like including Julia, R, bash, Fortran, etc. in the languages you can access from these notebooks (someone even got Matlab working). The conference ended with a tutorial on Jupyter (Python only), a good thing since we all wanted exactly that after the first talk.

There were many, many interesting talks about all kinds of things I wasn’t aware of. One that struck a chord with many was the talk on “technical debt” by a guy who does a lot of software development in the field, under pressure. He showed pictures of places he’s been, from research ships to remote tropical islands. His point was that you can get code running, but you won’t have spent the time you might want to on making it clean and robust, writing the test suites, etc. So the debt is in what you owe that software before you count on it in the field for the next trip. It’s OK to have some technical debt, but you need to realize that you owe debt – and there’s also the explaining to your boss that there’s unfinished business.

The talk on out-of-core computations was awesome, with a visual of code running on different parts of a huge array a few pieces at a time (in parallel). It got me thinking about running the boundary condition interpolater in parallel so I talked to the guy afterwards and he said my problem should be easy. He wanted to know what other sorts of problems we (ocean modelers) are trying to solve – he works for Anaconda, a company supporting scientific Python (and maybe other languages… Julia?).

Package managers

Speaking of Anaconda, many there were using it as their Python package manager. It gives you a suite of tools guaranteed to work together, as in being consistent versions of things. So to get the next Python package, you type:

conda install xxx

Otherwise, the Pythonic way is now:

pip install xxx

You can also:

pip install xxx –user

to put pip packages in your user space on computers in which you can’t write to /usr/local.

I just got the OS updated on my Mac last month and I somehow stumbled on:

brew install xxx

but brew doesn’t have all the Python tools, so I have used pip on a few, and installed one or two from sources. One (very) young man there says he installs his Python and all the packages from source, but I’ve been there, done that, I’m ready to move on to packages.

The talk on creating your own pip packages did not cover packages with compiled C or Fortran. That’s more challenging, with things having moved from Eggs to Wheels – and you need to make precompiled binaries for all the systems or else your users need to have the compilers themselves (which I figure ROMS users do anyway).

Python 2 vs 3

I also asked the Anaconda guy about the state of Python 3. He says the tools are all pretty much ready, no reason not to move to it. If you need Unicode, you need Python 3. On the other hand, the NCAR folks with the PyNIO and PyNGL libraries still only support Python 2. Someone said to code in the subset of the language which works with both, putting () on your print statements. There’s a __future__ module, not sure what it does.

Wrapping Models

Last fall someone said they’d like to have a ROMS version with Python wrappers around the computational guts and to call the guts from Python. I’d never thought if doing that before, but Johnny Lin gave a talk on doing it for a simpler atmospheric model. He described the steps and the reason he wanted to do it (parameter exploration), but the part I loved best was when he admitted that it took so long he ran out of time before he actually did the parameter exploration.

The kgen talk was on another way of wrapping Fortran, this time one subroutine at a time. It’s a way to isolate it for testing purposes.

Parallel I/O

One tutorial was on parallel I/O through MPI-I/O. We got temporary accounts on stampede, a huge system in Texas. I’m afraid that while I really, really want to be doing parallel I/O, I don’t want to be messing with it at the MPI-I/O level. Anyway, they explained why having 1000 cores each write to their own file is a very bad idea, especially on a Lustre filesystem. You do want to be writing to one file in parallel, but you probably don’t want all 1000 cores to be writing either, so they have a T3PIO library for tuning your I/O, starting with a default of one writer per 16-core node. The French Nemo models uses XIOS for parallel I/O and the NCAR CESM has a PIO library too. Getting them to work with T3PIO should be easy.

Wrap-up

I can’t say enough about what a great workshop this was. I feel that attending has given me some “technical debt” in that I now need to spend some time using one or two things I learned about there. Thanks to the Coursera courses I’ve finally learned enough Python to be able to use it for things and I’ve begun making a parser for the ROMS ocean.in file, turning it into a Python dictionary and being able to write out a file made up of info from three different files for a nested grid problem. It still needs work and probably inspired this comic from afar.

One more thing…

I’d heard of JSON before, but recently discovered that it’s a sort of file format. It’s a subset of YAML, simpler than XML. I heard some mention of “ugly XML” at the meeting, like people have moved on to something they like better.

kate - November 15, 2014 @ 20:16 Read the output file- Comments (0)

Hey, have you ever had some bizarre behavior where it runs fine when compiled for the debugger but blows up when optimized? It turns out the answer was in the output if I’d only looked:

 uice                     1   Gradient     Gradient     Gradient     Gradient
 vice                     1   Gradient     Gradient     Gradient     Gradient
 aice                     1   Closed       Closed       Closed       Closed
 hice                     1   Closed       Closed       Closed       Closed
 tisrf                    1   Closed       Closed       Closed       Closed
 snow_thick               1   Closed       Closed       Closed       Closed
 apond                    1
 hpond                    1   ^XÃªg^D/+^@^@^XÃªg  ^D/+^@^@

I was switching back and forth between old and new codes in the source directory, using git branches. But I didn’t always update the ocean.in accordingly in the run directory… apond and hpond replace the old sfwat variable in my new code. Providing the LBC option for sfwat wasn’t helping.

kate - July 10, 2014 @ 18:21 More debugging fun- Comments (0)

This time the question came up about perfect restarts. The option PERFECT_RESTART promises that a run from beginning to end all in one go should match one made with a restart in the middle. With this option, more fields need to be written to the restart field, but were we saving enough? I had done this exercise before for the sea ice model, but not with WET_DRY as well.

The way to test for perfect restarts is to run for some number of timesteps X, save a restart file, run for one more step and save a history file. Then in another directory, copy the first restart file and use it to start at timestep X and run for one step, saving a history file. Now compare the history files with ncdiff and ncview.

What value of X to use? I didn’t try X=1 because ROMS doesn’t get into its default timestep until timestep 3. I therefore started with X=3 and found lots of differences between the two history files.

Time to go to a debugger – actually, duelling debuggers. One debugger starts from time 0, the other from step X. To do this, you don’t want to try a large X because then you’d be running all those steps in the debugger for the first debugger window. I found that for X=3, the u and v fields matched perfectly at step X, but the value of nrhs differed. nrhs is used in the computation of W, the vertical velocity. With the vertical velocities mismatched, everything evolved differently.

Rather than fix this, I decided to try again with an even value for X, say X=2. After all, I usually save after an even number of timesteps and I just wanted to see if something else was off. Indeed, the differences were more localized, but there still were differences, especially along the shallow parts where WET_DRY would be involved.

After much digging around, I found two things going on. The first is that the wet-dry masks are computed during initialization and also in calls to wetdry.F from step2d. One needs to make sure that on restart, one either needs to have saved all those masks or to recompute them consistently.

The other source of trouble comes from how ROMS writes out fields to the NetCDF files. Most fields are written with masked areas getting the _FillValue. For WET_DRY, many are being saved with rmask_io, which masks out the dry cells. This can cause a loss of information of the state of the dry cells. For instance, tracer values in the dry cells are timestepped in ROMS, including advection terms if flow happens to be entering a dry cell. If these dry cell tracers are read from masked out values in the restart file, they will be set to zero.

In the end, I didn’t have to save any more fields, but simply to update the mask initialization and to change the masks used in the saving of restart fields. I also found that I had a lingering ice restart bug, also fixed by the writing of restart fields. You might find it to be interesting, so I’ll try to describe it:

The ice is timestepped before the call to output because it contributes to the computation of surface fluxes.
For PERFECT_RESTART, I wrote a routine to overwrite surface fluxes from the saved ones in the restart file, skipping the call to seaice on the first restart step.
I found a mismatch at one lone point next to a land mask (not near the WET_DRY cells though).
The mismatch stemmed from the computation of the vertical mixing (GLS_MIXING) near the surface.
Near-surface mixing gets a contribution from the surface stresses.
The stresses are averaged from velocity points to rho points.
On restart, the stresses were set to zero for the velocity points at the land-sea mask boundary.
Turning off the masking while writing the stresses fixed this restart issue.

One more thing: ROMS has some funny features with the writing of history files. If you run for X steps, asking for a history save at step X+1, you won’t get any history file at all. Then on restart, it will complain that it can’t find one and fail.

If you run for X+1 steps, asking for a history record at step X+1, you will get one record for the initial conditions, one for time X+1. For my runs, I ask for one file per record.

If you restart from step X, giving it a history file to build on, it will save to the first file. I then need to compare file 0002 from one run with file 0001 from the other run.

kate - January 24, 2013 @ 18:09 Ice model debugging fun- Comments (1)

Yes, debugging can be fun. It’s a job where you know there’s an answer to be found in some finite amount of time. It’s not always as quick as you expect, though.

Our Arctic domain has been run for years and doesn’t look too horrible, but there’s a bit too much ice in the summers. I’ve tried a couple of different albedo schemes, then found one more to try. It’s by Ebert and Curry and is the only one to include the effects of melt ponds. Why hadn’t I ever tried it before? A quick test with PGI Fortran showed me why – the thing blew up in a few timesteps. Maybe that had happened before and I found it easier to just go back to another albedo option.

This time, I recompiled for the debugger and fired up totalview. The thing ran this time, no problem. My favorite trick in this situation is to try another compiler. I’ve also got gfortran, so I gave that a whirl. That one ran fine too – but then it blew up three steps after a restart. How strange to run for hundreds of days, then fail right after a restart. Both this and the earlier PGI failure reported trouble from the ice_frazil routine, so I looked to the temperature field.

Totalview and gfortran don’t play nice, so it’s back to PGI and this time it failed in the same way as gfortran, in the debugger even. I focused on the i,j point reported by the ice_frazil error message and I discovered that in the very first timestep, the surface salinity value became nan, then the entire water column of salinity was set to zero. I’m not quite sure how that loss of nan happened… Those zero salinities with sub-zero temperatures generated a bunch of ice at just the one point and the water column then became unstable.

The nan at the surface led me to check the surface tracer forcing, which indeed was nan for salt. On restart, the ice concentration contained a very tiny non-zero value which introduced trouble to the computation of the ice albedo there. I have an ice_limit routine which keeps ice fields well-behaved, but I wasn’t running it from within get_state. I looked into calling it there, but ice_limit works on the MPI tile while get_state does not. I managed to add a call to ice_limit from the ini_fields routine instead. It’s running again – in both PGI and gfortran. The problem this time was due to a single precision restart file, but I already knew I wanted a call to ice_limit on startup.

While I was poking around in there, I also fixed it so that trouble in ice_frazil really does cause the job to end. I had added a check on exit_flag, but was then broadcasting the value from process 0, not broadcasting the bad value to the rest of the nodes – unless process 0 was the one in trouble. It needed a “MAX” reduction so that any process in trouble made that trouble known to the head node.

Anyway, I learned something, which is always fun.

kate - January 19, 2011 @ 14:39 Hong Kong training- Comments (0)

I have to write up these trips for my work, so why not post here too?

Part 1 written Tuesday Jan 4:

Dale Haidvogel invited me to help him teach a ten-day ROMS training
in Hong Kong. He had said he would go early with his family, so Rob
and I bought tickets to arrive early. Dale’s family then changed
their minds and they spent New Year’s in Toronto. We arrived late
afternoon on Dec 30 for a class starting Jan 5. We were met by a
taxi driver with a card holding my name – it was so nice. Gan and
two of his students met us at the apartment where they are putting
us up.

The next morning one of the students, LinLin, came and gave us a
campus tour. HKUST has most everything in one enormous building,
with eating establishments on the lower levels, class rooms on the
main level, and offices above. HKUST is only twenty years old, but
has become one of the best universities in Hong Kong, with the top
business school in all of Asia. LinLin took us to meet Gan, then
we went to visit the computer lab where my part of the class will
be held. The two computer guys who run all the computers in the
Math department were there, explaining that the PCs running windows
would have a virtual Linux, then they could connect to Gan’s cluster
for doing the actual work. Seemed dicey to me, plus Gan hadn’t sent
them the list of software I’d asked for (with all the Python modules),
so that was worrisome as well. We agreed to meet with them again
on Tuesday, the day before the class started. Gan then took us out
for a big dim sum lunch.

That afternoon Rob and I went to Hang Hau to buy Octopus cards and
to look for an extra electric plug converter. The Octopus cards can
be loaded with money, then debitted by the buses and the metro, so
you don’t have to carry tons of change around. My last Seattle trip
was filled with scrambling for bus money.

We spent the evenings and some early mornings working on talks while
touring around during the daylight hours. We got out to Tsuen Wan
on Saturday to see a Taoist temple, then to Stanley Market to see
what was there (I wanted a Chinese chop and we were told that was
the place to go). On Sunday we went out by the airport to see the
largest Buddha in the world at the Po Lin monastery . You can get
there by aerial tramway, which happened to have a two hour wait –
we’d have taken a bus if we’d known! Gan told us there were many
tourists from the mainland over for the holidays and said it should
be quiet now until Chinese New Years.

Dale arrived on Monday the 3rd and we met him, Gan, and Olivia for
lunch. We talked over some of the plans and they told us that 40% of
those signed up are university students, maybe 40% already have some
ROMS experience and want more of the theory (Dale’s part of the
class) and some have absolutely no necessary skills other than
perhaps English, in spite of being told they should know math,
computers, oceanography before coming.

On Tuesday we met with the computer center guys again and were told
that Cook had found a way to get the PCs to run Linux, running one
image from a server. What a relief! Rob then handed over a DVD of
/usr/local on which he had installed everything we need. It’s going
to make everything so much simpler for the students. They can do
everything on the quad-core desktop systems rather than having to
log into Gan’s cluster.

Olivia is coordinating with a copy center and they want our talks
two days in advance for making copies for everyone. Dale and I are
both working hard to prepare things on time. Hernan is expecting me
to put all the talks on the ROMS Wiki, but I told him it has to
wait until the talks are all done. Classes start tomorrow!

Part 2: Fairbanks two weeks later:

Wow, that was one intense class! My talks are now on the wiki:
https://www.myroms.org/wiki/index.php/HK_2011

The first day wasn’t bad, with my part being an intro to Linux and
a tour of online ROMS resources. We had a nice group lunch with
everyone saying their name and where they are from, but I caught
practically none of it. There was a group from Korea, then a person
from Canada and one from Virginia, but I think the rest were Chinese.
Gan had opened up 35 spaces for people, reserving 5 for his group
since the lab has 40 workstations.

The second day we got into the thick of it. Dale had asked me to
make sure they’d be able to run five different test problems. Three
are in the main trunk of the ROMS code, two are not. One of the
latter contains a C language Bessel function for initial conditions
so I hacked the Makefile to link to that thing – but only for that
case. How to do that? Well, with git we have the technology. Download
ROMS with git-svn to a master branch, then create two other branches,
one for the circle problem, one for the user’s own changes. I wrote
down the commands to set that all up and asked the users to go
through them. I went way too fast and we were dealing with weird
Bessel problems for days afterwards.

In the evenings, Rob and I were working on the python tools for
ROMS, aka pyroms. They are still very rough around the edges, but
contain a surprising amount of useful capabilities. They do however
have a frightful list of dependencies which I never quite got working
on my Mac, so Rob set me up with a VirtualBox containing Linux with
the whole works. Fred at Rutgers sent me tips and codes and even a
patch to get me going. My pyroms talks were a bit scattered and
last minute, but by golly, we signed up for the Sunday outing and
were determined to make the most of it. Twenty-three of us got a
quick tour of the Hong Kong highlights, from a reunification park,
to the Peak, to the south side of the island (lunch at Aberdeen,
Repulse Bay, and Stanley), then back to Kowloon for the Hollywood
Parade of Stars.

On Tuesday afternoon I went through setting up a grid in pyroms,
then copying an existing case in ROMS, then modifying that to work
with the new grid. I changed the open boundary from the east side to
the south side and showed them several ways to screw up and get no
flow at all. They then repeated the getting no flow part and
expected me to fix them up. Hence the Thursday slide about becoming
a detective.

On Wednesday afternoon I gave the talks about MPI and fish, but they
really wanted to see the setting up of a 3-D problem, so that’s what
we did on Thursday afternoon. We added tides from Oregon State, with
caveats about the 18.6 year components not being included, then we
added initial conditions from SODA, finally we added atmospheric
forcing which Rob had prepared for me. I showed them recompiling
umpteen times and mentioned that not everyone has the personality
for it. We finished with ROMS seg faulting on reading in one of
those files. So very true to life!

On Friday the morning was a Q&A session with Dale, Gan and I taking
turns answering the questions. Lots and lots of good questions. That
afternoon we were all ready to call it a day, except Rob volunteered
to lead a class on getting VirtualBox going on their laptops with
all the pyroms goodies – it took all afternoon!

The students were very involved, staying in the lab until we had to
throw them out at 5:30 every evening, even on the Saturday. There
were always many questions, though about half the students were
totally silent the whole time. I was asked to slow down, even when
I thought I was using my slow voice for foreigners – it wasn’t slow
enough, especially the hands-on parts. The Koreans invited us to
come visit them – perhaps in the fall. One is a former student of
Dale’s, now a professor (BJ), and two are researchers at KORDI, one
wanting a coupled ROMS-WRF model. It was good to see BJ and he did a
great job of asking the questions he thought his people would want
to hear the answers to. Gan told us half a dozen times some students
are looking for a quick model to learn, a black box where you hit a
button to make it go. Those students were disappointed, but I think
it was overall a positive experience. Gan was very impressed with
what Rob could contribute to the lab portion, answering many
questions, including those about setting stuff up on their home
computers.

Gan told us the idea for this training came about when he visits China
and gets bombarded with ROMS questions of all sorts, to the point where
he can’t get to the coffee during breaks. Now he can tell them all they
should have come to the class! If we do it again, it should be easier with
so much material already prepared. Gan says he started a competition between
his students for who can master git first.

P.S. I finally have a better understanding of build.bash, how and why
one would use it. I do feel that it’s unfortunate that it is not
a thing of mean, lean, clean beauty.