Last week I was lucky enough to attend the Software Engineering Assembly 2015 at NCAR in Boulder, Colorado. This year it was all about what’s going on in Python in Scientific Computing. The agenda and links to the slides are here. I encourage you to check them out.
The first talk was on Jupyter, a browser-based means of running not just Python, but many, many computer language. I heard about IPython notebooks just a few months ago through one of the Coursera courses on Python and haven’t yet gotten a chance to play with them much. Enrique is using them for teaching his class – you can run things interactively, you can have equations, you can output it all to pdf. Anyway, the Jupyter talk is from the maker of IPython who has now set his sights on bigger things, like including Julia, R, bash, Fortran, etc. in the languages you can access from these notebooks (someone even got Matlab working). The conference ended with a tutorial on Jupyter (Python only), a good thing since we all wanted exactly that after the first talk.
There were many, many interesting talks about all kinds of things I wasn’t aware of. One that struck a chord with many was the talk on “technical debt” by a guy who does a lot of software development in the field, under pressure. He showed pictures of places he’s been, from research ships to remote tropical islands. His point was that you can get code running, but you won’t have spent the time you might want to on making it clean and robust, writing the test suites, etc. So the debt is in what you owe that software before you count on it in the field for the next trip. It’s OK to have some technical debt, but you need to realize that you owe debt – and there’s also the explaining to your boss that there’s unfinished business.
The talk on out-of-core computations was awesome, with a visual of code running on different parts of a huge array a few pieces at a time (in parallel). It got me thinking about running the boundary condition interpolater in parallel so I talked to the guy afterwards and he said my problem should be easy. He wanted to know what other sorts of problems we (ocean modelers) are trying to solve – he works for Anaconda, a company supporting scientific Python (and maybe other languages… Julia?).
Package managers
Speaking of Anaconda, many there were using it as their Python package manager. It gives you a suite of tools guaranteed to work together, as in being consistent versions of things. So to get the next Python package, you type:
conda install xxx
Otherwise, the Pythonic way is now:
pip install xxx
You can also:
pip install xxx –user
to put pip packages in your user space on computers in which you can’t write to /usr/local.
I just got the OS updated on my Mac last month and I somehow stumbled on:
brew install xxx
but brew doesn’t have all the Python tools, so I have used pip on a few, and installed one or two from sources. One (very) young man there says he installs his Python and all the packages from source, but I’ve been there, done that, I’m ready to move on to packages.
The talk on creating your own pip packages did not cover packages with compiled C or Fortran. That’s more challenging, with things having moved from Eggs to Wheels – and you need to make precompiled binaries for all the systems or else your users need to have the compilers themselves (which I figure ROMS users do anyway).
Python 2 vs 3
I also asked the Anaconda guy about the state of Python 3. He says the tools are all pretty much ready, no reason not to move to it. If you need Unicode, you need Python 3. On the other hand, the NCAR folks with the PyNIO and PyNGL libraries still only support Python 2. Someone said to code in the subset of the language which works with both, putting () on your print statements. There’s a __future__ module, not sure what it does.
Wrapping Models
Last fall someone said they’d like to have a ROMS version with Python wrappers around the computational guts and to call the guts from Python. I’d never thought if doing that before, but Johnny Lin gave a talk on doing it for a simpler atmospheric model. He described the steps and the reason he wanted to do it (parameter exploration), but the part I loved best was when he admitted that it took so long he ran out of time before he actually did the parameter exploration.
The kgen talk was on another way of wrapping Fortran, this time one subroutine at a time. It’s a way to isolate it for testing purposes.
Parallel I/O
One tutorial was on parallel I/O through MPI-I/O. We got temporary accounts on stampede, a huge system in Texas. I’m afraid that while I really, really want to be doing parallel I/O, I don’t want to be messing with it at the MPI-I/O level. Anyway, they explained why having 1000 cores each write to their own file is a very bad idea, especially on a Lustre filesystem. You do want to be writing to one file in parallel, but you probably don’t want all 1000 cores to be writing either, so they have a T3PIO library for tuning your I/O, starting with a default of one writer per 16-core node. The French Nemo models uses XIOS for parallel I/O and the NCAR CESM has a PIO library too. Getting them to work with T3PIO should be easy.
Wrap-up
I can’t say enough about what a great workshop this was. I feel that attending has given me some “technical debt” in that I now need to spend some time using one or two things I learned about there. Thanks to the Coursera courses I’ve finally learned enough Python to be able to use it for things and I’ve begun making a parser for the ROMS ocean.in file, turning it into a Python dictionary and being able to write out a file made up of info from three different files for a nested grid problem. It still needs work and probably inspired this comic from afar.
One more thing…
I’d heard of JSON before, but recently discovered that it’s a sort of file format. It’s a subset of YAML, simpler than XML. I heard some mention of “ugly XML” at the meeting, like people have moved on to something they like better.