Building ROMS grid topography from scattered, sparse data

Discussion about analysis, visualization, and collaboration tools and techniques

Moderators: arango, robertson

Post Reply
User avatar
Posts: 185
Joined: Fri Nov 14, 2003 4:57 pm

Building ROMS grid topography from scattered, sparse data

#1 Post by shchepet » Tue Nov 20, 2018 3:59 pm

Recently I was asked to help set up ROMS grid for a lake which has its bathymetry data available only as
a set of point measurements from a boat: the depth was measured by echo sounder accompanied by GPS,
so the data comes as a file of the following structure, three numbers on each line

Code: Select all

        latitude longitude depth
        ... ... .....
along the trajectory of the boat, which is traversed the lake, literally like this,
Raw data: coast line and location of measured data points for bathymetry.
Raw data: coast line and location of measured data points for bathymetry.
where the dots indicate points where the depth was measured: the placement is sparse and irregular.
They are connected by thin line which is simply sequence of the points in the file - presumably along the
trajectory, but some zig-zags and crossing are hard to explain.

Anyhow, interpolation from randomly placed data sounds like griddata, and does not look promising, given
the sparseness of data. Coastline for this lake is available from ESRI (a shape file *.shp in its
proprietary format), and is actually very good.

The coastline is needed because there are too few measurements near the coast, and the topography data
should be augmented by the coastline where the depth is presumed to be zero (or set to a minimum value).

Obviously, the first try is to use Matlab griddata -- the results were not very promising, as expected. Another
thing I bumped into, is that an innocent-looking

Code: Select all

 hraw=griddata(zlon,zlat,z, rlon,rlat); 
[where zlon,zlat,z are geographical longitude,latitude, and depth of measured points expressed in
degrees,degrees, and meters; rlon,rlat longitude,latitude, of the target ROMS grid, also in degrees]
produce very erroneous result: it just connects wrong data points when doing triangulation.
Left: griddata applied directly to lon-lat coordinates (expressed in degrees) resulting in failure connect the nearest points, producing narrow triangles, and failing to achieve propery of Delone triangulation. Right - griddata in conformally projected coordinates.
Left: griddata applied directly to lon-lat coordinates (expressed in degrees) resulting in failure connect the nearest points, producing narrow triangles, and failing to achieve propery of Delone triangulation. Right - griddata in conformally projected coordinates.
The reason for this is because griddata interprets arrays passed to it through the arguments as Cartesian
coordinates, so it computes and compares distances simply as dt=sqrt[(x1-x2)^2+(y1-y2)^2]. Therefore,
because at 60 degrees North one degree in north-south direction means twice as much distance measured in
meters than one degree east-west, but griddata algorithm has no way to know about it.

I checked ROMS-related Matlab scripts available to me, and found that in most cases when griddata is used,
its arguments are geographical lon-lat coordinates. Obviously, Boris Delone would not be happy.

An intuitive way to correct the situation is to conformally transform the horizontal coordinates (both ROMS
grid and data points) into flat Cartesian coordinates: in the simplest case of very small domain just to
multiply longitude by cosine of the median latitude; a more diligent way is to use Lambert conformal
conical projection with its two standard latitudes optimally chosen for the particular modeling domain
(minimum distortion). For larger grids grids, and if the resolution is perfectly isotropic (pm=pn
everywhere) a good way is to transfer data points into coordinates of ROMS grid indices -- this is what I
ended up doing, however this option requires grid-index search algorithm which I do have (see later) and it
is independent of Matlab and griddata (seem that in Matlab world there only few options, and griddata is
the most often used for this purpose as well, even if the grids are regular. ...Any way, something needs
to be done.

The next thing to be careful about is that before even start messing with griddata, the point-wise data
must be preprocessed to filter out points which are too close to each other, and potentially contain contradictory
data: either because of GPS errors or echo sounder errors, when the depth is measured again
in nearly the same location (perhaps a later time, or on different/return pass by the boat), the measured
value does not repeat itself exactly, but griddata tries to fit all the points available resulting in almost
vertical walls in the interpolated field it produces. So the augmented and preprocessed data looks like
Preprocessed data: points which are too close to each other are averaged.
Preprocessed data: points which are too close to each other are averaged.
where all the data points which were closer than a user-specified minumum distance were averaged and merged
into a single point.

In practice it is a bit tricky because of the inherent data dependency: if three or more points come too
close to each other, the location of the averaged point may change the status of logical condition of whether
or not some of the source points should be averaged -- in may end up at some distance further away from its
neighbor. To resolve this, the procedure begins with a very small threshold -- only a fraction of user-specified
value, and gradually increasing it until reaching the user-specified value. This way, only two-point averaging
takes place at any stage (I cannot prove this mathematically, but experimentally this seems to be the case.

Also note that coastline points were averages as well. This is, strictly speaking, not necessary, but it
make it a little bit easier on griddata: depending on the option for selecting algorithm, it may take
griddata some time to compute these fields, or even fail to do so, complaining that "matrix is too big".

Matlab griddata supports four options "linear", "cubic", "natural", and "v4". Obviously nearest neighbor
does not count for our purposes, linear is kind of obvious, and is too crude; As for the remaining three,
there is very little information of what are the underlying algorithms.

Trying all of them results in:

Linear option:
griddata(....,'linear'); Left: geographical coordinates; right in conformal. (these plots corresponds to triangulations shown above.
griddata(....,'linear'); Left: geographical coordinates; right in conformal. (these plots corresponds to triangulations shown above.
Field on the right is point-wise Laplacian computed from the field on the left. The scaling does not
matter, but the scale in exactly the same on all plots presented here. Computing point-wise Laplacian
is useful to expose discontinuities of first and second derivatives, and this way expose quality of
interpolation. Also it is possible to track down both resolution/placement of the initial source data,
as well as the algorithm itself (in unknown).

In the case of "linear" (see above) all what you see in triangulation itself, with color intensity of
line proportional to the discontinuity of first derivative - a kind of bent between to flat planes.

Here, for cubic case what you see on the right is a typical signature of griddata: it almost like
"linear", except that facets inside triangles are not perfectly flat.

On this suite Matlab claims C2-continuity for
the "cubic" option for griddata. It is clearly not the case.

The called "natural" neighbor,
not clear what exactly the algorithm is used here, but is clear that it is not suitable for our purposes,
as neither of the above. The original trajectory of the boat is clearly visible in the interpolated field,
and the jaggedness of the green contours on the south-western lope is clearly artificial.

Finally, "v4" option
This one, indeed, looks more promissing. Laplacian field identifies locations of the data points, but
Laplacian field itself is continuous.

On its help suite Matlab identifies "v4" as
Biharmonic spline interpolation (MATLAB(R) 4 griddata method) supporting 2-D interpolation
only. Unlike the other methods, this interpolation is not based on a triangulation.

Most of all I like the (R) aspect of it....

However, words biharmonic spline interpolation immediately brings association with David Sandwell

Sandwell, DT., 1987: Biharmonic Spline Interpolation of Geos-3 and Seasat Altimeter Data. Geophysical
Research Letters. 14:139-142,

also available here:,

So I decided to build my own Matlab-free procedure.

To explain it briefly:

Similarly to one-dimensional cubic spline, which satisfy the variational principle of being the function
going through all data points, and having the minimum possible integral of square of the second derivative,
the two-dimensional interpolation is constructed as a 2D function f=f(x,y), such that it passes through all
the data points, and has minimum possible integral over the entire 2D domain of square of its Laplacian.
In its turn, variational derivative of such integral with respect to the finction is bi-laplacian
(Laplacian of Laplacian) of the function itself. Then, we demand that the bi-laplacian is equal to zero
everywhere, except data points. There is exist a fundamental function,


which gas the property that its derivatives for up to the order 3 (inclucive) are continuous (including
the continuity in x=0,y=0), and its bi-laplacian is zero everywhere, except in (x=0,y=0), where it is delta
function. Then the function to be interpolated is constructed as

f(x,y) = sum w_j*G(x-x_j,y-y_j)

where (x_j,y_j) are the data points, and summation takes place over j.

Weights w_j are determined from the condition that f(x,y) passes through all data points,
that is f(x_k,y_k)=f_k, or

f_k = sum w_j*G(x_k-x_j, y_k-y_j)

or all data points k=1,..,N, and the latest summation is over index j, and it skips k=j [actually,
mathematically speaking it does not matter, because G(x,y) --> 0 as (x,y) --> (0,0), however, computing
log(0) on a digital computer is not a good idea].

So it boils down to solving a full-matrix linear system of matrix size NxN, where N is the number of data
points, hence N^3 operations (no way to cheat); then the interpolation itself, which is about computing
and adding up these N Green finctions at every point of ROMS grid, hence xi_rho*eta_rho*N operations
more. It is brute-force approach, but for a reasonable number of points it should work.

The good news is that the matrix of the system is symmetric. The bad news is that it is expected to have
some of its element be very small, others very large, and not well-condition as the result. Thus the
algorithm for solving it should be prepared to handle it.

The sequence of operations to prepare bottom "hraw":

All commands below come from "tools.tar" attached to the bottom of this post. All of them are
compile-once--use-forever command-line operators.

To start, you are supposed to have two files, "coastline.dat" and "measured_bath.dat" which ASCII files
containing coastline points, and scattered data for topography, whitten in format

Code: Select all

 latitude longitude [depth].
two and three numbers per line. Note that latitude comes first, as in navigation tradition.

And you are supposed to have ROMS-style grid file which contains variables "lon_rho" and "lat_rho"
expressed in degrees.

Optionally, if you have an ESRI data for your lake or sea, you may extract the coastline

Code: Select all

       read_esri_data lake.shp coastline.dat
which takes an ESRI-style binary shape file and creates an easy-to-read by Matlab or whatever file
"coastline.dat" - and ASCII file containing Lat,Lon coordinates of points on shoreline (two numbers
per line, N lines for N points on the shoreline). This operation is merely a file-format conversion.
This step is optional in sense that all ot does not matter how file "coastline.dat" was obtained,
and what is the original source of data. The contour should be closed. Optionally the outcome can
be visualized

Optionally you may plot it (black-and-white plots above were produced by this program)

Code: Select all

       plot_coast_traj coastline.dat
which is compile-once-use-forever NCAR-graphics-based tool. It takes one or two arguments and produces
PDF file always called "file.pdf" (in addition to NCAR graphics library, it relies on Linux image format
conversion tools ps2epsi and epstopdf, which usually come with Linux distribution). This is purely for
visualization/checking purposes, does not affect any of the data files produced here.

Code: Select all

       plot_coast_traj coastline.dat measured_bath.dat
which superimposes available data points onto coastline plot. See "file.pdf".

Produce merged coast-bathymetry file

Code: Select all

    merge_coast_topo measured_bath.dat coastline.dat 150 merged_bath_coast.dat
After this operation the last file "merged_bath_coast.dat" contains bathymetry measured along the
trajectory of the boat (same as in measured_bath.dat) appended by the coastline where the bathymetry
is specified as zero. This is needed to make sure that interpolation algorithm (e.g., griddata_v4, or
else) has values at the coastline to lock on rather than extrapolate from inside (and end up with
meaningless values).

In addition to that this operation also perform averaging of measured points which are too close to
each other, hence potentially contain contradictory data causing interpolation routine produce steep
gradients. The third argument is minimum distance allowed specified in meters.

Once again, the outcome of this procedure can be visualized by command

Code: Select all

       plot_coast_traj coastline.dat merged_bath_coast.dat
and looking at PDF file "file.pdf".

The next step is to compute raw bathymetry "hraw",

Code: Select all

       biharm merged_bath_coast.dat
which is an alternative to running Matlab script "make_bathymetry.m" where griddata is set to option
v4 resulting in Sandwell (1987) biharmonic spline interpolation from sparse irregularly placed data
points. Either way, it takes the merged data file "merged_bath_coast.dat" and ROMS-style grid file
and produces raw topography "hraw".

After this moment your "hraw" is ready. Inspect it with ncview. Also useful is to compute Laplacian
and bi-laplacian

Code: Select all

       hmask hraw
       laplace hraw
       laplace lap_hraw
and inspect them with ncview as well. The first one gives a measure of smoothness of the interpolated field.
The second proves that hraw=griddata(x,y,z, xi,yj, 'v4') is, indeed bi-harmonic spline interpolation as
described in paper of David Sandwell, 1987: bi-laplacian is equal to zero except in data points, as delta

Building land/sea mask "mask_rho" and smooth topography "h"

Overall the rest is standard for ROMS procedures.

Note while building "hraw" does not depend on having "mask_rho" is advance, that smoothing topography
to get "h" needs "mask_rho" to proceed.

Code: Select all

     coastline2mask coastline.dat
it reads "coastline.dat" file made in the previous step and generates land mask placing it into
a separate file called "".

The next step is to transfer and post-process land/sea mask:

Code: Select all

       single_connect 500 200
where all the commands executed above are from my pre-/post-processing FORTRAN toolbox, and all of
them are self-explanatory if executed without arguments. This is, in fact, the standard ROMS building
procedure for land/sea mask: all what is new here is that source for shoreline comes from a user-supplies
file rather than GSHHS dataset.

Above numbers 500 200 meaning (i=500,j=200) point an arbitrary point inside the designated water body
(if the initial land mask is such that there are more than one water body (for example, multiple lakes
inside land mass), then only the one containing the point (i,j) will be left open; all others blocked.
In principle, land/sea mask can be edited manually and/or semi-manually (i.e. iterating manual editing,
single, and mask_narrow_bays).

After this step land mask stored as netCDF variable "mask_rho" is complete.

Produce version of bottom topography to be used by the model:

Code: Select all

     lsmooth 1.0 25.5 0.15
where 1.0 and 25.5 are minimum and maximum depth in meters; 0.15 is maximum allowed topographic r-factor.

This is it. Good luck to everybody who cares and finds this useful.

P.S.: The main engine for this procedure is biharm.F. Its computationally-critical parts are parallelized.
It is guaranteed to be one or two orders of magnitude faster than Matlab griddata(...,'v4'), and can handle
bigger matrices.
(1.54 MiB) Downloaded 111 times

User avatar
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Building ROMS grid topography from scattered, sparse dat

#2 Post by m.hadfield » Wed Nov 21, 2018 10:07 pm

Thank you very much for that Sasha.

For a while now I have handed my bathymetry interpolation duties to the GMT surface utility:

I'll have a good look at what you've done and compare them.

Posts: 139
Joined: Fri Jan 08, 2010 7:22 pm
Location: Theiss Research

Re: Building ROMS grid topography from scattered, sparse dat

#3 Post by rduran » Sat Nov 24, 2018 2:41 am

Here are some Matlab tools that are based on ideas that seem to be along the same lines. It should be relatively straightforward to test them so it's probably worth the time. I was happy with gridfit when I used it back in the day.

Gridfit offers "diffusion", "gradient" and "springs" options to fit a surface to data points. ... ng-gridfit

This one I have not tested myself, but is supposed to improve on gridfit: ... rizedata3d

I'm sure many of us would be interested to read about any tests on interpolating/fitting scattered data!

Post Reply