Recently I was asked to help set up ROMS grid for a lake which has its bathymetry data available only as

a set of point measurements from a boat: the depth was measured by echo sounder accompanied by GPS,

so the data comes as a file of the following structure, three numbers on each line

**Code:**

latitude longitude depth

... ... .....

along the trajectory of the boat, which is traversed the lake, literally like this,

Attachment:

**File comment:** Raw data: coast line and location of measured data points for bathymetry.
raw_bath_coast.png [ 62.28 KiB | Viewed 773 times ]
where the dots indicate points where the depth was measured: the placement is sparse and irregular.

They are connected by thin line which is simply sequence of the points in the file - presumably along the

trajectory, but some zig-zags and crossing are hard to explain.

Anyhow, interpolation from randomly placed data sounds like griddata, and does not look promising, given

the sparseness of data. Coastline for this lake is available from ESRI (a shape file *.shp in its

proprietary format), and is actually very good.

The coastline is needed because there are too few measurements near the coast, and the topography data

should be augmented by the coastline where the depth is presumed to be zero (or set to a minimum value).

Obviously, the first try is to use Matlab griddata -- the results were not very promising, as expected. Another

thing I bumped into, is that an innocent-looking

**Code:**

hraw=griddata(zlon,zlat,z, rlon,rlat);

[where zlon,zlat,z are geographical longitude,latitude, and depth of measured points expressed in

degrees,degrees, and meters; rlon,rlat longitude,latitude, of the target ROMS grid, also in degrees]

produce very erroneous result: it just connects wrong data points when doing triangulation.

Attachment:

**File comment:** Left: griddata applied directly to lon-lat coordinates (expressed in degrees) resulting in failure connect the nearest points, producing narrow triangles, and failing to achieve propery of Delone triangulation. Right - griddata in conformally projected coordinates.
page1.png [ 733.86 KiB | Viewed 773 times ]
The reason for this is because griddata interprets arrays passed to it through the arguments as Cartesian

coordinates, so it computes and compares distances simply as dt=sqrt[(x1-x2)^2+(y1-y2)^2]. Therefore,

because at 60 degrees North one degree in north-south direction means twice as much distance measured in

meters than one degree east-west, but griddata algorithm has no way to know about it.

I checked ROMS-related Matlab scripts available to me, and found that in most cases when griddata is used,

its arguments are geographical lon-lat coordinates. Obviously, Boris Delone would not be happy.

An intuitive way to correct the situation is to conformally transform the horizontal coordinates (both ROMS

grid and data points) into flat Cartesian coordinates: in the simplest case of very small domain just to

multiply longitude by cosine of the median latitude; a more diligent way is to use Lambert conformal

conical projection with its two standard latitudes optimally chosen for the particular modeling domain

(minimum distortion). For larger grids grids, and if the resolution is perfectly isotropic (pm=pn

everywhere) a good way is to transfer data points into coordinates of ROMS grid indices -- this is what I

ended up doing, however this option requires grid-index search algorithm which I do have (see later) and it

is independent of Matlab and griddata (seem that in Matlab world there only few options, and griddata is

the most often used for this purpose as well, even if the grids are regular. ...Any way, something needs

to be done.

The next thing to be careful about is that before even start messing with griddata, the point-wise data

must be preprocessed to filter out points which are too close to each other, and potentially contain contradictory

data: either because of GPS errors or echo sounder errors, when the depth is measured again

in nearly the same location (perhaps a later time, or on different/return pass by the boat), the measured

value does not repeat itself exactly, but griddata tries to fit all the points available resulting in almost

vertical walls in the interpolated field it produces. So the augmented and preprocessed data looks like

Attachment:

**File comment:** Preprocessed data: points which are too close to each other are averaged.
processed_bath_coast.png [ 67.32 KiB | Viewed 773 times ]
where all the data points which were closer than a user-specified minumum distance were averaged and merged

into a single point.

In practice it is a bit tricky because of the inherent data dependency: if three or more points come too

close to each other, the location of the averaged point may change the status of logical condition of whether

or not some of the source points should be averaged -- in may end up at some distance further away from its

neighbor. To resolve this, the procedure begins with a very small threshold -- only a fraction of user-specified

value, and gradually increasing it until reaching the user-specified value. This way, only two-point averaging

takes place at any stage (I cannot prove this mathematically, but experimentally this seems to be the case.

Also note that coastline points were averages as well. This is, strictly speaking, not necessary, but it

make it a little bit easier on griddata: depending on the option for selecting algorithm, it may take

griddata some time to compute these fields, or even fail to do so, complaining that "matrix is too big".

Matlab griddata supports four options "linear", "cubic", "natural", and "v4". Obviously nearest neighbor

does not count for our purposes, linear is kind of obvious, and is too crude; As for the remaining three,

there is very little information of what are the underlying algorithms.

Trying all of them results in: Linear option:Attachment:

**File comment:** griddata(....,'linear'); Left: geographical coordinates; right in conformal. (these plots corresponds to triangulations shown above.
page2.png [ 886.78 KiB | Viewed 773 times ]
Cubic:Attachment:

**File comment:** griddata(....,'cubic')
cubic.png [ 1.11 MiB | Viewed 773 times ]
Field on the right is point-wise Laplacian computed from the field on the left. The scaling does not

matter, but the scale in exactly the same on all plots presented here. Computing point-wise Laplacian

is useful to expose discontinuities of first and second derivatives, and this way expose quality of

interpolation. Also it is possible to track down both resolution/placement of the initial source data,

as well as the algorithm itself (in unknown).

In the case of "linear" (see above) all what you see in triangulation itself, with color intensity of

line proportional to the discontinuity of first derivative - a kind of bent between to flat planes.

Here, for cubic case what you see on the right is a typical signature of griddata: it almost like

"linear", except that facets inside triangles are not perfectly flat.

On this suite

https://www.mathworks.com/help/matlab/ref/griddata.html Matlab claims C2-continuity for

the "cubic" option for griddata. It is clearly not the case.

The called

"natural" neighbor,Attachment:

**File comment:** griddata(....,'natural')
natural.png [ 808.32 KiB | Viewed 773 times ]
not clear what exactly the algorithm is used here, but is clear that it is not suitable for our purposes,

as neither of the above. The original trajectory of the boat is clearly visible in the interpolated field,

and the jaggedness of the green contours on the south-western lope is clearly artificial.

Finally, "v4" option

Attachment:

**File comment:** griddata(......,'v4')
v4.png [ 700.23 KiB | Viewed 773 times ]
This one, indeed, looks more promissing. Laplacian field identifies locations of the data points, but

Laplacian field itself is continuous.

On its help suite

https://www.mathworks.com/help/matlab/ref/griddata.html Matlab identifies "v4" as

Biharmonic spline interpolation (MATLAB(R) 4 griddata method) supporting 2-D interpolation

only. Unlike the other methods, this interpolation is not based on a triangulation. Most of all I like the

(R) aspect of it....

However, words

biharmonic spline interpolation immediately brings association with David Sandwell

paper,

Sandwell, DT., 1987: Biharmonic Spline Interpolation of Geos-3 and Seasat Altimeter Data. Geophysical

Research Letters. 14:139-142,

https://doi.org/10.1029/GL014i002p00139 also available here:

https://topex.ucsd.edu/sandwell/publications/21.pdf,

So I decided to build my own Matlab-free procedure. To explain it briefly:

Similarly to one-dimensional cubic spline, which satisfy the variational principle of being the function

going through all data points, and having the minimum possible integral of square of the second derivative,

the two-dimensional interpolation is constructed as a 2D function f=f(x,y), such that it passes through all

the data points, and has minimum possible integral over the entire 2D domain of square of its Laplacian.

In its turn, variational derivative of such integral with respect to the finction is bi-laplacian

(Laplacian of Laplacian) of the function itself. Then, we demand that the bi-laplacian is equal to zero

everywhere, except data points. There is exist a fundamental function,

G(x,y)=(x^2+y^2)*[log(sqrt(x^2+y^2))-1]

which gas the property that its derivatives for up to the order 3 (inclucive) are continuous (including

the continuity in x=0,y=0), and its bi-laplacian is zero everywhere, except in (x=0,y=0), where it is delta

function. Then the function to be interpolated is constructed as

f(x,y) = sum w_j*G(x-x_j,y-y_j)

where (x_j,y_j) are the data points, and summation takes place over j.

Weights w_j are determined from the condition that f(x,y) passes through all data points,

that is f(x_k,y_k)=f_k, or

f_k = sum w_j*G(x_k-x_j, y_k-y_j)

or all data points k=1,..,N, and the latest summation is over index j, and it skips k=j [actually,

mathematically speaking it does not matter, because G(x,y) --> 0 as (x,y) --> (0,0), however, computing

log(0) on a digital computer is not a good idea].

So it boils down to solving a full-matrix linear system of matrix size NxN, where N is the number of data

points, hence N^3 operations (no way to cheat); then the interpolation itself, which is about computing

and adding up these N Green finctions at every point of ROMS grid, hence xi_rho*eta_rho*N operations

more. It is brute-force approach, but for a reasonable number of points it should work.

The good news is that the matrix of the system is symmetric. The bad news is that it is expected to have

some of its element be very small, others very large, and not well-condition as the result. Thus the

algorithm for solving it should be prepared to handle it.

The sequence of operations to prepare bottom "hraw": All commands below come from "tools.tar" attached to the bottom of this post. All of them are

compile-once--use-forever command-line operators.

To start, you are supposed to have two files, "coastline.dat" and "measured_bath.dat" which ASCII files

containing coastline points, and scattered data for topography, whitten in format

**Code:**

latitude longitude [depth].

two and three numbers per line. Note that latitude comes first, as in navigation tradition.

And you are supposed to have ROMS-style grid file which contains variables "lon_rho" and "lat_rho"

expressed in degrees.

Optionally, if you have an ESRI data for your lake or sea, you may extract the coastline

**Code:**

read_esri_data lake.shp coastline.dat

which takes an ESRI-style binary shape file and creates an easy-to-read by Matlab or whatever file

"coastline.dat" - and ASCII file containing Lat,Lon coordinates of points on shoreline (two numbers

per line, N lines for N points on the shoreline). This operation is merely a file-format conversion.

This step is optional in sense that all ot does not matter how file "coastline.dat" was obtained,

and what is the original source of data. The contour should be closed. Optionally the outcome can

be visualized

Optionally you may plot it (black-and-white plots above were produced by this program)

**Code:**

plot_coast_traj coastline.dat

which is compile-once-use-forever NCAR-graphics-based tool. It takes one or two arguments and produces

PDF file always called "file.pdf" (in addition to NCAR graphics library, it relies on Linux image format

conversion tools ps2epsi and epstopdf, which usually come with Linux distribution). This is purely for

visualization/checking purposes, does not affect any of the data files produced here.

**Code:**

plot_coast_traj coastline.dat measured_bath.dat

which superimposes available data points onto coastline plot. See "file.pdf".

Produce merged coast-bathymetry file

**Code:**

merge_coast_topo measured_bath.dat coastline.dat 150 merged_bath_coast.dat

After this operation the last file "merged_bath_coast.dat" contains bathymetry measured along the

trajectory of the boat (same as in measured_bath.dat) appended by the coastline where the bathymetry

is specified as zero. This is needed to make sure that interpolation algorithm (e.g., griddata_v4, or

else) has values at the coastline to lock on rather than extrapolate from inside (and end up with

meaningless values).

In addition to that this operation also perform averaging of measured points which are too close to

each other, hence potentially contain contradictory data causing interpolation routine produce steep

gradients. The third argument is minimum distance allowed specified in meters.

Once again, the outcome of this procedure can be visualized by command

**Code:**

plot_coast_traj coastline.dat merged_bath_coast.dat

and looking at PDF file "file.pdf".

The next step is to compute raw bathymetry "hraw",

**Code:**

biharm merged_bath_coast.dat roms_grid.nc

which is an alternative to running Matlab script "make_bathymetry.m" where griddata is set to option

v4 resulting in Sandwell (1987) biharmonic spline interpolation from sparse irregularly placed data

points. Either way, it takes the merged data file "merged_bath_coast.dat" and ROMS-style grid file

and produces raw topography "hraw".

After this moment your "hraw" is ready. Inspect it with ncview. Also useful is to compute Laplacian

and bi-laplacian

**Code:**

hmask roms_grid.nc hraw

laplace hmask.nc hraw

laplace lap_hraw.nc lap_hraw

and inspect them with ncview as well. The first one gives a measure of smoothness of the interpolated field.

The second proves that hraw=griddata(x,y,z, xi,yj, 'v4') is, indeed bi-harmonic spline interpolation as

described in paper of David Sandwell, 1987: bi-laplacian is equal to zero except in data points, as delta

functions.

Building land/sea mask "mask_rho" and smooth topography "h" Overall the rest is standard for ROMS procedures.

Note while building "hraw" does not depend on having "mask_rho" is advance, that smoothing topography

to get "h" needs "mask_rho" to proceed.

**Code:**

coastline2mask coastline.dat roms_grid.nc

it reads "coastline.dat" file made in the previous step and generates land mask placing it into

a separate file called "mask.nc".

The next step is to transfer and post-process land/sea mask:

**Code:**

mv mask.nc roms_nask.nc

copymask roms_nask.nc roms_grid.nc

single_connect 500 200 roms_grid.nc

mask_narrow_bays roms_grid.nc

commit_mask roms_grid.nc

where all the commands executed above are from my pre-/post-processing FORTRAN toolbox, and all of

them are self-explanatory if executed without arguments. This is, in fact, the standard ROMS building

procedure for land/sea mask: all what is new here is that source for shoreline comes from a user-supplies

file rather than GSHHS dataset.

Above numbers 500 200 meaning (i=500,j=200) point an arbitrary point inside the designated water body

(if the initial land mask is such that there are more than one water body (for example, multiple lakes

inside land mass), then only the one containing the point (i,j) will be left open; all others blocked.

In principle, land/sea mask can be edited manually and/or semi-manually (i.e. iterating manual editing,

single, and mask_narrow_bays).

After this step land mask stored as netCDF variable "mask_rho" is complete.

Produce version of bottom topography to be used by the model:

**Code:**

lsmooth 1.0 25.5 0.15 roms_grid.nc

where 1.0 and 25.5 are minimum and maximum depth in meters; 0.15 is maximum allowed topographic r-factor.

This is it. Good luck to everybody who cares and finds this useful.

P.S.: The main engine for this procedure is biharm.F. Its computationally-critical parts are parallelized.

It is guaranteed to be one or two orders of magnitude faster than Matlab griddata(...,'v4'), and can handle

bigger matrices.