"missing_value" for masked regions in NetCDF output files?

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
rsignell
Posts: 123
Joined: Fri Apr 25, 2003 9:22 pm
Location: USGS

"missing_value" for masked regions in NetCDF output files?

#1 Post by rsignell » Wed Oct 22, 2008 2:00 pm

ROMS folk,

Although ROMS 3.0 NetCDF output files are "CF-compliant", there is currently no CF-standard to specify masking. As a result, these tools plot whatever values are in the masked regions, which can be ugly at best, misleading at worst.

Since CF visualization and access clients already handle missing values, could we simply write -99999.0 (or some other special value) into the values of "temp", "salt", and other dependent variables in the NetCDF output files in regions that are masked and add the "missing_value" attribute to these variables?

Would this cause any problems for ROMS? If so, I guess this would mean that the values of variables in masked regions DO matter! If not, we should add this to the ROMS wish list, as then the existing clients could correctly show masked regions without implementing new standards and features.

Thanks,
Rich

User avatar
kate
Posts: 3808
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: "missing_value" for masked regions in NetCDF output files?

#2 Post by kate » Wed Oct 22, 2008 6:36 pm

Good suggestion! The only caveat I can think of is that ROMS should do an extra multiply by mask on reading in fields. We don't want that -999 in the land salinity!

Speaking of masks, the ROMS mask has a _FillValue of 1, which causes trouble for tools expecting that to be the special value. I've had to work around it once or twice.

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: "missing_value" for masked regions in NetCDF output files?

#3 Post by arango » Wed Oct 22, 2008 6:52 pm

I am moving away from the missing_value attribute and using the _FillValue attribute instead. Notice that the :arrow: missing_value attribute:
is not treated in any special way by the library or conforming generic applications.
Now, I think that over-writing the masked areas with special value is kind of dangerous. What about adding a new attribute, say mask, which points to the array used for masking:

Code: Select all

        float temp(ocean_time, s_rho, eta_rho, xi_rho) ;
                temp:long_name = "potential temperature" ;
                temp:units = "Celsius" ;
                temp:time = "ocean_time" ;
                temp:coordinates = "lon_rho lat_rho s_rho ocean_time" ;
                temp:mask = "mask_rho" ;
so any application just needs to multiply by the specified masking array. We will need a logic here for 3D variables since the masking is a 2D array in the ROMS case. I wonder if this should be inside or part of the coordinates attribute. Perhaps, you can bring this issue to the standards committee.

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: "missing_value" for masked regions in NetCDF output files?

#4 Post by arango » Wed Oct 22, 2008 7:07 pm

kate wrote:Speaking of masks, the ROMS mask has a _FillValue of 1, which causes trouble for tools expecting that to be the special value. I've had to work around it once or twice.
There is not a _FillValue attribute in the definition of Land/Sea mask arrays of ROMS.

Code: Select all

        double mask_rho(eta_rho, xi_rho) ;
                mask_rho:long_name = "mask on RHO-points" ;
                mask_rho:option_0 = "land" ;
                mask_rho:option_1 = "water" ;
                mask_rho:coordinates = "lon_rho lat_rho" ;
We need to be careful when defining a variable with the _FillValue attribute since it needs to be of the same type as the defined variable. This is problematic in some Cray computers that not support 4-byte reals when writing single-precision variables. Mark and I went through this exercise recently.

rsignell
Posts: 123
Joined: Fri Apr 25, 2003 9:22 pm
Location: USGS

Re: "missing_value" for masked regions in NetCDF output files?

#5 Post by rsignell » Wed Oct 22, 2008 9:08 pm

arango wrote:I am moving away from the missing_value attribute and using the _FillValue attribute instead. Notice that the missing_value attribute:
is not treated in any special way by the library or conforming generic applications.

What about adding a new attribute, say mask, which points to the array used for masking:

Code: Select all

        float temp(ocean_time, s_rho, eta_rho, xi_rho) ;
                temp:long_name = "potential temperature" ;
                temp:units = "Celsius" ;
                temp:time = "ocean_time" ;
                temp:coordinates = "lon_rho lat_rho s_rho ocean_time" ;
                temp:mask = "mask_rho" ;
so any application just needs to multiply by the specified masking array. We will need a logic here for 3D variables since the masking is a 2D array in the ROMS case. I wonder if this should be inside or part of the coordinates attribute. Perhaps, you can bring this issue to the standards committee.
Hernan, I agree that if we were to propose a "masking" convention to the CF standards committee, it would look exactly as you have indicated, and indeed, it's very simple. But then of course, the existing clients (like NCVIEW, the Matlab/NetCDF tools, etc) would need to implement this standard. Since these tools already handle "missing_value" and "_FillValue" (preferred, as you mention), I was just wondering why couldn't just write the "_FillValue" instead and we wouldn't have to have add a masking standard.
arango wrote: Now, I think that over-writing the masked areas with special value is kind of dangerous.
I'm very curious why writing a special value in masked regions would be dangerous. Don't the values in the masked regions always get zeroed out by the mask in ROMS? If the values in the masked regions affect the solution, that would indeed seem dangerous!

-Rich

User avatar
hetland
Posts: 79
Joined: Thu Jul 03, 2003 3:39 pm
Location: TAMU,USA

Re: "missing_value" for masked regions in NetCDF output files?

#6 Post by hetland » Thu Oct 23, 2008 2:14 am

I can't think of a single good reason not to use _FillValue. I've wondered for years why ROMS does not do this (but have been too lazy to write the wrapper). Other models, like GETM, do this, and suffer no ill effects. And it makes quick looks in ncview much nicer. Note, the values in the model would not need to be changed, just the values written to the file. In the model, the salinity on land could still be zero.

As for reading in values in other programs, most netcdf readers worth their salt obey _FillValue attributes. If not, using -999 makes it pretty clear pretty fast that that is not a real salinity or temperature....

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: "missing_value" for masked regions in NetCDF output files?

#7 Post by arango » Thu Oct 23, 2008 3:10 am

rsignell wrote:I'm very curious why writing a special value in masked regions would be dangerous. Don't the values in the masked regions always get zeroed out by the mask in ROMS? If the values in the masked regions affect the solution, that would indeed seem dangerous!
It is just what Kate mentioned about applying the mask during reading to make sure that the fill value is removed. We need to be careful with objective analysed data in case that the masking needs to be revisited. We need to be very aware of such values during restart, adjoint-based extensive IO, and interpolation weights for coupling, nesting, and so on.

I assume that implementing this will not hard. Few changes are needed in nf_fread*d.F and nf_fwrite*d.F which already passes the mask array.

There is a lot of discussion about the use of _FillValue versus missing_value attributes. Just google about this to find a lot of discussion about deprecating missing_value. As I mentioned above, I personally prefer the _FillValue attribute. The issue is that we need to define this attribute with the same type as the declared variable. Otherwise, we will get something like:
ERROR: Abnormal termination: NetCDF OUTPUT.
REASON: NetCDF: Not a valid data type or _FillValue type mismatch
In NetCDF the fill value is set by the parameter NF90_FILL_REAL which has a value of 9.9692099683868690E+36. This is an awful number which is very difficult to remember. Of course, there are ways to change this value or select any other special value.

Mark Hadfield and I talked about this recently. There is not need to set the _FillValue for a variable explicitly if we use 9.9692099683868690E+36 in masked places, for instance. However, I prefer to have this value defined explicitly for completeness. We currently, use 1.0E+35 in the floats. I believe that a well coded application should inquire about the value of this attribute and process the data accordingly. That's what we do in basic matlab scripts to process NetCDF files.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: "missing_value" for masked regions in NetCDF output files?

#8 Post by m.hadfield » Fri Oct 24, 2008 12:58 am

As Hernan says, he and I had a discussion a week ago about what fill value to set (or whether to use the netCDF library's default) and how to communicate this info to downstream applications. This was relating to float trajectories. See

https://www.myroms.org/projects/src/ticket/217

and revision 243. However the tricky bit was related to a bug in the netCDF library on the Cray T3E, namely that it can't write 4-byte real values into attributes. This is a bug on a specific, little-used platform, probably fixable with a bit of effort, and I wouldn't like to see it be an impediment to making ROMS behave sensibly.

I argued that the sensible fill value to use is the netCDF default, which as Hernan says is 9.9692099683868690E+36. Yes, this is an awkward-looking number (in decimal notation anyway), but netCDF-aware applications should never have to deal with it in that form: they should all know it as NF90_FILL_REAL. In a CDL file it is represented with a hyphen.

I also argued that, if you use this fill value, then you don't need to set a _FillValue attribute for each variable. However on this issue I have come around to Hernan's point of view: if you're going to use the fill value to represent missing data, then it's wise always to set a _FillValue attribute for the variable in question, as some downstream applications may expect it. (I believe this is true of the NCO utilities.)

Anyway, these are technicalities. I concur with Rob & others that fill values are the way to go.

rsignell
Posts: 123
Joined: Fri Apr 25, 2003 9:22 pm
Location: USGS

Re: "missing_value" for masked regions in NetCDF output files?

#9 Post by rsignell » Fri Oct 24, 2008 2:15 am

Hernan,

I totally agree with you that we should use "_FillValue" instead of "missing_value" and that we should specify this explicitly. For floats and doubles, I guess the value of 1.0e35 is fine. But for the rest, I suggest we go with the default Unidata values, but also specify them explicitly as attributes in the NetCDF files:

Code: Select all

     parameter (nf_fill_byte = -127)
     parameter (nf_fill_int1 = nf_fill_byte)
     parameter (nf_fill_char = 0)
     parameter (nf_fill_short = -32767)
     parameter (nf_fill_int2 = nf_fill_short)
     parameter (nf_fill_int = -2147483647)
     parameter (nf_fill_float = 9.9692099683868690e+36)
     parameter (nf_fill_real = nf_fill_float)
     parameter (nf_fill_double = 9.9692099683868690e+36)
     parameter (nf_fill_ubyte = 255)
     parameter (nf_fill_ushort = 65535)
Finally, as you say, we need to pay attention and specify the "_FillValue" attribute value is the same type as the variable it's associated with.

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: "missing_value" for masked regions in NetCDF output files?

#10 Post by arango » Fri Oct 24, 2008 2:45 am

I implemented this request. See :arrow: track ticket 222. The change is generic and uses parameter spval = 1.0E+35 which is declared in mod_scalars.F. The default NetCDF value nf_fill_double = 9.9692099683868690e+36 is not fully written in the CDL file produced by ncdump. This kind of annoys me. Anyway, this declaration is very easy to change to any other value.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: "missing_value" for masked regions in NetCDF output files?

#11 Post by m.hadfield » Fri Oct 24, 2008 3:12 am

I'd like to request one change: set spval to 1.0E37. Unlike the present value, 1.0E35, this is greater than NF90_FILL_REAL, so, even if a variable does not have an explicit _FillValue attribute, a well-behaved netCDF application will recognise spval as being outside the valid range, according to the conventions described here:

http://www.unidata.ucar.edu/software/ne ... onventions

Specifically:
If neither valid_min, valid_max nor valid_range is defined then generic applications should define a valid range as follows. If the data type is byte and _FillValue is not explicitly defined, then the valid range should include all possible values. Otherwise, the valid range should exclude the _FillValue (whether defined explicitly or by default) as follows. If the _FillValue is positive then it defines a valid maximum, otherwise it defines a valid minimum. For integer types, there should be a difference of 1 between the _FillValue and this valid minimum or maximum. For floating point types, the difference should be twice the minimum possible (1 in the least significant bit) to allow for rounding error.
PS: I believe the value of 1.E35 comes from NCAR Graphics.

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: "missing_value" for masked regions in NetCDF output files?

#12 Post by arango » Fri Oct 24, 2008 3:28 am

Yes, good idea. Done. I will change ROMS plotting package tomorrow since it uses the NCAR library.

rsignell
Posts: 123
Joined: Fri Apr 25, 2003 9:22 pm
Location: USGS

Re: "missing_value" for masked regions in NetCDF output files?

#13 Post by rsignell » Fri Oct 24, 2008 3:46 pm

Hernan,

I did an SVN update to grab the latest ROMS (r247), ran RIVERPLUME2, and then fired up NCVIEW on ocean_his.nc.
Yes!!!! The land is masked:
Image

I tried it with "nc_varget" in Matlab, and works great there too.
No more multiplying by the mask every time you want to make a plot in Matlab!
Fabulous!

Thanks,
-Rich

User avatar
arango
Site Admin
Posts: 1131
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: "missing_value" for masked regions in NetCDF output files?

#14 Post by arango » Fri Oct 24, 2008 6:03 pm

Great. I did screw-up the grid variables (f, h, lon, lat, pm, pn, etc). I knew about this but I forgot to put the conditional to avoid overwriting such variables. I just needed to process variables with tindex>0. Please upgrate again. See :arrow: track ticket 223.

jcwarner
Posts: 885
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: "missing_value" for masked regions in NetCDF output files?

#15 Post by jcwarner » Tue Oct 28, 2008 10:06 am

Rich-

A lot of my matlab scripts read in variables such as:
nc=netcdf('ocean_his.nc');
zeta=nc{'zeta'}(:);

When i do a pcolor, i now get large values (because of the new fill values).
Is there a way to have those fill values set to something else when i read
the variable in, or do i now always have to get the mask and multiply by
masking?

rsignell
Posts: 123
Joined: Fri Apr 25, 2003 9:22 pm
Location: USGS

Re: "missing_value" for masked regions in NetCDF output files?

#16 Post by rsignell » Tue Oct 28, 2008 1:06 pm

John,

Most Matlab/netcdf routines default to handling missing values via
"_fillValue" and scaling via "add_offset"/"scale_factor", but not the
NetCDF Toolbox. You have to turn them on. So in my Matlab
startup.m, I have these lines:

Code: Select all

% Netcdf Toolbox global options (turn autoscale & autonan on)
global nctbx_options
nctbx_options.theAutoNaN=1;
nctbx_options.theAutoscale=1;
-Rich

mathieu
Posts: 74
Joined: Fri Sep 17, 2004 2:22 pm
Location: Institut Rudjer Boskovic

Re: "missing_value" for masked regions in NetCDF output files?

#17 Post by mathieu » Tue Nov 04, 2008 10:50 am

Having coherent filling values is good, but for us the real problem of the mask is the excessive use of disk space. The READ_WATER, WRITE_WATER options of ROMS allow to write only the sea points and are certainly nice.

However, Hernan indicated me that the resulting files are not CF-compliant. What would be a longer term solution?

Post Reply