problematic behavior of super_obs.m

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
jpm
Posts: 21
Joined: Thu Aug 27, 2009 4:37 pm
Location: UCSC

problematic behavior of super_obs.m

#1 Unread post by jpm »

The super_obs matlab function (matlab/4dvar/super_obs.m in the repository) is meant to merge observations of the same type together, if they are located very closely in both time and space. However, it may merge observations in a somewhat arbitrary fashion and does not guarantee the removal of two observations at nearly identical locations even if these are very close to the center of the same rho-point.

The problem is that binning is determined by the minimum observation grid coordinate:

Code: Select all

%  Compute the index in each dimension of the grid cell in which the
%  observation is located.

    Xbin = 1.0 + floor((V.Xgrid - Xmin) ./ dx);
    Ybin = 1.0 + floor((V.Ygrid - Ymin) ./ dy);
    Zbin = 1.0 + floor((V.Zgrid - Zmin) ./ dz);
In the code above, Xmin = min(V.Xgrid), Ymin = min(V.Ygrid), ... are computed for each survey individually (dx=dy=dz=1.0 have no importance). That is, the minimum x-, y- and z-coordinates in each survey determine the bin index and thereafter the merging of observations. This can lead to unexpected results, as the following example demonstrates.

example:

Here, we have two surveys, 3 observations each. Within each survey, all observations have the same y- and z-coordinate and the same timestamp, the observations only differ in their x-coordinate. In the first survey, the x-coordinates are 4.0, 13.0-0.000001, 13.0+0.000001, in the second survey they are 4.5, 13.0-0.000001, 13.0+0.000001 (identical, except for the first observation). So, within each survey, the second and third observation are very close to each other and very close to the center of a rho-point. After running super_obs, the two close observations are not merged together in the first survey but in the second they are. It is the x-coordinate of the first observation that determines Xmin and hence the way observations are merged.

Here is the example in action, the code to run it is given below (note that matlab prints "13.0-0.000001" as "13.0000").

Code: Select all

>> test_super_obs 
before super-obing:

Sinp = 

            time: [1 1 1 2 2 2]
           Xgrid: [4 13.0000 13.0000 4.5000 13.0000 13.0000]
           Ygrid: [10 10 10 10 10 10]
           Zgrid: [42 42 42 42 42 42]
     survey_time: [1 2]
         Nsurvey: 2
            Nobs: [3 3]
           value: [10 10 10 10 10 10]
            type: [7 7 7 7 7 7]
           depth: [42 42 42 42 42 42]
           error: [1 1 1 1 1 1]
          ncfile: 'example'
    grid_Lm_Mm_N: [1 2 3]
        variance: [1 1 1 1 1 1 1]
       spherical: 0

after super-obing:

Sout = 

          ncfile: 'example'
         Nsurvey: 2
          Nstate: 7
          Ndatum: 5
       spherical: 0
            Nobs: [3 2]
     survey_time: [1 2]
        variance: [1 1 1 1 1 1 1]
            type: [7 7 7 7 7]
            time: [1 1 1 2 2]
           Xgrid: [4 13.0000 13.0000 4.5000 13]
           Ygrid: [10 10 10 10 10]
           Zgrid: [42 42 42 42 42]
           depth: [42 42 42 42 42]
           error: [1 1 1 1 1]
           value: [10 10 10 10 10]
             std: [0 0 0 0 0]
    grid_Lm_Mm_N: [1 2 3]
Not only does this behavior lead to binning that differs from survey to survey, it can also lead to observations not being merged together if they are very close to the center of any given rho-point. This binning procedure is most problematic if the observations are (almost) equally spaced (in grid coordinates), as the edges of the bins are placed close to the observations and not between them.

Binning will always bin certain points together and others not. However, it is probably more beneficial to align the bins with the rho-points instead of having them be determined by observation locations. An alternative binning procedure would hence be (the dx=dy=dz=1.0 were removed here but they can easily be added again):

Code: Select all

%  Compute the index in each dimension of the grid cell in which the
%  observation is located.

    Xbin = 1.0 + round(V.Xgrid);
    Ybin = 1.0 + round(V.Ygrid);
    Zbin = 1.0 + round(V.Zgrid);
code to run example:

Code: Select all

function test_super_obs()

% add path to location of repository
addpath('matlab/4dvar')

Sinp.time =  [1,   1,             1              2,   2,             2];
Sinp.Xgrid = [4.0, 13.0-0.000001, 13.0+0.000001, 4.5, 13.0-0.000001, 13.0+0.000001];
Sinp.Ygrid = 10.0*ones(size(Sinp.Xgrid));
Sinp.Zgrid = 42.0*ones(size(Sinp.Xgrid));

% initialize survey-related variables appropriately
Sinp.survey_time = unique(Sinp.time);
Sinp.Nsurvey = numel(Sinp.survey_time);
Sinp.Nobs = zeros(1,Sinp.Nsurvey);
for isurvey = 1:Sinp.Nsurvey
    Sinp.Nobs(isurvey) = sum(Sinp.time==Sinp.survey_time(isurvey));
end

% the rest of the fields are necessary for super_obs to run but their values do not influence the super-obing
Sinp.value = 10.0*ones(size(Sinp.Xgrid));
Sinp.type  = 7*ones(size(Sinp.Xgrid));
Sinp.depth = Sinp.Zgrid;
Sinp.error = ones(size(Sinp.Xgrid));
Sinp.ncfile = 'example';
Sinp.grid_Lm_Mm_N = [1,2,3];
Sinp.variance = ones(1,7);
Sinp.spherical = 0;

% print values and run super_obs
fprintf('before super-obing:\n')
Sinp
Sout = super_obs(Sinp);
fprintf('after super-obing:\n')
Sout
Jann Paul Mattern, Ocean Sciences Department, UCSC

jpm
Posts: 21
Joined: Thu Aug 27, 2009 4:37 pm
Location: UCSC

Re: problematic behavior of super_obs.m

#2 Unread post by jpm »

For the sake of completeness, if one makes changes to the super_obs routine in the way suggested in the post above, one extra change is required ass well.

In summary, the original code

Code: Select all

%  Compute the index in each dimension of the grid cell in which the
%  observation is located.

    Xbin = 1.0 + floor((V.Xgrid - Xmin) ./ dx);
    Ybin = 1.0 + floor((V.Ygrid - Ymin) ./ dy);
    Zbin = 1.0 + floor((V.Zgrid - Zmin) ./ dz);

%  Similarly, compute the maximum averaging grid size.

    Xsize = 1.0 + floor((Xmax - Xmin) ./ dx);
    Ysize = 1.0 + floor((Ymax - Ymin) ./ dy);
    Zsize = 1.0 + floor((Zmax - Zmin) ./ dz);
would need to be changed to (once again, ignoring dx=dy=dz=1):

Code: Select all

%  Compute the index in each dimension of the grid cell in which the
%  observation is located.

    Xbin = 1.0 + round(V.Xgrid);
    Ybin = 1.0 + round(V.Ygrid);
    Zbin = 1.0 + round(V.Zgrid);

%  Similarly, compute the maximum averaging grid size.

    Xsize = max(Xbin);
    Ysize = max(Ybin);
    Zsize = max(Zbin);
While the post above shows an idealized example, we encountered the super_obs behavior recently when super-obing an observation file with almost equally spaced data. Presumably due to interpolation being used, the x- and y- grid coordinates of the observations were not quite in the center of the rho-points. The offsets from the centers were small and seemingly random (in [-10^-12,10^12]) but enough to trigger undesired merging of some of the observations (again, seemingly random, causing a removal of ~18% of the observations).
Jann Paul Mattern, Ocean Sciences Department, UCSC

Post Reply