# Difference between revisions of "Analysis-Forecast Cycle Observation Impacts"

Analysis-Forecast Cycle Observation Impacts
1. Introduction
2. Incremental 4D Variational Data Assimilation
3. Analysis-Forecast Cycle Observation Impacts

The procedure for computing the observation impacts during the forecast cycle is a little more involved than that for the analysis cycle. However, a separate ROMS driver exists for this. To help illustrate the procedure involved, consider the typical analysis-forecast cycle shown schematically below. Figure: A schematic showing the typical configuration of an analysis-forecast cycle. Analysis cycle spans the interval $[t_{0}^{j},t_{0}^{j}+\tau ]$, and is associated with 4D-Var analysis ${\mathbf {x} }_{a}^{j}$ and the forecast ${\mathbf {x} }_{f}^{j}$. At the start of each cycle there are three available circulation estimates: two forecasts initialized from the previous two adjacent analysis cycles, and the analysis for the current time. These are illustrated at time $t_{0}^{j+2}+\tau$ at the start of cycle $j+3$.

In the figure above, each analysis cycle is assumed to be of length and analysis cycle spans the interval $[t_{0}^{j},t_{0}^{j}+\tau ]$. The circulation estimate at time $t_{0}^{j}+\tau$ (i.e. the end of analysis cycle ) is denoted as ${\mathbf {x} }_{a}^{j}$ and is the initial condition for the forecast spanning the next analysis interval $[t_{0}^{j+1},t_{0}^{j+1}+\tau ]$. In the figure it is assumed, for convenience only, that the forecast duration is an integer multiple of , but this does not have to be the case, and the code is set up to handle analysis and forecast cycles that are different lengths. The figure shows the analyses and forecasts that result from three adjacent analysis cycles, namely cycles , $j+1$ and cycle $j+2$. The analysis ${\mathbf {x} }_{a}^{j}$ at the end of cycle is used as the initial condition for the forecast ${\mathbf {x} }_{f}^{j}$ of duration $2\tau$ that terminates at time $t_{0}^{j+2}+\tau$, the end of analysis cycle $j+2$. Similarly, the analysis ${\mathbf {x} }_{a}^{j+1}$ at the end of cycle $j+1$ is used as the initial condition for the forecast ${\mathbf {x} }_{f}^{j+1}$ of duration and also terminates at time $t_{0}^{j+2}+\tau$, the end of analysis cycle $j+2$. After sufficient time has elapsed, a new analysis ${\mathbf {x} }_{a}^{j+2}$ will be computed at this time. Since ${\mathbf {x} }_{a}^{j+2}$ represents our best estimate of the ocean circulation at time $t_{0}^{j+2}+\tau$ it can be used to quantify the veracity of the forecasts ${\mathbf {x} }_{f}^{j}$ and ${\mathbf {x} }_{f}^{j+1}$. For this reason, ${\mathbf {x} }_{a}^{j+2}$ is usually referred to as the “verifying analysis.” However, as discussed shortly, other sources of information can be used to verify the forecasts, such as new or independent observations.

It should be clear from the figure that the forecast ${\mathbf {x} }_{f}^{j+1}$ benefits from the observations assimilated into the model during analysis cycle $j+1$ (i.e. during the interval $[t_{0}^{j+1},t_{0}^{j+1}+\tau ]$). Therefore, providing that ${\mathbf {x} }_{f}^{j}$ and ${\mathbf {x} }_{f}^{j+1}$ are subject to identical surface forcing and open boundary conditions during the interval $[t_{0}^{j+2},t_{0}^{j+2}+\tau ]$, any differences in forecast error must be associated with the observations assimilated into the model during the interval $[t_{0}^{j+1},t_{0}^{j+1}+\tau ]$.

## Forecast Error Metrics

As in the case of the analysis cycle observation impacts described above, the impact of the observations during the forecast cycle is computed for a specific metric, in this case a metric of the forecast error. The methodology will be described first for a standard generic quadratic forecast error metric given by:

 $e=({\mathbf {x} }_{f}-{\mathbf {x} }_{t})^{T}{\mathbf {C} }({\mathbf {x} }_{f}-{\mathbf {x} }_{t})$ (1)

where ${\mathbf {x} }_{f}$ denotes the forecast state-vector, ${\mathbf {x} }_{t}$ denotes the true state-vector, and ${\mathbf {C} }$ is a weight matrix. For example, if ${\mathbf {C} }$ is a diagonal matrix with elements equal to 1 corresponding to all surface temperature grid points, and zero elsewhere, then would represent the sum of the squared errors in SST. Forecast error metrics of the form (1) are very common in numerical weather prediction and oceanography, so (1) is a good starting point.

In the figure there are two forecasts of interest: ${\mathbf {x} }_{f}^{j}$ initialized at the end of analysis cycle at time $t_{0}^{j}+\tau$, and ${\mathbf {x} }_{f}^{j+1}$ initialized at the end of analysis cycle $j+1$ at time $t_{0}^{j+1}+\tau$. At time $t_{0}^{j+2}+\tau$ the error in forecast ${\mathbf {x} }_{f}^{j}$ is given by $e_{b}=({\mathbf {x} }_{f}^{j}-{\mathbf {x} }_{t})^{T}{\mathbf {C} }({\mathbf {x} }_{f}^{j}-{\mathbf {x} }_{t})$, while the error in $x_{f}^{j+1}$ is given by $e_{a}=({\mathbf {x} }_{f}^{j+1}-{\mathbf {x} }_{t})^{T}{\mathbf {C} }({\mathbf {x} }_{f}^{j+1}-{\mathbf {x} }_{t})$. As noted above, if ${\mathbf {x} }_{f}^{j}$ and ${\mathbf {x} }_{f}^{j+1}$ are subject to identical surface forcing and open boundary conditions during the interval $[t_{0}^{j+2},t_{0}^{j+2}+\tau ]$, then the difference in forecast error $\delta e=e_{a}-e_{b}$ is due solely to the difference in the forecast initial conditions due to the observations assimilated during analysis cycle $j+1$ spanning the interval $[t_{0}^{j+1},t_{0}^{j+1}+\tau ]$. (The more realistic case where the two forecasts are subject to different surface forcing fields is addressed below in the step-by-step procedure notes). Specifically, if $\delta e<0$ the observations assimilated during cycle $j+1$ lead to an improvement in the forecast skill (i.e. $e_{a}), while if $\delta e>0$ the observations assimilated during cycle $j+1$ have degraded the forecast (i.e. $e_{a}>e_{b}$). While this convention may seem counter-intuitive, it is the convention used in the numerical weather prediction literature, so it seems prudent to adopt it here.

### Using a verifying analysis as a surrogate for the true ocean circulation

In practice, the true state ${\mathbf {x} }_{t}$ will never be known, so the forecast error (1) is usually computed relative to the verifying analysis ${\mathbf {x} }_{a}^{j+2}$, in which case:

 $e=({\mathbf {x} }_{f}-{\mathbf {x} }_{a})^{T}{\mathbf {C} }({\mathbf {x} }_{f}-{\mathbf {x} }_{a})$ (2)

where ${\mathbf {x} }_{a}$ denotes the verifying analysis at the appropriate forecast time.

The 3rd-order approximation for $\delta e$ is given by:

 $\delta e_{3}={\mathbf {d} }^{T}{\mathbf {K} }^{T}{\mathbf {M} }_{b}^{T}[{\mathbf {M} }_{j}^{T}{\mathbf {C} }({\mathbf {x} }_{f}^{j}-{\mathbf {x} }_{a}^{j+2})+{\mathbf {M} }_{j+1}^{T}{\mathbf {C} }({\mathbf {x} }_{f}^{j+1}-{\mathbf {x} }_{a}^{j+2})]$ (3)

where ${\mathbf {M} }_{j}^{T}$ represents the adjoint model run backwards over the forecast interval $[t_{0}^{j+2},t_{0}^{j+2}+\tau ]$ and linearized about the forecast solution ${\mathbf {x} }_{f}^{j}$; ${\mathbf {M} }_{b}^{T}$ is the adjoint model run backwards over the 4D-Var analysis interval $[t_{0}^{j+1},t_{0}^{j+1}+\tau ]$ and linearized about 4D-Var background ${\mathbf {x} }_{b}$; and ${\mathbf {M} }_{j+1}^{T}$ denotes the adjoint model linearized about the forecast solution ${\mathbf {x} }_{f}^{j+1}$. Thus, the 3rd-order impact given by (3) requires two integrations of the adjoint model: one forced by ${\mathbf {C} }({\mathbf {x} }_{f}^{j}-{\mathbf {x} }_{a}^{j+2})$ and another forced by ${\mathbf {C} }({\mathbf {x} }_{f}^{j+1}-{\mathbf {x} }_{a}^{j+2})$ and linearized about different forecast solutions.

### Using the independent observations as a surrogate for the true ocean circulation

As discussed above, it is typical in operational numerical weather prediction to use the verifying analysis at the forecast time as a surrogate for the true state of the system, as in equation (2). These days operational weather prediction models generally yield very high-quality analyses, so the assumption that ${\mathbf {x} }_{a}$ is a reasonable approximation for ${\mathbf {x} }_{t}$ is probably reasonable. In oceanography, however, this is a more questionable assumption, so when possible, it may be more prudent to verify a forecast against independent observations, or observations that have not yet been assimilated into the model. In this case, equation (2) would be reformulated as:

 $e=({\mathbf {y} }_{f}-{\mathbf {y} })^{T}{\mathbf {C} }({\mathbf {y} }_{f}-{\mathbf {y} })$ (4)

where ${\mathbf {y} }_{f}$ is model forecast of the vector of verifying observations ${\mathbf {y} }$. In this case, the 3rd-order approximation for $\delta e$ becomes:

 $\delta e_{3}={\mathbf {d} }^{T}{\mathbf {K} }^{T}{\mathbf {M} }_{b}^{T}[{\mathbf {G} }_{j}^{T}{\mathbf {C} }({\mathbf {y} }_{f}^{j}-{\mathbf {y} }^{j+2})+{\mathbf {G} }_{j+1}^{T}{\mathbf {C} }({\mathbf {y} }_{f}^{j+1}-{\mathbf {y} }^{j+2})]$ (5)

where ${\mathbf {G} }_{j}^{T}$ and ${\mathbf {G} }_{j+1}^{T}$ denote the adjoint model forced at the observation points and linearized about ${\mathbf {x} }_{f}^{j}$ and ${\mathbf {x} }_{f}^{j+1}$ respectively.

## Practicalities

1. The observation impact driver for the forecast cycles is activated using the RBL4DVAR_FCT_SENSITIVITY cpp option.
2. The default option is 3rd-order approximations of the squared forecast error difference $\delta e$. 1st- and 2nd-order cases can also be run by changing the parameter ImpOrd in the routine obs_sen_w4dpsas_forecast.h but this is recommended only for testing purposes and once you feel confident about how things work.
3. The length of the analysis cycle is assigned in roms.in using the parameter NTIMES_ANA.
4. The length of the forecast cycle is assigned in roms.in using the parameter NTIMES_FCT.
5. The default configuration of the driver is to use a verifying analysis to compute the forecast errors as in equation (1). Two adjoint model forcing input files are required: one that corresponds to ${\mathbf {C} }({\mathbf {x} }_{f}^{j}-{\mathbf {x} }_{a}^{j+2})$ in equation (3) and one that corresponds to ${\mathbf {C} }({\mathbf {x} }_{f}^{j+1}-{\mathbf {x} }_{a}^{j+2})$. These are referred to respectively as FCTnameB and FCTnameA in roms.in.
6. For forecast error metrics in observation space as in equation (4), it is necessary to also define the OBS_SPACE cpp option. In this case the input forcing files for the adjoint model have the same structure as an observation file. In the code they are referred to as FOInameA and FOInameB in roms.in corresponding to ${\mathbf {C} }({\mathbf {y} }_{f}^{j+1}-{\mathbf {y} }^{j+2})$ and ${\mathbf {C} }({\mathbf {y} }_{f}^{j}-{\mathbf {y} }^{j+2})$ respectively in equation (5). The actual observations assimilated during the analysis cycle prior to the forecast cycle are referred to as obs_C.nc in the driver.
7. The options OBS_IMPACT_SPLIT and IMPACT_INNER can also be used.

## Step-by-Step Procedure

1. Run 4D-Var for the analysis cycle and save the MODname.nc and FWDname.nc files.
2. Run the forecast initialized from the 4D-Var analysis at the end of the analysis cycle and write the surface fluxes and wind stress to the HISname.nc file. Also ensure that you define FORWARD_WRITE and VERIFICATION, and save the MODname.nc and HISname.nc files.
3. Rerun step 2 without BULK_FLUXES and instead use the saved surface fluxes and wind stress in the history file from step 2 to force the model. This represents the forecast ${\mathbf {x} }_{f}^{j+1}$ in equation (3) (i.e. the red curve in the figure). This step is necessary because the forecast run using the background in step 4 below to generate ${\mathbf {x} }_{f}^{j}$ must also be subject to the same surface fluxes, wind stress, and open boundary conditions. Any difference in the forecast errors in ${\mathbf {x} }_{f}^{j+1}$ and those from the forecast in step 2 will be due to the differences in the surface boundary conditions. As in step 2, ensure that you define FORWARD_WRITE and VERIFICATION, and save the MODname.nc and HISname.nc files.
4. Run a forecast without BULK_FLUXES initialized from the 4D-Var background solution at the end of the 4D-Var window using the saved surface fluxes and wind stress in the history file from step 2 to force the model. This represents the forecast ${\mathbf {x} }_{f}^{j}$ in equation (3) (i.e. the portion of the blue curve in the figure spanning the interval $[t_{0}^{j+1}+\tau ,t_{0}^{j+2}+\tau ]$). Recall that in order for equation (3) to hold, ${\mathbf {x} }_{f}^{j+1}$ and ${\mathbf {x} }_{f}^{j}$ must be subject to the same surface and open boundary conditions. Steps 2, 3 and 4 ensure this. As in step 2, ensure that you define FORWARD_WRITE and VERIFICATION, and save the MODname.nc and HISname.nc files.
5. When the verification time arrives, compute the verifying 4D-Var analysis, ${\mathbf {x} }_{a}^{j+2}$ in equation (3).
6. Using the FWDname.nc and MODname.nc files from steps 3 and 4, prepare the adjoint forcing file FCTnameA corresponding to ${\mathbf {C} }({\mathbf {x} }_{f}^{j+1}-{\mathbf {x} }_{a}^{j+2})$ in equation (3) and FCTnameB corresponding to ${\mathbf {C} }({\mathbf {x} }_{f}^{j}-{\mathbf {x} }_{a}^{j+2})$ if using the verifying analysis as a surrogate for the true ocean state. Similarly, if using independent or unassimilated observations as a surrogate for the true ocean state, prepare the adjoint forcing files in observation space FOInameA and FOInameB corresponding to ${\mathbf {C} }({\mathbf {y} }_{f}^{j+1}-{\mathbf {y} }^{j+2})$ and ${\mathbf {C} }({\mathbf {y} }_{f}^{j}-{\mathbf {y} }^{j+2})$ respectively in equation (5).
7. Run the forecast observation impact driver.