possible bug in cgradient.F (gradient norm diagnostic)

Message

stef · #1 Unread post by **stef** » Thu Apr 25, 2024 8:40 am

I know I4DVAR is deprecated, but I'm still trying to find out what the algorithm does, because it's the most simple version of 4dvar.

Could it be that there is a bug in cgradient.F, specifically in the computation of the reduction of the gradient norm?

In cgradientF, the tridiagonal system is solved for the z vector in Eq. 22 of [1]. For innLoop=1, the back substitution into cg_Tmatrix(1,3) and the call to new_gradient() is skipped. Shouldn't this part be executed for innLoop=1? In other words, the 'END IF' for the following conditional

Code: Select all

      IF (innLoop.gt.1) THEN
        DO i=2,innLoop
          cg_gamma(i,outLoop)=cg_beta(i,outLoop)/zbeta
          zbeta=cg_delta(i,outLoop)-                                    &
     &          cg_beta(i,outLoop)*cg_gamma(i,outLoop)
          cg_zu(i,outLoop)=(-cg_QG(i,outLoop)-                          &
     &                      cg_beta(i,outLoop)*cg_zu(i-1,outLoop))/zbeta
        END DO

is currently after the call to new_gradient, but shouldn't it be inserted right after the above segment? See diff below. That way, cg_Tmatrix(1,3) is set to z_1 of Eq. 22 of [1], and g_2 is computed in new_gradient. I think according to the equations,

Code: Select all

g_2 = As_2 - b = A(s_1 + Vz) -b = g_1 + AVz = 
g_1 + VTz + gamma_1 q_2 z 
is approximately equal to
g_1 - V cg_QG + gamma_1 q_2 z
the first two terms cancel: V cg_QG = q_1 q_1^T g_1 = norm(g_1) q_1 = g_1
and z = z_1 \approx -norm(g_1)/delta_1
note that gamma_1/delta_1 = -sqrt(beta_1)  = -norm(g_2)/norm(g_1)

g_1 is the initial gradient, computed in innLoop=0
g_2 is the second gradient
A is the hessian
T is the tridiagonal matrix of the Lanczos relation
V is the matrix of the Lanczos relation constiting of orthonormal column vectors q_i
q_1 is the first column of V
s_1 is the initial increment, i.e. zero
s_2 is the next increment
cg_QG = V^T g_1 is approximately Tz according to Eq. 22 of [1]
z is the solution of Eq. 22 of [1]
delta_1 is the first diagonal element of T
gamma_1 is the first upper diagonal element of T

The "gamma" here is called "beta" in the code, which is somewhat confusing.

The output of the I4DVAR tutorial currently is:

Code: Select all

  ....
  (001,000): Initial gradient norm, Gnorm  =  1.5065655E+03
  ...
 (001,001): Reduction in the gradient norm,  Greduc =  0.0000000E+00
 (001,001): Lanczos algorithm coefficient,    delta =  6.6899689E+02

tested with:
* ROMS revision b7a47408, although slightly patched to use the old LAPACK due to compiler problems and fixing the bug in obs_k2z
* and roms_test revision e1af6099
* I'm using an outdated data revision, but I guess this doesn't change the control flow

It says the gradient reduction is zero after innLoop=1, but shouldn't we have already computed g_2, which should have a smaller norm than the initial g_1? Moving the 'END IF' upwards above the back-substitution of the tridiagonal solution algorithm gives

Code: Select all

 ...
 (001,000): Initial gradient norm, Gnorm  =  1.5065655E+03
 ...
 (001,001): Reduction in the gradient norm,  Greduc =  8.6059669E-01
 (001,001): Lanczos algorithm coefficient,    delta =  6.6899689E+02

and for the rest:

Code: Select all

[drop@ocean I4DVAR]$ cat  output.log |grep Reduction
 (001,001): Reduction in the gradient norm,  Greduc =  8.6059669E-01
 (001,002): Reduction in the gradient norm,  Greduc =  4.0482401E-01
 (001,003): Reduction in the gradient norm,  Greduc =  3.3909970E-01
 (001,004): Reduction in the gradient norm,  Greduc =  2.4363785E-01
 (001,005): Reduction in the gradient norm,  Greduc =  2.0550356E-01
 (001,006): Reduction in the gradient norm,  Greduc =  1.6836103E-01
 (001,007): Reduction in the gradient norm,  Greduc =  1.7527041E-01
 (001,008): Reduction in the gradient norm,  Greduc =  1.2164608E-01

Is this a bug, and if so, does it affect only diagnostics? The increment is still computed correctly via Eqs. 21-22 of [1], right (because cg_zu is currently set correctly for innLoop=1)?. Does it have any other impact?

What concerns me, though, is that I get completely different values for the initial gradient norm (in both of the above) when compared to a revision from 2022. Also, the TLcost_function output makes no sense in my own experiment, because the first value is computed differently from the subsequent values, and has vastly different values. May it's a problem in my setup and not a bug, but I think this one here is unrelated to the other problems...

[1] Fisher, M., 1998: Minimization algorithms for variational data !
data assimilation. In Recent Developments in Numerical !
Methods for Atmospheric Modelling, pp 364-385, ECMWF. !

diff:

Code: Select all

--- a/ROMS/Utility/cgradient.F
+++ b/ROMS/Utility/cgradient.F
@@ -992,6 +992,7 @@
           cg_zu(i,outLoop)=(-cg_QG(i,outLoop)-                          &
      &                      cg_beta(i,outLoop)*cg_zu(i-1,outLoop))/zbeta
         END DO
+      END IF
 !
 !  Back substitution.
 !
@@ -1054,7 +1055,6 @@
      &                     ad_ubar, ad_vbar,                            &
 # endif
      &                     ad_zeta)
-      END IF
 !
 !  Compute the new cost function.
 !

stef · #2 Unread post by **stef** » Tue Apr 30, 2024 7:10 am

Hmm, that patch I suggested is incorrect, leads to this error:

Code: Select all

 <<<< Conjugate Gradient Algorithm >>>>

At line 496 of file cgradient.f90
Fortran runtime error: Index '0' of dimension 1 of array 'cg_zu' below lower bound of 1

Would this one work?:

Code: Select all

diff --git a/ROMS/Utility/cgradient.F b/ROMS/Utility/cgradient.F
... 
--- a/ROMS/Utility/cgradient.F
+++ b/ROMS/Utility/cgradient.F
@@ -982,7 +982,6 @@
       IF (innLoop.gt.0) THEN
         zbeta=cg_delta(1,outLoop)
         cg_zu(1,outLoop)=-cg_QG(1,outLoop)/zbeta
-      END IF
 !
       IF (innLoop.gt.1) THEN
         DO i=2,innLoop
@@ -992,6 +991,7 @@
           cg_zu(i,outLoop)=(-cg_QG(i,outLoop)-                          &
      &                      cg_beta(i,outLoop)*cg_zu(i-1,outLoop))/zbeta
         END DO
+      END IF
 !
 !  Back substitution.
 !

For innLoop=0 we don't have to call new_gradient as g(0) is obtained from the adjoint.
So new_gradient() is called for innLoop>0, as opposed to innLoop>1 as it was in the original code.

Ocean Modeling Discussion

possible bug in cgradient.F (gradient norm diagnostic)

possible bug in cgradient.F (gradient norm diagnostic)

Re: possible bug in cgradient.F (gradient norm diagnostic)