Experiment RBL4DVAR/RPCG weak-constraint configuration

Message

lfelipem · #1 Unread post by **lfelipem** » Thu Mar 26, 2026 1:33 pm

Dear ROMS developers,

I am facing a persistent problem while running a ROMS 4D-Var experiment, specifically using the RBL4DVAR/RPCG weak-constraint configuration, and I would appreciate any guidance to help identify the source of the issue.

The model compiles successfully, and the run also starts normally without immediate fatal errors. The nonlinear model integration is completed, and the system enters the minimization stage associated with the inner loops. However, the problem occurs precisely at that stage: the experiment proceeds until what appears to be the last inner loop, and then it does not move forward anymore, remaining stuck without finishing the process or proceeding to the next expected step of the algorithm.

In my case, the observed behavior is the following: the model executes the background integration, enters the RPCG_LANCZOS procedure, prints the iterative diagnostic information normally, and reaches the end of the internal iterations. The log shows diagnostics such as:

number of observations (Ndatum);

residual estimate (eps);

gradient norm reductions in y-space and v-space;

cost-function terms (Jf, Jdata, Jmod, Jopt, Jb, Jobs, Jact);

Lanczos coefficients (cg_delta, cg_beta, zwork).

However, after this stage, the model does not continue. It does not stop with an explicit blow-up message, it does not proceed to the subsequent analysis stage, and it does not complete the assimilation cycle. In other words, the code seems to become stalled or stuck after the completion of the inner loops.

Some relevant details about the configuration are:

I am using an application with atmospheric bulk fluxes and adjustments in:

ADJUST_STFLUX

ADJUST_WSTRESS

ADJUST_BOUNDARY

Initially there was a conflict with the deprecated option NL_BULK_FLUXES, but this has already been corrected according to the current recommendation, replacing it with:

PRIOR_BULK_FLUXES

FORWARD_FLUXES

The CPP configuration currently follows the modern recommended 4D-Var framework, including:

RBL4DVAR

RPCG

VCONVOLUTION

IMPLICIT_VCONV

FORWARD_WRITE

FORWARD_READ

FORWARD_MIXING

OUT_DOUBLE

ADJUST_STFLUX

ADJUST_WSTRESS

ADJUST_BOUNDARY

PRIOR_BULK_FLUXES

FORWARD_FLUXES

I have also reviewed modules such as ad_initial, rbl4dvar_mod, roms_kernel_mod, rpcg_lanczos_mod, frc_adjust_mod, and mod_forces, looking for inconsistencies between the active flags and the arrays allocated for TLM/ADM/NLM.

In previous tests, when some flags were removed, the behavior changed, but the main problem remained: either the model failed very early at the first inner loop, or it reached the end of the inner loops and then remained unable to proceed.

The log from the problematic stage shows output similar to this at the end of the iteration:

Residual estimate, eps = ...

Reduction in gradient norm ...

Actual total penalty function, Jact = ...

listing of Lanczos vectors up to the final index

but after that, the expected progression does not occur.

My main question is the following:

Which part of the RBL4DVAR/RPCG workflow should be investigated when the model apparently completes the inner loops, but does not advance to the next stage?

More specifically, I would like to know whether this type of behavior is usually associated with:

inconsistency between ADJUST_* options and the forcing arrays in mod_forces;

a problem in the transition between increment() and analysis() inside rbl4dvar_mod;

logical errors involving Jb0, cg_pxsave, cg_pxsum, or the augmented vectors in rpcg_lanczos_mod;

incompatibilities involving bulk-flux flags in more recent ROMS versions;

a silent NetCDF I/O issue during the transition among DAV/ADM/ITL/TLF files;

inconsistencies in the dimensions or content of the surface forcing, flux-adjustment, or boundary-adjustment files used by the weak-constraint setup.

If useful, I can share:

my full application .h file;

the roms.in file;

the relevant log excerpts showing where execution stops;

the modified source modules;

and the exact list of CPP flags being used.

Best regards,

lfelipem · #2 Unread post by **lfelipem** » Thu Mar 26, 2026 1:35 pm

This is the point where the model gets stuck and doesn't proceed, even though it continues to use the processors.

RPCG_LANCZOS - Conjugate Gradient Information:

Ndatum = 1563994

(001,019): Residual estimate, eps = 8.6973728E+01
(001,019): Reduction in gradient norm, Greduc y-space = 0.0000000E+00
(001,019): Reduction in gradient norm, Greduc v-space = 5.6774306E-01
(001,019): First guess initial data misfit, Jf = 1.0266124E+07
(001,019): State estimate data misfit, Jdata = 0.0000000E+00
(001,019): Model penalty function, Jmod = 0.0000000E+00
(001,019): Optimal penalty function, Jopt = 3.7338393E+04

(001,019): Actual Model penalty function, Jb = 1.8807435E+03
(001,019): Actual data penalty function, Jobs = 7.1158079E+04

(001,019): Actual total penalty function, Jact = 7.3038823E+04

(001,019): Lanczos vectors - cg_delta, cg_beta, zwork:

001 8.44998828E+02 0.00000000E+00 1.16533924E+01
002 5.85498938E+02 4.64814826E+02 -1.42917638E+01
003 7.86999544E+02 2.45139925E+02 1.20386059E+01
004 4.62586697E+02 3.66425043E+02 -1.62949984E+01
005 3.27034138E+02 2.06238282E+02 1.51601475E+01
006 4.28472556E+02 1.51972629E+02 -1.05100064E+01
007 3.71686485E+02 2.04851113E+02 1.07361966E+01
008 2.71985089E+02 1.56145134E+02 -1.17679790E+01
009 2.67276617E+02 1.37866999E+02 1.10563802E+01
010 2.51525453E+02 1.31525635E+02 -1.01325946E+01
011 2.59976860E+02 1.22676895E+02 8.92106060E+00
012 3.03612382E+02 1.47112342E+02 -7.31572945E+00
013 2.29982950E+02 1.30569286E+02 6.95989049E+00
014 2.08376476E+02 9.29522140E+01 -6.94385372E+00
015 2.05326385E+02 1.34066366E+02 5.96718298E+00
016 1.41487835E+02 6.31554289E+01 -4.65966075E+00
017 1.30079092E+02 6.87248814E+01 4.10950600E+00
018 1.96345994E+02 9.10537558E+01 -2.35384223E+00
019 1.17195650E+02 6.61854984E+01 1.32931744E+00
020 1.53326803E+02 6.54273581E+01 0.00000000E+00
021 0.00000000E+00

annsp · #3 Unread post by **annsp** » Fri Mar 27, 2026 2:00 pm

Which compiler are you using?

I observed the exact same behavior yesterday with my RBL4DVAR configuration when testing the gfortran compiler.
The same assimilation cycle with the exact same configuration and inputs runs without issues when compiled with ifort.
We've had issues in the past with gfortran + 4D-Var. If my memory serves me right it was due to a bug within the compiler itself. That's years ago though, so it might not be the same issue this time around.

lfelipem · #4 Unread post by **lfelipem** » Fri Mar 27, 2026 6:10 pm

Dear Ann,

Thank you very much for your response. I assume this problem is keeping me up at night. I’ve been at it every day for three months, testing everything. I’ve already debugged all the Fortran code to try to find the source of the problem. I’ve already checked that my files are correct—the dates, everything.

I wanted to ask you: Was the solution to switch to iFort, or to change the version of GFortran?

best regards.

arango · #5 Unread post by **arango** » Fri Mar 27, 2026 6:20 pm

I rarely use gfortan, it is a compiler that I dislike very much as a developer. The symptoms you are experiencing may be due to running out of computer memory (RAM, cache, virtual memory), which swaps out due to limited RAM, slowing computations, and potentially causing hangs.

lfelipem · #6 Unread post by **lfelipem** » Fri Mar 27, 2026 6:56 pm

Good afternoon, Arango, thank you for your quick reply.

I use gfortran with ROMS without any problems. But I’m going to test ifort and see if it can solve the problem. I’m setting up my machine for that right now.

As for memory, I’ve been monitoring it with htop, and it doesn’t even come close to exceeding 128 GB. The process simply doesn’t move forward (it’s been stuck at the same point for days) and continues to use the processors as if it were processing something, but it really isn’t making any progress.

It could be the compiler, as our colleague mentioned. I’ll test it and let you know here.

arango · #7 Unread post by **arango** » Fri Mar 27, 2026 7:43 pm

Nope, htop doesn't help here. The master thread solves the RPCG Lanczos algorithm (MPI rank=0) because it is the dual formulation of 4D-Var, and we are working in the space spanned by the observations. If you have millions (Mobs) of observations, the vectors in RPCG can get very large in particular arrays like TLmodVal_S(Mobs, Ninner, Nouter). Solving RPCG Lanczos in parallel is a nightmare. Thus, we need MPI collective calls to broadcast several variables from the minimization to the 4D-Var algorithm that runs in parallel. If you are running hundreds of processes, it can become a bottleneck or hang because you have run out of virtual memory.

It will be very hard to convince me about gfortran. It is one of the worst compilers that I have encountered in my career as a numerical modeler. It does not have complete information in its object files (*.o) that we can use with fancy parallel debuggers like TotalView, which have made the development of ROMS complex algorithms possible. After all, gfortran is a free compiler, and you get what you pay for. It is also a nightmare with modern Fortran for object-oriented programming, since not all its features are supported.

lfelipem · #8 Unread post by **lfelipem** » Fri Mar 27, 2026 8:24 pm

Hi Hernan,
Thank you very much for the clarification. I completely understand your point.
I was not trying to challenge your view regarding gfortran. What I meant is simply that I had been using gfortran successfully for runs without 4D-Var, where things were working normally for my application. I understand that with 4D-Var, especially in the RPCG dual formulation and with a very large number of observations, I am now dealing with a very different situation, both in terms of memory requirements and debugging complexity.
Your explanation about the master thread handling the RPCG Lanczos minimization in observation space, and the need for MPI collective broadcasts of large arrays, makes perfect sense. I can see how this can become a serious bottleneck, or even lead to a hang, when the number of observations and MPI processes are both large.
Following your recommendation, I will first try to rebuild and run the case with ifort, and then, if needed, I will move on to TotalView.
I should admit that I have never used TotalView before, so it may be a challenge at first. Still, since you recommended it, I will give it a try.
I will come back soon with news on whether the model advances further under this new setup.
Thank you again for your guidance.
Best regards,

arango · #9 Unread post by **arango** » Fri Mar 27, 2026 8:52 pm

Okay, good luck with TotalView. It is not free, and we pay for its Licence.

One of the secrets of numerical modeling is to start simple. Thus, one of the things I do with data assimilation in ROMS or ROMS-JEDI is to assimilate a single Temperature and Salinity observation at the center of the domain, say, at a depth of 100 m (Z=-100). Then 4D-Var should need only Nouter=1 and Ninner=2 for convergence. You can plot all the increments for the state vector (zeta, u, v, T, S). You can also plot cross-sections to examine your background error hypothesis and correlation scales. Once that works, it should work for all the observations that you have for a particular data assimilation cycle.

For example, check the following link:

https://github.com/myroms/roms-jedi/wik ... tion-Tests

If this works in your application, you will owe me big time for telling you my tricks

annsp · #10 Unread post by **annsp** » Sat Mar 28, 2026 7:13 am

Hi Felipe,

These debugging issues sound all too familiar...
For me the solution for the 4D-Var applications have been to stay clear of gfortran, and use ifort instead.
Similar to your experience, our applications without 4D-Var run fine when compiled with gfortran.

I hope you're able to find a solution and get your application up and running!

lfelipem · Sun Mar 29, 2026 12:04 pm

Thank you, my friends,

Things are pretty hectic here, but I’m going to try to switch to ifort since I can get an academic license as a university professor.

I’ll reply to this thread as soon as I get a chance.

Best regards

jivica · #12 Unread post by **jivica** » Mon Mar 30, 2026 8:14 am

I was using ifort before and to be honest was not happy with it, they should stick to the hardware

Using Cray fortran is on the other hand pain in the a** - it is way to pedantic, and free version of gfortran works for me, even on the latest Cray.

One of the things you have to be careful about are optimisation flags, and that is so true for ARPACK & PARPACK libs.
Try to reduce from "-O3" to "-O" only for the libs I mentioned and see how it goes.

Cheers,
Ivica

lfelipem · Wed Apr 08, 2026 11:32 am

Dear all,

Thank you very much for your suggestions and for the time you devoted to helping me with this issue.

I would especially like to thank Ann for the very precise and decisive recommendation to install and use an Intel compiler instead of gfortran. In my experience, gfortran works well for clean hindcast applications, but it seems to become much more limited when dealing with more demanding configurations, such as 4D-Var data assimilation or biogeochemical modes.

I am also very grateful to Arango for the time he took to help me. It is truly remarkable that, even with such a busy schedule, he still responds to the ROMS forums with such attention and willingness to help. This kind of support is deeply appreciated by users like me, and it makes a real difference.

Thank you all once again.

Best regards,
Luís Felipe Ferreira de Mendonça

rtoste · #14 Unread post by **rtoste** » Fri Apr 10, 2026 2:17 am

Hi, Felipe. I ran into the same issue with gfortran and the newest 4dvar codes (freezing at the last inner loop). I solved it by removing the -ffast-math flag rather than switching compilers.

lfelipem · #15 Unread post by **lfelipem** » Fri Apr 10, 2026 6:09 pm

Thanks, Raquel.

I imagine it was a lot of work to switch my two servers to the Intel system, but honestly, it was the best choice.

Thanks—that’s a valuable tip.

All the best

Ocean Modeling Discussion

Experiment RBL4DVAR/RPCG weak-constraint configuration

Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration

Re: Experiment RBL4DVAR/RPCG weak-constraint configuration