I am running an online nesting configuration and finding a tremendous slow-down in computational speed when transitioning from offline to online nesting. Here are some numbers I have calculated:
Code: Select all
Run                # of time steps   # of nodes            Total run time
                                     (16 cores per node)
Parent grid        1000               1                    ~ 37 min
                                      4                    ~ 13 min
Child grid         1000               1                    ~ 26 min
                                      4                    ~ 8 min
Online nesting     100                1                    ~ 53 min
with parent and                       4                    ~ 51 min
child grids
However, I am finding that when I transition to the online nesting configuration there is very little time difference between using a single node or several nodes (16 processes on 1 node, or 64 on 4 nodes). Additionally, summing the computational times of the parent and child individual runs for 1000 time steps is ~ 63 min on a single node. Whereas scaling the online run for both grids up to 1000 time steps on a single node would take almost 9 hours to run. The online nesting takes about 9 times longer to run than the parent and child in sequence. It appears that the parallelization of online nesting, particularly the fine2coarse and coarse2fine steps may have a bottleneck that significantly slows down the computational time.
We don’t think this is a memory issue as the nested configuration on 1 node only takes up about 60% of the total node memory.
Is anyone else finding similar results? I may have made a mistake in my model configuration. Any guidance would be greatly appreciated.
Thank you ROMS developers for continuing to improve the online nesting capabilities!