Parent : 638 x 438 x 20

Child: 198 x 249 x 20

However, there is a very significant computational overhead as well as the MPI/tiling problem when running the nested configuration when compared to the parent standalone. We tested the Rutgers release as well as COAWST and also ROMS_AGRIF for comparison. Below is the report on run times:

ROMS:

PARENT STANDALONE:

Tiling: 18 x 24 = 432 Timing: ~05 min / day

Tiling: 16 x 20 = 320 Timing: ~05 min / day

Tiling: 10 x 16 = 160 Timing: ~11 min / day

Tiling: 8 x 12 = 96 Timing: ~15 min / day

NESTED 1 WAY

Tiling: 18 x 24 = 432 Timing: ~1hr 20 min / day

Tiling: 16 x 20 = 320 Timing: ~1hr 20 min / day

Tiling: 10 x 16 = 160 Timing: ~1hr 02 min / day

Tiling: 8 x 12 = 96 Timing: ~1hr 06 min / day

NESTED 2 WAY

Tiling: 18 x 24 = 432 Timing: ~2hr 10 min / day

Tiling: 16 x 20 = 320 Timing: ~2hr 08 min / day

Tiling: 10 x 16 = 160 Timing: ~1hr 32 min / day

Tiling: 8 x 12 = 96 Timing: ~1hr 32 min / day

COAWST

PARENT STANDALONE

Tiling: 20 x 24 = 480 Timing: ~4 min / day

NESTED 1 WAY

Tiling: 20 x 24 = 480 Timing: ~2hr 48 min / day

NESTED 2 WAY

Tiling: 20 x 24 = 480 Timing: ~2hr 48 min / day

ROMS_AGRIF

PARENT STANDALONE

I do not have the exact figure, but very similar to ROMS and COAWST

NESTED 1 WAY

Tiling: 24 x 20 = 480 Timing: ~18 min / day

Tiling: 20 x 20 = 400 Timing: ~18 min / day

Tiling: 10 x 12 = 120 Timing: ~18 min / day

Tiling: 8 x 8 = 64 Timing: ~28 min / day

NESTED 2 WAY

Tiling: 24 x 20 = 480 Timing: ~23 min / day

I would greatly appreciate any comments on two issues arising from the above:

1) Why are the computational costs so massive in ROMS and COAWST, whereas they are reasonable in AGRIF?

2) What could the issue be with the MPI / tiling when running the nested configuration (runs faster on fewer nodes)?

I am also pasting the time profile stats from the COAWST run that show that the model spends disproportionate amount of time on reading and processing input data (and it is not the initialization, as the integration commences in normal time):

Nonlinear model elapsed time profile:

Initialization ................................... 7414.111 ( 0.4980 %)

OI data assimilation ............................. 0.084 ( 0.0000 %)

Reading of input data ............................ 1129394.223

**(75.8601 %)**

Processing of input data ......................... 1136092.318

**(76.3100 %)**

Computation of vertical boundary conditions ...... 1500.082 ( 0.1008 %)

Computation of global information integrals ...... 229.418 ( 0.0154 %)

Writing of output data ........................... 1986.356 ( 0.1334 %)

Model 2D kernel .................................. 10940.908 ( 0.7349 %)

2D/3D coupling, vertical metrics ................. 12390.746 ( 0.8323 %)

Omega vertical velocity .......................... 1538.468 ( 0.1033 %)

Equation of state for seawater ................... 12300.072 ( 0.8262 %)

KPP vertical mixing parameterization ............. 6061.499 ( 0.4071 %)

3D equations right-side terms .................... 226.410 ( 0.0152 %)

3D equations predictor step ...................... 887.124 ( 0.0596 %)

Pressure gradient ................................ 182.883 ( 0.0123 %)

Harmonic mixing of tracers, geopotentials ........ 266.205 ( 0.0179 %)

Harmonic stress tensor, S-surfaces ............... 164.894 ( 0.0111 %)

Corrector time-step for 3D momentum .............. 1336.560 ( 0.0898 %)

Corrector time-step for tracers .................. 2696.120 ( 0.1811 %)

Total: 2325608.482 156.2084

