I bought a SGI server to run ROMS in a modeling project. The server has four opteron processors with 16 core in each processor (2GHz) and has 256GB of RAM memory.

I have complied and run the upwelling application to run serially in one core. This is the result:

Code: Select all

```
Elapsed CPU time (seconds):
Thread # 0 CPU: 460.741
Total: 460.741
Nonlinear model elapsed time profile:
Allocation and array initialization .............. 0.104 ( 0.0226 %)
Ocean state initialization ....................... 0.012 ( 0.0026 %)
Reading of input data ............................ 0.004 ( 0.0009 %)
Processing of input data ......................... 0.084 ( 0.0182 %)
Processing of output time averaged data .......... 81.489 (17.6865 %)
Computation of vertical boundary conditions ...... 0.068 ( 0.0148 %)
Computation of global information integrals ...... 2.432 ( 0.5279 %)
Writing of output data ........................... 3.752 ( 0.8144 %)
Model 2D kernel .................................. 256.432 (55.6565 %)
2D/3D coupling, vertical metrics ................. 2.352 ( 0.5105 %)
Omega vertical velocity .......................... 1.892 ( 0.4107 %)
Equation of state for seawater ................... 1.884 ( 0.4089 %)
3D equations right-side terms .................... 11.097 ( 2.4084 %)
3D equations predictor step ...................... 22.493 ( 4.8820 %)
Pressure gradient ................................ 8.405 ( 1.8241 %)
Harmonic mixing of tracers, S-surfaces ........... 3.552 ( 0.7710 %)
Harmonic stress tensor, S-surfaces ............... 8.345 ( 1.8111 %)
Corrector time-step for 3D momentum .............. 28.246 ( 6.1305 %)
Corrector time-step for tracers .................. 23.653 ( 5.1338 %)
Total: 456.297 99.0354
All percentages are with respect to total time = 460.741
```

Code: Select all

```
Elapsed CPU time (seconds):
Thread # 0 CPU: 130.859
Total: 130.859
Nonlinear model elapsed time profile:
Allocation and array initialization .............. 0.047 ( 0.0355 %)
Ocean state initialization ....................... 0.006 ( 0.0048 %)
Reading of input data ............................ 0.003 ( 0.0020 %)
Processing of input data ......................... 0.040 ( 0.0302 %)
Processing of output time averaged data .......... 7.020 ( 5.3642 %)
Computation of vertical boundary conditions ...... 0.042 ( 0.0318 %)
Computation of global information integrals ...... 1.452 ( 1.1096 %)
Writing of output data ........................... 0.561 ( 0.4288 %)
Model 2D kernel .................................. 68.150 (52.0792 %)
2D/3D coupling, vertical metrics ................. 1.103 ( 0.8427 %)
Omega vertical velocity .......................... 1.032 ( 0.7890 %)
Equation of state for seawater ................... 0.642 ( 0.4907 %)
3D equations right-side terms .................... 5.884 ( 4.4967 %)
3D equations predictor step ...................... 13.001 ( 9.9347 %)
Pressure gradient ................................ 3.383 ( 2.5854 %)
Harmonic mixing of tracers, S-surfaces ........... 1.418 ( 1.0839 %)
Harmonic stress tensor, S-surfaces ............... 2.264 ( 1.7298 %)
Corrector time-step for 3D momentum .............. 14.086 (10.7640 %)
Corrector time-step for tracers .................. 8.481 ( 6.4809 %)
Total: 128.614 98.2840
All percentages are with respect to total time = 130.859
```

However, the differences the calculation times are too large. I can not believe that the opteron processors are so inferior to the intel ones. Specially in the "model 2D kernel" part.

A friend of mine has raised the question that the server maybe using the GPU (graphics CPU) to do some of the computations. I doubt it! Is there anything in ROMS code that use this kind of resource?

With your experience, do you see any other possibility to explain such a difference?

The HPC architecture of the SGI servers is becoming very popular.I have been struggling with a benchmark problem with ROMS and maybe you can bring a different insight to this issue.