Opened 7 years ago
Closed 7 years ago
#783 closed upgrade (Done)
VERY IMPORTANT: ROMS Dynamic, Automatic, and Static Memory Requirements
Reported by: | arango | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Release ROMS/TOMS 3.7 |
Component: | Nonlinear | Version: | 3.7 |
Keywords: | Cc: |
Description
Currently, ROMS uses primarily dynamic and automatic memory which is allocated at running time. It uses small static memory allocation at compile time.
The dynamical memory is that associated with the ocean state arrays, and it is allocated at runtime, and it is persistent until the ROMS termination of the execution.
The automatic arrays appear in subroutines and functions for temporary local computations. They are created on entry to the subroutine for intermediate computations and disappear on exit. The automatic arrays (meaning non-static) are either allocated on heap or stack memory. If using the ifort compiler, the option -heap-arrays directs the compiler to put automatic arrays on the heap instead of the stack. However, it may affect performance by slowing down the computations. If using stack memory, the application needs to have enough to avoid weird segmentation faults during execution. In Linux operating systems, unlimited stack memory is possible by setting:
ulimit -s unlimited in your .bashrc limit stacksize unlimited in your .cshrc, .tcshrc
The static arrays are allocated at compilation time and the memory reserved can be neither increased or decreased. Only a few static arrays are used in ROMS and mainly needed for I/O processing in the mod_netcdf routines.
In serial and shared-memory (OpenMP) applications, the dynamic memory associated with the ocean state is for full, global variables. Contrarily, in distributed-memory (MPI) applications, the dynamical memory related to the ocean state is for the smaller tiled arrays with global indices. Recall that the tiling in only done in the horizontal I- and J-dimensions and not in the vertical dimension.
Mostly all the ocean state arrays are dereferenced pointers and are allocated after processing ROMS standard input parameters. Recall that arrays represent a continuous linear sequence of memory. The pointer indicates the beginning of the state variable in the memory block.
ROMS is updated to compute an estimate of the dynamic and automatic memory requirements needed to run an application. The automatic memory is difficult to estimate since it is volatile. The maximum automatic memory is computed by looking at step2d.F, step3d_t.F, and I/O routines. Check mod_arrays.F to see how it is done. Also, information is provided in ROMS/memory.txt.
We can use the memory requirements to optimize partitions in parallel computers by examining the memory available for each Persistent Execution Thread (PET) or CPU. We need to make sure that the memory required by each distributed-memory tile fits on the PET to accelerate computations and optimize the computer resources.
The memory requirements are written to the standard output file after the activated CPP options report. For example, for three grids nested application on four distributed-memory PETs, we get:
Process Information: Node # 0 (pid= 7420) is active. Node # 3 (pid= 7423) is active. Node # 1 (pid= 7421) is active. Node # 2 (pid= 7422) is active. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic and Automatic memory (MB) usage for Grid 01: 240x104x40 tiling: 2x2 tile Dynamic Automatic USAGE 0 145.33 16.94 162.27 1 146.47 16.94 163.42 2 147.88 16.94 164.83 3 149.05 16.94 165.99 SUM 588.73 67.78 656.51 Dynamic and Automatic memory (MB) usage for Grid 02: 204x216x40 tiling: 2x2 tile Dynamic Automatic USAGE 0 217.32 29.46 246.78 1 215.32 29.46 244.78 2 215.43 29.46 244.89 3 213.45 29.46 242.91 SUM 861.52 117.84 979.36 Dynamic and Automatic memory (MB) usage for Grid 03: 276x252x40 tiling: 2x2 tile Dynamic Automatic USAGE 0 334.33 46.32 380.66 1 332.02 46.32 378.34 2 331.81 46.32 378.13 3 329.51 46.32 375.83 SUM 1327.67 185.29 1512.95 TOTAL 2777.92 370.90 3148.82 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Notice that the information is provided in decimal megabytes (MB) for each PET (tile). The USAGE column is the sum of dynamic and automatic memory requirements for each PET (tile). The report is also done for each nested grid. The TOTAL row provides memory requirements for all tree nested grids. Its value is a little underestimated. It will give you a guideline of what amounts to use in supercomputer queueing batch jobs. This application needs around 3.5 GB if we want a nice rounded number.
In a shared-memory 2x2 partitions for the UPWELLING test case with the BIO_FENNEL ecosystem model, we get:
Process Information: Thread # 3 (pid= 70227) is active. Thread # 0 (pid= 70227) is active. Thread # 1 (pid= 70227) is active. Thread # 2 (pid= 70227) is active. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic and Automatic memory (MB) usage for Grid 01: 41x80x16 tiling: 2x2 tile Dynamic Automatic USAGE 0 216.11 2.93 219.04 1 0.00 2.81 2.81 2 0.00 2.93 2.93 3 0.00 2.81 2.81 TOTAL 216.11 11.48 227.59 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Since it is a shared-memory application, the dynamic memory requirements are reported only for PET (tile) zero.
Identical values are obtained in a serial 2x2 partitions:
Process Information: Thread # 0 (pid= 36227) is active. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic and Automatic memory (MB) usage for Grid 01: 41x80x16 tiling: 2x2 tile Dynamic Automatic USAGE 0 216.11 2.93 219.04 1 0.00 2.81 2.81 2 0.00 2.93 2.93 3 0.00 2.81 2.81 TOTAL 216.11 11.48 227.59 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
and similar values in a serial 1x1 partitions:
Process Information: Thread # 0 (pid= 18037) is active. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic and Automatic memory (MB) usage for Grid 01: 41x80x16 tiling: 1x1 tile Dynamic Automatic USAGE 0 216.11 9.75 225.86 TOTAL 216.11 9.75 225.86 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
WARNING:
The memory requirement values are reported in the International System (SI) Units for megabyte (MB):
In ROMS: one r4 array element = 32 bits = 4 bytes (single-precission) one r8 array element = 64 bits = 8 bytes (double-precission) In the metric decimal system (SI): 1 byte 8 bits 1 kilobyte (kB) 1E+3 bytes (1000) 1 megabyte (MB) 1E+6 bytes (1000^2) 1 gigabyte (GB) 1E+9 bytes (1000^3) 1 terabyte (TB) 1E+12 bytes (1000^4) 1 petabyte (PB) 1E+15 bytes (1000^5) In the binary system (deprecated): 1 kibibyte (KiB) 1024 bytes (2^10) 1 mebibyte (MiB) 1,048,576 bytes (2^20, 1024^2) 1 gibibyte (GiB) 1,073,741,834 bytes (2^30, 1024^3) 1 tebibyte (TiB) 1,099,511,627,776 bytes (2^40, 1024^4)