Opened 7 years ago
Closed 7 years ago
#789 closed upgrade (Done)
VERY IMPORTANT: ROMS single precision numerical kernel
Reported by: | arango | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Release ROMS/TOMS 3.7 |
Component: | Nonlinear | Version: | 3.7 |
Keywords: | Cc: |
Description (last modified by )
ROMS uses double-precision arithmetic in its numerical kernel by default. I have been asked several times what is needed to run in single precision to accelerate the computations. I always answered that it is straightforward since ROMS defines the numerical precision with KIND type parameters in mod_kinds.F globally. That is, of course, very far from reality because we need to look at the accuracy in various computations like:
- time managing variables,
- bounded time interpolation between snapshots,
- vertical stretching of terrain-following coordinates,
- weighting coefficients for time averaging of barotropic fields,
- area and volume conservation variables,
- time variables representation in output NetCDF files, and
- processing of ROMS unique namelist like parameters from input scripts
Therefore, implementing a single-precision numerical kernel was not as trivial as I thought since we need to keep/compute a few variables in double precision with lots of files being modified.
What Is New:
- A new C-preprocessing SINGLE_PRECISION is introduced to convert the numerical kernel to single precision. In globaldef.h, we now have:
/* ** Turn ON/OFF double precision arithmetic in numerical kernel (default) ** and floating-point type variables and associated intrinsic functions. */ #ifdef SINGLE_PRECISION # ifdef OUT_DOUBLE # undef OUT_DOUBLE # endif # ifndef RST_SINGLE # define RST_SINGLE # endif #else # define DOUBLE_PRECISION #endif
Notice that if OUT_DOUBLE and RST_SINGLE are activated, they are turned off for consistency in the numerical kernel.
- In mod_kinds.F, we now have
integer, parameter :: i1b= SELECTED_INT_KIND(1) ! 8-bit integer, parameter :: i2b= SELECTED_INT_KIND(2) ! 8-bit integer, parameter :: i4b= SELECTED_INT_KIND(4) ! 16-bit integer, parameter :: i8b= SELECTED_INT_KIND(8) ! 32-bit integer, parameter :: c8 = SELECTED_REAL_KIND(6,30) ! 32-bit integer, parameter :: dp = SELECTED_REAL_KIND(12,300) ! 64-bit integer, parameter :: r4 = SELECTED_REAL_KIND(6,30) ! 32-bit # ifdef SINGLE_PRECISION integer, parameter :: r8 = SELECTED_REAL_KIND(6,30) ! 32-bit # else integer, parameter :: r8 = SELECTED_REAL_KIND(12,300) ! 64-bit # endif
Notice that the r8 parameter is defined for 32-bit (4-bytes) floating-point variables in single precision computations. A new parameter, dp, is introduced to keep few variable in double precision. For example, in mod_scalars.F we have:real(dp), allocatable :: tdays(:) ! days real(dp), allocatable :: time(:) ! seconds real(dp), allocatable :: dt(:) ! seconds real(dp), allocatable :: dtfast(:) ! seconds real(dp), allocatable :: TimeEnd(:) ! seconds real(dp), allocatable :: AVGtime(:) ! seconds real(dp), allocatable :: DIAtime(:) ! seconds real(dp), allocatable :: IMPtime(:) ! seconds real(dp), allocatable :: ObsTime(:) ! seconds real(dp), allocatable :: FrcTime(:) ! seconds real(dp) :: dstart = 0.0_dp ! days real(dp) :: io_time = 0.0_dp ! seconds real(dp) :: run_time = 0.0_dp ! seconds real(dp) :: tide_start = 0.0_dp ! days
- A new module, inp_decode.F, is introduced to process and decode ROMS KeyWord parameters from input script files. It has the following module procedures:
INTERFACE load_i MODULE PROCEDURE load_1d_i ! 1D integer array MODULE PROCEDURE load_2d_i ! 2D integer array MODULE PROCEDURE load_3d_i ! 3D integer array END INTERFACE load_i INTERFACE load_l MODULE PROCEDURE load_1d_l ! 1D logical array MODULE PROCEDURE load_2d_l ! 2D logical array MODULE PROCEDURE load_3d_l ! 3D logical array END INTERFACE load_l INTERFACE load_r #ifdef SINGLE_PRECISION MODULE PROCEDURE load_1d_dp ! 1D real(dp) array MODULE PROCEDURE load_2d_dp ! 2D real(dp) array MODULE PROCEDURE load_3d_dp ! 3D real(dp) array #endif MODULE PROCEDURE load_1d_r8 ! 1D real(r8) array MODULE PROCEDURE load_2d_r8 ! 2D real(r8) array MODULE PROCEDURE load_3d_r8 ! 3D real(r8) array END INTERFACE load_r
to process 1D, 2D, and 3D variables since we are using the strong typing properties of modules. Before we didn't use modules because all the arrays arguments were reshaped to 1D arrays (F77 style). This part is very tricky but more robust since we need to consider single and double precision variables that are written to input scripts with 8-bytes double representation. I considered polymorphic objects (available in Fortran 2003, 2008, 2018), but I rejected it because of backward compatibility with older Fortran compilers.
WARNING: The input script processing routines (read_asspar.F, read_biopar.F, read_phypar.F, and read_stapar.F) were modified, and the functions load_i, load_l, and load_r have a different number of arguments depending if we are processing 1D, 2D, or 3D variables. For Example for load_r we can have:
CASE ('DT') Npts=load_r(Nval, Rval, Ngrids, dt) CASE ('TNU2') Npts=load_r(Nval, Rval, NAT+NPT, Ngrids, Rtracer) CASE ('HdecayB(isTvar)') Npts=load_r(Nval, Rval, 4, MT, Ngrids, Rboundary)for processing dt(:), Rtracer(:,:,:), and Rboundary(:,:,:) respectively.
NOTICE: on principle single-precision computations run twice as fast as double-precision computations. However, since some double precision computations are carried out for accuracy in SINGLE_PRECISION solutions the gain is less than 50 percent. For example, in a 3-nested grid application on 4 CPUs, I get
Elapsed CPU time (seconds): Node # 0 CPU: 4923.242 Node # 1 CPU: 4953.730 Node # 2 CPU: 4953.504 Node # 3 CPU: 4953.539 Total: 19784.015 for double-precision computations Elapsed CPU time (seconds): Node # 0 CPU: 2628.964 Node # 1 CPU: 2683.894 Node # 3 CPU: 2683.867 Node # 2 CPU: 2683.818 Total: 10680.543 for single-precision computations
That is, the single-precision is 46 percent faster than the double-precision simulation.
In 4D-Var simulations, this number get much smaller because the computations are dominated by I/O.
Change History (1)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|---|
Resolution: | → Done |
Status: | new → closed |