YAML Parser: Difference between revisions
| No edit summary   (change visibility)  | |||
| Line 12: | Line 12: | ||
| Currently, it supports the following options: | Currently, it supports the following options: | ||
| *Single or multiple line comments start with a hash '''#'''. Also, comment after a key/value pair is allowed. All comments are skipped during processing. | *'''Single''' or '''multiple''' line comments start with a hash '''<span class="red">#</span>'''. Also, comment after a key/value pair is allowed. All comments are skipped during processing. | ||
| *It has an  | *It has an '''unlimited nested structure''' (lists, mappings, hierarchies). Indentation of whitespace is used to denote structure. | ||
| *It has an unrestricted schema indentation. However, some schema validators recommend or impose two whitespace indentations. | *It has an '''unrestricted schema indentation'''. However, some schema validators recommend or impose '''two whitespace''' indentations. | ||
| *A colon follows a key to denote a mapping value like:<div class="box" style="margin-left: 30px;">ocean_model: ROMS</div> | *A <span class="red">colon</span> follows a <span class="violet">key</span> to denote a mapping <span class="twilightBlue">value</span> like:<div class="box" style="margin-left: 30px;"><span class="violet">ocean_model</span>: <span class="twilightBlue">ROMS</span></div> | ||
| *It supports Aliases  | *It supports <span class="green">Anchors</span> and <span class="orange">Aliases</span>.<div class="box" style="margin-left: 30px;"><span class="violet">ATM_component</span>: <span class="green">&ATM</span> <span class="twilightBlue">WRF</span><br /><br /><span class="violet">metadata</span>:<br /><br />  - <span class="violet">standard_name</span>:       <span class="twilightBlue">surface_eastward_wind</span><br />    <span class="violet">long_name</span>:           <span class="twilightBlue">surface eastward wind</span><br />    <span class="violet">short_name</span>:          <span class="twilightBlue">Uwind</span><br />    <span class="violet">data_variables</span>:      [<span class="twilightBlue">uwind</span>, <span class="twilightBlue">time</span>]<br />    <span class="violet">source_units</span>:        <span class="twilightBlue">m s-1</span><br />    <span class="violet">destination_units</span>:   <span class="twilightBlue">m s-1</span><br />    <span class="violet">source_grid</span>:         <span class="twilightBlue">cell_center</span><br />    <span class="violet">destination_grid</span>:    <span class="twilightBlue">cell_center</span><br />    <span class="violet">add_offset</span>:          <span class="twilightBlue">0.0d0</span><br />    <span class="violet">scale</span>:               <span class="twilightBlue">1.0d0</span><br />    <span class="violet">debug_write</span>:         <span class="twilightBlue">false</span><br />    <span class="violet">connected_to</span>:        <span class="orange">*ATM</span>                                   <span class="twilightBlue"># u10</span><br />    <span class="violet">regrid_method</span>:       <span class="twilightBlue">bilinear</span><br />    <span class="violet">extrapolate_method</span>:  <span class="twilightBlue">none</span></div> | ||
| *It supports blocking lists: members are denoted by a leading hyphen and space, which is considered part of the indentation.   | *It supports '''blocking lists''': members are denoted by a leading <span class="red">hyphen-and-space</span>, which is considered part of the indentation.   | ||
| *It supports a flow sequence: a vector list with values enclosed in square brackets and separated by a comma-and-space, like a keyword: [val1, ..., valN].   | *It supports a '''flow sequence''': a vector list with values enclosed in <span class="red">square brackets</span> and separated by a <span class="red">comma-and-space</span>, like a <span class="violet">keyword</span>: <span class="red">[</span><span class="twilightBlue">val1</span><span class="red">,</span> ...<span class="red">,</span> <span class="twilightBlue">valN</span><span class="red">]</span>.   | ||
| *The keyword value(s) is (are) processed and stored as strings but converted to a logical, integer, floating-point, or derived-type when appropriate during extraction. If particular derived-type values are needed, the caller can process such a structure outside the parser.   | *The <span class="violet">keyword</span> <span class="twilightBlue">value(s)</span> is (are) processed and stored as strings but converted to a logical, integer, floating-point, or derived-type when appropriate during extraction. If particular derived-type values are needed, the caller can process such a structure outside the parser.   | ||
| *It removes unwanted control characters like tabs and separators (ASCII character code 0-31).   | *It removes '''unwanted control characters''' like tabs and separators (ASCII character code 0-31).   | ||
| *It is restricted to the English uppercase and lowercase alphabet but can be expanded to other characters (see '''yaml_ValueType''' routine).   | *It is restricted to the '''English''' uppercase and lowercase alphabet but can be expanded to other characters (see '''yaml_ValueType''' routine).   | ||
| *Multiple or continuation lines are supported. So, for example, we can have:<div class="box" style="margin-left: 30px;">state variables: [sea_surface_height_anomaly,<br />                  barotropic_sea_water_x_velocity,<br />                  barotropic_sea_water_y_velocity,<br />                  sea_water_x_velocity,<br />                  sea_water_y_velocity,<br />                  sea_water_potential_temperature,<br />                  sea_water_practical_salinity]</div> | *Multiple or continuation lines are supported. So, for example, we can have:<div class="box" style="margin-left: 30px;"><span class="violet">state variables</span>: <span class="red">[</span><span class="twilightBlue">sea_surface_height_anomaly</span><span class="red">,</span><br />                  <span class="twilightBlue">barotropic_sea_water_x_velocity</span><span class="red">,</span><br />                  <span class="twilightBlue">barotropic_sea_water_y_velocity</span><span class="red">,</span><br />                  <span class="twilightBlue">sea_water_x_velocity</span><span class="red">,</span><br />                  <span class="twilightBlue">sea_water_y_velocity</span><span class="red">,</span><br />                  <span class="twilightBlue">sea_water_potential_temperature</span><span class="red">,</span><br />                  <span class="twilightBlue">sea_water_practical_salinity</span><span class="red">]</span></div> | ||
Revision as of 15:48, 24 April 2022
Starting svn revision -r 1902 released on March 1, 2022, the ROMS metadata is managed with a YAML file, and the regular text file varinfo.dat is deprecated. The YAML files are simple, easy to follow, elegant, portable, and expandable. ROMS now can process YAML files with its parser module, yaml_parser.F. Therefore, there is no need to use third-party YAML parsers.
The ROMS YAML parser source code can be found in ROMS/Utility. It is written in Fortran 2003 and includes a CLASS of type yaml_tree for parsing input YAML files.
Introduction
Although several YAML parsers for Fortran exist, a more straightforward and uncomplicated parser with substantial capabilities was coded. It is a hybrid between standard and Object-Oriented Programming (OOP) principles but without the need for recurrency, polymorphism, and containers (another library).
The only constraint in the parser is that the YAML file is read twice for simplicity and to avoid containers. The container is a Fortran vector! The first read determines the number indentation of blanks policy and the length of the collection vector, list(:) pairs object (CLASS yaml_pair). The first reading is quick. Overall, the parser is very fast and works in parallel. All PETs are involved in their dictionary copy to avoid overhead in collective MPI calls.
Capabilities
Currently, it supports the following options:
- Single or multiple line comments start with a hash #. Also, comment after a key/value pair is allowed. All comments are skipped during processing.
- It has an unlimited nested structure (lists, mappings, hierarchies). Indentation of whitespace is used to denote structure.
- It has an unrestricted schema indentation. However, some schema validators recommend or impose two whitespace indentations.
- A colon follows a key to denote a mapping value like:ocean_model: ROMS
- It supports Anchors and Aliases.ATM_component: &ATM WRF
 metadata:
 - standard_name: surface_eastward_wind
 long_name: surface eastward wind
 short_name: Uwind
 data_variables: [uwind, time]
 source_units: m s-1
 destination_units: m s-1
 source_grid: cell_center
 destination_grid: cell_center
 add_offset: 0.0d0
 scale: 1.0d0
 debug_write: false
 connected_to: *ATM # u10
 regrid_method: bilinear
 extrapolate_method: none
- It supports blocking lists: members are denoted by a leading hyphen-and-space, which is considered part of the indentation.
- It supports a flow sequence: a vector list with values enclosed in square brackets and separated by a comma-and-space, like a keyword: [val1, ..., valN].
- The keyword value(s) is (are) processed and stored as strings but converted to a logical, integer, floating-point, or derived-type when appropriate during extraction. If particular derived-type values are needed, the caller can process such a structure outside the parser.
- It removes unwanted control characters like tabs and separators (ASCII character code 0-31).
- It is restricted to the English uppercase and lowercase alphabet but can be expanded to other characters (see yaml_ValueType routine).
- Multiple or continuation lines are supported. So, for example, we can have:state variables: [sea_surface_height_anomaly,
 barotropic_sea_water_x_velocity,
 barotropic_sea_water_y_velocity,
 sea_water_x_velocity,
 sea_water_y_velocity,
 sea_water_potential_temperature,
 sea_water_practical_salinity]
