===== Multidimensional Data Guidance Overview ===== The SAEON Open Data Platform allows for the storage, discoverability and download of multidimensional scientific data. Network Common Data Format (netCDF) is a file format designed for storing multidimensional data in the form of arrays, and is widely used in the atmospheric and oceanographic communities. The netCDF library has been developed by the [[https://www.unidata.ucar.edu/software/netcdf/|Unidata program]] and is freely available. NetCDFs are self-describing, so it is essential that certain conventions are followed when creating these files. SAEON follows the requirements set out by the [[https://cfconventions.org/|Climate and Forecast]] (CF) and [[https://wiki.esipfed.org/Category:Attribute_Conventions_Dataset_Discovery|Attribute Conventions Dataset Discovery]] (ACDD) conventions for netCDF metadata and data. NetCDF standardisation also promotes interoperability and allows us to add tools such as THREDDS data services to the data. The main components of a netCDF that should be standardized are dimensions, variables and attributes. ==== NetCDF File Structure ==== There are certain types of discrete sampling in atmospheric/oceanographic research, like point, time series, trajectory or profile, and each of these feature types have defined netCDF file structures. The US National Oceanographic Data Centre has developed [[https://www.nodc.noaa.gov/data/formats/netcdf/v1.1/|feature type templates and examples]], which are accepted widely for netCDF standardisation. Most importantly, each feature type template highlights the relationships set out for netCDF dimensions and variables. === Global Attributes === This section contains metadata that pertains to the overall netCDF file. The following table contains global attributes derived from CF and ACDD conventions required by the SAEON ODP for multidimensional data in netCDF format. For additional resources you can refer directly to the official documentations of each convention. Note: Time information in the global attributes should be formatted as a string in ISO 8601 standard format,“YYYY-MM-DDThh:mm:ssZ” (i.e. year – month – day “T” hour : minute : second “Z”). Where “Z” indicates UTC time zone (with zero offset). __Table 1. List of core global attributes for SAEON netCDF files__ //All attributes are MANDATORY to include in your netCDF unless stated: CONDITIONAL= include these fields if the information exists, or RECOMMENDED = it is recommended to include these fields but not mandatory.// | **Attribute** | **Example** | **Description** | | title | “CTD data from the ASCA transect station 5” | A succinct description of what is in the dataset. | | %%institution%% | %%“Department of Forestry, Fisheries and the Environment”%% | %%Specifies where the original data was produced.%% | | source | “CTD Measurement” | The method of production of the original data. | | history | "Thu Aug 4 14:19:04 2018: ncatted -h -a long_name,longitude,c,c,longitude" | Provides an audit trail for modifications to the original netCDF file. | | references | “DOI: 10.15493/SAEON.EGAGASINI.10000004” | Published or web-based references that describe the data or methods used to produce it. Recommend URIs (such as a URL or DOI) for papers or other references. | | comment | “Stations 13 and 17 of the ASCA transect were abandoned due to large swells and bad weather.” | Miscellaneous information about the data or methods used to produce it. | | Conventions | “CF-1.7” | Name of the format convention used by the dataset. | | summary | “The Agulhas System Climate Array (ACSA) is a mooring array and is designed to provide long-term observations of the Agulhas Current volume, heat and salt transport of the and its variability. The ASCA shelf and tall moorings extend 200 km offshore along descending TOPEX/Jason satellite ground track # 96, through the core of the Agulhas Current with CPIES measurements extending the array to 300 km offshore of Port Elizabeth. The CTD data here was collected on voyage AGU037 in July 2018.” | An abstract about the data contained within the file. | | keywords | “CTD, ASCA, Temperature, Salinity, South Africa, Agulhas Current” | A comma separated list of keywords related to the dataset. | | license | “Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)” | Describe the restrictions to data access and distribution. | | license_url | “[[https://creativecommons.org/licenses/by-sa/4.0/legalcode]]” | | | date_created | “2018-07-05T08:35:00Z” | The date on which the file was created. Use ISO 8601 standard for date and time. | | creator_name | “John Smith” | The name of the person principally responsible for creating this data. | | creator_email | “jsmith@email.com” | The email address of the person principally responsible for creating this data. | | project (CONDITIONAL) | “Agulhas System Climate Array” | The name of the project(s) principally responsible for originating this data. | | publisher_name (CONDITIONAL) | “South African Environmental Observation Network” | The name of the person responsible for publishing the data file or product to users, with its current metadata and format. | | publisher_email (CONDITIONAL) | “curation@saeon.ac.za” | The email address of the person responsible for publishing the data file or product to users, with its current metadata and format. | | geospatial_lat_min | “-34.678” | Describes the southernmost latitude covered by the dataset. A value between -90 and 90 decimal degrees North. | | geospatial_lat_max | “-34.678” | Describes the northernmost latitude covered by the dataset. A value between -90 and 90 decimal degrees North. | | geospatial_lat_units | “degrees_north” | Units used for geospatial_lat_min/max. Default is “degrees_north”. | | geospatial_lon_min | “26.012” | Describes the westernmost longitude covered by the dataset. A value between -180 and 180 degrees East. | | %%geospatial_lon_max%% | “26.012” | Describes the easternmost longitude covered by the dataset. A value between -180 and 180 degrees East. | | geospatial_lon_units | “degrees_east” | Units used for geospatial_lon_min/max. Default is “degrees_east”. | | geospatial_vertical_min | %%“10”%% | Describes the minimum depth covered by the dataset. | | geospatial_vertical_max | “1256” | Describes the maximum depth covered by the dataset. | | geospatial_vertical_units | “metres” | Units used for geospatial_vertical_min/max. | | geospatial_vertical_positive | “down” | The direction of increasing vertical coordinate values corresponding to a reference point. Either “up” or “down”. | | time_coverage_start | “2018-07-05T08:35:00Z” | Start time of the data in this dataset. Use ISO 8601 standard for date and time. | | time_coverage_end | “2018-07-05T11:55:10Z” | End time of the data in this dataset. Use ISO 8601 standard for date and time. | | date_modified (CONDITIONAL) | “2018-08-12T10:30:00Z” | The date on which this file was last modified. Use ISO 8601 standard for date and time. | | date_metadata_modified (CONDITIONAL) | “2018-08-12T10:30:00Z” | The date on which this metadata was last modified. Use ISO 8601 standard for date and time. | | %%instrument%% | %%“SBE9/11 plus CTD”%% | Name of the contributing instrument or sensor used to create this dataset. | | lineage (RECOMMENDED) | “The CTD raw data was converted to scientific units using SBEDataProcessing software (ver 7.26.7.114). The following modules (SBEDataProcessing) were then run on the converted data using the recommended default values: Align CTD, Filter, Loop Edit and Cell Thermal Mass. The data was then split into the up cast and the down cast. The downcast data was then bin averaged.” | Information about how the data has been produced and processed, modified. | === Dimensions === NetCDF dimensions define the shape, grid or coordinate system of a variable. A dimension has both a name and a length. A dimension length is an arbitrary positive integer, except for cases where one dimension, at most, is UNLIMITED and can grow along that dimension. Once the data feature type has been identified, the US National Oceanographic Data Center [[https://www.nodc.noaa.gov/data/formats/netcdf/v1.1/|netCDF templates]] can be used as a guide for which dimensions to define in your file. Dimensions can be used to represent physical dimensions, for example, “time” (T), “latitude” (Y), “longitude” (X), or “depth” (Z). A dimension can also be used to index other quantities, for example, station number or model number. It is recommended that dimensions appear in the relative order T, then Z, then Y, then X, where applicable. Something to keep in mind in terms of naming dimensions and variables is the netCDF coordinate variable. This is when a variable has the exact same name as a dimension and is most commonly used for coordinates such as lat, lon, depth and/or time. This may not necessarily have special meaning to the netCDF library, however visualisation software treats this in a special way. === Variables === Variables in a netCDF contain the parameters measured by an instrument. Variables in netCDF files can be one of six types (char, byte, short, int, float or double). Each variable has a data type, name and a shape which is defined by its specified dimensions. Our guidance does not standardise variable names, for example, a suitable variable name for temperature data could be anything from: Temperature, Temp or TEMP, so long as the variable name sufficiently describes that data which it contains. Keep in mind the naming of netCDF //coordinate variables//. Each variable has a corresponding set of attributes which provide information about the data for the end user. The following tables contain variable attributes derived from CF and ACDD conventions required by SAEON ODP for multidimensional data in netCDF format. Note: The //long_name// attribute is defined to contain a long descriptive name of the variable which may, for example, be used for labelling plots. The //standard_name// attribute is the name used to identify the physical quantity and must be taken from the [[https://cfconventions.org/standard-names.html|CF standard name table]]. Parameters with no suitable standard_name should be described using the long_name attribute only. === Variable Attributes === == Time coordinates == Time data in a netCDF is represented as an integer representing an interval from some reference time. Time variables must include a clear units attribute, as there is no default value. The units attribute for a time variable is a string value in the format recommended by UDUNITS. Commonly these strings include “//days//”, “//hours//”, “//minutes//” or “//seconds//” //since// a specific date, time and time zone (e.g. “seconds since 1970-01-01 00:00:00 UTC”). Use Double data type. __Table 2. List of time variable attributes for SAEON netCDF files__ //All attributes are MANDATORY to include in your netCDF unless stated: CONDITIONAL= include these fields if the information exists, or RECOMMENDED = it is recommended to include these fields but not mandatory.// | **Attribute** | **Example** | **Description** | | standard_name | “time” | A description of the variable’s content from the CF standard name table. | | long_name | “time” | A descriptive name that indicates a variable’s content. | | units | “seconds since 1970-01-01 00:00:00 UTC” | Use approved CF convention with approved UDUNITS obtained from CF standard name table. Please contact SAEON curation if help is required. | | calendar (CONDITIONAL) | “gregorian” | Calendar used for encoding time axes. See CF documentation for clarification. Default is "gregorian". | | valid_min (RECOMMENDED) | “1530779700” | Smallest valid value of a variable. Should be of the same type as the variable type. | | valid_max (RECOMMENDED) | “1530791710” | Largest valid value of a variable. Should be of the same type as the variable type. | | axis | “T” | Identifies the time coordinate. Set value of “T”. | | comment (RECOMMENDED) | “....” | Miscellaneous information about the data, that cannot be described in any of the other available attributes. | == Horizontal coordinates == SAEON ODP uses the WGS84 coordinate reference system in decimal degrees when describing latitude and longitude. __Table 3. List of longitude and latitude variable attributes for SAEON netCDF files__ //All attributes are MANDATORY to include in your netCDF unless stated: CONDITIONAL= include these fields if the information exists, or RECOMMENDED = it is recommended to include these fields but not mandatory.// | **Attribute** | **Example** | **Description** | | standard_name | “latitude”or“longitude” | A description of the variable’s content from the CF standard name table. | | long_name | “latitude”or“longitude” | A descriptive name that indicates a variable’s content. | | units | “degrees_north” (LATITUDE)or “degrees_east” (LONGITUDE) | Use approved CF convention with approved UDUNITS obtained from CF standard name table. | | valid_min (RECOMMENDED) | “-34.678” (LATITUDE)or“26.012” (LONGITUDE) | Smallest valid value of a variable. Should be of the same type as the variable type. | | valid_max (RECOMMENDED) | “-34.678” (LATITUDE)or“26.012” (LONGITUDE) | Largest valid value of a variable. Should be of the same type as the variable type. | | axis | “Y” (LATITUDE)or “X” (LONGITUDE) | Identifies the horizontal coordinate. Set values of “X” or “Y”. | | comment (RECOMMENDED) | “....” | Miscellaneous information about the data, that cannot be described in any of the other available attributes. | == Vertical coordinates == Depth or height variables should be measured in SI units, most commonly “metres”. Pressure measurements should not be labelled as a depth or height variable, but as a separate parameter. __Table 4. List of depth variable attributes for SAEON netCDF files__ //All attributes are MANDATORY to include in your netCDF unless stated: CONDITIONAL= include these fields if the information exists, or RECOMMENDED = it is recommended to include these fields but not mandatory.// | **Attribute** | **Example** | **Description** | | standard_name | “depth” | A description of the variable’s content from the CF standard name table. | | long_name | “depth” | A descriptive name that indicates a variable’s content. | | units | “m” | Use approved CF convention with approved UDUNITS obtained from CF standard name table. | | positive | “down” | The direction of increasing vertical coordinate values corresponding to a reference point. Either “up” or “down”. | | valid_min (RECOMMENDED) | “10” | Smallest valid value of a variable. Should be of the same type as the variable type. | | valid_max (RECOMMENDED) | “1256” | Largest valid value of a variable. Should be of the same type as the variable type. | | axis | “Z” | Identifies the vertical coordinate. Set value of “Z”. | | comment (RECOMMENDED) | “....” | Miscellaneous information about the data, that cannot be described in any of the other available attributes. | == Geophysical parameters == These variables contain the data collected by an instrument or sensor. Note: For cases where a netCDF contains two sensors measuring the same variable it is important that these are easily differentiated, community best practise indicates one of the variable names is suffixed with “_2” (e.g. “Temp” and “Temp_2”). For guidance on variable naming conventions one can follow community guidance (e.g. [[https://github.com/aodn/imos-toolbox/blob/master/IMOS/imosParameters.txt|IMOS naming toolbox]]) or contact the [[curation@saeon.nrf.ac.za|SAEON Data Curators]]. __Table 5. List of geophysical parameter variable attributes for SAEON netCDF files__ //All attributes are MANDATORY to include in your netCDF unless stated: CONDITIONAL= include these fields if the information exists, or RECOMMENDED = it is recommended to include these fields but not mandatory.// | **Attribute** | **Example** | **Description** | | standard_name (CONDITIONAL) | “sea_water_temperature” | A description of the variable’s content from the CF standard name table. | | long_name | “Temperature” | A descriptive name that indicates a variable’s content. | | units | “degree_C” | Use approved CF convention with approved UDUNITS obtained from CF standard name table. | | scale_factor (CONDITIONAL) | “0.01” | If the data uses a scale_factor other than 1. Should be of the same type as the variable type. | | add_offset (CONDITIONAL) | “25.0" | If the data uses an add_offset other than 0. Should be of the same type as the variable type. | | _FillValue (CONDITIONAL) | “-9999” | This value is considered to be a special value that indicates undefined or missing data. Should be of the same type as the variable type. | | valid_min (RECOMMENDED) | “2.565” | Smallest valid value of a variable. Should be of the same type as the variable type. | | valid_max (RECOMMENDED) | “23.198” | Largest valid value of a variable. Should be of the same type as the variable type. | | coordinates (RECOMMENDED) | “T Y X Z” | This attribute contains a space separated list of all the coordinates corresponding to the variable. | | comment (RECOMMENDED) | “....” | Miscellaneous information about the data, that cannot be described in any of the other available attributes. | == Quality control flags (CONDITIONAL) == If applicable to the platform, quality control flags can be self describing variables that show some sort of assessment to identify possible errors in the data. __Table 6. List of quality control flag variable attributes for SAEON netCDF files__ //All attributes are MANDATORY to include in your netCDF unless stated: CONDITIONAL= include these fields if the information exists, or RECOMMENDED = it is recommended to include these fields but not mandatory.// | **Attribute** | **Example** | **Description** | | standard_name | “....” | A description of the variable’s content from the CF standard name table. | | long_name | “quality flag for sea_water_temperature” | A descriptive name that indicates a variable’s content. | | _FillValue (CONDITIONAL) | “-99b” | This value is considered to be a special value that indicates undefined or missing quality control flags in the data. | | flag_values | 0b, 1b, 2b, 3b, 4b, 5b, 6b, 7b | List of flag values used in the data. | | flag_meanings | "No_QC_performed Good_data Probably_good_data Bad_data_that_are_correctable Bad_data Value_changed Missing_value" | The meaning of each flag in the same order as flag_values. | | comment (RECOMMENDED) | “....” | Miscellaneous information about the data, that cannot be described in any of the other available attributes. | === NetCDF manipulation tools === * [[https://downloads.unidata.ucar.edu/netcdf/|NetCDF library software]]: //ncdump// can convert a netCDF binary file to CDL text. * [[https://www.giss.nasa.gov/tools/panoply/|Panoply]] and [[http://cirrus.ucsd.edu/~pierce/software/ncview/index.html|Ncview]] are great visualisation tools. * [[http://nco.sourceforge.net/|NCO]] is a command-line toolkit for manipulating netCDF files: //ncatted// and //ncrename// are specifically useful for editing metadata on the fly.