saeon_preservation_policy

This is an old revision of the document!


The purpose of this document is to describe the data preservation framework governing the operations of the SAEON Open Data Platform (ODP). This policy covers all data and metadata archived and published in the ODP with the exception of community portals hosted on behalf of stakeholders external to SAEON that have different preservation requirements.

The principles of this policy are informed by:

  • Trusted Digital Repository Standards and Frameworks
    • ISO 16363
    • Trustworthy Repositories Audit & Certification (TRAC)
  • Reference Model for an Open Archival Information System (OAIS)
  • FAIR Data Principles

This policy may be revised if the framework governing the SAEON ODP changes.

Mission and Organisational Mandate

SAEON is a sustained, coordinated, responsive and comprehensive in situ Earth observation network that delivers long-term reliable data for scientific research and informs decision-making for a knowledge society and improved quality of life.

SAEON received a portfolio of funding from the Department of Science and Innovation (DSI) to preserve and provide access to earth and environmental observation data for South Africa and so archives any publicly funded data or open that is captured in this domain.

uLwazi is the node within SAEON that is responsible for acquiring, enhancing, storing, maintaining and disseminating the data and is made up of four teams focused on IT infrastructure management, systems development, data curation and data science. The uLwazi node also runs various data related projects for government stakeholders and so continuously works on sourcing datasets that can supplement decision support and policy making in areas relevant to these projects.

Data Infrastructure

The ODP is SAEON’s overall research data infrastructure that includes a number of data and metadata collections and portals that are customised for particular stakeholder communities.

The datasets hosted by SAEON include spatial data, multidimensional data, time series data and general digital objects and media data for the earth and environmental observation domain.

Appraisal and selection of data

On verification of a SAEON Data Policy compliant data submission, a Submission Information Package (SIP) consisting of data and metadata submitted by a data provider is created by the data curation team and uploaded to the SAEON ODP file repository.

Once the SIP has been created, Quality Assurance (QA) and selection of an appropriate data store are executed. Available data store options depend on the data formats. These are:

  • Geospatial Databases and Servers - for vector and raster spatial datasets.
  • THREDDS/OPeNDAP - for multidimensional datasets (NetCDF)
  • Relational Database Management Systems - for time Series observations.
  • File System for managing text files, images, video, audio - for any other digital object or unstructured data.

At this stage the SIP may be reassigned to a data curator with the relevant domain expertise. The purpose of this QA step is to check if any information is needed from the data provider prior to publishing the dataset. If no further data management actions are required, an Archival Information Package (AIP) is generated and passed on to anotherdata curator for Quality Control (QC) and publication.

The Archival Information Package (AIP) generation is initiated by uploading the data in the SIP into the correct data store. If the data are not in the correct format for long-term preservation, additional curation steps such as format migration, further updates to metadata and additional quality assurance are added to the workflow. See Table 1 below for preferred preservation file formats.

Table 1: Recommended file formats for long-term data preservation

Data Type Recommended File Formats
Documents Plain text (.txt)PDF (.pdf)
Tabular data Comma separated values (.csv)
Geospatial data Shapefile (.shp, including .shx and .dbf)GeoTIFF (.tiff and .tfw)
Multidimensional NetCDF (.nc)
Time series data Relational database (SQL)Comma separated values (.csv)Plain text (.txt)
Images TIFF (.tiff)
Audio Wav (.wav)
Video Quicktime (.mov)Mpeg 4 (.mp4)
  • saeon_preservation_policy.1657531372.txt.gz
  • Last modified: 2022/07/11 09:22
  • by lindsay