Overview of SILO processes

SILO constructs datasets which are “ready to use” by post-processing raw observational data. Raw data records are unsuitable for most applications because observational data may:

  • Be missing values
  • Contain erroneous values
  • Not be available at or near the location of interest.

SILO provides datasets which are:

  • Spatially complete: Gridded rasters covering Australia and some nearby islands.
  • Temporally complete: Time series datasets at point locations. Two types of point datasets are offered; datasets at:
    1. Grid points: Daily (or monthly) time series consisting entirely of interpolated data. These records are obtained by extracting data from a selected grid cell over a time series of gridded rasters.
    2. Station locations: Daily (or monthly) time series consisting of observed data when available, and interpolated data when observed data are missing.

The various products are constructed as follows:

  1. Raw observational data are collated from records obtained from the Bureau of Meteorology and other providers.
  2. All available observations for a single variable and single day (or month, in the case of monthly rainfall) are assembled and spatially interpolated to create a gridded raster. This procedure is repeated for all climate variables, and all days (or months) over the period of interest (typically 1 January 1889 – present).
  3. Time series (point) datasets at station locations are constructed as follows. For each variable at a given station:
    1. All valid observations are assembled to create a partial time series.
    2. Missing values in the partial time series are “patched” with interpolated estimates.
    3. Any remaining missing values are patched with long term averages.
      This occurs when interpolated estimates are not available:
      1. Gridded rasters are not constructed for mean sea level pressure for years 1889-1956, nor Class A pan evaporation for years 1889-1969, as there are insufficient observational data to construct reliable surfaces.
      2. Neither interpolated nor observed data are available for maximum temperature and evaporation on the previous day (SILO provides data from 1 January 1889 up to yesterday).
      3. The interpolated data are flagged as being potentially erroneous.
    4. The dataset is augmented with source codes that indicate how each value was obtained e.g. observed, interpolated, long term average etc.

SILO’s data products are dynamic. While the datasets are initially created using the procedure defined above, they continually evolve:

  • Selected datasets may be updated if SILO modifies the scientific methods used in their preparation.
  • All datasets are periodically updated, typically every one to two years. The purpose of this update is to incorporate changes made by our data suppliers. For example, additional historic records may have become available, erroneous observations may have been identified and removed, or the coordinates of some observing stations may have been updated.
  • Recent datasets are updated every night with the latest observations. Users should note that recent datasets evolve significantly due to the delay in obtaining data from some sites (while some stations report in real-time, it may be many months or even years until data are received from all stations). The nightly update typically impacts the most recent 3-12 months of each dataset.

SILO’s gridded datasets are constructed by spatially interpolating the observational data. Ordinary kriging is used to interpolate daily and monthly rainfall, while a thin plate smoothing spline is used to interpolate minimum and maximum temperatures, Class A pan evaporation, mean sea level pressure, solar radiation and vapour pressure. An anomaly method is used to interpolate minimum and maximum temperatures, solar radiation and vapour pressure for all years up to and including 1956. Once the gridded rasters for these “primary” variables have been constructed, SILO then constructs gridded rasters for “derived” variables, such as relative humidity, vapour pressure deficit and various estimates of evapotranspiration. The derived variables are calculated on a pixel-by-pixel basis: each derived variable is calculated at each pixel using the corresponding pixel values taken from gridded rasters of the relevant input variables.

Once all the datasets have been assembled after the nightly update, they are packed, converted and stored in formats and locations ready for clients to access:

  • The point data are stored internally in a format tailored for high speed access. When a client requests a particular dataset (using either the web interface or API), the relevant data are assembled and returned immediately. SILO also provides a mechanism that enables clients to efficiently mirror point datasets at station locations.
  • The spatial data are converted into NetCDF and GeoTiff formats and copied to AWS Public Data. Clients can download the gridded data directly from the cloud platform, or via our web interface.

For those interested in the technical details, please consult the metadata and various technical publications. If you require further assistance, please contact us.

Last updated: 7 July 2021