How to Mirror SILO Datasets

You can maintain your own local copy of SILO datasets using the methods described below.

If you wish to mirror SILO datasets, please read the usage information about data mirroring on our Frequently asked questions page.


Station datasets

SILO provides an incremental update facility that enables clients to efficiently mirror our point datasets at station locations (“patched point datasets”).  The system consists of:

  • a base dataset (updated when SILO undergoes a major update)
  • a monthly update (contains all changes since the base dataset was constructed)
  • a daily update (contains all changes since the monthly update was constructed).

To mirror SILO's patched point datasets:

  1. Download the incremental update files from AWS Public Data using the Amazon Web Services Command Line Interface (CLI):
    1. Install the AWS CLI
    2. Use the CLI sync command to mirror the data.

      For example, to mirror the files into your local target folder:
      aws s3 sync s3://silo-open-data/Official/PPD_mirror target --exact-timestamps

      Notes:
      • the first time you run the sync command it will download the entire dataset
      • you need to re-run the sync command every time you wish to update your local copy (sync will only download files that have changed)
      • the --exact-timestamps option is required otherwise sync will not download files which have been updated but still have the same file size.
    Note: you need to download the update files every time you wish to update your mirror.
  1. Reconstruct the patched point datasets
    The patched datasets can be reconstructed from the incremental updates in any way the user chooses. For example, you may wish to reconstruct datasets containing only maximum temperature for stations in Victoria and discard all other stations and variables.

    SILO provides a software package which demonstrates one method for reconstructing the patched datasets. The package contains instructions on how to install and operate the software. Please note SILO provides this software in good faith and is not responsible for its use or misuse.
    Download software package

Note: SILO does not provide a facility for mirroring point datasets at grid cell locations because it would overload the system (there are approximately 290,000 grid cell locations). If you require temporal datasets at grid cell locations please download our gridded datasets and extract the relevant data.


Gridded datasets

To mirror SILO's gridded datasets you can either:

Use the Amazon Web Services Command Line Interface (CLI):
  1. Install  the AWS CLI
  2. Use the CLI sync command to mirror the data.

    For example, to mirror the monthly rainfall rasters into your local target folder:
    aws s3 sync s3://silo-open-data/Official/annual/monthly_rain target --exact-timestamps

Notes:

  • the first time you run the sync command it will download the entire dataset
  • you need to re-run the sync command every time you wish to update your local copy (sync will only download files that have changed)
  • the --exact-timestamps  option is required otherwise sync will not download files which have been updated but still have the same file size.
or

 

Manually download new and/or updated rasters:
  1. Download the entire set of rasters for the variable(s) that you wish to mirror.

    A list of files available for download can be obtained via URL:

    https://s3-ap-southeast-2.amazonaws.com/silo-open-data/Official/annual/index.html
    Individual files can be downloaded using the methods described on our gridded data page. For example, the monthly rainfall rasters for 1989 can be downloaded using curl  as follows:
    curl 'https://s3-ap-southeast-2.amazonaws.com/silo-open-data/Official/annual/monthly_rain/1989.monthly_rain.nc'
    Note: this step only needs to be done once.
  2. Each time you wish to update your local copy:

    Use the file listing:

    https://s3-ap-southeast-2.amazonaws.com/silo-open-data/Official/annual/index.html
    to identify any new or updated files (see the file creation date), and then manually download the relevant file(s).

Please note SILO data are constantly evolving so you will need to determine how often you wish to update your local copy of the data. SILO data typically change due to:

  • Nightly updates: each night SILO ingests new data which have been collected recently. This typically only impacts the most recent datasets (rainfall datasets for the preceding 12 months and other variables for the preceding 3-6 months)
  • Bulk updates: SILO periodically regenerates the entire dataset to incorporate new features or to take advantage of data improvements. This typically impacts the entire time period spanned by the affected variable(s).

You may also wish to consider your network bandwidth and transfer costs when determining how often you update your local copy. The rasters are packed into annual files, each being around 410 MB in size for daily variables and around 14 MB for monthly rainfall.

Last updated: 7 July 2021