Protocol modules controlled vocabularies pipeline

TERN Surveillance is developing a new plot-based survey protocol for the DAWE as part of RLP program. The protocol utilises many controlled lists of terms sourced from NVIS and the Australian Soil and Land Survey Field Handbook and stored in a PostgreSQL database as lookup tables.

To maximise data interoperability, the lookup tables used by the new survey protocol will be transformed into SKOS controlled vocabularies. Following Linked Data best practices, the controlled vocabularies will have a persistent and machine-resolvable IRI with basic Dublin Core metadata.

This document explains the data pipeline between TERN Surveillance and TERN Data Services and Analytics (TDSA).

This work is part of the deliverables in Section 2 - Deliverables for phase 4 of the DAWE Data Standards Project.

Work scope

TERN Surveillance has informed TDSA that the deliverable has been broken into 19 separate modules and each module will be worked on one by one.

Milestone 1

Milestone 1 will focus on completing the Floristics module.

Task

Description

Status

Task

Description

Status

Growth form controlled vocabulary

Set up a scheduled pipeline in Airflow to pull from the TERN Surveillance REST API the growth form list and transform it into a SKOS controlled vocabulary.

API endpoint to process: https://dev.core-api.paratoo.tern.org.au/documentation#/Lut-veg-growth-form/get_lut_veg_growth_forms

ongoing

APNI species list - full-text search

Shared scope with work in SHaRED (data submission tool), making available a full-text search of flora species names from APNI, backed by Elasticsearch.

ongoing

Milestone 2

Milestone 2 will focus on completing the Site Description module.

Task

Description

Status

Task

Description

Status

 

 

 

 

 

 

Implementation

A preliminary read-only development version of the web API has been provided by TERN Surveillance to access the PostgreSQL database containing the lookup tables.

Flow

TERN Surveillance provides a REST API for accessing the PostgreSQL database. On a schedule, TDSA utilises Airflow to pull the lookup tables from TERN Surveillance and transforms the values into SKOS controlled vocabularies. The transformed values are validated before ingesting into GraphDB and made available in the AusPlots vocabularies viewer.

Controlled vocabulary shapes

Concept

Generic concept shape, but API field examples are based on the landform elements endpoint: https://dev.core-api.paratoo.tern.org.au/lut-landform-elements.

Name

API field

Property

Required

Name

API field

Property

Required

preferred label

landform_element

skos:prefLabel

yes

code

code

skos:notation

yes

definition

description

skos:definition

yes

created

created_at

dcterms:created

yes

modified

updated_at

dcterms:modified

yes

identifier

id

dcterms:identifier

yes

source

 

dcterms:source

yes

alternate label

abbreviation

skos:altLabel

optional

Schemes

The new protocol’s controlled lists will be created as a standalone set of controlled vocabularies from the TDSA’s global set. This is to ensure that they function as intended within the scope of the DAWE’s Data Standards Project. If interoperability between datasets using the new protocol and controlled vocabularies from TDSA’s global set, then a linkset (mapping) may be used to assert the semantic relationships.

 

We at TERN acknowledge the Traditional Owners and Custodians throughout Australia, New Zealand and all nations.
We honour their profound connections to land, water, biodiversity and
culture and pay our respects to their Elders past, present and emerging.

TERN is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy, NCRIS.