Protocol modules controlled vocabularies pipeline
TERN Surveillance is developing a new plot-based survey protocol for the DAWE as part of RLP program. The protocol utilises many controlled lists of terms sourced from NVIS and the Australian Soil and Land Survey Field Handbook and stored in a PostgreSQL database as lookup tables.
To maximise data interoperability, the lookup tables used by the new survey protocol will be transformed into SKOS controlled vocabularies. Following Linked Data best practices, the controlled vocabularies will have a persistent and machine-resolvable IRI with basic Dublin Core metadata.
This document explains the data pipeline between TERN Surveillance and TERN Data Services and Analytics (TDSA).
This work is part of the deliverables in Section 2 - Deliverables for phase 4 of the DAWE Data Standards Project.
Work scope
TERN Surveillance has informed TDSA that the deliverable has been broken into 19 separate modules and each module will be worked on one by one.
Milestone 1
Milestone 1 will focus on completing the Floristics module.
Task | Description | Status |
---|---|---|
Growth form controlled vocabulary | Set up a scheduled pipeline in Airflow to pull from the TERN Surveillance REST API the growth form list and transform it into a SKOS controlled vocabulary. API endpoint to process: https://dev.core-api.paratoo.tern.org.au/documentation#/Lut-veg-growth-form/get_lut_veg_growth_forms | ongoing |
APNI species list - full-text search | Shared scope with work in SHaRED (data submission tool), making available a full-text search of flora species names from APNI, backed by Elasticsearch. | ongoing |
Milestone 2
Milestone 2 will focus on completing the Site Description module.
Task | Description | Status |
---|---|---|
|
|
|
|
|
|
Implementation
A preliminary read-only development version of the web API has been provided by TERN Surveillance to access the PostgreSQL database containing the lookup tables.
Flow
TERN Surveillance provides a REST API for accessing the PostgreSQL database. On a schedule, TDSA utilises Airflow to pull the lookup tables from TERN Surveillance and transforms the values into SKOS controlled vocabularies. The transformed values are validated before ingesting into GraphDB and made available in the AusPlots vocabularies viewer.
Controlled vocabulary shapes
Concept
Generic concept shape, but API field examples are based on the landform elements endpoint: https://dev.core-api.paratoo.tern.org.au/lut-landform-elements
.
Name | API field | Property | Required |
---|---|---|---|
preferred label | landform_element | skos:prefLabel | yes |
code | code | skos:notation | yes |
definition | description | skos:definition | yes |
created | created_at | dcterms:created | yes |
modified | updated_at | dcterms:modified | yes |
identifier | id | dcterms:identifier | yes |
source |
| dcterms:source | yes |
alternate label | abbreviation | skos:altLabel | optional |
Schemes
The new protocol’s controlled lists will be created as a standalone set of controlled vocabularies from the TDSA’s global set. This is to ensure that they function as intended within the scope of the DAWE’s Data Standards Project. If interoperability between datasets using the new protocol and controlled vocabularies from TDSA’s global set, then a linkset (mapping) may be used to assert the semantic relationships.
Related content
We at TERN acknowledge the Traditional Owners and Custodians throughout Australia, New Zealand and all nations.
We honour their profound connections to land, water, biodiversity and
culture and pay our respects to their Elders past, present and emerging.
TERN is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy, NCRIS.