Taxa Table
Source: CORVEG.TAXA
Example row:
LINNAEAN_ID | TAXON | FAMILY | GENUS | SPECIES | INFRA_RANK | INFRA | BC_NUMBER |
---|---|---|---|---|---|---|---|
4413 | Gynura drymophila var. drymophila | Asteraceae | Gynura | drymophila | var. | drymophila | 3525 |
Interpretation in SKOS
Create a
skos:ConceptScheme
for taxaskos:Concept
in both the CORVEG and the Australian Plant Census (APC).Link between the mapped concepts in CORVEG and the APC by
skos:closeMatch
.
Background
The Australian Plant Census (APC) is considered to be the authoritative source of data on vascular plants and mosses in Australia, provided by Biodiversity. They deliver the Australian Plant Name Index (APNI) as a dataset to tie in with the APC.
https://biodiversity.org.au/nsl/services/export/index - APC and APNI for vascular plants
https://moss.biodiversity.org.au/nsl/services/export/index - APC and APNI for mosses
Outcome
The terms in APC and CORVEG have been modelled using SKOS and the Darwin Core terms, and a relationship has been created between the two skos:ConceptScheme
using skos:closeMatch
.
The result of the ETL process for Taxa is three new controlled vocabularies:
the set of terms from the CORVEG Taxa table
the set of terms name-matched from the CORVEG Taxa table to the APC terms
a set of taxon rank terms used in the APC and APNI
Further Information
The mappings for APNI and APC terms have been documented in https://www.anbg.gov.au/ibis25/display/NSL/Data+Extracts+-+Darwin+Core by Biodiversity. The mappings have been created in taxa.py
for TERN's use since the Biodiversity's Linked Data APIs are currently down (as of March, 2019).
Due to CORVEG’s Taxa table's column infra_rank using the abbreviated form as its value, a controlled vocabulary had to be created to marry the terms up with APNI (as APNI uses the non-abbreviated form for its taxon rank value). For example, CORVEG’s value is var. while APNI's value is Varietas, but they're the same term.
RDF Example of the Taxon Rank Vocabulary
@prefix biod: <http://linked.data.gov.au/def/biodiversity/> .
@prefix bioreg: <http://linked.data.gov.au/def/bioregion/> .
@prefix corveg: <http://linked.data.gov.au/dataset/corveg/> .
@prefix corveg-cv: <http://linked.data.gov.au/def/corveg-cv/> .
@prefix corveg-def: <http://linked.data.gov.au/def/corveg/> .
@prefix corveg-dist: <http://linked.data.gov.au/dataset/corveg/disturbance/> .
@prefix corveg-geol: <http://linked.data.gov.au/dataset/corveg/geology/> .
@prefix corveg-location: <http://linked.data.gov.au/dataset/corveg/location/> .
@prefix corveg-rfstruct: <http://linked.data.gov.au/dataset/corveg/rf-structure/> .
@prefix corveg-site: <http://linked.data.gov.au/dataset/corveg/site/> .
@prefix corveg-site-strata: <http://linked.data.gov.au/dataset/corveg/site-strata/> .
@prefix corveg-site-tax-strata: <http://linked.data.gov.au/dataset/corveg/site-tax-strata/> .
@prefix corveg-situation: <http://linked.data.gov.au/dataset/corveg/situation/> .
@prefix corveg-soil: <http://linked.data.gov.au/dataset/corveg/soil/> .
@prefix corveg-soil-text: <http://registry.it.csiro.au/sandbox/soil-data-ie/def/voc/texture/> .
@prefix corveg-struct: <http://linked.data.gov.au/dataset/corveg/structure/> .
@prefix corveg-taxa: <http://linked.data.gov.au/dataset/corveg/taxa/> .
@prefix corveg-v-comm: <http://linked.data.gov.au/dataset/corveg/vege-community/> .
@prefix data: <http://linked.data.gov.au/def/datatype/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dwct: <http://rs.tdwg.org/dwc/terms/> .
@prefix epsg-crs: <http://www.opengis.net/def/crs/epsg/0/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geosparql: <http://www.opengis.net/ont/geosparql#> .
@prefix locn: <http://www.w3.org/ns/locn#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix plot: <http://linked.data.gov.au/def/plot/> .
@prefix plot-x: <http://linked.data.gov.au/def/plot/x/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sdo: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix ssn-ext: <http://www.w3.org/ns/ssn/ext/> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
biod:rank-classis a skos:Concept ;
skos:altLabel "cl." ;
skos:inScheme biod:taxon-rank ;
skos:notation 30 ;
skos:prefLabel "Classis" .
biod:rank-division a skos:Concept ;
skos:altLabel "div." ;
skos:inScheme biod:taxon-rank ;
skos:notation 20 ;
skos:prefLabel "Division" .
biod:rank-familia a skos:Concept ;
skos:altLabel "fam." ;
skos:inScheme biod:taxon-rank ;
skos:notation 80 ;
skos:prefLabel "Familia" .
biod:rank-forma a skos:Concept ;
dwct:specificEpithet "clypearia" ;
skos:altLabel "f." ;
skos:inScheme biod:taxon-rank ;
skos:notation 230 ;
skos:prefLabel "Forma" .
biod:rank-genus a skos:Concept ;
skos:altLabel "gen." ;
skos:inScheme biod:taxon-rank ;
skos:notation 120 ;
skos:prefLabel "Genus" .
biod:rank-infragenus a skos:Concept ;
skos:altLabel "infragenus" ;
skos:inScheme biod:taxon-rank ;
skos:notation 500 ;
skos:prefLabel "infragenus" .
biod:rank-infraspecies a skos:Concept ;
dwct:specificEpithet "calcitrapa" ;
skos:altLabel "infrasp." ;
skos:inScheme biod:taxon-rank ;
skos:notation 500 ;
skos:prefLabel "infraspecies" .
biod:rank-morphological-var a skos:Concept ;
dwct:specificEpithet "tasmanica" ;
skos:altLabel "morph." ;
skos:inScheme biod:taxon-rank ;
skos:notation 260 ;
skos:prefLabel "morphological var." .
biod:rank-n-a a skos:Concept ;
skos:altLabel "n/a" ;
skos:inScheme biod:taxon-rank ;
skos:notation 500 ;
skos:prefLabel "n/a" .
biod:rank-nothovarietas a skos:Concept ;
dwct:specificEpithet "bontei" ;
skos:altLabel "nothovar." ;
skos:inScheme biod:taxon-rank ;
skos:notation 210 ;
skos:prefLabel "Nothovarietas" .
biod:rank-ordo a skos:Concept ;
skos:altLabel "ordo" ;
skos:inScheme biod:taxon-rank ;
skos:notation 60 ;
skos:prefLabel "Ordo" .
biod:rank-regio a skos:Concept ;
skos:altLabel "regio" ;
skos:inScheme biod:taxon-rank ;
skos:notation 8 ;
skos:prefLabel "Regio" .
biod:rank-regnum a skos:Concept ;
skos:altLabel "reg." ;
skos:inScheme biod:taxon-rank ;
skos:notation 10 ;
skos:prefLabel "Regnum" .
biod:rank-sectio a skos:Concept ;
skos:altLabel "sect." ;
skos:inScheme biod:taxon-rank ;
skos:notation 140 ;
skos:prefLabel "Sectio" .
biod:rank-series a skos:Concept ;
skos:altLabel "ser." ;
skos:inScheme biod:taxon-rank ;
skos:notation 160 ;
skos:prefLabel "Series" .
biod:rank-species a skos:Concept ;
dwct:specificEpithet "aspera" ;
skos:altLabel "sp." ;
skos:inScheme biod:taxon-rank ;
skos:notation 190 ;
skos:prefLabel "Species" .
biod:rank-subclassis a skos:Concept ;
skos:altLabel "subcl." ;
skos:inScheme biod:taxon-rank ;
skos:notation 40 ;
skos:prefLabel "Subclassis" .
biod:rank-subfamilia a skos:Concept ;
skos:altLabel "subfam." ;
skos:inScheme biod:taxon-rank ;
skos:notation 90 ;
skos:prefLabel "Subfamilia" .
biod:rank-subforma a skos:Concept ;
dwct:specificEpithet "baccifera" ;
skos:altLabel "subf." ;
skos:inScheme biod:taxon-rank ;
skos:notation 240 ;
skos:prefLabel "Subforma" .
biod:rank-subgenus a skos:Concept ;
skos:altLabel "subg." ;
skos:inScheme biod:taxon-rank ;
skos:notation 130 ;
skos:prefLabel "Subgenus" .
biod:rank-subordo a skos:Concept ;
skos:altLabel "subordo" ;
skos:inScheme biod:taxon-rank ;
skos:notation 70 ;
skos:prefLabel "Subordo" .
biod:rank-subsectio a skos:Concept ;
skos:altLabel "subsect." ;
skos:inScheme biod:taxon-rank ;
skos:notation 150 ;
skos:prefLabel "Subsectio" .
biod:rank-subseries a skos:Concept ;
skos:altLabel "subser." ;
skos:inScheme biod:taxon-rank ;
skos:notation 170 ;
skos:prefLabel "Subseries" .
biod:rank-subspecies a skos:Concept ;
dwct:specificEpithet "manihot" ;
skos:altLabel "subsp." ;
skos:inScheme biod:taxon-rank ;
skos:notation 200 ;
skos:prefLabel "Subspecies" .
biod:rank-subtribus a skos:Concept ;
skos:altLabel "subtrib." ;
skos:inScheme biod:taxon-rank ;
skos:notation 110 ;
skos:prefLabel "Subtribus" .
biod:rank-subvarietas a skos:Concept ;
dwct:specificEpithet "longifolia" ;
skos:altLabel "subvar." ;
skos:inScheme biod:taxon-rank ;
skos:notation 220 ;
skos:prefLabel "Subvarietas" .
biod:rank-superordo a skos:Concept ;
skos:altLabel "superordo" ;
skos:inScheme biod:taxon-rank ;
skos:notation 50 ;
skos:prefLabel "Superordo" .
biod:rank-superspecies a skos:Concept ;
skos:altLabel "supersp." ;
skos:inScheme biod:taxon-rank ;
skos:notation 180 ;
skos:prefLabel "Superspecies" .
biod:rank-tribus a skos:Concept ;
skos:altLabel "trib." ;
skos:inScheme biod:taxon-rank ;
skos:notation 100 ;
skos:prefLabel "Tribus" .
biod:rank-unranked a skos:Concept ;
dwct:specificEpithet "matrella" ;
skos:altLabel "unranked" ;
skos:inScheme biod:taxon-rank ;
skos:notation 500 ;
skos:prefLabel "unranked" .
biod:rank-varietas a skos:Concept ;
dwct:specificEpithet "rupestris" ;
skos:altLabel "var." ;
skos:inScheme biod:taxon-rank ;
skos:notation 210 ;
skos:prefLabel "Varietas" .
biod:taxon-rank a skos:ConceptScheme ;
dct:description "A vocabulary of taxonomic rank terms in the Australian Plant Name Index (APNI)." ;
dct:isPartOf <http://linked.data.gov.au/dataset/apni> ;
skos:hasTopConcept biod:rank-classis,
biod:rank-division,
biod:rank-familia,
biod:rank-forma,
biod:rank-genus,
biod:rank-infragenus,
biod:rank-infraspecies,
biod:rank-morphological-var,
biod:rank-n-a,
biod:rank-nothovarietas,
biod:rank-ordo,
biod:rank-regio,
biod:rank-regnum,
biod:rank-sectio,
biod:rank-series,
biod:rank-species,
biod:rank-subclassis,
biod:rank-subfamilia,
biod:rank-subforma,
biod:rank-subgenus,
biod:rank-subordo,
biod:rank-subsectio,
biod:rank-subseries,
biod:rank-subspecies,
biod:rank-subtribus,
biod:rank-subvarietas,
biod:rank-superordo,
biod:rank-superspecies,
biod:rank-tribus,
biod:rank-unranked,
biod:rank-varietas ;
skos:prefLabel "Taxon Rank Concepts" .
RDF Example
An example of what the output RDF in Turtle looks for a single taxon Gynura drymophila var. drymophila mapped to APC's Gynura drymophila (F.Muell.) F.G.Davies var. drymophila. This skos:closeMatch
mapping is done by mapping the Corveg's taxon name to the canonicalName
in APC.
@prefix biod: <http://linked.data.gov.au/def/biodiversity/> .
@prefix bioreg: <http://linked.data.gov.au/def/bioregion/> .
@prefix corveg: <http://linked.data.gov.au/dataset/corveg/> .
@prefix corveg-cv: <http://linked.data.gov.au/def/corveg-cv/> .
@prefix corveg-def: <http://linked.data.gov.au/def/corveg/> .
@prefix corveg-dist: <http://linked.data.gov.au/dataset/corveg/disturbance/> .
@prefix corveg-geol: <http://linked.data.gov.au/dataset/corveg/geology/> .
@prefix corveg-location: <http://linked.data.gov.au/dataset/corveg/location/> .
@prefix corveg-rfstruct: <http://linked.data.gov.au/dataset/corveg/rf-structure/> .
@prefix corveg-site: <http://linked.data.gov.au/dataset/corveg/site/> .
@prefix corveg-site-strata: <http://linked.data.gov.au/dataset/corveg/site-strata/> .
@prefix corveg-site-tax-strata: <http://linked.data.gov.au/dataset/corveg/site-tax-strata/> .
@prefix corveg-situation: <http://linked.data.gov.au/dataset/corveg/situation/> .
@prefix corveg-soil: <http://linked.data.gov.au/dataset/corveg/soil/> .
@prefix corveg-soil-text: <http://registry.it.csiro.au/sandbox/soil-data-ie/def/voc/texture/> .
@prefix corveg-struct: <http://linked.data.gov.au/dataset/corveg/structure/> .
@prefix corveg-taxa: <http://linked.data.gov.au/dataset/corveg/taxa/> .
@prefix corveg-v-comm: <http://linked.data.gov.au/dataset/corveg/vege-community/> .
@prefix data: <http://linked.data.gov.au/def/datatype/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dwct: <http://rs.tdwg.org/dwc/terms/> .
@prefix epsg-crs: <http://www.opengis.net/def/crs/epsg/0/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix geosparql: <http://www.opengis.net/ont/geosparql#> .
@prefix locn: <http://www.w3.org/ns/locn#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix plot: <http://linked.data.gov.au/def/plot/> .
@prefix plot-x: <http://linked.data.gov.au/def/plot/x/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sdo: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix ssn-ext: <http://www.w3.org/ns/ssn/ext/> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
biod:apc-taxa a skos:ConceptScheme ;
dct:description "A vocabulary of taxon terms from the Australian Plant Census (APC) dataset. This is a combined collection of Biodiversity's Vascular Plants and AusMoss APC taxonomic data." ;
dct:isPartOf <http://linked.data.gov.au/dataset/apc> ;
skos:hasTopConcept biod:gynura-drymophila-var-drymophila ;
skos:prefLabel "Australian Plant Census Taxa" .
corveg-cv:corveg-taxa a skos:ConceptScheme ;
dct:description "A vocabulary of taxon terms from the CORVEG dataset." ;
dct:isPartOf corveg: ;
skos:hasTopConcept corveg-cv:gynura-drymophila-var-drymophila ;
skos:prefLabel "Corveg Taxa" .
corveg-cv:gynura-drymophila-var-drymophila a dwct:Taxon,
skos:Concept ;
dwct:family "Asteraceae" ;
dwct:genus "Gynura" ;
dwct:species "drymophila" ;
dwct:specificEpithet "drymophila" ;
skos:closeMatch biod:gynura-drymophila-var-drymophila ;
skos:inScheme corveg-cv:corveg-taxa ;
skos:notation 4413 ;
skos:prefLabel "Gynura drymophila var. drymophila" .
biod:gynura-drymophila-var-drymophila a dwct:Taxon,
skos:Concept ;
dct:created "1997-07-16 13:01:13+10"^^xsd:dateTime ;
dct:modified "1997-07-16 13:01:13+10"^^xsd:dateTime ;
dwct:acceptedNameUsage "Gynura drymophila (F.Muell.) F.G.Davies var. drymophila" ;
dwct:acceptedNameUsageID <https://id.biodiversity.org.au/tree/51305933/51257233> ;
dwct:class "Equisetopsida" ;
dwct:family "Asteraceae" ;
dwct:higherClassification "Plantae|Charophyta|Equisetopsida|Magnoliidae|Asteranae|Asterales|Asteraceae|Gynura|drymophila|drymophila" ;
dwct:kingdom "Plantae" ;
dwct:nameAccordingTo "CHAH (2011), Australian Plant Census" ;
dwct:nameAccordingToID <https://id.biodiversity.org.au/reference/apni/49840> ;
dwct:nomenclaturalCode "ICN" ;
dwct:parentNameUsageID <https://id.biodiversity.org.au/tree/51305933/51245515> ;
dwct:scientificName "Gynura drymophila (F.Muell.) F.G.Davies var. drymophila" ;
dwct:scientificNameAuthorship "" ;
dwct:scientificNameID <https://id.biodiversity.org.au/name/apni/104369> ;
dwct:taxonConceptID <https://id.biodiversity.org.au/instance/apni/741250> ;
dwct:taxonID <https://id.biodiversity.org.au/tree/51305933/51257233> ;
dwct:taxonRank "Varietas" ;
dwct:taxonomicStatus "accepted" ;
skos:altLabel "Gynura drymophila var. drymophila" ;
skos:inScheme biod:apc-taxa ;
skos:prefLabel "Gynura drymophila (F.Muell.) F.G.Davies var. drymophila" .
Algorithm
Create the taxon rank controlled vocabulary (as it is a dependent of the taxa controlled vocabulary)
Load APNI CSV
Gather the unique taxon ranks
Clean up some of the string literals (redundant square brackets)
Map to SKOS RDF
Create the taxa controlled vocabulary, forming two
skos:ConceptScheme
, one as concepts from CORVEG, the other from APCCreate file-based indexes using Whoosh library for APC CSV
Load the indexes
Note: once indexes have been created, new CSVs won't be indexed unless the
force_taxa_index
attribute in the YAML file for Taxa table is true.The indexes are created so that searching for the taxon terms is extremely fast compared (otherwise it'd be O(n2) using two loops in time complexity)
For each taxon term in CORVEG, create it as a
skos:Concept
, match itsinfra_rank
to theskos:altLabel
of the taxon rank controlled vocabularyMap it to terms in SKOS, DCT, and Darwin Core Terms
Do a
skos:closeMatch
to the same term in APC, name-matching on itscanonicalName
(with slight fuzzy-matching capabilities of the Whoosh library)Ensure that the
taxonRank
column in APC is also matched to the taxon rank controlled vocabulary, name-matching on theskos:prefLabel
Comments
Many of the columns' values in APC and APNI can/should be turned into a controlled vocabulary. Currently only the taxon rank values in APNI have been turned into a controlled vocabulary (for interoperability between CORVEG taxon rank values and APNI taxon rank values). Creating controlled vocabularies for the other columns should be a good enhancement in the future but for now, the values are just string literals.
Currently 2225 out of 103,108 CORVEG taxon terms are not mapped to external sources (like APC). It may be possible to map these 2225 terms to other Australian government agencies' datasets in the future, but currently having most of the taxon terms mapped to APC is good enough. A possible way to find out what sources have data on these terms can be done by searching the missing terms on Atlas of Living Australia (ALA), and checking where their source information is from. taxa_terms_not_found.txt is a list of taxon terms that were not name-matched to external sources.
The Python library Whoosh was used to create binary (pickle) file-based indexes so that full text search queries were made possible on the large APC and APNI CSV files.
ALA's sources are not updated often enough (some are two years out of date) and they mix and match data between different sources, which is controversial, therefore we are not using ALA as an external source for extra taxonomy information.
Visual Diagram of a Single Taxon
A visual diagram showcasing the above RDF Turtle snippet.