Taxa Table

Source: CORVEG.TAXA

Example row:

LINNAEAN_ID

TAXON

FAMILY

GENUS

SPECIES

INFRA_RANK

INFRA

BC_NUMBER

LINNAEAN_ID

TAXON

FAMILY

GENUS

SPECIES

INFRA_RANK

INFRA

BC_NUMBER

4413

Gynura drymophila var. drymophila

Asteraceae

Gynura

drymophila

var.

drymophila

3525

Interpretation in SKOS

  1. Create a skos:ConceptScheme for taxa skos:Concept in both the CORVEG and the Australian Plant Census (APC).

  2. Link between the mapped concepts in CORVEG and the APC by skos:closeMatch.

Background

The Australian Plant Census (APC) is considered to be the authoritative source of data on vascular plants and mosses in Australia, provided by Biodiversity. They deliver the Australian Plant Name Index (APNI) as a dataset to tie in with the APC.

Outcome

The terms in APC and CORVEG have been modelled using SKOS and the Darwin Core terms, and a relationship has been created between the two skos:ConceptScheme using skos:closeMatch.

The result of the ETL process for Taxa is three new controlled vocabularies:

  • the set of terms from the CORVEG Taxa table

  • the set of terms name-matched from the CORVEG Taxa table to the APC terms

  • a set of taxon rank terms used in the APC and APNI

Further Information

The mappings for APNI and APC terms have been documented in https://www.anbg.gov.au/ibis25/display/NSL/Data+Extracts+-+Darwin+Core by Biodiversity. The mappings have been created in taxa.py for TERN's use since the Biodiversity's Linked Data APIs are currently down (as of March, 2019).

Due to CORVEG’s Taxa table's column infra_rank using the abbreviated form as its value, a controlled vocabulary had to be created to marry the terms up with APNI (as APNI uses the non-abbreviated form for its taxon rank value). For example, CORVEG’s value is var. while APNI's value is Varietas, but they're the same term.

RDF Example of the Taxon Rank Vocabulary

@prefix biod: <http://linked.data.gov.au/def/biodiversity/> . @prefix bioreg: <http://linked.data.gov.au/def/bioregion/> . @prefix corveg: <http://linked.data.gov.au/dataset/corveg/> . @prefix corveg-cv: <http://linked.data.gov.au/def/corveg-cv/> . @prefix corveg-def: <http://linked.data.gov.au/def/corveg/> . @prefix corveg-dist: <http://linked.data.gov.au/dataset/corveg/disturbance/> . @prefix corveg-geol: <http://linked.data.gov.au/dataset/corveg/geology/> . @prefix corveg-location: <http://linked.data.gov.au/dataset/corveg/location/> . @prefix corveg-rfstruct: <http://linked.data.gov.au/dataset/corveg/rf-structure/> . @prefix corveg-site: <http://linked.data.gov.au/dataset/corveg/site/> . @prefix corveg-site-strata: <http://linked.data.gov.au/dataset/corveg/site-strata/> . @prefix corveg-site-tax-strata: <http://linked.data.gov.au/dataset/corveg/site-tax-strata/> . @prefix corveg-situation: <http://linked.data.gov.au/dataset/corveg/situation/> . @prefix corveg-soil: <http://linked.data.gov.au/dataset/corveg/soil/> . @prefix corveg-soil-text: <http://registry.it.csiro.au/sandbox/soil-data-ie/def/voc/texture/> . @prefix corveg-struct: <http://linked.data.gov.au/dataset/corveg/structure/> . @prefix corveg-taxa: <http://linked.data.gov.au/dataset/corveg/taxa/> . @prefix corveg-v-comm: <http://linked.data.gov.au/dataset/corveg/vege-community/> . @prefix data: <http://linked.data.gov.au/def/datatype/> . @prefix dct: <http://purl.org/dc/terms/> . @prefix dwct: <http://rs.tdwg.org/dwc/terms/> . @prefix epsg-crs: <http://www.opengis.net/def/crs/epsg/0/> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . @prefix geosparql: <http://www.opengis.net/ont/geosparql#> . @prefix locn: <http://www.w3.org/ns/locn#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix plot: <http://linked.data.gov.au/def/plot/> . @prefix plot-x: <http://linked.data.gov.au/def/plot/x/> . @prefix prov: <http://www.w3.org/ns/prov#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix sdo: <http://schema.org/> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix sosa: <http://www.w3.org/ns/sosa/> . @prefix ssn-ext: <http://www.w3.org/ns/ssn/ext/> . @prefix time: <http://www.w3.org/2006/time#> . @prefix unit: <http://qudt.org/vocab/unit/> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . biod:rank-classis a skos:Concept ; skos:altLabel "cl." ; skos:inScheme biod:taxon-rank ; skos:notation 30 ; skos:prefLabel "Classis" . biod:rank-division a skos:Concept ; skos:altLabel "div." ; skos:inScheme biod:taxon-rank ; skos:notation 20 ; skos:prefLabel "Division" . biod:rank-familia a skos:Concept ; skos:altLabel "fam." ; skos:inScheme biod:taxon-rank ; skos:notation 80 ; skos:prefLabel "Familia" . biod:rank-forma a skos:Concept ; dwct:specificEpithet "clypearia" ; skos:altLabel "f." ; skos:inScheme biod:taxon-rank ; skos:notation 230 ; skos:prefLabel "Forma" . biod:rank-genus a skos:Concept ; skos:altLabel "gen." ; skos:inScheme biod:taxon-rank ; skos:notation 120 ; skos:prefLabel "Genus" . biod:rank-infragenus a skos:Concept ; skos:altLabel "infragenus" ; skos:inScheme biod:taxon-rank ; skos:notation 500 ; skos:prefLabel "infragenus" . biod:rank-infraspecies a skos:Concept ; dwct:specificEpithet "calcitrapa" ; skos:altLabel "infrasp." ; skos:inScheme biod:taxon-rank ; skos:notation 500 ; skos:prefLabel "infraspecies" . biod:rank-morphological-var a skos:Concept ; dwct:specificEpithet "tasmanica" ; skos:altLabel "morph." ; skos:inScheme biod:taxon-rank ; skos:notation 260 ; skos:prefLabel "morphological var." . biod:rank-n-a a skos:Concept ; skos:altLabel "n/a" ; skos:inScheme biod:taxon-rank ; skos:notation 500 ; skos:prefLabel "n/a" . biod:rank-nothovarietas a skos:Concept ; dwct:specificEpithet "bontei" ; skos:altLabel "nothovar." ; skos:inScheme biod:taxon-rank ; skos:notation 210 ; skos:prefLabel "Nothovarietas" . biod:rank-ordo a skos:Concept ; skos:altLabel "ordo" ; skos:inScheme biod:taxon-rank ; skos:notation 60 ; skos:prefLabel "Ordo" . biod:rank-regio a skos:Concept ; skos:altLabel "regio" ; skos:inScheme biod:taxon-rank ; skos:notation 8 ; skos:prefLabel "Regio" . biod:rank-regnum a skos:Concept ; skos:altLabel "reg." ; skos:inScheme biod:taxon-rank ; skos:notation 10 ; skos:prefLabel "Regnum" . biod:rank-sectio a skos:Concept ; skos:altLabel "sect." ; skos:inScheme biod:taxon-rank ; skos:notation 140 ; skos:prefLabel "Sectio" . biod:rank-series a skos:Concept ; skos:altLabel "ser." ; skos:inScheme biod:taxon-rank ; skos:notation 160 ; skos:prefLabel "Series" . biod:rank-species a skos:Concept ; dwct:specificEpithet "aspera" ; skos:altLabel "sp." ; skos:inScheme biod:taxon-rank ; skos:notation 190 ; skos:prefLabel "Species" . biod:rank-subclassis a skos:Concept ; skos:altLabel "subcl." ; skos:inScheme biod:taxon-rank ; skos:notation 40 ; skos:prefLabel "Subclassis" . biod:rank-subfamilia a skos:Concept ; skos:altLabel "subfam." ; skos:inScheme biod:taxon-rank ; skos:notation 90 ; skos:prefLabel "Subfamilia" . biod:rank-subforma a skos:Concept ; dwct:specificEpithet "baccifera" ; skos:altLabel "subf." ; skos:inScheme biod:taxon-rank ; skos:notation 240 ; skos:prefLabel "Subforma" . biod:rank-subgenus a skos:Concept ; skos:altLabel "subg." ; skos:inScheme biod:taxon-rank ; skos:notation 130 ; skos:prefLabel "Subgenus" . biod:rank-subordo a skos:Concept ; skos:altLabel "subordo" ; skos:inScheme biod:taxon-rank ; skos:notation 70 ; skos:prefLabel "Subordo" . biod:rank-subsectio a skos:Concept ; skos:altLabel "subsect." ; skos:inScheme biod:taxon-rank ; skos:notation 150 ; skos:prefLabel "Subsectio" . biod:rank-subseries a skos:Concept ; skos:altLabel "subser." ; skos:inScheme biod:taxon-rank ; skos:notation 170 ; skos:prefLabel "Subseries" . biod:rank-subspecies a skos:Concept ; dwct:specificEpithet "manihot" ; skos:altLabel "subsp." ; skos:inScheme biod:taxon-rank ; skos:notation 200 ; skos:prefLabel "Subspecies" . biod:rank-subtribus a skos:Concept ; skos:altLabel "subtrib." ; skos:inScheme biod:taxon-rank ; skos:notation 110 ; skos:prefLabel "Subtribus" . biod:rank-subvarietas a skos:Concept ; dwct:specificEpithet "longifolia" ; skos:altLabel "subvar." ; skos:inScheme biod:taxon-rank ; skos:notation 220 ; skos:prefLabel "Subvarietas" . biod:rank-superordo a skos:Concept ; skos:altLabel "superordo" ; skos:inScheme biod:taxon-rank ; skos:notation 50 ; skos:prefLabel "Superordo" . biod:rank-superspecies a skos:Concept ; skos:altLabel "supersp." ; skos:inScheme biod:taxon-rank ; skos:notation 180 ; skos:prefLabel "Superspecies" . biod:rank-tribus a skos:Concept ; skos:altLabel "trib." ; skos:inScheme biod:taxon-rank ; skos:notation 100 ; skos:prefLabel "Tribus" . biod:rank-unranked a skos:Concept ; dwct:specificEpithet "matrella" ; skos:altLabel "unranked" ; skos:inScheme biod:taxon-rank ; skos:notation 500 ; skos:prefLabel "unranked" . biod:rank-varietas a skos:Concept ; dwct:specificEpithet "rupestris" ; skos:altLabel "var." ; skos:inScheme biod:taxon-rank ; skos:notation 210 ; skos:prefLabel "Varietas" . biod:taxon-rank a skos:ConceptScheme ; dct:description "A vocabulary of taxonomic rank terms in the Australian Plant Name Index (APNI)." ; dct:isPartOf <http://linked.data.gov.au/dataset/apni> ; skos:hasTopConcept biod:rank-classis, biod:rank-division, biod:rank-familia, biod:rank-forma, biod:rank-genus, biod:rank-infragenus, biod:rank-infraspecies, biod:rank-morphological-var, biod:rank-n-a, biod:rank-nothovarietas, biod:rank-ordo, biod:rank-regio, biod:rank-regnum, biod:rank-sectio, biod:rank-series, biod:rank-species, biod:rank-subclassis, biod:rank-subfamilia, biod:rank-subforma, biod:rank-subgenus, biod:rank-subordo, biod:rank-subsectio, biod:rank-subseries, biod:rank-subspecies, biod:rank-subtribus, biod:rank-subvarietas, biod:rank-superordo, biod:rank-superspecies, biod:rank-tribus, biod:rank-unranked, biod:rank-varietas ; skos:prefLabel "Taxon Rank Concepts" .

RDF Example

An example of what the output RDF in Turtle looks for a single taxon Gynura drymophila var. drymophila mapped to APC's Gynura drymophila (F.Muell.) F.G.Davies var. drymophila. This skos:closeMatch mapping is done by mapping the Corveg's taxon name to the canonicalName in APC.

@prefix biod: <http://linked.data.gov.au/def/biodiversity/> . @prefix bioreg: <http://linked.data.gov.au/def/bioregion/> . @prefix corveg: <http://linked.data.gov.au/dataset/corveg/> . @prefix corveg-cv: <http://linked.data.gov.au/def/corveg-cv/> . @prefix corveg-def: <http://linked.data.gov.au/def/corveg/> . @prefix corveg-dist: <http://linked.data.gov.au/dataset/corveg/disturbance/> . @prefix corveg-geol: <http://linked.data.gov.au/dataset/corveg/geology/> . @prefix corveg-location: <http://linked.data.gov.au/dataset/corveg/location/> . @prefix corveg-rfstruct: <http://linked.data.gov.au/dataset/corveg/rf-structure/> . @prefix corveg-site: <http://linked.data.gov.au/dataset/corveg/site/> . @prefix corveg-site-strata: <http://linked.data.gov.au/dataset/corveg/site-strata/> . @prefix corveg-site-tax-strata: <http://linked.data.gov.au/dataset/corveg/site-tax-strata/> . @prefix corveg-situation: <http://linked.data.gov.au/dataset/corveg/situation/> . @prefix corveg-soil: <http://linked.data.gov.au/dataset/corveg/soil/> . @prefix corveg-soil-text: <http://registry.it.csiro.au/sandbox/soil-data-ie/def/voc/texture/> . @prefix corveg-struct: <http://linked.data.gov.au/dataset/corveg/structure/> . @prefix corveg-taxa: <http://linked.data.gov.au/dataset/corveg/taxa/> . @prefix corveg-v-comm: <http://linked.data.gov.au/dataset/corveg/vege-community/> . @prefix data: <http://linked.data.gov.au/def/datatype/> . @prefix dct: <http://purl.org/dc/terms/> . @prefix dwct: <http://rs.tdwg.org/dwc/terms/> . @prefix epsg-crs: <http://www.opengis.net/def/crs/epsg/0/> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . @prefix geosparql: <http://www.opengis.net/ont/geosparql#> . @prefix locn: <http://www.w3.org/ns/locn#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix plot: <http://linked.data.gov.au/def/plot/> . @prefix plot-x: <http://linked.data.gov.au/def/plot/x/> . @prefix prov: <http://www.w3.org/ns/prov#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix sdo: <http://schema.org/> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix sosa: <http://www.w3.org/ns/sosa/> . @prefix ssn-ext: <http://www.w3.org/ns/ssn/ext/> . @prefix time: <http://www.w3.org/2006/time#> . @prefix unit: <http://qudt.org/vocab/unit/> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . biod:apc-taxa a skos:ConceptScheme ; dct:description "A vocabulary of taxon terms from the Australian Plant Census (APC) dataset. This is a combined collection of Biodiversity's Vascular Plants and AusMoss APC taxonomic data." ; dct:isPartOf <http://linked.data.gov.au/dataset/apc> ; skos:hasTopConcept biod:gynura-drymophila-var-drymophila ; skos:prefLabel "Australian Plant Census Taxa" . corveg-cv:corveg-taxa a skos:ConceptScheme ; dct:description "A vocabulary of taxon terms from the CORVEG dataset." ; dct:isPartOf corveg: ; skos:hasTopConcept corveg-cv:gynura-drymophila-var-drymophila ; skos:prefLabel "Corveg Taxa" . corveg-cv:gynura-drymophila-var-drymophila a dwct:Taxon, skos:Concept ; dwct:family "Asteraceae" ; dwct:genus "Gynura" ; dwct:species "drymophila" ; dwct:specificEpithet "drymophila" ; skos:closeMatch biod:gynura-drymophila-var-drymophila ; skos:inScheme corveg-cv:corveg-taxa ; skos:notation 4413 ; skos:prefLabel "Gynura drymophila var. drymophila" . biod:gynura-drymophila-var-drymophila a dwct:Taxon, skos:Concept ; dct:created "1997-07-16 13:01:13+10"^^xsd:dateTime ; dct:modified "1997-07-16 13:01:13+10"^^xsd:dateTime ; dwct:acceptedNameUsage "Gynura drymophila (F.Muell.) F.G.Davies var. drymophila" ; dwct:acceptedNameUsageID <https://id.biodiversity.org.au/tree/51305933/51257233> ; dwct:class "Equisetopsida" ; dwct:family "Asteraceae" ; dwct:higherClassification "Plantae|Charophyta|Equisetopsida|Magnoliidae|Asteranae|Asterales|Asteraceae|Gynura|drymophila|drymophila" ; dwct:kingdom "Plantae" ; dwct:nameAccordingTo "CHAH (2011), Australian Plant Census" ; dwct:nameAccordingToID <https://id.biodiversity.org.au/reference/apni/49840> ; dwct:nomenclaturalCode "ICN" ; dwct:parentNameUsageID <https://id.biodiversity.org.au/tree/51305933/51245515> ; dwct:scientificName "Gynura drymophila (F.Muell.) F.G.Davies var. drymophila" ; dwct:scientificNameAuthorship "" ; dwct:scientificNameID <https://id.biodiversity.org.au/name/apni/104369> ; dwct:taxonConceptID <https://id.biodiversity.org.au/instance/apni/741250> ; dwct:taxonID <https://id.biodiversity.org.au/tree/51305933/51257233> ; dwct:taxonRank "Varietas" ; dwct:taxonomicStatus "accepted" ; skos:altLabel "Gynura drymophila var. drymophila" ; skos:inScheme biod:apc-taxa ; skos:prefLabel "Gynura drymophila (F.Muell.) F.G.Davies var. drymophila" .

Algorithm

  • Create the taxon rank controlled vocabulary (as it is a dependent of the taxa controlled vocabulary)

    • Load APNI CSV

    • Gather the unique taxon ranks

    • Clean up some of the string literals (redundant square brackets)

    • Map to SKOS RDF

  • Create the taxa controlled vocabulary, forming two skos:ConceptScheme, one as concepts from CORVEG, the other from APC

    • Create file-based indexes using Whoosh library for APC CSV

    • Load the indexes

      • Note: once indexes have been created, new CSVs won't be indexed unless the force_taxa_index attribute in the YAML file for Taxa table is true.

      • The indexes are created so that searching for the taxon terms is extremely fast compared (otherwise it'd be O(n2) using two loops in time complexity)

    • For each taxon term in CORVEG, create it as a skos:Concept, match its infra_rank to the skos:altLabel of the taxon rank controlled vocabulary

      • Map it to terms in SKOS, DCT, and Darwin Core Terms

    • Do a skos:closeMatch to the same term in APC, name-matching on its canonicalName (with slight fuzzy-matching capabilities of the Whoosh library)

    • Ensure that the taxonRank column in APC is also matched to the taxon rank controlled vocabulary, name-matching on the skos:prefLabel

Comments

Many of the columns' values in APC and APNI can/should be turned into a controlled vocabulary. Currently only the taxon rank values in APNI have been turned into a controlled vocabulary (for interoperability between CORVEG taxon rank values and APNI taxon rank values). Creating controlled vocabularies for the other columns should be a good enhancement in the future but for now, the values are just string literals.

Currently 2225 out of 103,108 CORVEG taxon terms are not mapped to external sources (like APC). It may be possible to map these 2225 terms to other Australian government agencies' datasets in the future, but currently having most of the taxon terms mapped to APC is good enough. A possible way to find out what sources have data on these terms can be done by searching the missing terms on Atlas of Living Australia (ALA), and checking where their source information is from. taxa_terms_not_found.txt is a list of taxon terms that were not name-matched to external sources.

The Python library Whoosh was used to create binary (pickle) file-based indexes so that full text search queries were made possible on the large APC and APNI CSV files.

ALA's sources are not updated often enough (some are two years out of date) and they mix and match data between different sources, which is controversial, therefore we are not using ALA as an external source for extra taxonomy information.

Visual Diagram of a Single Taxon

A visual diagram showcasing the above RDF Turtle snippet.

Action Items