/
Ecoplots ES document structure

Ecoplots ES document structure


Information model introduction

The EcoPlot uses an underlying common information model based on SOSA ontology. All data is mapped to the information model and each feature types and observed properties are controlled vocabularies. In summary, An observer will visit a site to make observations related to Feature of interests.

High-level information model

 

A plot-based site observation (the main object in our model) has the following fields:

  • Observation ID: unique identifier through all observations in the system. Not used to filter by

  • Dataset: obs. belongs to 1 specific dataset. used to filter by in facets

  • Site ID: obs. was taken within an ecological site. used to filter by in facets

  • Site Visit ID: obs. was taken during a specific visit to the site. used to filter by in facets

  • Site Visit Date: when the visit to the site happened. used to filter by in facets

  • Feature of interest ID: obs. belongs to a specific feature of interest (aka foi). Not used to filter by

  • Feature of interest type: obs. belongs to a foi which has a type. used to filter by in facets

  • Observed property: the observed/measured property. used to filter by in facets

  • Result: the result of the observation / measured value (can be multiple data types). WILL be used to filter by its value

  • Unit of measurement: unit of the measured result, if apply. Not used to filter by

  • Result time: when the observation was taken. Not used to filter by

  • Used procedure: Which method or procedure was used to make the obs. used to filter by in facets

  • Used instrument: Which specific instrument was used to make the obs., if apply. Not used to filter by

  • Regions: the site where the obs. was taken belongs to a geographical region around Australia. There are multiple regions types (States, Local government areas, bioregions, etc.) so an obs. can belong to 1 to many regions (but exactly to 1 region per region type). used to filter by in facets

  • Site Attributes: the site/plot can have different attributes (e.g. dimensions, shape, description…). A site has many observations, so every attribute will be duplicated in every document (observation). WILL be used to filter by its value

  • Site Visit Attributes: During a visit to the site, many observations are made, so every attribute will be duplicated in every document (observation). WILL be used to filter by its value

  • FOI attributes: A FOI has many observations, so every attribute will be duplicated in every document (observation). WILL be used to filter by its value

  • Observation attributes: An observation can have multiple attributes. WILL be used to filter by its value

  • Instrument attributes: An instrument can have multiple attributes. WILL be used to filter by its value

An attribute has the following fields:

  • Attribute ID: unique identifier through all attributes in the system. Not used to filter by

  • Attribute: specific attribute (e.g. type of soil observation, plot dimensions, scientific species name…) used to filter by in facets

  • Value: value of the attribute. WILL be used to filter by its value

  • Unit of measurement: unit of the measured result, if apply. Not used to filter by

 

"id":"http://linked.data.gov.au/dataset/ausplots/soil_characterisation-obs-colour_when_moist-119526", "dataset": "Ausplots Rangelands" "feature_id": "http://linked.data.gov.au/dataset/ausplots/id-119526" "feature_type": "soil profile" "foi_attributes":[ { "attribute":"soil depth max" "id":"http://linked.data.gov.au/dataset/ausplots/soil_characterisation-attr-lower_depth-119526" "unit_of_measure":"http://qudt.org/vocab/unit/M" "value": 0.22 }, { "attribute":"soil depth min" "id":"http://linked.data.gov.au/dataset/ausplots/soil_characterisation-attr-upper_depth-119526" "unit_of_measure":"http://qudt.org/vocab/unit/M" "value": 0.07 }, { "attribute": "type of soil observation" "id":"http://linked.data.gov.au/dataset/ausplots/soil_characterisation-attr-soil_observation_type-119526" "unit_of_measure": null "value": "http://linked.data.gov.au/def/tern-cv/e2505a19-b277-4f83-b146-bc9cd9c691a0" } ], "instr_attributes":[], "instrument_type": "munsell soil colour chart" "obs_attributes":[], "observed_property":"wet soil colour" "regions":[ { "dataset": "States and territories" "label":"Northern Territory" }, { "dataset":"Subregions" "label":"McArthur" }, { "dataset":"Local government areas" "label":"Roper Gulf (S)" }, { "dataset":"Bioregions" "label":"Gulf Fall and Uplands" }, { "dataset":"WWF ecoregions" "label":"Carpentaria tropical savanna" }, { "dataset": "Terrestrial CAPAD regions" "label":"Limmen" } ], "result_time": "2012-06-12T00:00:00Z" "result_value": "7.5YR56" "site_id":"NTAGFU0026" "site_visit_date": "2012-06-12T00:00:00Z" "site_visit_id": "53673" "unit_of_measure":null "used_procedure":"Soil characterisation to 1 m+"

All this information is showed in the Ecoplots-UI (https://ecoplots-test.tern.org.au/search) as rows in a table. The information showed in this datagrid is pulled from ES though an API, using the filters selected by the user in the facets section.

Visual graph example

The above diagram would be translated into the following table (very simplified):

dataset

site_id

feature_id

foi_attributes

observation

obs_attributes

dataset

site_id

feature_id

foi_attributes

observation

obs_attributes

Ausplots

site_id-1

plant-pop-123456

[attr-species_name-123456-]

obs-hits-123456-2

[]

Ausplots

site_id-1

plant-pop-123456

[attr-species_name-123456-]

obs-basal_area-123456-2

[attr-point_id-obs-123456-1]

Notice the main object is the different “observations”, which may have observation-attributes (these attr. are specific and unique to an observation).
The rest of items showed in the graph are also embedded in the same document (dataset, site, site_visit, feature_id…), which means that, for example, many documents have the same “feature_id” (e.g. plant-pop-123456) and all attributes for that feature_id instance are duplicated towards all documents which are connected to the same feature_id.

Faceted search

Faceted search (basic functionality)

As introduced above, an observation has many fields, some of which we want to filter by though the faceted search on the left-sided menu.

  • Filter by region_type and regions:
    Allows the user to filter by one “region_type” at a time, and then by 1 to many regions of the specified region_type:

Currently 6-7 values, not significantly extendable in the future (maybe adding few more region_types)

Around hundred or few hundreds of different options in most of “region_types”.

  • Filter by dataset -> site -> site_visit:
    Firstly it allows the user to filter by a specific dataset, once selected, a new facet with all site_ids available is displayed, and once selected the site_visit_id facets is showed.

Currently 1 value, less than a hundred in the future.

Each dataset may have tens of thousands of sites.

Every site usually has 1-3 site visits, but potentially might be more.

  • Filter by Feature of Interest (FOI) → FOI attributes: a required feature in the future is to allow the user to filter by the “value” of the attributes.
    E.g. The user selects an attribute (e.g. reliability) and then it can filter by the value. If the value is a categorical value (high, medium, low) we would show a new facet with the different options. If not, a new input would allow the user to introduce the desired value (e.g. vegetation height > 1.5m).

Less than a hundred of options.

Few hundreds of options within all future datasets.

  • Filter by parameter (aka Observed property) → Observation attributes:
    Same expected behaviour as Feature_type/attributes filter. We would like to allow the user to filter by attribute values.

Few hundreds of options.

Less than hundred of options within all future datasets.

  • Filter by site_visit_date:
    Allows the user to fix a date range to filter observations whose visit_date is between that range.

Not implemented yet
  • Filter by site_attributes

  • Filter by site_visit attributes

    Same expected behaviour as FOI and observation attributes.

Faceted search (extended functionality)

Nested sites

In the current version of the UI / ES document structure, each observation only has one site, but our initial data model allows (and in practice, it happens) to have nested sites:

This means that we would want to allow the user to filter by all the levels of the site hierarchy:

  1. Firstly, we show all the options for top level sites (plot1 in the example).

  2. If a top level site is selected, then show a new facet with the next level (transect)

  3. And so on…

More details and proposed solution: Mapping and querying nested sites (or any other nested field) in Elasticsearch

 

Filtering by attributes value (categorical values)

Filter by the value of attributes is a required functionality to be implemented in the UI, this means:

  • Categorical values: Once the user has selected a specific attribute, a new facet (combobox) with all the possible values/results must be shown in the UI. This would then allow the user to select a concrete value, so the search will be narrowed to those observations whose have a specific attribute with a specific value.
    E.g. The user selects an attribute (e.g. reliability) and then it can filter by the value. If the value is a categorical value (high, medium, low) we would show a new facet with the different options.

Possible categorical values of the “type of soil observation” attr.

{ "query": { "bool": { "filter": [ { "terms": { "feature_type.value": [ "http://linked.data.gov.au/def/tern-cv/80c39b95-0912-4267-bb66-2fa081683723" ] } } ] } }, "aggs": { "nested_agg": { "nested": { "path": "foi_attributes" }, "aggs": { "filtering": { "filter": { "term": { "foi_attributes.attribute.value": "http://linked.data.gov.au/def/tern-cv/8e7dfefe-e3ee-40ac-9024-ede48922bee6" } }, "aggs": { "value": { "terms": { "field": "foi_attributes.value.value.keyword", "size": 1000 } } } } } } }, "size": 0 }

Result: (2 possible values, “soil pit” and “auger boring” in the example)

{ "took": 43, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 10000, "relation": "gte" }, "max_score": null, "hits": [] }, "aggregations": { "nested_agg": { "doc_count": 120620, "filtering": { "doc_count": 26175, "value": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "http://linked.data.gov.au/def/tern-cv/e2505a19-b277-4f83-b146-bc9cd9c691a0", "doc_count": 25735 }, { "key": "http://linked.data.gov.au/def/tern-cv/2747e2d9-04b7-4115-8f4b-ca0264eb9ad2", "doc_count": 440 } ] } } } } }

 

  • Not categorical values: If the selected attribute is not a CV, a new input would allow the user to introduce the desired value (e.g. vegetation height > 1.5m). This could include range queries with numbers and full text search with string, for example.

ES implementation

Document examples can be easily extracted from ES:
https://es-test.tern.org.au/plotdata_ecoplots-data/_search

Document mapping

Most fields of an observation (dataset, site, site_visit, feature_type, etc.) are kind of key-value fields, what we call “label” and “value”. “Label” is the human readable label of an item, which also has a “value” that usually contains a “URI”.

All filters and aggregations made in the UI use the value field. One query for getting the label fields is executed just once and is stored/cached for label-value mapping.

 

ES queries

Many query examples have been grouped into a Postman REST client collection (can also be imported into Insomnia REST client):

Data query

Example of query for reading observations (documents) that complies with user’s selection.

 

FACETS queries / list of aggregations

  • Simple aggregations: dataset, site_id, site_visit_id, feature_type, observed_property/parameter…

  • Nested aggregations: region_type, regions, foi_attributes, obs_attributes (future site_attributes, etc.)…

  • Composed aggregations: in order to retrieve the labels of more than 10.000 sites, we need to perform a series of composite_aggregations using the keyword “after”, as simple aggregations are limited to a max of 10.000 items per bucket.

Example of query for aggregating the possible values of a specific facet (it also complies with user’s selection).
Query used in the /facet API endpoint (based on user’s selection, it get all the possible site_id values):

Example of aggregating a nested field (region_type):
Notice that: region.dataset in ES = region_type

Labels aggregations

Notice that “labels” queries are slow, but they are only executed once and then stored in browser store, so the performance problem does not lie on them.

 

Ecoplots API

Ecoplots UI renders and triggers new search and facets using an Ecoplots API (https://ecoplots-test.tern.org.au/api/v1.0/ui):

The core endpoints used though the UI are:

  • /data: based on user’s selection in the facets menu, it requests to the API a paginated number of observations (documents in ES, displayed as rows in the datagrid).

  • /facet: based on user’s selection in the facets menu, it requests the lists of options that fit the filtering to populate all the facets (combo-boxes in the UI)

Using the above parameters of the request, the API generates and executes 1 or many ES queries and respond with the response of ES, which is processed in the UI.

  • Data endpoint is converted into only 1 ES query:

  • Facet endpoint is converted into many ES queries:

The number of internal queries to ES increases when some facets are already selected, e.g. once the user has selected a dataset, a new request is triggered and in this occasion also an aggregation for getting site_ids is executed.
This behaviour can be easily seen though the UI.

 

Add label

Related content

Summary of recommendations/changes in Mapping
Summary of recommendations/changes in Mapping
Read with this
EcoPlots GeoJSON output Description
EcoPlots GeoJSON output Description
More like this
Mapping and querying nested sites (or any other nested field) in Elasticsearch
Mapping and querying nested sites (or any other nested field) in Elasticsearch
Read with this
EcoPlots CSV output Description
EcoPlots CSV output Description
More like this
EcoPlots API - creating search queries
EcoPlots API - creating search queries
More like this