Information model introduction
The EcoPlot uses an underlying common information model based on SOSA ontology. All data is mapped to the information model and each feature types and observed properties are controlled vocabularies. In summary, An observer will visit a site to make observations related to Feature of interests.
A plot-based site observation (the main object in our model) has the following fields:
Observation ID: unique identifier through all observations in the system. Not used to filter by
Dataset: obs. belongs to 1 specific dataset. used to filter by in facets
Site ID: obs. was taken within an ecological site. used to filter by in facets
Site Visit ID: obs. was taken during a specific visit to the site. used to filter by in facets
Site Visit Date: when the visit to the site happened. used to filter by in facets
Feature of interest ID: obs. belongs to a specific feature of interest (aka foi). Not used to filter by
Feature of interest type: obs. belongs to a foi which has a type. used to filter by in facets
Observed property: the observed/measured property. used to filter by in facets
Result: the result of the observation / measured value (can be multiple data types). WILL be used to filter by its value
Unit of measurement: unit of the measured result, if apply. Not used to filter by
Result time: when the observation was taken. Not used to filter by
Used procedure: Which method or procedure was used to make the obs. used to filter by in facets
Used instrument: Which specific instrument was used to make the obs., if apply. Not used to filter by
Regions: the site where the obs. was taken belongs to a geographical region around Australia. There are multiple regions types (States, Local government areas, bioregions, etc.) so an obs. can belong to 1 to many regions (but exactly to 1 region per region type). used to filter by in facets
Site Attributes: the site/plot can have different attributes (e.g. dimensions, shape, description…). A site has many observations, so every attribute will be duplicated in every document (observation). WILL be used to filter by its value
Site Visit Attributes: During a visit to the site, many observations are made, so every attribute will be duplicated in every document (observation). WILL be used to filter by its value
FOI attributes: A FOI has many observations, so every attribute will be duplicated in every document (observation). WILL be used to filter by its value
Observation attributes: An observation can have multiple attributes. WILL be used to filter by its value
Instrument attributes: An instrument can have multiple attributes. WILL be used to filter by its value
An attribute has the following fields:
Attribute ID: unique identifier through all attributes in the system. Not used to filter by
Attribute: specific attribute (e.g. type of soil observation, plot dimensions, scientific species name…) used to filter by in facets
Value: value of the attribute. WILL be used to filter by its value
Unit of measurement: unit of the measured result, if apply. Not used to filter by
All this information is showed in the Ecoplots-UI (https://ecoplots-test.tern.org.au/search) as rows in a table. The information showed in this datagrid is pulled from ES though an API, using the filters selected by the user in the facets section.
Visual graph example
The above diagram would be translated into the following table (very simplified):
dataset | site_id | feature_id | foi_attributes | observation | obs_attributes |
---|---|---|---|---|---|
Ausplots | site_id-1 | plant-pop-123456 | [attr-species_name-123456-] | obs-hits-123456-2 | [] |
Ausplots | site_id-1 | plant-pop-123456 | [attr-species_name-123456-] | obs-basal_area-123456-2 | [attr-point_id-obs-123456-1] |
Notice the main object is the different “observations”, which may have observation-attributes (these attr. are specific and unique to an observation).
The rest of items showed in the graph are also embedded in the same document (dataset, site, site_visit, feature_id…), which means that, for example, many documents have the same “feature_id” (e.g. plant-pop-123456) and all attributes for that feature_id instance are duplicated towards all documents which are connected to the same feature_id.
Faceted search
Faceted search (basic functionality)
As introduced above, an observation has many fields, some of which we want to filter by though the faceted search on the left-sided menu.
Filter by region_type and regions:
Allows the user to filter by one “region_type” at a time, and then by 1 to many regions of the specified region_type:
Currently 6-7 values, not significantly extendable in the future (maybe adding few more region_types)
Around hundred or few hundreds of different options in most of “region_types”.
Filter by dataset -> site -> site_visit:
Firstly it allows the user to filter by a specific dataset, once selected, a new facet with all site_ids available is displayed, and once selected the site_visit_id facets is showed.
Currently 1 value, less than a hundred in the future.
Each dataset may have tens of thousands of sites.
Every site usually has 1-3 site visits, but potentially might be more.
Filter by Feature of Interest (FOI) → FOI attributes: a required feature in the future is to allow the user to filter by the “value” of the attributes.
E.g. The user selects an attribute (e.g. reliability) and then it can filter by the value. If the value is a categorical value (high, medium, low) we would show a new facet with the different options. If not, a new input would allow the user to introduce the desired value (e.g. vegetation height > 1.5m).
Less than a hundred of options.
Few hundreds of options within all future datasets.
Filter by parameter (aka Observed property) → Observation attributes:
Same expected behaviour as Feature_type/attributes filter. We would like to allow the user to filter by attribute values.
Few hundreds of options.
Less than hundred of options within all future datasets.
Filter by site_visit_date:
Allows the user to fix a date range to filter observations whose visit_date is between that range.
Not implemented yet
Filter by site_attributes
Filter by site_visit attributes
Same expected behaviour as FOI and observation attributes.
Faceted search (extended functionality)
Nested sites
In the current version of the UI / ES document structure, each observation only has one site, but our initial data model allows (and in practice, it happens) to have nested sites:
This means that we would want to allow the user to filter by all the levels of the site hierarchy:
Firstly, we show all the options for top level sites (plot1 in the example).
If a top level site is selected, then show a new facet with the next level (transect)
And so on…
More details and proposed solution: https://ternaus.atlassian.net/wiki/spaces/EE/pages/2226520950
Filtering by attributes value (categorical values)
Filter by the value of attributes is a required functionality to be implemented in the UI, this means:
Categorical values: Once the user has selected a specific attribute, a new facet (combobox) with all the possible values/results must be shown in the UI. This would then allow the user to select a concrete value, so the search will be narrowed to those observations whose have a specific attribute with a specific value.
E.g. The user selects an attribute (e.g. reliability) and then it can filter by the value. If the value is a categorical value (high, medium, low) we would show a new facet with the different options.
Not categorical values: If the selected attribute is not a CV, a new input would allow the user to introduce the desired value (e.g. vegetation height > 1.5m). This could include range queries with numbers and full text search with string, for example.
ES implementation
Document examples can be easily extracted from ES:
https://es-test.tern.org.au/plotdata_ecoplots-data/_search
Document mapping
Most fields of an observation (dataset, site, site_visit, feature_type, etc.) are kind of key-value fields, what we call “label” and “value”. “Label” is the human readable label of an item, which also has a “value” that usually contains a “URI”.
All filters and aggregations made in the UI use the value field. One query for getting the label fields is executed just once and is stored/cached for label-value mapping.
ES queries
Many query examples have been grouped into a Postman REST client collection (can also be imported into Insomnia REST client):
Data query
Example of query for reading observations (documents) that complies with user’s selection.
FACETS queries / list of aggregations
Simple aggregations: dataset, site_id, site_visit_id, feature_type, observed_property/parameter…
Nested aggregations: region_type, regions, foi_attributes, obs_attributes (future site_attributes, etc.)…
Composed aggregations: in order to retrieve the labels of more than 10.000 sites, we need to perform a series of composite_aggregations using the keyword “after”, as simple aggregations are limited to a max of 10.000 items per bucket.
Example of query for aggregating the possible values of a specific facet (it also complies with user’s selection).
Query used in the /facet API endpoint (based on user’s selection, it get all the possible site_id values):
{
"query":{
"bool":{
"filter":[
{
"nested":{
"path":"regions",
"query":{
"terms":{
"regions.uri":[
"http://linked.data.gov.au/dataset/asgs2016/stateorterritory/3"
]
}
}
}
},
{
"terms":{
"dataset.value":[
"http://linked.data.gov.au/dataset/ausplots"
]
}
},
{
"terms":{
"feature_type.value":[
"http://linked.data.gov.au/def/tern-cv/60d7edf8-98c6-43e9-841c-e176c334d270"
]
}
},
{
"terms":{
"observed_property.value":[
"http://linked.data.gov.au/def/tern-cv/09296da0-c645-4165-950c-780c21b3c140"
]
}
}
]
}
},
"aggs":{
"value":{
"terms":{
"field":"site_id.value",
"size":200
}
}
},
"size":0
}Example of aggregating a nested field (region_type):
Notice that: region.dataset in ES = region_type
{
"query":{
"bool":{
"filter":[
{
"nested":{
"path":"regions",
"query":{
"terms":{
"regions.uri":[
"http://linked.data.gov.au/dataset/asgs2016/stateorterritory/3"
]
}
}
}
},
{
"terms":{
"dataset.value":[
"http://linked.data.gov.au/dataset/ausplots"
]
}
},
{
"terms":{
"feature_type.value":[
"http://linked.data.gov.au/def/tern-cv/60d7edf8-98c6-43e9-841c-e176c334d270"
]
}
},
{
"terms":{
"observed_property.value":[
"http://linked.data.gov.au/def/tern-cv/09296da0-c645-4165-950c-780c21b3c140"
]
}
}
]
}
},
"aggs":{
"nested_agg":{
"nested":{
"path":"regions"
},
"aggs":{
"value":{
"terms":{
"field":"regions.dataset.uri", # region.dataset = region_type
"size":1000
}
}
}
}
},
"size":0
}Labels aggregations
Notice that “labels” queries are slow, but they are only executed once and then stored in browser store, so the performance problem does not lie on them.
Ecoplots API
Ecoplots UI renders and triggers new search and facets using an Ecoplots API (https://ecoplots-test.tern.org.au/api/v1.0/ui):
The core endpoints used though the UI are:
/data: based on user’s selection in the facets menu, it requests to the API a paginated number of observations (documents in ES, displayed as rows in the datagrid).
{ "sorting":[], "query":{ "dataset":[ "http://linked.data.gov.au/dataset/ausplots" ] }, "page_size":50, "page_num":1 }
/facet: based on user’s selection in the facets menu, it requests the lists of options that fit the filtering to populate all the facets (combo-boxes in the UI)
{
"query":{
"dataset":[
"http://linked.data.gov.au/dataset/ausplots"
]
}
}Notice that all API request queries use the “value” of the option selected, instead of the showed label.
Using the above parameters of the request, the API generates and executes 1 or many ES queries and respond with the response of ES, which is processed in the UI.
Data endpoint is converted into only 1 ES query:
[2021-09-06 14:57:38,213] DEBUG in data: {'from': 0, 'size': 50, 'track_total_hits': True} 127.0.0.1 - - [06/Sep/2021 14:57:38] "POST /api/v1.0/data HTTP/1.1" 200
Facet endpoint is converted into many ES queries:
[2021-09-06 14:57:38,295] DEBUG in facet: {'aggs': {'nested_agg': {'nested': {'path': 'regions'}, 'aggs': {'value': {'terms': {'field': 'regions.dataset.uri', 'size': 1000}}}}}, 'size': 0} [2021-09-06 14:57:38,346] DEBUG in facet: {'aggs': {'value': {'terms': {'field': 'dataset.value', 'size': 200}}}, 'size': 0} [2021-09-06 14:57:38,403] DEBUG in facet: {'aggs': {'value': {'terms': {'field': 'feature_type.value', 'size': 200}}}, 'size': 0} [2021-09-06 14:57:38,452] DEBUG in facet: {'aggs': {'value': {'terms': {'field': 'observed_property.value', 'size': 200}}}, 'size': 0} [2021-09-06 14:57:38,509] DEBUG in facet: {'aggs': {'min_date': {'min': {'field': 'site_visit_date.value'}}, 'max_date': {'max': {'field': 'site_visit_date.value'}}}, 'size': 0} 127.0.0.1 - - [06/Sep/2021 14:57:38] "POST /api/v1.0/facet HTTP/1.1" 200 -
The number of internal queries to ES increases when some facets are already selected, e.g. once the user has selected a dataset, a new request is triggered and in this occasion also an aggregation for getting site_ids is executed.
This behaviour can be easily seen though the UI.
[2021-09-06 15:03:17,240] DEBUG in facet: {'query': {'bool': {'filter': [{'terms': {'dataset.value': ['http://linked.data.gov.au/dataset/ausplots']}}]}}, 'aggs': {'nested_agg': {'nested': {'path': 'regions'}, 'aggs': {'value': {'terms': {'field': 'regions.dataset.uri', 'size': 1000}}}}}, 'size': 0}
[2021-09-06 15:03:17,296] DEBUG in facet: {'aggs': {'value': {'terms': {'field': 'dataset.value', 'size': 200}}}, 'size': 0}
[2021-09-06 15:03:17,346] DEBUG in facet: {'query': {'bool': {'filter': [{'terms': {'dataset.value': ['http://linked.data.gov.au/dataset/ausplots']}}]}}, 'aggs': {'value': {'terms': {'field': 'site_id.value', 'size': 200}}}, 'size': 0}
[2021-09-06 15:03:17,440] DEBUG in facet: {'query': {'bool': {'filter': [{'terms': {'dataset.value': ['http://linked.data.gov.au/dataset/ausplots']}}]}}, 'aggs': {'value': {'terms': {'field': 'feature_type.value', 'size': 200}}}, 'size': 0}
[2021-09-06 15:03:17,489] DEBUG in facet: {'query': {'bool': {'filter': [{'terms': {'dataset.value': ['http://linked.data.gov.au/dataset/ausplots']}}]}}, 'aggs': {'value': {'terms': {'field': 'observed_property.value', 'size': 200}}}, 'size': 0}
[2021-09-06 15:03:17,537] DEBUG in facet: {'query': {'bool': {'filter': [{'terms': {'dataset.value': ['http://linked.data.gov.au/dataset/ausplots']}}]}}, 'aggs': {'min_date': {'min': {'field': 'site_visit_date.value'}}, 'max_date': {'max': {'field': 'site_visit_date.value'}}}, 'size': 0}
127.0.0.1 - - [06/Sep/2021 15:03:17] "POST /api/v1.0/facet HTTP/1.1" 200 -