Mapping and querying nested sites (or any other nested field) in Elasticsearch
As a collateral requirement of the TERN ontology data model, where there can be infinite nested sites/samples (plots, transect, quadrants, etc.), elasticsearch documents containing observation/sites need to have a specific structure that will allow to successfully query them in order to aggregate the information to create the faceted search.
Examples of nested sites in the data model:
Modelling sites, transects and quadrats
Another requirement for the design is that, in the Ecoplots facets, we first need to show the “top parent site facet”. Then, once the user has selected a site, it may select a nested site of the selected site (e.g. another site, transect or quadrants), and so on.
RDF data for observations only contains the proximate site, i.e. the closest one in the hierarchy, so specific SPARQL queries are needed during indexing to discover the whole tree.
The proposed solution is based on storing, for every document, the chain of nested sites, so that every site level in the hierarchy can be queried independently to build a facet. The tricky part is how to store the “tree/hierarchy” of sites in ES.
Proposed example docs (Where “site” is a “ES nested field”):
{
"title": "obs1",
"sites": [
{
"site_id": "1-1-1",
"depth": "2",
"parent_site_id": "1-1"
},
{
"site_id": "1-2-1",
"depth": "2",
"parent_site_id": "1-2"
},
{
"site_id": "1-2-2",
"depth": "2",
"parent_site_id": "1-2"
},
{
"site_id": "1-1",
"depth": "1",
"parent_site_id": "1"
},
{
"site_id": "1-2",
"depth": "1",
"parent_site_id": "1"
},
{
"site_id": "1",
"depth": "0",
"parent_site_id": null
}
]
}
{
"title": "obs2",
"site": [
{
"site_id": "2-1",
"depth": "1",
"parent": "2"
},
{
"site_id": "2",
"depth": "0",
"parent": null
}
]
}
ES query example 1:
This query would be used to generate the first site facet, containing only the top parent sites (depth=0).
{
"aggs": {
"agg1": {
"nested": {
"path": "sites"
},
"aggs": {
"agg2": {
"filter": {
"bool": {
"filter": [
{
"term": {
"sites.depth": "0"
}
} ]
}
},
"aggs": {
"value": {
"terms": {
"field": "sites.site_id",
"size": 1000
}
}
}
}
}
}
},
"size": 0
}
Result:
{
"took": 44,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"agg1": {
"doc_count": 8,
"agg2": {
"doc_count": 2,
"value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1", # Site 1
"doc_count": 1
},
{
"key": "2", # Site 2
"doc_count": 1
}
]
}
}
}
}
}
ES query example 2:
This query would be used to generate the second site facet, once the user has selected a depth=0 site.
Result:
And we can follow the same pattern to query infinite depth of sites.
And how it looks with real data:
https://es-test.tern.org.au/plotdata_ecoplots_sitess-data/_search