/
Mapping and querying nested sites (or any other nested field) in Elasticsearch

As a collateral requirement of the TERN ontology data model, where there can be infinite nested sites/samples (plots, transect, quadrants, etc.), elasticsearch documents containing observation/sites need to have a specific structure that will allow to successfully query them in order to aggregate the information to create the faceted search.

Examples of nested sites in the data model:
https://ternaus.atlassian.net/wiki/spaces/DAWE/pages/2217771011

Another requirement for the design is that, in the Ecoplots facets, we first need to show the “top parent site facet”. Then, once the user has selected a site, it may select a nested site of the selected site (e.g. another site, transect or quadrants), and so on.
RDF data for observations only contains the proximate site, i.e. the closest one in the hierarchy, so specific SPARQL queries are needed during indexing to discover the whole tree.

The proposed solution is based on storing, for every document, the chain of nested sites, so that every site level in the hierarchy can be queried independently to build a facet. The tricky part is how to store the “tree/hierarchy” of sites in ES.

Proposed example docs (Where “site” is a “ES nested field”):

1{ 2 "title": "obs1", 3 "sites": [ 4 { 5 "site_id": "1-1-1", 6 "depth": "2", 7 "parent_site_id": "1-1" 8 }, 9 { 10 "site_id": "1-2-1", 11 "depth": "2", 12 "parent_site_id": "1-2" 13 }, 14 { 15 "site_id": "1-2-2", 16 "depth": "2", 17 "parent_site_id": "1-2" 18 }, 19 { 20 "site_id": "1-1", 21 "depth": "1", 22 "parent_site_id": "1" 23 }, 24 { 25 "site_id": "1-2", 26 "depth": "1", 27 "parent_site_id": "1" 28 }, 29 { 30 "site_id": "1", 31 "depth": "0", 32 "parent_site_id": null 33 } 34 ] 35} 36 37{ 38 "title": "obs2", 39 "site": [ 40 { 41 "site_id": "2-1", 42 "depth": "1", 43 "parent": "2" 44 }, 45 { 46 "site_id": "2", 47 "depth": "0", 48 "parent": null 49 } 50 ] 51}


ES query example 1:

This query would be used to generate the first site facet, containing only the top parent sites (depth=0).

1{ 2 "aggs": { 3 "agg1": { 4 "nested": { 5 "path": "sites" 6 }, 7 "aggs": { 8 "agg2": { 9 "filter": { 10 "bool": { 11 "filter": [ 12 { 13 "term": { 14 "sites.depth": "0" 15 } 16 } ] 17 } 18 }, 19 "aggs": { 20 "value": { 21 "terms": { 22 "field": "sites.site_id", 23 "size": 1000 24 } 25 } 26 } 27 } 28 } 29 } 30 }, 31 "size": 0 32}

Result:

1{ 2 "took": 44, 3 "timed_out": false, 4 "_shards": { 5 "total": 1, 6 "successful": 1, 7 "skipped": 0, 8 "failed": 0 9 }, 10 "hits": { 11 "total": { 12 "value": 2, 13 "relation": "eq" 14 }, 15 "max_score": null, 16 "hits": [] 17 }, 18 "aggregations": { 19 "agg1": { 20 "doc_count": 8, 21 "agg2": { 22 "doc_count": 2, 23 "value": { 24 "doc_count_error_upper_bound": 0, 25 "sum_other_doc_count": 0, 26 "buckets": [ 27 { 28 "key": "1", # Site 1 29 "doc_count": 1 30 }, 31 { 32 "key": "2", # Site 2 33 "doc_count": 1 34 } 35 ] 36 } 37 } 38 } 39 } 40}

ES query example 2:

This query would be used to generate the second site facet, once the user has selected a depth=0 site.

1{ 2 "aggs": { 3 "agg1": { 4 "nested": { 5 "path": "sites" 6 }, 7 "aggs": { 8 "agg2": { 9 "filter": { 10 "bool": { 11 "filter": [ 12 { 13 "term": { 14 "sites.depth": "1" 15 } 16 }, 17 { 18 "term": { 19 "sites.parent_site_id": "1" 20 } 21 } 22 ] 23 } 24 }, 25 "aggs": { 26 "value": { 27 "terms": { 28 "field": "sites.site_id", 29 "size": 1000 30 } 31 } 32 } 33 } 34 } 35 } 36 }, 37 "size": 0 38}

Result:

1{ 2 "took": 43, 3 "timed_out": false, 4 "_shards": { 5 "total": 1, 6 "successful": 1, 7 "skipped": 0, 8 "failed": 0 9 }, 10 "hits": { 11 "total": { 12 "value": 2, 13 "relation": "eq" 14 }, 15 "max_score": null, 16 "hits": [] 17 }, 18 "aggregations": { 19 "agg1": { 20 "doc_count": 8, 21 "agg2": { 22 "doc_count": 2, 23 "value": { 24 "doc_count_error_upper_bound": 0, 25 "sum_other_doc_count": 0, 26 "buckets": [ 27 { 28 "key": "1-1", 29 "doc_count": 1 30 }, 31 { 32 "key": "1-2", 33 "doc_count": 1 34 } 35 ] 36 } 37 } 38 } 39 } 40}

And we can follow the same pattern to query infinite depth of sites.

And how it looks with real data:

https://es-test.tern.org.au/plotdata_ecoplots_sitess-data/_search

1{ 2 "aggs": { 3 "agg1": { 4 "nested": { 5 "path": "sites" 6 }, 7 "aggs": { 8 "agg2": { 9 "filter": { 10 "bool": { 11 "filter": [ 12 { 13 "term": { 14 "sites.depth": "1" 15 } 16 }, 17 { 18 "term": { 19 "sites.parent_site_id": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001" 20 } 21 } 22 ] 23 } 24 }, 25 "aggs": { 26 "value": { 27 "terms": { 28 "field": "sites.site_id", 29 "size": 1000 30 } 31 } 32 } 33 } 34 } 35 } 36 }, 37 "size": 0 38}
1{ 2 "took": 4, 3 "timed_out": false, 4 "_shards": { 5 "total": 1, 6 "successful": 1, 7 "skipped": 0, 8 "failed": 0 9 }, 10 "hits": { 11 "total": { 12 "value": 10000, 13 "relation": "gte" 14 }, 15 "max_score": null, 16 "hits": [] 17 }, 18 "aggregations": { 19 "agg1": { 20 "doc_count": 3232871, 21 "agg2": { 22 "doc_count": 9981, 23 "value": { 24 "doc_count_error_upper_bound": 0, 25 "sum_other_doc_count": 0, 26 "buckets": [ 27 { 28 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-n2-s2", 29 "doc_count": 1153 30 }, 31 { 32 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-s3-n3", 33 "doc_count": 1126 34 }, 35 { 36 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-s1-n1", 37 "doc_count": 1042 38 }, 39 { 40 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-w1-e1", 41 "doc_count": 1023 42 }, 43 { 44 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-w3-e3", 45 "doc_count": 944 46 }, 47 { 48 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-w5-e5", 49 "doc_count": 853 50 }, 51 { 52 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-e4-w4", 53 "doc_count": 799 54 }, 55 { 56 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-e2-w2", 57 "doc_count": 618 58 }, 59 { 60 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-n4-s4", 61 "doc_count": 531 62 }, 63 { 64 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-s5-n5", 65 "doc_count": 494 66 }, 67 { 68 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-s4-n4", 69 "doc_count": 488 70 }, 71 { 72 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-n5-s5", 73 "doc_count": 465 74 }, 75 { 76 "key": "http://linked.data.gov.au/dataset/ausplots/site-satkan0001-transect-w2-e2", 77 "doc_count": 445 78 } 79 ] 80 } 81 } 82 } 83 } 84}