Following some of the recommendations presented on Monday 25th, here are some stats to help to make decision about how ES performance improves.
https://docs.google.com/document/d/11rdMT-ZoFpOmJY4ND1ujb5kwyiSZCMro7_47QbkiY9U
Region field
Change mapping for “regions” field, getting rid of “nested” type and using a plain “keyword” field (concatenating region_type and region) and then to use “regex” in aggregations.
Data changes
New mapping for regions:
"regions": {"type": "keyword"}, "region_type": {"type": "keyword"},
Data before:
"regions": [ { "uri": "http://linked.data.gov.au/dataset/asgs2016/stateorterritory/5", "label": "Western Australia", "dataset": { "uri": "http://linked.data.gov.au/dataset/asgs2016/stateorterritory", "label": "States and territories" } }, { "uri": "http://linked.data.gov.au/dataset/wwf-terr-ecoregions/14110", "label": "Southwest Australia savanna", "dataset": { "uri": "http://linked.data.gov.au/dataset/wwf-terr-ecoregions", "label": "WWF ecoregions" } }, { "uri": "http://linked.data.gov.au/dataset/local-gov-areas-2011/56790", "label": "Northampton (S)", "dataset": { "uri": "http://linked.data.gov.au/dataset/local-gov-areas-2011", "label": "Local government areas" } }, { "uri": "http://linked.data.gov.au/dataset/nrm-2017/5010", "label": "Northern Agricultural Region", "dataset": { "uri": "http://linked.data.gov.au/dataset/nrm-2017", "label": "NRM regions" } }, { "uri": "http://linked.data.gov.au/dataset/capad-2018-terrestrial/BHA_26", "label": "Eurardy", "dataset": { "uri": "http://linked.data.gov.au/dataset/capad-2018-terrestrial", "label": "Terrestrial CAPAD regions" } }, { "uri": "http://linked.data.gov.au/dataset/bioregion/GES01", "label": "Geraldton Hills", "dataset": { "uri": "http://linked.data.gov.au/dataset/bioregion", "label": "Subregions" } }, { "uri": "http://linked.data.gov.au/dataset/bioregion/GES", "label": "Geraldton Sandplains", "dataset": { "uri": "http://linked.data.gov.au/dataset/bioregion/IBRA7", "label": "Bioregions" } } ],
Data after:
"regions": [ "http://linked.data.gov.au/dataset/asgs2016/stateorterritory|http://linked.data.gov.au/dataset/asgs2016/stateorterritory/5", "http://linked.data.gov.au/dataset/wwf-terr-ecoregions|http://linked.data.gov.au/dataset/wwf-terr-ecoregions/14110", "http://linked.data.gov.au/dataset/local-gov-areas-2011|http://linked.data.gov.au/dataset/local-gov-areas-2011/56790", "http://linked.data.gov.au/dataset/nrm-2017|http://linked.data.gov.au/dataset/nrm-2017/5010", "http://linked.data.gov.au/dataset/capad-2018-terrestrial|http://linked.data.gov.au/dataset/capad-2018-terrestrial/BHA_26", "http://linked.data.gov.au/dataset/bioregion|http://linked.data.gov.au/dataset/bioregion/GES01", "http://linked.data.gov.au/dataset/bioregion/IBRA7|http://linked.data.gov.au/dataset/bioregion/GES" ], "region_types": [ "http://linked.data.gov.au/dataset/asgs2016/stateorterritory", "http://linked.data.gov.au/dataset/wwf-terr-ecoregions", "http://linked.data.gov.au/dataset/local-gov-areas-2011", "http://linked.data.gov.au/dataset/nrm-2017", "http://linked.data.gov.au/dataset/capad-2018-terrestrial", "http://linked.data.gov.au/dataset/bioregion", "http://linked.data.gov.au/dataset/bioregion/IBRA7" ]
Index size stats
Approach | No docs | No docs (incl. hidden docs) | Docs increase |
---|---|---|---|
Nested docs | 2,563,630 | 27,284,158 | x10.64278 |
Keyword | 2,563,630 | 10,292,406 | x4.014778 |
Old index is ~2.65 times bigger in terms of number of documents
ES Queries
Old query for regions aggregation:
{ "aggs": { "nested_agg": { "nested": { "path": "regions" }, "aggs": { "value": { "terms": { "field": "regions.dataset.uri", "size": 1000 } } } } }, "size": 0 }
{ "aggs": { "nested_agg": { "nested": { "path": "regions" }, "aggs": { "filtering": { "filter": { "term": { "regions.dataset.uri": "http://linked.data.gov.au/dataset/asgs2016/stateorterritory" } }, "aggs": { "value": { "terms": { "field": "regions.uri", "size": 1000 } } } } } } }, "size": 0 }
New queries:
{ "aggs": { "regions": { "terms": { "field": "region_types" } } }, "size": 0, "track_total_hits": true }
{ "aggs": { "regions": { "terms": { "field": "regions", "include": "http://linked.data.gov.au/dataset/bioregion\\|.*" } } }, "size": 0, "track_total_hits": true }
Requests time stats
Summary Excel:
JSONs with results of tests:
POSTMAN queries (collection) → Importable to Postman by anyone.
Not nested regions is slighly faster in most of the executions
Add Comment