New formatting toolbar button
New Try the inline toolbar, (opens new window). Unpin it. Select text to edit. Feedback
Copy of Summary of recommendations/changes in Mapping

Following some of the recommendations presented on Monday 25th, here are some stats to help to make decision about how ES performance improves.

https://docs.google.com/document/d/11rdMT-ZoFpOmJY4ND1ujb5kwyiSZCMro7_47QbkiY9U


Change mapping for “regions” field. Get rid of “nested” type and to use a plain “keyword” field and to use “regex” in aggregations.

Changes implemented

New mapping for regions:

1"regions": {"type": "keyword"}, 2"region_type": {"type": "keyword"},

Data before:

1"regions": [ 2 { 3 "uri": "http://linked.data.gov.au/dataset/asgs2016/stateorterritory/5", 4 "label": "Western Australia", 5 "dataset": { 6 "uri": "http://linked.data.gov.au/dataset/asgs2016/stateorterritory", 7 "label": "States and territories" 8 } 9 }, 10 { 11 "uri": "http://linked.data.gov.au/dataset/wwf-terr-ecoregions/14110", 12 "label": "Southwest Australia savanna", 13 "dataset": { 14 "uri": "http://linked.data.gov.au/dataset/wwf-terr-ecoregions", 15 "label": "WWF ecoregions" 16 } 17 }, 18 { 19 "uri": "http://linked.data.gov.au/dataset/local-gov-areas-2011/56790", 20 "label": "Northampton (S)", 21 "dataset": { 22 "uri": "http://linked.data.gov.au/dataset/local-gov-areas-2011", 23 "label": "Local government areas" 24 } 25 }, 26 { 27 "uri": "http://linked.data.gov.au/dataset/nrm-2017/5010", 28 "label": "Northern Agricultural Region", 29 "dataset": { 30 "uri": "http://linked.data.gov.au/dataset/nrm-2017", 31 "label": "NRM regions" 32 } 33 }, 34 { 35 "uri": "http://linked.data.gov.au/dataset/capad-2018-terrestrial/BHA_26", 36 "label": "Eurardy", 37 "dataset": { 38 "uri": "http://linked.data.gov.au/dataset/capad-2018-terrestrial", 39 "label": "Terrestrial CAPAD regions" 40 } 41 }, 42 { 43 "uri": "http://linked.data.gov.au/dataset/bioregion/GES01", 44 "label": "Geraldton Hills", 45 "dataset": { 46 "uri": "http://linked.data.gov.au/dataset/bioregion", 47 "label": "Subregions" 48 } 49 }, 50 { 51 "uri": "http://linked.data.gov.au/dataset/bioregion/GES", 52 "label": "Geraldton Sandplains", 53 "dataset": { 54 "uri": "http://linked.data.gov.au/dataset/bioregion/IBRA7", 55 "label": "Bioregions" 56 } 57 } 58],

Data after:

1"regions": [ 2 "http://linked.data.gov.au/dataset/asgs2016/stateorterritory|http://linked.data.gov.au/dataset/asgs2016/stateorterritory/5", 3 "http://linked.data.gov.au/dataset/wwf-terr-ecoregions|http://linked.data.gov.au/dataset/wwf-terr-ecoregions/14110", 4 "http://linked.data.gov.au/dataset/local-gov-areas-2011|http://linked.data.gov.au/dataset/local-gov-areas-2011/56790", 5 "http://linked.data.gov.au/dataset/nrm-2017|http://linked.data.gov.au/dataset/nrm-2017/5010", 6 "http://linked.data.gov.au/dataset/capad-2018-terrestrial|http://linked.data.gov.au/dataset/capad-2018-terrestrial/BHA_26", 7 "http://linked.data.gov.au/dataset/bioregion|http://linked.data.gov.au/dataset/bioregion/GES01", 8 "http://linked.data.gov.au/dataset/bioregion/IBRA7|http://linked.data.gov.au/dataset/bioregion/GES" 9], 10"region_types": [ 11 "http://linked.data.gov.au/dataset/asgs2016/stateorterritory", 12 "http://linked.data.gov.au/dataset/wwf-terr-ecoregions", 13 "http://linked.data.gov.au/dataset/local-gov-areas-2011", 14 "http://linked.data.gov.au/dataset/nrm-2017", 15 "http://linked.data.gov.au/dataset/capad-2018-terrestrial", 16 "http://linked.data.gov.au/dataset/bioregion", 17 "http://linked.data.gov.au/dataset/bioregion/IBRA7" 18]


Index size stats

Approach

No docs

No hidden docs

Docs increase

Nested docs

2,563,630

27,284,158

x10.64278

Keyword

2,563,630

10,292,406

x4.014778

New index is ~2.65 times bigger in terms of number of documents


ES Queries

Old query for regions aggregation:

1{ 2 "aggs": { 3 "nested_agg": { 4 "nested": { 5 "path": "regions" 6 }, 7 "aggs": { 8 "value": { 9 "terms": { 10 "field": "regions.dataset.uri", 11 "size": 1000 12 } 13 } 14 } 15 } 16 }, 17 "size": 0 18}
1{ 2 "aggs": { 3 "nested_agg": { 4 "nested": { 5 "path": "regions" 6 }, 7 "aggs": { 8 "filtering": { 9 "filter": { 10 "term": { 11 "regions.dataset.uri": "http://linked.data.gov.au/dataset/asgs2016/stateorterritory" 12 } 13 }, 14 "aggs": { 15 "value": { 16 "terms": { 17 "field": "regions.uri", 18 "size": 1000 19 } 20 } 21 } 22 } 23 } 24 } 25 }, 26 "size": 0 27}

New queries:

1{ 2 "aggs": { 3 "regions": { 4 "terms": { 5 "field": "region_types" 6 } 7 } 8 }, 9 "size": 0, 10 "track_total_hits": true 11}
1{ 2 "aggs": { 3 "regions": { 4 "terms": { 5 "field": "regions", 6 "include": "http://linked.data.gov.au/dataset/bioregion\\|.*" 7 } 8 } 9 }, 10 "size": 0, 11 "track_total_hits": true 12}
Requests time stats