Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

At 1/11/2021 with 1 full dataset ingested, the total number of fields is 290.

Denormalise regions

Force Merge API

...

Code Block
...
"region_types": [
  "http://linked.data.gov.au/dataset/local-gov-areas-2011",
  "http://linked.data.gov.au/dataset/nrm-2017",
  "http://linked.data.gov.au/dataset/bioregion/IBRA7",
  "http://linked.data.gov.au/dataset/bioregion",
  "http://linked.data.gov.au/dataset/asgs2016/stateorterritory",
  "http://linked.data.gov.au/dataset/wwf-terr-ecoregions"
],
"region:local-gov-areas-2011": "http://linked.data.gov.au/dataset/local-gov-areas-2011/32250",
"region:nrm-2017": "http://linked.data.gov.au/dataset/nrm-2017/3080",
"region:bioregion/IBRA7": "http://linked.data.gov.au/dataset/bioregion/GUP",
"region:bioregion": "http://linked.data.gov.au/dataset/bioregion/GUP01",
"region:asgs2016/stateorterritory": "http://linked.data.gov.au/dataset/asgs2016/stateorterritory/3",
"region:wwf-terr-ecoregions": "http://linked.data.gov.au/dataset/wwf-terr-ecoregions/12945",
...

ES document mapping:

Code Block
"region:asgs2016/stateorterritory" : {
  "type" : "keyword"
},
"region:bioregion" : {
  "type" : "keyword"
},
"region:bioregion/IBRA7" : {
  "type" : "keyword"
},
"region:capad-2018-terrestrial" : {
  "type" : "keyword"
},
"region:local-gov-areas-2011" : {
  "type" : "keyword"
},
"region:nrm-2017" : {
  "type" : "keyword"
},
"region:wwf-terr-ecoregions" : {
  "type" : "keyword"
},
"region_types" : {
  "type" : "keyword"
},

Mapping generated dynamically using https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html

Code Block
"mappings" : {
  "dynamic_templates" : [
    {
      "region_as_keyword" : {
        "match" : "region:*",
        "mapping" : {
          "type" : "keyword"
        }
      }
    },
    ...
  ]
}

Force Merge API

Indices segments are merged after every indexing.

Disable refresh during indexing

Disabling index refresh makes indexing times notably faster (thoughput: ~1000 every two seconds).

1 refresh action is performed manually after indexing. Then the index segments are merged (force-merge).

Dynamic mapping

In order to ensure that the correct datatype is stored in ES for each attribute value, dynamic templating is performed during indexing following the defined rules:

Code Block
"mappings" : {
  "dynamic_templates" : [
    {
      "region_as_keyword" : {
        "match" : "region:*",
        "mapping" : {
          "type" : "keyword"
        }
      }
    },
    {
      "attribute_field" : {
        "path_match" : "*_attr_*.attribute",
        "mapping" : {
          "type" : "keyword"
        }
      }
    },
    {
      "id_field" : {
        "path_match" : "*_attr_*.id",
        "mapping" : {
          "type" : "keyword"
        }
      }
    },
    {
      "unit_field" : {
        "path_match" : "*_attr_*.unit_of_measure",
        "mapping" : {
          "type" : "keyword"
        }
      }
    },
    {
      "value_label_field" : {
        "path_match" : "*_attr_*.value.label",
        "mapping" : {
          "type" : "text"
        }
      }
    },
    {
      "value_type_field" : {
        "path_match" : "*_attr_*.value.type",
        "mapping" : {
          "type" : "keyword"
        }
      }
    },
    {
      "value_value_field" : {
        "path_match" : "*_attr_*.value.value_float",
        "mapping" : {
          "coerce" : true,
          "doc_values" : true,
          "ignore_malformed" : true,
          "type" : "float"
        }
      }
    },
    {
      "value_value_field" : {
        "path_match" : "*_attr_*.value.value_int",
        "mapping" : {
          "coerce" : true,
          "doc_values" : true,
          "ignore_malformed" : true,
          "type" : "integer"
        }
      }
    },
    {
      "value_value_field" : {
        "path_match" : "*_attr_*.value.value_bool",
        "mapping" : {
          "normalizer" : "lowercase_normalizer",
          "type" : "keyword"
        }
      }
    },
    {
      "value_value_field" : {
        "path_match" : "*_attr_*.value.value_datetime",
        "mapping" : {
          "format" : "yyyy-MM-dd'T'HH:mm:ss'Z'||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||d/MM/yyyy||epoch_millis",
          "ignore_malformed" : "true",
          "type" : "date"
        }
      }
    },
    {
      "value_value_field" : {
        "path_match" : "*_attr_*.value.value_date",
        "mapping" : {
          "format" : "yyyy-MM-dd'T'HH:mm:ss'Z'||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||d/MM/yyyy||epoch_millis",
          "ignore_malformed" : "true",
          "type" : "date"
        }
      }
    },
    {
      "value_value_field" : {
        "path_match" : "*_attr_*.value.value_uri",
        "mapping" : {
          "type" : "keyword"
        }
      }
    },
    {
      "value_value_field" : {
        "path_match" : "*_attr_*.value.value_string",
        "mapping" : {
          "type" : "keyword"
        }
      }
    }
  ],
  ...
}

Clear cache API | Elasticsearch Guide [7.10] | Elastic

Clear cache after each test query to really test performance improvement!!!