Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Uwe - because you have multiple indices, the queries are doing it in parallel and merge the result in the response

    • Javi - We are using the scroll API to get like 20 million documents, since the indices are overall smaller, it is a slight improvement as well.

    • Javi - for the UI we are using pagination

  • Uwe - is date range working too?

    • Javi - yes, and the queries now are simpler too.

  • Javi - No longer need to use nested aggregations as well

    • Uwe - Yes, just terms aggregations and nothing more. That's good. Really a great success in my opinion.

    • Javi - Yes I am quite happy with the improvement

  • Javi - Gerhard did a few changes to the cluster, so maybe he wants to update Uwe.

    • Gerhard - I changed the heap settings to 25% of the maximum ram and added a few more cpu cores and a bit more ram.

    • Uwe - Last time we said I'll get access to the machines via SSH, is that still required? For now, seems fine.

    • Gerhard - using the readonlyrest extension for authorization

      • Uwe - be careful and disable scripting if you're not using the script API

        • Not really sure if the read-only will affect this but if you're not using it, then disable it to be safer.

      • Gerhard - it's not even read-only from the outside, and it will eventually be completely blocked from the outside

  • Uwe - thought about using open search?

    • Gerhard - thought about but haven't tried it

    • Uwe - I think most is the same, elasticsearch stuff is identical but authorization handling is different

  • Uwe - you're using postman and not kibana?

    • Javi - I use postman in my day-to-day work and it can save queries

  • Uwe - How much code changes was it in the UI and API?

    • Javi - Project is split into 3

      • API

      • UI

      • Indexer

    • Javi - Most of the changes were happening in the indexer

    • Javi - The API project was just changing the queries.

    • Javi - the UI now looks up the labels

    • Uwe - Oh, I thought the API would look for the labels.

  • Uwe - Will the API be used by external users?

    • Javi - Yes

    • Uwe - will they also have to use URIs?

    • Javi - For now yes

    • Gerhard - we can also have R or Python wrappers around the APIs to improve the UX.

  • Javi - We have some issues with the map and was wondering if you can help us Uwe.

    • Javi - We are doing geo aggregations.

    • Uwe - So are the coordinates saved here in the documents?

    • Javi - shows kibana

    • Javi - using geopoint

    • Uwe - unfortunately I don't have much experience with the geo aggregations

    • Javi - shows the clustering of sites and how zooming in and out calculates the aggregations to return the response, a cluster of sites.

    • Javi - we will have 10s of thousands of sites in the coming datasets

      • Javi - we like square data grids like DataOne shows DataOne portal

      • Javi - Looking at geotile API in elasticsearch

      • Uwe - unfortunately I have no idea and haven't tried this before

      • Javi - currently we are using the geohash API but I want to use the geotile API.

    • Uwe - At PANGAEA we are assigning names of the regions but not anything like this with a map with rectangle clustering

    • Uwe - I prefer full-text search instead of arbitrary maps broken into clusters of things

    • Uwe - I prefer the current clustering that you have instead of the rectangular one.

  • Uwe - probably worthwhile to think about utilising full-text search in combination with the current search options.

  • Guru - Javi is currently doing aggregations in real-time, which causes some slowness. Is there a better way to do this to index it in advance.

    • Uwe - A bit strange why it's slow for around 800 sites.

    • Javi - Every time you zoom in or out is performing a new aggregation. So it's not really slow, just inefficient because it's performing a lot of aggregations.

    • Uwe - At PANGAEA we are adding full-text search and everything during indexing time and have it as a grid and index it.

    • Gerhard - not really slow currently and the animation with the zoom in and out is blocking the UI.

    • Uwe - currently don't think there's much to do now as the aggregations is fast. You can precalculate it in the future.

    • Javi - I read many people are moving to Elasticsearch for geo instead of using something like PostGIS

  • Uwe - https://wiki.pangaea.de/wiki/Topic

Action items

  •  

Decisions