We are using Datomic to store Time series data. We get the data hourly.
We have around ~40 billion datoms
When we query ,we try to load the data for last ~5 days.We have defined indexes for data and we use date range based queries.
At times , some query take up to ~1 min.Problem is its very inconsistent
Sadly, We realized Datomic is not optimized for such huge amount of data.
“Time series data”, this is called out in various places as one of the things that Datomic is not well suited for. You might want to consider a store that’s more geared to that use case, and potentially say use Datomic for ‘projections’ of that data, etc. depending on your needs
40B datoms doesn’t sound that bad, I believe Nubank and HCA both rollover to a new database every quarter or so when they reach the limit and there is a crossover technique, see this talk for the details: http://2018.clojure-conj.org/igor-ges/
Thanks Dustin. The video was helpful.It mentions about sharding and roll over.I had watched Nubank presentation on Infoq , they also mention Sharding.
Is there any way we can identify how many trips are made to storage to complete a query?
Thanks