Where are queries being cached?


#1

What memory setting affects how many queries can be cached?
https://docs.datomic.com/on-prem/capacity.html
Object cache?


#2

Are you referring to the results of the query being cached on your peer? If so, that will (indirectly) be controlled by the Object Cache, as you suggest

Datomic also caches processed query representations (https://docs.datomic.com/on-prem/query.html#query-caching). The size of the parsed query cache is not user-configurable.


#3

Well, query results must go somewhere? Can I tune it somehow indirectly with -Xmx?

Is a processed query representation a query with an argument?


#4

The final result of a query is returned to your application and isn’t cached ‘as is’ by Datomic.

However, the Object Cache holds segments of the Datomic database in local memory. The Object Cache is an LRU cache of raw index segments and its size can be tuned with a command line argument to your peer process (https://docs.datomic.com/on-prem/system-properties.html).

The query engine will first look in the Object Cache when it needs a database segment, so if you have sufficient memory on your machine and have configured a large Object Cache, you can often serve a large proportion of your query requirements via the in-memory local cache, instead of having to read segments from a remote location (i.e. storage).


#5

https://docs.datomic.com/on-prem/best-practices.html#parameterize-queries
Does this mean that query representations are cached, but query results are not!?


#6

The result of a query is returned to your calling application code.
The segments used during that query’s execution will likely be cached in your local object cache (assuming it’s large enough to hold them all).

The recommendation to parameterize queries is strictly about performance enhancement at the level of query parsing and “compilation”.


#7

Thank you for the clarification. Do you think it would be useful to implement query result caching? It would be tied to the timestamp of the query execution.


#8

The query execution timestamp should be irrelevant in terms of the query results, since the database “value” passed to query is immutable. You would only need to key your result cache on query inputs (in the case of Database inputs, its id and basisT would represent this adequately).

I’ve found that for the most part the results of a query are an intermediate form…there might be multiple queries, serialization for clients, etc. that happens in your application and it is better to cache the end result. Since datomic doesn’t know anything of these details, “built in” query result caching would add complexity and consume resources for very little benefit. As Marshall mentioned, the segments are cached. Segments are expensive to retrieve (since they must traverse the network) and can be reused across many different queries, so there is more benefit in caching them.

If you really do need to serve the same query result again and again, adding this caching layer is nearly trivial for the application. Use the list of query inputs as your cache key.


#9

I think the basis-t advances too quickly in many apps for a result cache to be useful. We would need an incremental query engine for this to be useful


#10

Indeed, I completely agree. Sometimes the application can annotate transactions with additional attributes to establish coarser grains than basisT, still nothing datomic knows about.

In terms of reactive query, are you referring to something like this? http://2018.clojure-conj.org/nikolas-gobel/


#11

yes, Niko’s research is awesome


#12

I meant some kind of a timestamp that is tied to the query result. If this is basisT in Datomic, then that’s it.
I don’t get your rationale why caching wouldn’t be useful in Datomic. Materialized views in an accumulate-only database like Datomic are more useful than in a regular database. You could even get correct results instead of cached results:

  1. get cached results from materialized view
  2. fix them based on recent transaction log changes