Memory needed for as-of

We’ve got an application that regularly needs to see how the database looked at a specific time and we’re using as-of. The logic works perfectly, but we’re seeing memory pressure issues. I’ve read the filtering docs, so my basic understanding is that the as-of function filters the indexes (and the doc says this can (or perhaps it means will?) cause a full scan of the indexes.

Now, since the peer needs those to be in the object cache, I assume this means if the object cache isn’t big enough to hold all of the indexes that this is likely to cause GC thrashing (created the filtered version of the indexes, which become garbage at some near pt in the future when the as-of db itself is collected, I would assume).

I’m guessing the filtered indexes are stored on the heap of the app, as opposed to the object cache. So, if the indexes fit in the object cache the memory pressure is really on the heap of the application itself (creating filtered copies of the indexes).

So, from a memory tuning perspective I’m thinking I might want the object cache split to be enough to hold all of the indexes, but also enough VM RAM to house at least 1 set of filtered ones?

Basically I’m just trying to understand if there are internal optimizations that make the memory and CPU overhead of as-of much less than the size of all of the indexes. We’re running several pulls and queries against each of these databases.

At the end of the day, I guess the answer is “give it as much RAM as you can afford”, but I’m wondering if the objectCacheMax split should be adjusted on a system that uses as-of a lot.

The object cache is an LRU cache of index segments, so I don’t think GC references really impact it. In other words, there isn’t any memory dependency between the cached index segments in the object cache and the Database objects that may need those segments when queried. It is query (or raw index access) that puts demands on the cache, and it is memory pressure which will cause the least recently used segments to be evicted from the cache. Query and index APIs read from that cache, Databases don’t point directly to it.

T is the last element in all the non-log indexes with the highest T first.
Since as-of constrains acceptable T values, whenever a datom is found via its index, the filter will then cause the Peer to scan backward in time until it finds a datom with T <= the as-of time. That scan may cause additional segments to be retrieved and populate the object cache. I believe it would be object cache memory pressure that will cause thrashing, not general heap pressure. I assume the memory requirements will be based on “how far back” in time you are going with as-of, measured as the novelty accumulated since that time. That is, if your as-of t is >= the last time all the datoms accessed by your query were changed, it would be a no-op (memory wise). If your as-of t is the beginning of time, then every datom accessed would require a “full scan” of that datom’s index.

Before increasing heap size you should experiment with adjusting how much heap is allocated to the object cache. If your GC thrashing is due to true garbage being generated by query, then a relatively smaller object cache may help. If, on the other hand, the object cache is too small to hold all the segments needed to complete your index time travel, then a larger object cache may help. If both of these are too small, then, yeah, buy more RAM.

Another thing you could try tuning is query workload. If you have two distinct query workloads that tend to operate over entirely different index segments (e.g. maybe your as-of queries are competing with some other unrelated queries) you could try splitting them into separate VMs to improve index cache coherence.

2 Likes