We’ve been seeing frequent application crashes as a result of our app running out of memory. We looked at our metrics in AWS and found a strange pattern where the memory would spike periodically, then plateau. When the plateau reaches a high enough height, a single spike is enough to freeze the docker container and we have to manually restart it. We’re seeing a similar pattern in our dev environment, just on a much smaller scale.
After analyzing the heap dump, we found that a large portion of the heap was contributed to a datomic class “datomic.index.TransposedData”. I believe this is the object cache on the peer judging by the look of some of the data types used and looking through some of the data.
(heap dump showing datomic TransposedData)
We’re trying to understand why these objects are building up in the cache and what we can do to clear them out. I tried passing the JVM_OPT datomic.ObjectCacheMax through to limit the cache on the peer, but it didn’t seem to have much of an impact when testing in our dev environment. Another possibility we’re considering is that we’re holding onto the head of some reference to the cache.
We are running datomic-pro 0.9.5561.62, with the database using PostgreSQL, but are in the process of upgrading to peer 1.0.7075 since it looks like there’s a system-administer API we can try to use. We’ve also seen some improvements in our dev environment from upgrading, so we’re going to push the upgrade to production to see what effect it has.
As an aside, does datomic have any mechanisms to rollback the internal PostgreSQL schema in the event something goes wrong during our upgrade and we have to roll it back, or will we have to use our own system to restore the database?
Any advice would be greatly appreciated.
Here’s some more information about our system:
JVM_OPTS: -Xms32m -Xmx12g