Performance degradation and heap overflow with large collection binding input

nikolayandr · July 9, 2025, 10:25pm

Hi everyone,

I’m working on a query pattern where I pass a large number of entity IDs into the query as a collection binding. Here’s a simplified version of the query:

[:find ?MaterialImpl1
 :in $ [?MaterialImpl1 ...] ?MaterialImpl1_value_0_VR
 :where
 [?MaterialImpl1 :Model/typeName ?MaterialImpl1_value_0_VR]]

When the number of IDs is relatively small (e.g., up to 100,000), the query executes within a reasonable time (a few seconds). However, once the input size increases (e.g., around 1,000,000 IDs), I observe a drastic performance slowdown. Eventually, the query fails with a Java heap space overflow, which I assume is due to memory pressure during query execution.

I’m trying to understand: Is there a known upper limit or best practice for using large collection bindings like [?e ...]?

Any advice or experiences would be appreciated — thank you!

nikolayandr · July 11, 2025, 2:57pm

Additionally, I’ve noticed that if we run a query like this:

[:find ?MaterialImpl1
 :in $ ?MaterialImpl1_value_0_VR
 :where
  [?_ :someAttribute ?MaterialImpl1]
  [?MaterialImpl1 :Model/typeName ?MaterialImpl1_value_0_VR]]

where ?MaterialImpl1 ends up containing exactly the same set of entity IDs as in the problematic case, the query executes much faster and is more memory-efficient.

The issue is that it’s not always possible to construct the query this way, since the set of entity IDs bound to ?MaterialImpl1 may come from multiple places in the application logic, not just from a single [?_ :someAttribute ?MaterialImpl1] clause. In those cases, we are forced to pass a large collection as an input binding, which causes the performance degradation and memory pressure described in this thread.

@jaret I’d really appreciate it if you could confirm whether an efficient solution for this problem exists.

P.S. It’s quite surprising that a query operating over the exact same data set can have such a significant difference in performance.

Topic		Replies	Views
Imposing peer `query` resource limits Datomic Pro	7	952	April 22, 2024
insufficient bindings? Peer API	6	1933	April 24, 2018
Bad query performance or OOM for queries executing joining between two variables Troubleshooting	4	967	November 25, 2020
Buildup of datomic.index.TransposedData instances in heap causing crashes, unsure how to clear Troubleshooting	5	183	February 26, 2024
Query performance with large(r) inputs Datomic Pro	2	673	July 11, 2025

Performance degradation and heap overflow with large collection binding input

Related topics