Hi Datomic team,
We’re trying to understand the exact guarantees of (d/sync conn) (without a t argument), as described in Datomic - Sync.
Our infrastructure workflow:
- Peer A writes data via a synchronous transaction and waits for confirmation.
- After confirmation, Peer A publishes an event to a third-party service.
- That service queries Datomic for the newly written data — but it goes through a load balancer, so the request may land on either Peer A or Peer B.
- Before querying, the third service calls sync() (without t) on whichever peer it lands on.
- When the request lands on Peer B, we fairly frequently observe that Peer B still does not see the data written by Peer A — even though sync() already completed.
Local reproduction:
We reproduced this locally with two processes: a producer (Peer A — creates entities and records basis-t from db-after) and a consumer (Peer B — receives a notification about the created entity along with the producer’s basis-t, calls sync() without t, then looks
up the entity). In the logs we can see that after sync() completes, the consumer’s basis-t is behind the producer’s basis-t from that transaction. Locally this is very hard to
reproduce (we waited several hours for a single occurrence), while in our cloud environment it happens fairly frequently. Our intuition is that network infrastructure plays a significant role here.
Related thread:
A similar situation is described in this mailing list thread: How to make sure transaction and peer are in sync? — where the reporter confirms that switching from sync() to sync(t) (using the basis-t from the transaction) resolved the issue.
Our question:
Is this expected behavior — i.e., does sync() without t not guarantee that a peer will see a transaction that was already committed by the time sync() was called?
The only solution we can think of is to include the producer’s basis-t in the event payload, and on the consumer side call sync(t) with that value instead of sync(). But we’d like to confirm whether the observed behavior is a known limitation before changing our
architecture.
Thank you.