Is sync() without t guaranteed to see a transaction committed by another peer?

Hi Datomic team,

We’re trying to understand the exact guarantees of (d/sync conn) (without a t argument), as described in Datomic - Sync.

Our infrastructure workflow:

  1. Peer A writes data via a synchronous transaction and waits for confirmation.
  2. After confirmation, Peer A publishes an event to a third-party service.
  3. That service queries Datomic for the newly written data — but it goes through a load balancer, so the request may land on either Peer A or Peer B.
  4. Before querying, the third service calls sync() (without t) on whichever peer it lands on.
  5. When the request lands on Peer B, we fairly frequently observe that Peer B still does not see the data written by Peer A — even though sync() already completed.

Local reproduction:

We reproduced this locally with two processes: a producer (Peer A — creates entities and records basis-t from db-after) and a consumer (Peer B — receives a notification about the created entity along with the producer’s basis-t, calls sync() without t, then looks
up the entity). In the logs we can see that after sync() completes, the consumer’s basis-t is behind the producer’s basis-t from that transaction. Locally this is very hard to
reproduce (we waited several hours for a single occurrence), while in our cloud environment it happens fairly frequently. Our intuition is that network infrastructure plays a significant role here.

Related thread:

A similar situation is described in this mailing list thread: How to make sure transaction and peer are in sync? — where the reporter confirms that switching from sync() to sync(t) (using the basis-t from the transaction) resolved the issue.

Our question:

Is this expected behavior — i.e., does sync() without t not guarantee that a peer will see a transaction that was already committed by the time sync() was called?

The only solution we can think of is to include the producer’s basis-t in the event payload, and on the consumer side call sync(t) with that value instead of sync(). But we’d like to confirm whether the observed behavior is a known limitation before changing our
architecture.

Thank you.

Sorry, this issue is no longer relevant, as it was not the cause of missing data in production. Although I was able to reproduce it locally, given that it only happened a few times over a long period under a very high transaction load, it can probably be considered negligible.