Are new facts streamed to all peers?


#1

When transactor broadcasts novelty to all connected peers, does it really stream all new data to all the peers, even if that data is not in the working set of that peer?

If one of the peers works on one set of app objects, and another works on another set (as happened by load balancing), they are not inerested in the changes made by another one (unless one fails and the second starts receiving requests for boths sets). Straming all data (pure datoms or transaction functions can be brodcasted too?) will produce unnecesary traffic and memory pressure on peers (they need to keep in memory all the novel, unindexed yet datoms, even if these datoms are not of inerest and were never touched by the application code running in that peer, right?).

Or, if I connect many many peers to utilize Datomic’s unlimited read scalability, the transactor network will become a bottleneck if it needs to broadcast copy of all new data to all the peers. So write performace will degrade as the number of connected peers grow. Does it happen?

Best regards,
Anton


#2

Yes, I believe you’re correct with your assertions, but in my mind this would still be better than having all reads/writes hitting the same server or losing ACID compliance by having many write servers (or alternatively increasing read complexity along with read traffic/latency).

Not sure, but I think the memcached [1] option might work towards reducing that write traffic for high volume write apps.

Adding memcached to a system can reduce the load on storage, and can reduce the latency associated with populating the object cache.

  1. https://docs.datomic.com/on-prem/caching.html#memcached

It would be nice if we could define a limited set of schema attributes for a peer to cache data with, but I see no evidence that can be done.

Someone correct me if I am wrong! Thanks.


#3

I would add that the novelty window kept by peers is modest in size and discarded after each index job.


#4

@tim, thanks for the link. But memcashed is for object cache, while new, unindexed datoms are stored in memory-index. Still, the page you linked gives the terminology (like memory-index) and explains other nuances.

So, transactor I guess doesn’t send new datoms to peers directly, it writes them to the log and just informs peers: “new transactions!”. The peers fetch new datoms from log into memory index.

And, as @stu said, the memory size occupied by the new datoms in memory-index is only for datoms not yet included into the storage index. As transactor runs the indexing job when memory-index-threshold is reached, the size of the novelty area on peers is around the memory-index-threshold.

So, the transactor network does not become a bottleneck for writes when more peers are connected, because peers receive new datoms from storage, not from the transactor directly. If the storage is scalable, large number of peers is ok.


#5

For estimating write load, Rich laid out the following math at the Conj '18 party: 10 billion datoms in a year is 317 datoms per second, so if you are maintaining write volumes like that, i might anticipate getting bottlenecked by index size. Some off the cuff math: 317 datoms * 5 bytes/datom (a guess) * 60 seconds = 100kb per minute (plus any additional blobs like strings, dates etc) until the next indexing job. Not a very scary guesstimate to me.


#6

@dustingetz, my writing will be deleting datoms as well as adding, so the total index size will not be large. (I’m looking for an operative storage to keep data that previously fit into a java memory. I need a kind of shared memory between java processes which were previously using adhoc data sync methods).

But the stream of modifications (both deletions and additions) will be moderately intense. That’s why I’m trying to understand clearly the writing mechanics.