Probably like some other people, I am curious how the architecture of a single Transactor affects performance. Does the Transactor have any multi-threaded aspects (like accepting data from Peers in parallel)? Can someone draw a (conceptual) comparison of how the write performance of Datomic would compare to a regular SQL database (e.g. PostgreSQL)? People I’ve talked to about Datomic get nervous when they learn that all the data written to the system passes through a single-threaded Transactor. What are the advantages of this approach, besides it making the Transactor easier to implement?
Datomic writes to storage sequentially. Internally the transactor runs parallel processes that handle different parts of the ingestion, and I believe these are separated by queues and stuff, but essentially it is a sequential process.
Peers read directly from storage, so the transactor is involved only in recording novelty. If you have a very high write load, you may find that Datomic struggles, but I’ve used it under pretty heavy conventional load (i.e. more reads than writes) and it’s been fine. I doubt (but maybe someone else can confirm) that Datomic adds much to the write latency of the underlying storage, although index updates I would guess are the thing that will blow when write load becomes too high for it.
It’s worth starting this response by acknowledging that Datomic’s architecture is not targeting the highest workloads, but in our experience, most workloads are more than adequately supported by the transactor and the Datomic model.
Does the Transactor have any multi-threaded aspects (like accepting data from Peers in parallel)?
Yes, the transactor is multi-threaded in much of the work it does, including some communication with peers, logging, and indexing. The “single writer” nature of Datomic means that writing to the transactional log is serialized and occurs on a single thread.
Peers write new facts by asking the Transactor to add them to the Storage Service. The Transactor processes these requests using ACID transactions, ensuring they succeed or fail atomically and do not interfere with one another. The Transactor notifies all Peers about new facts so that they can add them to their caches.
Can someone draw a (conceptual) comparison of how the write performance of Datomic would compare to a regular SQL database (e.g. PostgreSQL)?
One under appreciated aspect of the tradeoff that Datomic makes is that by stripping out all the other work normally done by the server in comparative SQL dbs (queries, reads, locking, disk sync) many workloads will be supported by this configuration. So, it’s hard to compare a “regular” SQL database when for instance they don’t separate reads from writes and run into issues like locking. In most SQL databases reads can slow writes and vice versa. This is not the case in Datomic.
Was reading back this thread and I’m still intrigued. How are isolated transactions before the commit ? Are you processing all transactions concurrently in multiple threads using optimistic updates ? Or are you processing them sequentially in one thread? If sequential was is processed in multiple threads? What kind of isolation are you doing?
Transactions are always processed in a single thread, fully isolated and serial.
Datomic is fully ACID and does not perform concurrent processing of multiple transactions.