How to avoid empty transaction entity from being persisted?

I’ve been working on a way to optionally transact something in datomic cloud and noticed that if my tx-data vector is empty (like {:tx-data []} the response from d/transact still includes a single Datom in the :tx-data.

{:db-before {:database-id "4c842e8a-aee2-4b52-9361-fd5e7183c02c",
             :db-name "2018-11-13",
             :t 733484,
             :next-t 733485,
             :type :datomic.client/db},
 :db-after {:database-id "4c842e8a-aee2-4b52-9361-fd5e7183c02c",
            :db-name "2018-11-13",
            :t 733485,
            :next-t 733486,
            :type :datomic.client/db},
 :tx-data [#datom[13194140266797 50 #inst"2019-04-24T19:42:43.232-00:00" 13194140266797 true]],
 :tempids {}}

I’m assuming that datom is the transaction entity itself. Is there any way to prevent the transaction entity from being created?
Is throwing an exception inside a transaction-function sufficient? If so, are there problems with doing that frequently (Hundreds of times per second) assuming wrap the transaction in a try-catch?

As you surmised, transactions will always include at least one datom (the :txInstant datom).
You can definitely throw from within your transaction function to abort the transaction without writing anything, but I would recommend detecting the potential empty transaction prior to issuing it if possible. Large numbers of aborted frequent transactions will not inherently be a problem, but the transactional stream is serialized, so they will be “in line” with any other transactions the system is attempting to issue.

To follow on with this discussion, we discovered one of our processes doing something similar (transacting empty values). We were also adding a custom attribute :transactionData/requestId for each transaction, a string containing a request ID from our HTTP server.

Is there any way to clean up these empty transactions that have accumulated? We are using an on-prem system, so excision is an option for us, but will excision just add more transactions, minus the custom attribute? Or could we run excisions in batches?

Hi @csm

How many datoms are we talking about exactly?

Are these erroneous datoms causing any performance impacts or issues? Or is this just a mistake you’d like to clean up?

As you probably suspect, deletion is not exactly compatible with Datomic --having been designed to be immutable and accumulate only. In general, we do not recommend the use of excision to correct erroneous data. It was specifically designed to satisfy scenarios where you need to forget data for liability reasons. Mainly, The General Data Protection Regulation (GDPR) law in the EU.

If you do decide to excise you will want to keep a keen eye on the performance implications. Excision puts a substantial burden on background indexing. Large excisions can trigger indexing jobs whose execution time is proportional to the size of the entire database, leading to back pressure and reduced write availability. Try to avoid excising more than a few thousand datoms at a time on a live system.

An alternative approach would be to implement a process we have previously called “Decanting”, whereby you create an ETL job to re-transact the current DB into a new DB while filtering out the datoms you do not wish to be present. You can do this by using the transaction log to “replay” transactions into a new system. Effectively pouring the existing DB into a new DB. Required by this approach, is a need to maintain a mapping of entity relationships so that history is preserved in the new system. You would also likely need downtime to perform the final switchover and depending on the size of your DB it may be a poor fit performance wise.

If you decide that both of these approaches are not worth the operational overhead, I’d be happy to take a closer look at the situation in a support case. (or you can e-mail support@cognitect.com) Perhaps we can address any issues these unintentional Datoms are causing by performance tuning or schema alterations etc.

I think it’s potentially a lot of datoms. Possibly the majority of the 1.6 billion datoms currently in this database.

I’m attempting to count them all and was trying the following, I don’t know if this is correct or not:

(defn count-empty-txns
  [url & {:keys [batch-size stop-after] :or {batch-size 1000}}]
  (let [conn (d/connect url)
        log (d/log conn)
        db (d/db conn)
        total (volatile! 0)]
    (loop [tx-log (filter #(every? #{:db/txInstant :transactionData/requestId}
                                   (map (fn [datom] (d/ident db (:a datom))) (:data %)))
                          (d/tx-range log nil nil))]
      (let [batch (take batch-size tx-log)]
        (vswap! total + (count batch))
        (prn @total)
        (when (or (nil? stop-after) (< @total stop-after))
          (recur (rest tx-log)))))
    @total))

(This got to around 500M datoms before I killed it, which seems like it was not counting correctly)

I believe the act of transacting empty datoms was leading to problems (namely with indexing in Datomic running too frequently than it should be). So, the issue this is really causing is just storage utilization, which is not so much that it’s worrying me, but it is causing our backups to run long. We currently backup to S3, and every week “rotate” to a new object prefix, and have a bucket rule to delete older objects (this may not be a good idea – is there a better way to expire old backup values in S3?).

The “decanting” process sounds like it might be a good option, I might think that through more.