4096 character string limit

Hi there.

I was testing what error I would get when I hit the datomic cloud string limit (4096 chars) when I was surprised to find that transacting strings larger than 4096 characters does not result in an error.

Is the Datomic cloud string size limit a soft limit? If so what other problems could I run into by storing strings larger than 4096 characters?
Here’s a sample of the code I’m running

(ns user
   [datomic.client.api :as d]))

(def db-name "test")

(def get-client
  "Return a shared client. Set datomic/ion/starter/config.edn resource
  before calling this function."
  #(d/client {:server-type :ion
              :region      "us-west-2"
              :system      "<system>"
              :endpoint    "<endpoint>"
              :proxy-port  8182}))

(defn get-conn
  "Get shared connection."
  (d/connect (get-client) {:db-name db-name}))

(defn get-db
  "Returns current db value from shared connection."
  (d/db (get-conn)))

(def schema
  [{:db/ident       :string
    :db/valueType   :db.type/string
    :db/cardinality :db.cardinality/one}])

  (d/create-database (get-client) {:db-name db-name})
  (d/transact (get-conn) {:tx-data schema})

  ;; test string limit
  (let [s         (apply str (repeat 10000 "a"))
        tx-report (d/transact (get-conn)
                              {:tx-data [{:db/id  "tempid"
                                          :string s}]})
        id        (get-in tx-report [:tempids "tempid"])
        stored-s  (:string (d/pull (get-db) '[*] id))]
    (println "s length in: " (count s))
    (println "s length out: " (count stored-s))
    (println "equal? " (= s stored-s)))
  ;; =>
  ;; s length in:  10000
  ;; s length out:  10000
  ;; equal?  true

Hi Josh, large strings cause memory pressure in the node and can degrade query performance, in particular storing large strings in the datom log that change only a little bit has been seen to cause problems. I can’t find the source on this yet though. See below

1 Like

Here is a Slack log copied from an internal document i have:

jchen: Is there a way to shrink the size of blobs written to storage? Our transactors are trying to write 27MB at a time to MySQL (5.6.34), which throws the MySQL exception The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size. Increase the redo log size using innodb_log_file_size . We can’t change that parameter without incurring downtime – I’m hoping there’s a knob we can turn in datomic to split that 27mb blob up. As it is, that exception means our transactors cycle every 2-3 minutes because they can’t finish indexing (edited)

ambroise: related to above, can I query in datalog for all transactions that have a size greater than N?

marshall: @jchen you should not have segments that large. are you making very large transactions?

Datomic should not be used to store BLOBs or other large unstructured data

Also, you should consider this a critical issue @jchen if your system can’t complete an indexing job it will eventually reach a point where it is no longer available for writes

You need to get the system past this indexing job and resolve the underlying issue that is causing the large segments

the most common causes for large segments are: storing large blob values, making very large transactions (many datoms), or storing large strings that are frequently updated in a way that the leading bits are unchanged (i.e. we’ve had users storing serialized HTML or CSS in a string and updating it frequently with updates that only alter content somewhere deep into the string)

jchen: Thanks for the reply @marshall. The index job eventually succeeded: we noticed that the :kv-cluster/create-val bufsize peaked at around 27.8mb while throwing exceptions, then slowly dropped to 26.8mb and succeeded. This is the second time in two days we’ve had this issue – the first time it also seemed to resolve itself without any intervention.

dustingetz: @marshall, how large strings are we talking (since this is Onprem and no 4k limit)

marshall: i’d avoid anything that smells like a serialized value

marshall: don’t have a specific size limit, but if it isnt a “fact” it doesnt belong in a datom