Syncing to datascript

Were trying to sync part of datomic to datacript, and we ran into an issue that datomic id’s produced are too big for datascript.

I believe the best solution likely involves syncing around a uuid rather then the db/id. This fixes the issue with the servers db/id being too long on the client and it generically fixes the issue of uniquely identifying an entity across multiple databases.

One downside is that we have to add a UUID to every entity in the db. And when we transact data to datomic,that we then need to ship to datascript, we would need a way to get that uuid (as it wouldn’t be in the tx-data log sense it wasn’t touched). However, that could just be done with a lookup, probably not too slow.

Alternatively, if we don’t want to add any facts to the Datomic, we could sync over an in memory unique.value client side attribute called :server/id, that is the :db/id from the server.

One common issue in both approaches is modifying the transaction log to add information (the uuid or server/id). The only idea i have is to do a none side effecting transaction using d/db-after and get the tx-data form that which always seems to be a log as opposed to the nested maps that can be submitted to transact! I just assume it will be hard to properly parse nested map tx’s so using the log will be far easier.

The code to change the data script client tx-data log to add this server/id currently taking this shape:


  (reduce-kv
   (fn [tx-data idx [too-long-server-db-id & rest-of-datom]]
     (let [temp-id (str idx)]
       (conj tx-data
             (vec (conj rest-of-datom temp-id :db/add)) ;; we replace the server id which is too long with a temporary id based on the idx of the log
             [:db/add temp-id :server/id (str too-long-server-db-id)])))

   #{}
   (vec (take 1 tx-data)))
  ;; => #{[:db/add "0" :server/id "316659348816869"]
  ;;      [:db/add
  ;;       "0"
  ;;       :internal-team/id
  ;;       #uuid "5caf6e45-f54d-4c0e-9658-33f63c069569"
  ;;       536870913
  ;;       true]}

If you have any thoughts, let me know :slightly_smiling_face: (yes, I see that fact has a UUID, but they aren’t on everything!).

  • Here is a slack discussion on the topic that some might find useful
  • datasync does this, maybe i could pull inspiration from there, i wonder if it predates unique.value in datascript though, or at least why they wouldn’t use that (i don’t see it mentioned.

I ended up forking datascript and bumping the emax number, as it was just a perf hit.

The way I did it, was to to a map of the server (datomic ids)->Datascript ids which I use every time I transact to the server and back. My hope is that this will allow me to have a Datascript database that reads data from several Datomic (and or DataHike) databases and be able to combine it locally.

Consider using a custom (unique identity) field in Datascript that represents the :db/id of the Datomic instance. Use type string to avoid integer limits.

;; Datomic
{:db/id 213891283192 :db/doc "My Entity"}
;; Datascript
{:db/id 8 :db/doc "My Entity" :datomic/id "213891283192"}

If you use datascript in clojurescript raising emax will probably cause you an issue. That is the max representable integer in javascript. Every thing above that will function as a double. This will result in many phantom bugs because mast emax some numbers will round to the same floating point representation.

1 Like

Thanks for the input @ttallman !

To make sure we’re on the page, I’m going to add some details I think are missing, namely, what we increased emax to, and why I think it will allow our system to do correct number comparisons.

If in the end, you’re not convinced our solution is reasonable, I would be extremely interested to know why, as this is a critical part of our code. I would be happy to dig up any more information you need as well.

What I understand is that emax is a var in the datascript code base

(def ^:const emax  0x7FFFFFFF)

Which is a hexadecimal representation of 2147483647.

parseInt("0x7FFFFFFF", 16)
2147483647

Which, as you imply, is to keep eid’s from becoming too large to be compared.

(if (> eid emax)
      (raise "Highest supported entity id is " emax ", got " eid {:error :entity-id :value eid})
      eid)

However, it’s far less than the largest safe number, which is the Number.MAX_SAFE_INTEGER

Number.MAX_SAFE_INTEGER
9007199254740991

Which Mozilla describes as

Safe" in this context refers to the ability to represent integers exactly and to compare them correctly. For example, Number.MAX_SAFE_INTEGER + 1 === Number.MAX_SAFE_INTEGER + 2 will evaluate to true, which is mathematically incorrect. See Number.isSafeInteger() for more information.

I took this to mean that datascript’s emax was overly conservative, so I set emax to the max safe integer.

Here is a javascript console sandbox session illustrating if the numbers are safe:

Number.isSafeInteger(parseInt("0x7FFFFFFF", 16))
true
Number.isSafeInteger(Number.MAX_SAFE_INTEGER)
true
Number.isSafeInteger(Number.MAX_SAFE_INTEGER + 1)
false
Number.MAX_SAFE_INTEGER + 1 === Number.MAX_SAFE_INTEGER + 2
true

So given emax is still under the max safe integer, we should be safe!

Here is a conversation where i talked it over in the datascript channel, i think it jumps around a bit, i haven’t read it myself in a while, but it might contain more information i have forgotten.

Your are correct I wrongly assumed 0x7FFFFFFF was MAX_SAFE_INTEGER

Would it make sense to increase emax then upstream?

The way I understood this conversation is that it basically has no downsides
and makes the DataScript db/ids compatible with Datomic?!

In older javascript, bit operations on numbers converts them to 32-bit integers, which I think would not be compatible with how Datascript uses the eid to access the the datoms internally.

In more recent javascript, there seems to (finally) be a BigInt, availiable which can store 64 bit integers and more.

But BigInts can, for instance, not be serialized into JSON and cannot be intermixed with Number in mathematical expressions (which seems sound). So I think the update needed is more extensive than it looks, and I’m not sure of the performance characteristics of BigInts compared to Number/sometimes 32bit integer.