Database Value Equality

I’m seeing some behavior in datomic cloud, that is contrary to what I’ve been led to believe as “the database as a value”. When I acquire a “current” DB value from a connection, using d/db fn, and try to compare that to another DB value, using =, which is of the same t basis, I get false. This appears to be the case when doing both a remote connection to a system, and from within an ions application.

;; Remote connection, using :server-type :ion
(let [c   (conn "core-prod" "XXXX.core.prod")
      db1 (d/db c)
      db2 (d/db c)]
  {:db1      db1
   :db2      db2
   :eq?      (= db1 db2)
   :type     (type db1)})
;; => {:db1
;;     {:t 3150, :next-t 3151, :db-name "XXXX.core.prod", :database-id "de0a365c-eb28-4cf4-a490-bd0bcfff8104", :type :datomic.client/db},
;;     :db2
;;     {:t 3150, :next-t 3151, :db-name "XXXX.core.prod", :database-id "de0a365c-eb28-4cf4-a490-bd0bcfff8104", :type :datomic.client/db},
;;     :eq? false, ;; ** This is not expected **
;;     :type datomic.client.impl.shared.Db}

When using dev-local I do see the expected behavior however:

;; dev-local connection, using dl/divert-system
(let [c   (conn)
      db1 (d/db c)
      db2 (d/db c)]
  {:db1      db1
   :db2      db2
   :eq?      (= db1 db2)
   :type     (type db1)})
;; => {:db1
;;     #datomic.core.db.Db{:id "import-202123-1720", :basisT 3151, :indexBasisT -1, :index-root-id nil, :asOfT nil, :sinceT nil, :raw nil},
;;     :db2
;;     #datomic.core.db.Db{:id "import-202123-1720", :basisT 3151, :indexBasisT -1, :index-root-id nil, :asOfT nil, :sinceT nil, :raw nil},
;;     :eq? true, ;; This was expected
;;     :type datomic.core.db.Db}

Having a tangible database value, that can be compared to other database values (with = fn) was a very big reason to choose datomic; see Rich’s Talk. I definitely understand that the two values may not be identical?, but surely, since they are the same T basis and both their as-of and since basis’ are the same, then the two DBs should be equal.

My primary use case here (and why I can’t just look at the :t value) is memoizing functions of the db value; since a db value is a value, and is immutable, then it should be able to be a param of a memoized fn. All of my code written this way works correct when connected to dev-local, but when deployed to an ion does not work as intended.

On a conceptual level I’m interested to hear an official answer to this question. On a practical level, I wonder if changes to indexes would be an issue here? In other words, two database values with the same database-id and same basisT could point to different segment roots in storage. They will be semantically equal, up until the point where a storage root for one is garbage collected, at which point they will not be equal at all…

At present, the Datomic API does not make any promises about the hashCode or equals methods of a db value, so you should not rely on them doing anything more than identity.

Can you tell us more about your use of memoization? What problem is it solving? How do you expire the memo to prevent unbounded memory growth?

I’m using this memoization library, that allows for various caching expiration strategies: GitHub - clojure/core.memoize: A manipulable, pluggable, memoization framework for Clojure

We’ve been using datomic cloud + ions in production for almost 6 months now, and in general the performance has been adequate. However recently some of the requirements that we’ve been implementing involve many repetitive queries (from separate app level API requests), making performance for these operations slower than desired; even after optimizing these queries.

I know we’ll need to eventually start developing a more pre-computed approach for the type of requests we’re serving in order to truly scale, but this early on in our system/product, with such little amount of data (100k datoms / 1.6MB in DDB), and very slow trickle of transactions, and lots of reads, using just a couple of well placed memoize wrappers would have been a quick win, to keep most requests fast (rather than implementing our own caching layer before needed), buying us more time to develop these features.

TL;DR: memoizing (with this) would have been a quick and convenient way to cache certain heavy queries, without having to build out our caching layer yet

I haven’t tried this, but I would expect you could cache on an argument that proxied for the database (maybe even simply t).

I would also be interesting in knowing more about your queries. With that tiny data size, the data to answer your query should always be in memory anyway. What kind of performance do you see for the uncached query, and what do you need?

Thanks for the response @stu, sorry for the delay. I ended up finding an excellent feature in that memoize lib, that let’s me customize the cache key fn, using metadata on the original fn. I was able to extract out the parts of the db value that make it unique (system/name/as-of/since/history/etc) in order to return a consistent cache key basis, and it’s working pretty well.

As for the performance expectations (and possible optimizations), I’d love to discuss that further (since this caching thing was really just a temporary solution anyway). Let me get some more solid numbers and data/query examples, and post under a new topic…