Hello Datomic Forum! I hope my first post is up to snuff.
I’m trying to work out the most efficient way to query against a large
set of entities for a given time range. Specifically, we need to return
only the members of this group that have changed since a certain point
in time.
Since we’re dealing with a business-entity that has multiple
component-ref levels, it isn’t sufficient to only consider the root when
looking for changes. We use a recursive rule to find all the component
refs given the root:
(def component-rules
"All component and transitive component entities."
'[[(component ?e ?a ?child)
[?e ?a ?child]
[?a :db/isComponent true]]
[(component ?e ?b ?grandchild)
[?e ?a ?child]
[?a :db/isComponent true]
(component ?child ?b ?grandchild)]])
(defn component-tree
"Given an eid, return all eids including it that can be directly or
transitively traversed through component attributes."
[db eid]
(let [eid (d/entid db eid) ;; allow for passing a lookup ref
descendents (d/q '[:find [?descendant ...]
:in $ % ?e
:where
(component ?e ?a ?descendant)]
db component-rules eid)]
(into #{eid} descendents)))
We can then ask if an entity has changed since a given t:
(defn updated-since-t?
"Return true if the given business entity -- i.e. the reference tree
implied by eid -- has had any alterations since t."
[db t eid]
(let [db-since (d/since db t)
eids (component-tree db eid)]
(boolean
(d/q '[:find ?e .
:in $ [?e ...]
:where [?e]]
db-since eids))))
Here I need to get a hold of all the roots. A wasteful approach
would be to filter
(d/datoms db :aevt :membership-root-attr)
by updated-since-t?
. But that’s prohibitive given number of members
and requirements around time. Note also that we can’t pass db-since
to d/datoms here because we would only iterate business entities
that had been created since-t – it’s a characteristic of our root
membership-attr that it’s only touched on creation.
At this point, I started to wonder if time is a more selective
criterion and if we could start with a tx range. I’ll elide the code,
but we can efficiently compute the maximal attribute set for a given
business-entity, and then pass those attributes to a query against
a tx range. This is pretty fast for a day range and 125 possible
attributes at all levels of the business entity.
(defn changes-for-span-by-attrs
[db log attributes t-start t-end]
(d/q '[:find [?e ...]
:in $ ?log [?a ...] ?t1 ?t2
:where
[(tx-ids ?log ?t1 ?t2) [?tx ...]]
[(tx-data ?log ?tx) [[?e ?a]]]]
db log attributes t-start t-end))
Now we need filter to the common roots, which is where I can’t make
this approach fast enough.
(def get-root-rules
'[[(component-ancestors ?e ?a ?parent)
[?parent ?a ?e]
[?a :db/isComponent true]]
[(component-ancestors ?e ?a2 ?ancestor)
[?parent ?a ?e]
[?a :db/isComponent true]
(component-ancestors ?parent ?a2 ?ancestor)]
[(get-root ?e ?root ?membership-attr)
[?e ?membership-attr]
[(identity ?e) ?root]]
[(get-root ?e ?ancestor ?membership-attr)
(component-ancestors ?e _ ?ancestor)
[?ancestor ?membership-attr]]])
(defn changed-entities-for-range-by-attrs
;; Takes too much time; i haven't seen it finish for a day range
[db log membership-attr attributes t-start t-end]
(d/q '[:find ?root
:in $ ?log % ?membership-attr [?a ...] ?t1 ?t2
:where
[(tx-ids ?log ?t1 ?t2) [?tx ...]]
[(tx-data ?log ?tx) [[?e ?a]]]
;; Then get root which could be the given ?e or an ancestor:
(get-root ?e ?root ?membership-attr)]
db log get-root-rules membership-attr attributes t-start t-end))
Happy to have any suggestions!