Our team has been debugging a slow query and I wanted to ask a question about why one query variant is slower than the other
(comment
(def members
(d/q
'[:find [?u ...]
:in $ ?b
:where [?u :user/follows ?b]]
(d/db @conn)
board-id)))
(def query1
'[:find ?u (count ?views)
:in $ % [?u ...] ?b
:where
(or
[?b :board/admins ?instructors]
[?b :board/created-by ?instructors])
[?post :post/author ?instructors]
[?post :post/board ?b]
[?post :post/views ?views]
[?views :post-view/viewer ?u]
[(!= ?u ?instructors)]])
(def query2
'[:find ?viewer (count ?views)
:in $ % [?u ...] ?b
:where
(or
[?b :board/admins ?instructors]
[?b :board/created-by ?instructors])
[?post :post/author ?instructors]
[?post :post/board ?b]
[?post :post/views ?views]
[?views :post-view/viewer ?viewer]
[?viewer :user/follows ?b]
[(!= ?viewer ?instructors)]])
(comment
;; Elapsed time: 6489.551387 msecs
(time
(d/q
query1
(d/db @conn)
(yd.rules/rules)
(take 250 members)
board-id))
;; "Elapsed time: 87.187617 msecs"
(time
(d/q
query2
(d/db @conn)
(yd.rules/rules)
(set (take 250 members))
board-id)))
The queries shown above are aiming to find the count of views on posts made by instructors for each user. In query1 we pass in a subset of the “members” of the board, who are just users with the :user/follows
relation as shown. In query2, the input is ignored in favor of an additional binding [?viewer :user/follows ?b]
which should constrain the results to produce the same output (not withstanding thetake
).
We did identify some :where
clauses which can be moved around to increase the performance, but I’m a bit confused about why the version which accepts the input vector would be a factor of ~75x slower. The members collection is around 7000 eids. My guess here is that since the input is de-structured it has to be probed or scanned against each :post-view/viewer
value, whereas the version in query2 is binding a set and thus can do a set whatever-the-big-o intersection operation is.
Is there any reason passing in a large-ish collection is not recommended? Is it possible to pass in a set as a query input?
I should note that putting [?views :post-view/viewer ?u]
as the first where clause did bring the query down to around 564ms
for the whole input collection, but it’s still slower than the no input version by about 5x
Any suggestions?