Our team has been debugging a slow query and I wanted to ask a question about why one query variant is slower than the other
(comment (def members (d/q '[:find [?u ...] :in $ ?b :where [?u :user/follows ?b]] (d/db @conn) board-id))) (def query1 '[:find ?u (count ?views) :in $ % [?u ...] ?b :where (or [?b :board/admins ?instructors] [?b :board/created-by ?instructors]) [?post :post/author ?instructors] [?post :post/board ?b] [?post :post/views ?views] [?views :post-view/viewer ?u] [(!= ?u ?instructors)]]) (def query2 '[:find ?viewer (count ?views) :in $ % [?u ...] ?b :where (or [?b :board/admins ?instructors] [?b :board/created-by ?instructors]) [?post :post/author ?instructors] [?post :post/board ?b] [?post :post/views ?views] [?views :post-view/viewer ?viewer] [?viewer :user/follows ?b] [(!= ?viewer ?instructors)]]) (comment ;; Elapsed time: 6489.551387 msecs (time (d/q query1 (d/db @conn) (yd.rules/rules) (take 250 members) board-id)) ;; "Elapsed time: 87.187617 msecs" (time (d/q query2 (d/db @conn) (yd.rules/rules) (set (take 250 members)) board-id)))
The queries shown above are aiming to find the count of views on posts made by instructors for each user. In query1 we pass in a subset of the “members” of the board, who are just users with the
:user/follows relation as shown. In query2, the input is ignored in favor of an additional binding
[?viewer :user/follows ?b] which should constrain the results to produce the same output (not withstanding the
We did identify some
:where clauses which can be moved around to increase the performance, but I’m a bit confused about why the version which accepts the input vector would be a factor of ~75x slower. The members collection is around 7000 eids. My guess here is that since the input is de-structured it has to be probed or scanned against each
:post-view/viewer value, whereas the version in query2 is binding a set and thus can do a set whatever-the-big-o intersection operation is.
Is there any reason passing in a large-ish collection is not recommended? Is it possible to pass in a set as a query input?
I should note that putting
[?views :post-view/viewer ?u] as the first where clause did bring the query down to around
564ms for the whole input collection, but it’s still slower than the no input version by about 5x