Query performance with large(r) inputs

Our team has been debugging a slow query and I wanted to ask a question about why one query variant is slower than the other

(comment
  (def members
    (d/q
     '[:find [?u ...]
       :in $ ?b
       :where [?u :user/follows ?b]]
     (d/db @conn)
     board-id)))

(def query1
  '[:find ?u (count ?views)
    :in $ % [?u ...] ?b
    :where
    (or
     [?b :board/admins ?instructors]
     [?b :board/created-by ?instructors])
    [?post :post/author ?instructors]
    [?post :post/board ?b]
    [?post :post/views ?views]
    [?views :post-view/viewer ?u]
    [(!= ?u ?instructors)]])

(def query2
  '[:find ?viewer (count ?views)
    :in $ % [?u ...] ?b
    :where
    (or
     [?b :board/admins ?instructors]
     [?b :board/created-by ?instructors])
    [?post :post/author ?instructors]
    [?post :post/board ?b]
    [?post :post/views ?views]
    [?views :post-view/viewer ?viewer]
    [?viewer :user/follows ?b]
    [(!= ?viewer ?instructors)]])

(comment
  ;; Elapsed time: 6489.551387 msecs
  (time
   (d/q
    query1
    (d/db @conn)
    (yd.rules/rules)
    (take 250 members)
    board-id))
  ;; "Elapsed time: 87.187617 msecs"
  (time
   (d/q
    query2
    (d/db @conn)
    (yd.rules/rules)
    (set (take 250 members))
    board-id)))

The queries shown above are aiming to find the count of views on posts made by instructors for each user. In query1 we pass in a subset of the “members” of the board, who are just users with the :user/follows relation as shown. In query2, the input is ignored in favor of an additional binding [?viewer :user/follows ?b] which should constrain the results to produce the same output (not withstanding thetake).

We did identify some :where clauses which can be moved around to increase the performance, but I’m a bit confused about why the version which accepts the input vector would be a factor of ~75x slower. The members collection is around 7000 eids. My guess here is that since the input is de-structured it has to be probed or scanned against each :post-view/viewer value, whereas the version in query2 is binding a set and thus can do a set whatever-the-big-o intersection operation is.

Is there any reason passing in a large-ish collection is not recommended? Is it possible to pass in a set as a query input?

I should note that putting [?views :post-view/viewer ?u] as the first where clause did bring the query down to around 564ms for the whole input collection, but it’s still slower than the no input version by about 5x

Any suggestions?

Try putting

As ?u is passed in this will unify. Reorder to put the (or …) further down.