Non-Database data sources

When crafting an input to query we have a few options as to how the values get bound: namely a “binding” (scalar/vector/tuple/relation), a “source” variable, a rules variable or a pattern name binding.

We are successfully passing non-Database values to “source” inputs (i.e. input variables starting with $). I see this hinted at in the documentation (emphasis mine):

Often you will have only a single, or primary, data source (usually a database). In this case you can call that data source $ , and elide it in the data clauses

Specifically, passing an [[e a v]] relation (i.e. a List of Lists) seems to work quite well as a source. This is useful for a couple use cases such as making arbitrary non-persisted reference data available to queries, and for unit testing queries. [[e a v t]] also works and I assume [[e a v t added?]] might work, although I’m not sure how the basis-t would be determined for such a source.

Are there any documented details on how non-Database sources are handled, what is “supported”? I haven’t tried, but I also imagine implementing parts of the Database API could work (e.g. to give query some leverage over indexes) but generally creating such things using memory databases and/or with seem like a better route in that case.

Hi @adam

The shapes of other data sources are implied by the binding support for inputs. I am looking at updating out current docs to be clear about that connection, but you can pass anything you can bind per the binding forms docs: https://docs.datomic.com/cloud/query/query-data-reference.html#binding-forms

As far as implementing the API that is undocumented. If you run into a specific scenario that you feel requires you to do so, please feel free to add that to the post --because like you said, I too think most use cases get covered by memory DBs and/or with.

Thanks @jaret for the doc reference. These details aren’t in the on-prem docs(!). In cloud, this section cements the link for me https://docs.datomic.com/cloud/query/query-data-reference.html#data-patterns … the key quote is:

A data pattern is a tuple that begins with an optional src-var which binds to a relation. The src-var is followed one or more elements that match the tuples of that relation in order. The relation is almost always a Datomic database, so the components are E, A, V, Tx, and Op.

The cloud docs are certainly clearer/more explicit than on-prem. Even so, if you take a pass at improving the docs, I feel the subtlety here is often missed since the first thought coming to mind is to bind relations using the relation binding form. src-var is for handling relations bound as scalars. One case de-structures the relation in :in the other in :where.

The use cases where we find this sort of thing helpful (powerful) show up when it comes to building extensible data processing systems. What I find is that I want the :where part of the query to be an extension point, usually :find is fixed by the specific use-case (a contract that :where must satisfy, in an arbitrary way) and :in comes from the context. It is always “easy” to toss a Database value into that context, but sometimes (e.g. due to data access control requirements) it is necessary to tightly restrict what data the query is allowed to operate over (Database filters are another way). Materializing that data and passing it as a relation has worked well for us in this case, when it is “derived” data (e.g. the result of some other aggregation operation already applied).

We’ve avoided this approach in cases where the input sizes are large enough to need extra indexing, but the concept is intriguing to me. Would there be a way to provide source inputs that capture the “indexed” nature of Database without having to cover the entire API? Some way of expressing “here are multiple views of the same data indexed by different component orderings”. Maybe query is already building these indexes on the fly? Another way to solve this for us would be to provide an API to “prepare” a relation for repeated querying. Even if the indexing were performed lazily (again, maybe you’re already doing this, internally?), the “prepared” relation could be used across multiple calls to query without re-indexing it each time.