Query result pagination


I have a hypothetical situation that I’d like to solve, but I can’t find the ideal answer. Suppose you have a huge data set that could be returned from a query, how do you paginate it, so that the impact on memory is minimal? The datoms API, iterating over the datoms and filtering one by one? The index-range API, but I would have to do the same thing as in the datoms API, iterate over the items and filter one by one? Perform an initial query that would return only ids, and the paginate those ids so that they could be used in another query to retrieve the entire data set?

In SQL you usually can define a pagination in the query itself:

SELECT col1, col2, ...
 FROM ...
 WHERE ... 
 ORDER BY -- this is a MUST there must be ORDER BY statement
-- the paging comes here
OFFSET     10 ROWS       -- skip 10 rows
FETCH NEXT 10 ROWS ONLY; -- take 10 rows



I would just run a regular datomic query, then use Clojure or Java for skipping (offset) and limiting (page size). After all, this is what a SQL server does on fetched records…



Datomic does not have any query result sorting built in. As Cam suggests, you’ll want to handle this in your application code. I will point out that you need to make memory considerations for query within your peer application.


Datomic’s Datalog is set-oriented and eager, and does not spill large results to disk. Queries are designed to be able to run over datasets much larger than memory. However, each intermediate representation step of a query must fit into local memory. Datomic doesn’t spool intermediate representations to disk like some server-based RDBMS’s.

If you need to work with result sets larger than can fit in memory, you will need to break up the query into smaller pieces whose results can fit in memory. The datoms and index-range APIs are often useful for this.