Query result pagination

I have a hypothetical situation that I’d like to solve, but I can’t find the ideal answer. Suppose you have a huge data set that could be returned from a query, how do you paginate it, so that the impact on memory is minimal? The datoms API, iterating over the datoms and filtering one by one? The index-range API, but I would have to do the same thing as in the datoms API, iterate over the items and filter one by one? Perform an initial query that would return only ids, and the paginate those ids so that they could be used in another query to retrieve the entire data set?

In SQL you usually can define a pagination in the query itself:

SELECT col1, col2, ...
 FROM ...
 WHERE ... 
 ORDER BY -- this is a MUST there must be ORDER BY statement
-- the paging comes here
OFFSET     10 ROWS       -- skip 10 rows
FETCH NEXT 10 ROWS ONLY; -- take 10 rows

Hi,

I would just run a regular datomic query, then use Clojure or Java for skipping (offset) and limiting (page size). After all, this is what a SQL server does on fetched records…

Best,
Cam

Datomic does not have any query result sorting built in. As Cam suggests, you’ll want to handle this in your application code. I will point out that you need to make memory considerations for query within your peer application.

http://docs.datomic.com/query.html#memory-usage

Datomic’s Datalog is set-oriented and eager, and does not spill large results to disk. Queries are designed to be able to run over datasets much larger than memory. However, each intermediate representation step of a query must fit into local memory. Datomic doesn’t spool intermediate representations to disk like some server-based RDBMS’s.

If you need to work with result sets larger than can fit in memory, you will need to break up the query into smaller pieces whose results can fit in memory. The datoms and index-range APIs are often useful for this.

1 Like

I was wondering about sorting and pagination with the client API on the cloud.

My example use case would be e.g. to sort all posts by number of likes, and return the top 20, then the next 20 etc. This is an extremely common web application requirement. The number of posts could be tens of thousands.

Leaving out considerations that the underlying result set might have changed when the next request comes in, how would Datomic handle this?

My first guess would be to install a few sort-related functions as Ions, that would return all results sorted, and also optionally skip/limit. It would be great to have a ready made example for this though, or even better some built in support.

3 Likes

I’m also very surprised something like this isn’t built in. As @orestis said, it’s such a normal web app requirement.

Are there any example solutions out there?

Bump: also interested in how this would work in Datomic Cloud.

2 Likes