Composite index

Say that I’m working on a Google Calendar-like database where people store meetings.

  1. A meeting have a start time and end time (mostly an hour or two).
  2. A meeting can occupy multiple rooms.
  3. Meetings might temporarily overlap (we don’t enforce strict collision control).

Assuming that a user often need to query things like “give me all the meeting today at this specific room(s)”, what’s the recommended index strategy?

(01) R-tree

In PostgreSQL we might do

create table meetings (room int, during tstzrange);
create index meeting_index on meetings using gist (during);

But since Datomic does not have a GiST index (R-tree) so it’s not an option.

(02) Composite Tuples

Since the rooms are limited and most of the meetings took place within an hour or two, we can cheat a little by creating a date index.

[{:db/ident       :meeting/rooms
  :db/valueType   :db.type/ref
  :db/cardinality :db.cardinality/many}
 {:db/ident       : meeting/dates
  ;; Datomic does not support java.time.LocalDate so we have to go with strings
  :db/valueType   :db.type/string
  ;; It's :db.cardinality/many since a meeting might take place at the midnight (yikes) and span two dates
  :db/cardinality :db.cardinality/many}]

But the query might be inefficient if we have lots of meeting (more than a million) within a day. So I thought maybe we can do

[{:db/ident       :meeting/date-room-index
  :db/valueType   :db.type/tuple
  :db/tupleAttrs  [:meeting/dates :meeting/rooms]
  :db/cardinality :db.cardinality/many}]

Which trades storage space for query time, but Datomic does not allow :db.cardinality/many on composite tuples…

Execution error (ExceptionInfo) at datomic.client.api.async/ares (async.clj:58).
First error: :db.error/tuple-of-attrs-must-be-card-one

Ref: https://portal.feedback.eu.pendo.io/app/#/case/113782

(03) Hilbert Curves

I did find a report that implements R-Tree / Hilbert Curves on top of Datomic on-prem, but the codebase is no longer maintained.

(04) …

I have not thought of a elegant solution. Dear mighty Clojure / Datomic community, any thought?

Hi @hden. You can use :db.type/instant instead of :db.type/string. :db.type/instant is represented as UTC instant time.

Let’s assume you have two attributes: :meeting/startLocalDate and :meeting/endLocalDate as db.type/instant.

Then your query then might look:

[
:find ?meeting
:in $ ?roomRef ?lookedLocalDate
:where
   [?meeting :meeting/rooms ?roomRef]
   [?meeting :meeting/startLocalDate ?meetingStartDate]
   [?meeting :meeting/endLocalDate ?meetingEndDate]
   (or
       [(= ?meetingStartDate ?lookedLocalDate)]
       [(= ?meetingEndDate ?lookedLocalDate)]
   )
]

If you really need some kind of indexing which is not available/reasonable to implement within the datomic data model (without very much extra transactional data to keep the index up to date, that is), and you can keep an index process going, preferably in your local on-prem application, you can always keep an index as a separate datastructure and update it by the transaction queue, and feed it as data to your queries.

This may not be worth it in terms of extra complexity, but it is certainly one way to implement this feature. I guess the fulltext searching in Datomic is implemented this way.

And I realize I need to try this out myself.

Thanks for the suggestion. Unfortunately for my case, I need to atomically query the index (in a Datomic Cloud instance) to determine if some transaction should be committed or rejected. If you could do that with an external index I’m all ears.