Can retractions improve performance?


#1

I have a schema where users can have a session/token. My application is a web-app, so when the app recieves a request with a <appname>_session cookie, it looks up the user which has the provided session/token and considers the user logged in if a user is found.

Now, tokens aren’t valid forever. So what I do is to lookup the transaction that added this token, and check if the token was created/asserted within a reasonable time (a week).

The interesting thing here is that I don’t really need to retrat tokens to avoid logging in users with invalid tokens, as I always check the transaction time. But I’m wondering if there are advantages to periodically retracting invalid tokens anyway?

As an example, say I have a user that has logged in one thousand times (and so has one thousand session tokens), would there be any benefit to retract all the invalid tokens when I always check the validity of the token by its creation time?


#2

It’s a good question. Personally I wouldn’t do that. As I understand it Datomic acts similar to an append only log file where the newest entries are accessed first in any querying. So when you retract you’re adding data to the top to nullify its previous entry. IF that’s true then it’s also possible that performance could be worse[1]. Really, I think the Datomic team can provide a better answer for that, but personally I’d be more interested to see if using the ‘since’ filter [2] is a better option in your case.

  1. theoretically, though practically speaking I doubt it would make any real difference.

  2. https://docs.datomic.com/on-prem/filters.html#since


#3

Wouldn’t :db/noHistory essentially remove the need for retraction altogether?


#4

If you never do a retraction (or never assert a new value) then :db/noHistory has no effect.


#5

Right. I’m all new to this, so take this with a grain of salt.

I thought, overwriting the session token counts as an implicit retraction, and :db/noHistory would kick in.


#6

Oh. No, you got the right idea. It’s just in my app, a user can have more than one session token :slight_smile:


#7

If you don’t retract datoms, the index grows and data access becomes more expensive. Official datomic people say that after 10 billions of datoms the index may become a bottleneck.

So for performance it’s better to remove unused data.


#8

Datomic is accumulate only, not append only. There are important semantic and performance differences. See https://docs.datomic.com/on-prem/indexes.html#accumulate-only

In particular, Datomic does not pay a performance penalty for the “present” (i.e. all those things that are true now).

As an answer to @Robin’s original question -
Yes, theoretically there may be a slight advantage to retracting old expired tokens. However, I strongly suspect that you will never see the difference in any practical implementation.


#9

Note that the accumulation of facts means that your total datom count goes up when you issue a retraction.
That said, it’s very unlikely either approach will have any measurable difference in performance.

I would personally retract the expired/invalid tokens just for semantic/ease-of-use reasons - getting all the tokens would also serve to get only the latest/valid ones.


#10

@marshall, I think retraction operations only add datoms to log. The current index version becomes smaller after retraction. So, unless you do historic queries, the active set of datoms a peer deals with becomes smaller after retraction.

Actually, those who want to be sure can just do 10 billion additions and retractions and compare query performance after that to query performance after doing just 10 billion additions. (Instead of 10 billion single datom transactions one can do 10 million of 1 million datom transactions).