Data non-retention and redaction

I’m currently evaluating Datomic for some upcoming projects and while it seems like an excellent fit in many regards, I am getting a bit hung up on the indelibility. While there are many contexts in which this sounds like a tremendously powerful and useful feature, I generally find that the ability to forget things is just as important as the ability to remember them, if for different reasons and in different contexts. Has anyone else grappled with the security, privacy, and compliance implications of this property?

I can add a bit of context to make this a little less abstract. One project I have in mind is a kind of case-management application for nonprofits working with (potentially) vulnerable populations. In general, the history-preserving behavior of Datomic is as useful here as anywhere. However, once the relationship with a client ends, much of the personal information rapidly becomes the equivalent of toxic waste: it has no analytical use and if the system were ever compromised, it could be used to harass or harm the individuals. Losing control of a couple month’s worth of PII is bad; losing control of ten year’s worth is catastrophic.

That’s a somewhat specific case, although variations of this come up in a number of other contexts. Customer support applications such as Zendesk will have a feature to redact passwords, credit card numbers, and any other sensitive data that people occasionally put in email. And, of course applications that operate under security- and privacy-focused compliance regimes such as PCI will be very concerned with ensuring that they never have data they’re not supposed to have (including after the fact).

For the time being, is it reasonable to say that any application that wishes (or is legally required) to protect data through non-retention or redaction is simply outside the scope of Datomic? Has anyone else wrestled with this issue and perhaps found creative solutions? Is there any reason to think that this might change in the future?

Greetings @psagers

Good questions; articulate write-up. I too am working on a systems articture with similar considerations.

I also found this article helpful:

Best of luck.

That’s interesting, thanks. Another variant I had considered was storing encrypted data and turning it into a key-rotation problem.

Setting aside the complexity, one thing that strikes me about any such solution is that it’s still predicated on a 100% success rate for anticipating in advance every value that might contain information that eventually needs to be purged or degraded. I recall a flurry of news stories some years back when researchers started to realize just how much information could be recovered from supposedly anonymized data sets. This is very much a moving target.

If excision comes to the cloud version (I hadn’t dug into the on-prem docs), that might be enough of an escape hatch. Until then, if I have to decide between a storage layer that makes it hard to remember the past and easy to forget, and one that makes it easy to remember and impossible to forget, I think I’ll have no choice but to go with the former every time.