Questions about Datomic backup integrity and file structure

leblowl · April 16, 2018, 7:45pm

Hello,

I have 3 questions

1st question: How I can verify the integrity of a Datomic backup? For example, a way to calculate a hash from a DB value. That way I can run the hash against my production DB and run the same hash against a restored backup of my production DB and compare.

2nd question: How is a Datomic backup structured? What are values? What are roots?

3rd question: I have a Datomic DB running on AWS and have two backups of the same database. One is a recurring backup to an S3 bucket, running every hour; the other is a local backup that I run pseudo randomly whenever I feel like it. When I run the Linux command diff against these two backups, the S3 backup has more files in both the roots subdirectory and the values subdirectory, although no files actually differ according to the output. Why is this?

Thanks!

leblowl · April 17, 2018, 10:25pm

Possible answer to #3. I think Datomic stores data regarding the point-in-time that the backup occurs, in order to support point-in-time restore: https://docs.datomic.com/on-prem/backup.html#sec-5. So based on that assumption it would make sense that a backup run every hour would have more data than a backup run every week or month. It’s a little tricky to answer questions for ourselves with closed source code haha. We have to rely on Datomic support.

marshall · April 18, 2018, 6:58pm

You’re correct regarding #3. The point-in-time backups will account for more data in the backup location.

There is not currently a specific built-in method for verifying a backup’s integrity. I would recommend restoring a specific point-in-time backup to a secondary storage (dev would be fine) and ensuring that it restores fully.

I would also recommend periodically (perhaps weekly) running a full backup to a secondary empty site (i.e. a separate s3 bucket that isn’t used for incremental backups), as incremental backups to a single location will never “re-copy” segments that are already present in the existing backup location.

Regarding the structure - the roots and values are the internal representations of the Datomic indexes (segments).

This part of Rich’s talk on Datomic describes Datomic’s use of storage in more detail.

leblowl · April 18, 2018, 7:42pm

Awesome,

thanks @marshall!

I will take a look at the video. Do you have any theories on ways to implement an integrity check for a Datomic backup? Do you think simply successfully restoring a backup without errors, is good enough? Could queries on a successfully restored backup ever uncover data corruption that datomic restore-db would not uncover? Thanks again.

marshall · April 19, 2018, 1:37pm

There are various ways of “ensuring” a backup, depending on the degree of assurance you require.

Restoring into a storage and connecting to it to be sure it can be read is a pretty good assurance, but, as you surmise, there are (unlikely) scenarios in which corruption could potentially exist at the datom level.

As with any approach of this sort, ultimately you could implement something O(n) that examined the entire database (i.e. walk the log or the indexes). Whether that is warranted is largely up to you.

leblowl · April 19, 2018, 2:30pm

Ok, thanks for the input

Topic		Replies	Views
Cloud backups & recovery Datomic Cloud	11	3647	April 8, 2020
Data corruption after restoring a database Datomic Pro	7	965	February 25, 2021
Backup to other S3-compatible storages Troubleshooting	4	17	July 28, 2025
How are people handling disaster recovery? Datomic Cloud	2	605	October 15, 2021
How to backup only a small preview of datomic DB? Datomic Pro	1	702	April 17, 2020

Questions about Datomic backup integrity and file structure

Related topics