Backup to other S3-compatible storages

maxweber · July 23, 2025, 10:16am

At the moment you are forced to use AWS S3 or the file system for Datomic backups. The latest Datomic Pro version uses the aws-s3-1.x SDK where it is not possible to change the S3_ENDPOINT via an env var. This option has been introduced in AWS SDK for Java v2 2.28.1.

Any chance to get a Datomic-specific env var or Java property to modify the S3_ENDPOINT? This would allow to do backups to Google Cloud Storage and many other S3-compatible storages.

jaret · July 23, 2025, 1:46pm

@maxweber I think I follow, and recall that you have requested this with me in past conversation. We have a current project to move to AWS SDK v2 and I am glad you brought this up because I want to ensure we exercise this option when we update to v2.

~~I am curious if you have tried to utilize the file path with a google Cloud storage location and if that works.~~

I just realized there is no way that could work. I will discuss with dev.

maxweber · July 25, 2025, 1:15pm

@jaret thanks a lot for discussing this with the dev team. Having this would be awesome and we could get rid of our AWS account (which we just have for the backups).

I tried it with GitHub - GoogleCloudPlatform/gcsfuse: A user-space file system for interacting with Google Cloud Storage in the past, but if I remember correctly the Datomic backup process then does not make use of the incremental nature of the segment UUIDs, meaning in the background gcsfuse had to list all files / objects on the corresponding Google Cloud Storage bucket which took very long.

jaret · July 25, 2025, 4:10pm

@maxweber Incremental backup should work if the file is the same? Have you tried it?

Additionally, I believe its worth re-testing as backup code has changed over the years (reducing the number of operations needed see releases like Datomic Pro Change Log | Datomic) and its possible that whatever friction you are recalling we have actually resolved. Additionally we have a first class feature for verifying backups now called verify-backup which allows you to read all files and verify integrity of a backup.

How frequently are you backing up?
Do you have any other problems backing up like speed?

If you want to move to GCP you could just take a full backup and point it at GCP using file backup. If it goes too slow we can help with tuning to possibly make it go faster.

maxweber · July 28, 2025, 8:15am

@jaret thanks a lot for the follow up.

Tried it multiple times. I guess the problem is that the Datomic backup does a .listFiles call on java.io.File (or something similar). That’s totally fine for a normal file system, but on gcsfuse the operation will cause thousands of list calls to the Google Cloud Storage API, since gcsfuse does not cache the entire directory tree.

On our old system (Storrito 1.0) every 4 hours. On our new system (Storrito 2.0) we plan to backup way more often (there we also have one logical db per customer).

I gave up to find a solution to use gcsfuse or something similar. Adding an option to set the S3 endpoint would probably solve all the associated troubles. ChatGPT gave me that example code for the AWS S3 SDK 1.x lib:

import com.amazonaws.ClientConfiguration;
import com.amazonaws.Protocol;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration;

public class GcsS3Client {

    public static void main(String[] args) {

        // Example credentials: GCS does not use them in the same way, but SDK requires them.
        BasicAWSCredentials credentials = new BasicAWSCredentials("GCS_ACCESS_KEY", "GCS_SECRET_KEY");

        // GCS S3-compatible endpoint
        String gcsEndpoint = "https://storage.googleapis.com";

        // Region doesn't matter for GCS, but is required by SDK
        String region = "auto";  // Can be anything, commonly "auto" or "us-east1"

        AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
            .withEndpointConfiguration(new EndpointConfiguration(gcsEndpoint, region))
            .withCredentials(new AWSStaticCredentialsProvider(credentials))
            .withPathStyleAccessEnabled(true) // GCS requires path-style access
            .build();

        // Example usage
        s3Client.listBuckets().forEach(bucket -> System.out.println(bucket.getName()));
    }
}

Configuring the endpoint, region and maybe PathStyleAccessEnabled via env vars or Java system properties could probably make Datomic backup compatible with most storage services with an S3 compatible API. I guess upgrading to the AWS 2.x SDKs is a lot of work and would delay a solution for that challenge quite a while?

Topic		Replies	Views
Cloud backups & recovery Datomic Cloud	11	3659	April 8, 2020
Datomic Cloud on sa-east-1 Datomic Cloud	2	552	October 15, 2019
Questions about Datomic backup integrity and file structure General	5	1471	April 19, 2018
Migration to cloud Datomic Cloud	5	1014	July 11, 2018
Stuck with connecting to Datomic Ions during "Getting Started" Datomic Cloud	8	1676	January 14, 2019

Backup to other S3-compatible storages

Related topics