Hello, we’ve faced with the following issue in our Datomic environment:
For our app we use PostgreSql 9.6 as an external storage for Datomic database. We use Datomic-Pro v. 0.9.5561.54. Besides of that we have the REST-api client in order to communicate with Datomic from non-JVM based applications.
Our environment is beind deployed in Kubernets cluster, so each Datomic component is being run inside docker container (k8s pod).
A few days ago we executed a read-query which supposed to return a large set of records. We made the query from .NET environment to REST api cient.
However, possibly because of non-optimized datomic query or wrong memory-configurations for PostgreSql\Kubectl pod we encountered with “Out of Memory” exception in postgres container pod and possibly the process was killed by Linux OOM-killer.
After we had tried to restart the pg storage container, we figure out that there was a mess in postgres transaction log (WAL), so pg server was reporting about “panic” and couldn’t be started (https://stackoverflow.com/questions/8799474/postgresql-error-panic-could-not-locate-a-valid-checkpoint-record).
We had to proceed with
pg_resetxlog command (without -f) to clear log from partially applied transactions.
After we had started the postgres storage server, we wasn’t able to connect to it from the transactor. I attached an image with transactor logs that we had after we had started the storage. It seems something has been corrupted during the crash.
We tried several ways to recover our data:
- We did
pg_dumpbackup for corrupted storage, - when we restore it to another db, transactor connects but seems like there are no data (datums) in datomic db, just databases names were remained.
- We tried to redeploy transactor and other datomic nodes several times in different order
- We tried to take Datomic-native backup via
backup-dbtool but it fails with
So the questions are:
- Is is there any way to recover datomic external storage with the data that we had before the crash?
- Why Read-Only query to REST api client writes something to postgres transaction log? I understand that wrong memory settings on kubectl\postgres is our fault, but I’m kinda worry that 1 read query can kill database.
- Could you suggest something to prevent such situations in the future ? I know there are recommended setting for transactor in official docs, maybe you also have something related for external storages?