Solo node became unresponsive and required reboot

Last week I couldn’t connect to datomic through the bastion and the lambdas timed out. The cloud watch had no new events, and the instance system log had no strangeness as far as I could tell. The EC2 health checks all indicated that everything was ok.

I did a reboot on the instance and had to wait 4 minutes which likely meant that EC2 had to do a forced reboot. When it came up again everything was working as normal.

Have you seen similar issues? Is there a good way to handle it? Can I monitor the Datomic instance in a way that catches errors like this?

Max,

We have seen a couple of issues that could explain this behavior. They have been corrected in the latest release of Datomic Cloud: https://docs.datomic.com/cloud/releases.html#477-8741

If you upgrade to the latest version and see this behavior again, please let us know and we can look into the logs/system details to determine what might be the underlying cause.

-Marshall

Today this happened again. The log became completely silent Friday night and I had to reboot to solo node be able to connect again. We’re just about to upgrade to the production topology. I would be happy if you could have a look. What do you need?

Max,

You can file a support ticket at support.cognitect.com and we can help investigate.