Solo node became unresponsive and required reboot


Last week I couldn’t connect to datomic through the bastion and the lambdas timed out. The cloud watch had no new events, and the instance system log had no strangeness as far as I could tell. The EC2 health checks all indicated that everything was ok.

I did a reboot on the instance and had to wait 4 minutes which likely meant that EC2 had to do a forced reboot. When it came up again everything was working as normal.

Have you seen similar issues? Is there a good way to handle it? Can I monitor the Datomic instance in a way that catches errors like this?