Rollback after failed deploy

In the Ions reference [1] under “Deploy” it’s stated

When you deploy, Datomic will use an AWS Step Machine to … Automatically roll back to the previous deployment if the application does not deploy correctly (e.g. if loading a namespace throws an exception.)

This is not what I’m seeing. If ever I get :deploy-status "FAILED", my app is down.

Is there something I need to do to enable automatic rollbacks?

I’m on the solo topology.

Thanks

[1] Ions Reference | Datomic

FWIW–the auto-rollback works for me. I didn’t have to do anything to set it up. Also on solo topology. I’d check the CodeDeploy logs if you haven’t already; there might be a clue as to why the auto-rollback isn’t working.

Thanks Jacob,

I also just discovered the CodeDeploy logs and saw that it is attempting the rollback but failing.

It occurred to me that it could be because I’m typically deploying using a uname rather than a rev. That would make sense, except for the fact that it’s not just committed files that are deployed. e.g. the gitigorned stuff in js/out is all deployed, so a rollback entirely based on git wouldn’t be possible.

hm… I think I’ve had uname deploys roll back before (most of my deploys have used uname). Although a rollback based on git alone wouldn’t be possible, all the deployed files are kept around in S3 I believe.

I’ve had an issue previously where all my deploys would fail because of an out-of-memory error, and the roll-back would also fail (so I was stuck in down-time). I eventually found that I could get deploys to work again simply by terminating the compute instance in the ec2 console, waiting for a new instance to be auto-started, and then re-deploying right away. For a while I had to do that every time, but it mysteriously stopped being an issue a week or so ago.

Good to know you have seen successful rollbacks to uname deploys. That will save me some time troubleshooting.

I also had the all-deploys-failing thing, and came to the same solution, but it’s far from ideal as it results in down time with every deploy. It would be great to find out how to avoid this (hence I’m asking about EC2 memory in another thread).

1 Like

I ran into a similar issue - deployment failed but did not rollback and could not connect to Datomic system. Wrote about issue here: Bastion refuses connection after failed deployment

Thanks for reporting this as well @tslocke helps to give clarity as to what may have occurred