So, after pushing then running the deploy-status command for about 2 minutes, I got the dreaded {:deploy-status "FAILED", :code-deploy-status "FAILED"}.
Where can I go from there? I’d look in Cloudwatch logs, but there are more than 30 log groups created by the stack, so it’s a bit of a needle in a haystack.
I went to see in CodeDeploy, and all I found there was
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems
as well as the event:
Error code: ScriptFailed
Script name: scripts/deploy-validate
Message: Script at specified location: scripts/deploy-validate run as user datomic failed with exit code 1
Logs:
[stdout]Received 503
[stdout]Received 503
// [... a few dozens of the same elided]
[stdout]Received 503
[stdout]WARN: validation did not succeed after two minutes
This does not seem to be related to the application code, as deploying a revision that previously worked fails similarly.
Generally speaking, a checklist for troubleshooting Ions deployment (or more detailed error messages) would be appreciated.
I believe this answer is out of date. My guess is that the naming of internal groups has been changed and that the deployment logs no longer go in datomic-<your-system-name>. I see groups with CreateCodeDeploy, CreateLambdaFromArray, GetDeploymentStatus, and others in the name. I imagine one of those will have it.
I see these in “CloudWatch -> Logs -> Log groups”.
If anyone knows more precisely where this information is, I’d appreciate the help because I cannot find it either.
I actually found an error message in “CodeDeploy -> Deployments” and the deployment that was in progress. And it was a strange one, probably good for starting another topic. AWS displayed the error right at the top of the deployment information page “memory allocation error” for the rm command! I reset the instances in the deployment autoscaling group and everything works fine.
Forgot to mention that this “memory allocation error” only started happening after we updated our Datomic Cloud stacks to the latest (free) version. ← @jaret
@cch1 , I’m not sure what you mean by “free”. Datomic changed it’s license in April 2023 such that all Datomic versions are free. The AWS resources however, have never been free for any version of Datomic.
I never figured out what caused the memory allocation problem. I have not seen it since and have deployed many times. Perhaps something was fixed or I upgraded to a more recent version. I’m confident I am not running the latest version at the time of this writing.
Things have been working for me for some time now (including new deployments) without a problem.
Hi Jaret,
We upgraded our instance size from the smallest to the next-to-smallest (t3.medium) and that seems to have resolved the issue. Prior to the upgrade we were running our development environment and our production environment on the same sized instances and only having the problem in production. Worth noting that that we have no users live yet so the load was minimal in both cases. The one notable difference: I had detailed metrics enabled in prod but not in dev.