I’m deploying an ion and it is failing. Where is the first place to start looking for the smoking gun? Cloudwatch, step functions, code deploy, or other?
Hi @jarrod, What is the status of your deploy when you’re seeing this? Are you running solo/production and what CFT version? Generally, I recommend that you start by reviewing CloudWatch (for alerts and messages), but this looks symptomatic of a timeout in loading your deps or your app at the “validateService” step. I’d be curious what changed between deployments that worked and this one.
Thanks for the references @jaret, I have looked over those and found them helpful, though not for my particular issue. I did find that it was a dependency that was causing the issue - I was hoping for a more explicit log or error that would indicate which dependency.
I am not quite sure where to find the CFT version. I am running a solo topology.
As far as identifying dependency issues - when you run the Ion push operation you should see an output like:
{:rev "8baf1c47e0bb62faf68c76cf7fefa05635f2ed01",
:uname "mt-ion-test",
:deploy-groups (mt-test-solo),
:dependency-conflicts
{:deps
{commons-codec/commons-codec #:mvn{:version "1.10"},
com.cognitect/http-client #:mvn{:version "0.1.80"},
org.slf4j/slf4j-api #:mvn{:version "1.7.14"},
org.clojure/core.async #:mvn{:version "0.3.442"}},
:doc
"The :push operation overrode these dependencies to match versions already running in Datomic Cloud. To test locally, add these explicit deps to your deps.edn."},
The first step I would take would be to test your Ion locally with any reported deps from that response explicitly included in your local deps.edn file.
I have a similar problem, except the error code is ScriptTimedOut. The problem began when I changed my compute stack from solo to production. I’ve tried all day long and it keeps timing out at the ValidateService step. The logs just show “[stdout]Received 000” which doesn’t mean much to me.
Update: it actually happens now when I use the solo compute stack. So this may be related to the updated templates and/or updated ion dependencies, because other than that no code has been changed.
I narrowed it down to one problematic dependency: leiningen. I was using leiningen as a library, and for some reason it didn’t like that. Luckily I was using it for a pretty narrow purpose so I was able to remove it entirely, and now the deploys work for me. I basically had to use the process of elimination to figure it out. Definitely would have been nice to see the cause the of timeout, but I guess AWS doesn’t provide that.