Ions Silent Timeout Failure

So my ions deployment has been silently failing for a while, just responding with {"message": "Endpoint request timed out"} . Looking at the logs there’s no info that anything is wrong, no alerts are being generated, it just appears to be silently timing out.

I turned on x-ray tracing errors, which just shows the request being sent to the lambda app and then timing out after 1 minute.

The thing that’s puzzling is that it should be just calling a ring get request, so I’m not sure why it’s timing out? All throughout development it was working fine, I released a version of it which my users were happy with. And have been subsequently working on other things for a bit, before I get some time to go back to it again.
So 3 questions:

  1. Any idea how I can debug what’s wrong?
  2. At the very least to get alerts etc if it just fails again? I really don’t want this to happen again…
  3. Do I have a wrong idea about the development model of ions? Do they need actively require a person to maintain them? This is sort of surprising for me, as that does essentially discount them as an option to deliver solutions for anyone but a dev shop.

Is this a solo or production system?
The first thing I would attempt is restarting your compute node(s) that are running the ion.

Have you examined your Datomic Cloudwatch dashboard and logs? Is the system up and reporting logs/metrics?