I’m really struggling to get a batch import job to work reliably with Datomic Cloud.
I’m running it from a an EC2 instance in the same VPC as my Cloud stack (prod, i3.xlarge), with a client arg-map looking like so:
{:server-type :cloud
:region "eu-central-1"
:system "linnaeus"
:endpoint "http://entry.linnaeus.eu-central-1.datomic.net:8182/"
:timeout (* 1000 60 5)}
Connecting to the db (via d/connect
) fails about 3 out of 4 times with a Datomic Client Timeout
error, and even when that works d/transact
fails after a few Transactions with a Service Unavailable
error:
#error{:cause "Service Unavailable",
:data {:cognitect.anomalies/category :cognitect.anomalies/unavailable,
:cognitect.anomalies/message "Service Unavailable",
:http-result {:status 503,
:headers {"cache-control" "must-revalidate,no-cache,no-store",
"content-length" "331",
"server" "Jetty(9.3.7.v20160115)",
"date" "Thu, 20 Dec 2018 12:25:05 GMT",
"content-type" "text/html;charset=ISO-8859-1"},
:body "<html>
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html;charset=ISO-8859-1\"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /api. Reason:
<pre> Async servlet timeout</pre></p>
<hr /><a href=\"http://eclipse.org/jetty\">Powered by Jetty:// 9.3.7.v20160115</a><hr/>
</body>
</html>
"}},
:via [{:type clojure.lang.ExceptionInfo,
:message "Service Unavailable",
:data {:cognitect.anomalies/category :cognitect.anomalies/unavailable,
:cognitect.anomalies/message "Service Unavailable",
:http-result {:status 503,
:headers {"cache-control" "must-revalidate,no-cache,no-store",
"content-length" "331",
"server" "Jetty(9.3.7.v20160115)",
"date" "Thu, 20 Dec 2018 12:25:05 GMT",
"content-type" "text/html;charset=ISO-8859-1"},
:body "<html>
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html;charset=ISO-8859-1\"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /api. Reason:
<pre> Async servlet timeout</pre></p>
<hr /><a href=\"http://eclipse.org/jetty\">Powered by Jetty:// 9.3.7.v20160115</a><hr/>
</body>
</html>
"}},
:at [datomic.client.api.async$ares invokeStatic "async.clj" 56]}],
:trace [[datomic.client.api.async$ares invokeStatic "async.clj" 56]
[datomic.client.api.async$ares invoke "async.clj" 52]
[datomic.client.api.sync$eval21871$fn__21876 invoke "sync.clj" 83]
[datomic.client.api.protocols$eval17089$fn__17125$G__17074__17132 invoke "protocols.clj" 58]
[datomic.client.api$transact invokeStatic "api.clj" 172]
[datomic.client.api$transact invoke "api.clj" 155]
[linnaeus.lab.crossref.datomic_import$transact_articles_from_chan_BANG_$fn__19902$fn__19905$f__15910__auto____19906
invoke
"datomic_import.clj"
106]
[clojure.lang.AFn run "AFn.java" 22]
[io.aleph.dirigiste.Executor$3 run "Executor.java" 318]
[io.aleph.dirigiste.Executor$Worker$1 run "Executor.java" 62]
[manifold.executor$thread_factory$reify__15792$f__15793 invoke "executor.clj" 44]
[clojure.lang.AFn run "AFn.java" 22]
[java.lang.Thread run "Thread.java" 748]]}
Exponential backoff doesn’t help, even after several minutes. The EC2 nodes of the Datomic stack exhibit a near-zero utilization in terms of CPU, memory and network, while the Cloudwatch metrics show a constant HttpEndpointAsyncTimeout
of 1.0.
What’s driving me crazy is that these failures seem so random. Everything works fine for a dozen txes, and then after virtually no load I get 100% failure.
What might be causing this?