Jetty max threads error when enabling ping /health

Hello, I’m running datomic pro within a Kubernetes cluster and I wanted to set up a liveness probe using the /health HTTP endpoint described here :

java.lang.RuntimeException: Unable to start ping endpoint localhost:9999
at datomic.transactor_ext$start_ping_endpoint.invokeStatic(transactor_ext.clj:56)
at datomic.transactor_ext$start_ping_endpoint.invoke(transactor_ext.clj:39)
at datomic.transactor_ext$start_pro.invokeStatic(transactor_ext.clj:63)
at datomic.transactor_ext$start_pro.invoke(transactor_ext.clj:59)
at clojure.lang.Var.invoke(Var.java:381)
at datomic.transactor$run_STAR_.invokeStatic(transactor.clj:303)
at datomic.transactor$run_STAR_.invoke(transactor.clj:234)
at datomic.transactor$run$fn__26995.invoke(transactor.clj:358)
at clojure.core$binding_conveyor_fn$fn__5476.invoke(core.clj:2022)
at clojure.lang.AFn.call(AFn.java:18)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Insufficient threads: max=3 < needed(acceptors=1 + selectors=2 + request=1)
at org.eclipse.jetty.server.Server.doStart(Server.java:368)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at cognitect.http_endpoint.jetty$start.invokeStatic(jetty.clj:123)
at cognitect.http_endpoint.jetty$start.invoke(jetty.clj:120)
at cognitect.http_endpoint.jetty$server$fn__9606.invoke(jetty.clj:136)
at cognitect.http_endpoint$create_endpoint.invokeStatic(http_endpoint.clj:306)
at cognitect.http_endpoint$create_endpoint.invoke(http_endpoint.clj:231)
at cognitect.nano_impl.server$create.invokeStatic(server.clj:221)
at cognitect.nano_impl.server$create.invoke(server.clj:187)
at cognitect.nano_impl$create.invokeStatic(nano_impl.clj:228)
at cognitect.nano_impl$create.invoke(nano_impl.clj:152)
at datomic.transactor_ext$start_ping_endpoint.invokeStatic(transactor_ext.clj:47)

But I’m getting this exception and I can’t seem to find a workaround (system property, environment variable etc. in order to tune the Jetty thread pool)

I’m running the latest version of datomic-pro 0.9.5703. Any suggestions on how I should proceed? Thanks in advance

Anybody have any insight on this? Is on-prem just not used anymore or is no longer supported?

Hey @nestrada, I’m using datomic-pro on-prem, I’ve actually not had to setup the heartbeat step you’re running.
Any way I could give you a hand?

Hey @Folcon, I also able to run the transactor on-prem without k8s heartbeat configured, but our notification system likes to be able to detect failures so that we can investigate later. I’m simply trying to enable the ping endpoint described here Health Check Endpoint.

Is anyone able to get this working? Thanks for any help

Reading that stacktrace, any way you can increase the threadpool size in jetty?

Caused by: java.lang.IllegalStateException: Insufficient threads: max=3 < needed(acceptors=1 + selectors=2 + request=1)
at org.eclipse.jetty.server.Server.doStart(Server.java:368)

That seems to be the main problem.

I’ve never run into this, are you limiting the pool size in some way?

I’m not doing anything besides running the transactor with the provided scripts and these lines added in my transactor.properties file.

ping-host=localhost
ping-port=9999

Try it out and you’ll see (using latest datomic-pro btw)

Would it be possible for you to provide the output of running:

(.availableProcessors (java.lang.Runtime/getRuntime))

from a REPL in the environment where you’re hitting this issue?

I don’t know of any way to run the transactor from a REPL. However you may be onto something as I’m running the transactor within a docker container running on jdk8.

I’ve been unable to reproduce the issue. I’ve tried on instances with 1 core, up to > 12 cores and in all cases I’m able to start the transactor with the ping-host and ping-port specified in the properties file and then I’m able to curl that endpoint successfully.

    
    Executing : /docker-java-home/bin/java -server -showversion -XshowSettings:vm -XX:+PrintCommandLineFlags -XX:+ExitOnOutOfMemoryError -XX:+UseStringDeduplication -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -Xms4g -Xmx4g -Ddatomic.pingHost=localhost -Ddatomic.pingPort=9999 -classpath resources:lib/*:datomic-transactor-pro-0.9.5783.jar:samples/clj:bin: clojure.main --main datomic.launcher /oscaro/etc/transactor.properties
    -XX:+ExitOnOutOfMemoryError -XX:InitialHeapSize=4294967296 -XX:MaxGCPauseMillis=50 -XX:MaxHeapSize=4294967296 -XX:MaxRAMFraction=1 -XX:+PrintCommandLineFlags -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseStringDeduplication 
    VM settings:
        Min. Heap Size: 4.00G
        Max. Heap Size: 4.00G
        Ergonomics Machine Class: server
        Using VM: OpenJDK 64-Bit Server VM
    
    openjdk version "1.8.0_171"
    OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-1~deb9u1-b11)
    OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
    
    2018-11-14 17:51:33.166 INFO  default    org.eclipse.jetty.util.log - Logging initialized @4709ms
    2018-11-14 17:51:35.147 INFO  default    org.eclipse.jetty.server.Server - jetty-9.3.7.v20160115
    2018-11-14 17:51:35.181 ERROR default    datomic.process - {:message "Critical failure, cannot continue: Error starting transactor", :pid 1, :tid 27}
    java.lang.RuntimeException: Unable to start ping endpoint localhost:9999
      at datomic.transactor_ext$start_ping_endpoint.invokeStatic(transactor_ext.clj:56)
      at datomic.transactor_ext$start_ping_endpoint.invoke(transactor_ext.clj:39)
      at datomic.transactor_ext$start_pro.invokeStatic(transactor_ext.clj:63)
      at datomic.transactor_ext$start_pro.invoke(transactor_ext.clj:59)
      at clojure.lang.Var.invoke(Var.java:381)
      at datomic.transactor$run_STAR_.invokeStatic(transactor.clj:305)
      at datomic.transactor$run_STAR_.invoke(transactor.clj:236)
      at datomic.transactor$run$fn__28715.invoke(transactor.clj:362)
      at clojure.core$binding_conveyor_fn$fn__5476.invoke(core.clj:2022)
      at clojure.lang.AFn.call(AFn.java:18)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.IllegalStateException: Insufficient threads: max=3 < needed(acceptors=1 + selectors=2 + request=1)
      at org.eclipse.jetty.server.Server.doStart(Server.java:368)
      at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
      at cognitect.http_endpoint.jetty$start.invokeStatic(jetty.clj:123)
      at cognitect.http_endpoint.jetty$start.invoke(jetty.clj:120)
      at cognitect.http_endpoint.jetty$server$fn__9606.invoke(jetty.clj:136)
      at cognitect.http_endpoint$create_endpoint.invokeStatic(http_endpoint.clj:306)
      at cognitect.http_endpoint$create_endpoint.invoke(http_endpoint.clj:231)
      at cognitect.nano_impl.server$create.invokeStatic(server.clj:221)
      at cognitect.nano_impl.server$create.invoke(server.clj:187)
      at cognitect.nano_impl$create.invokeStatic(nano_impl.clj:228)
      at cognitect.nano_impl$create.invoke(nano_impl.clj:152)
      at datomic.transactor_ext$start_ping_endpoint.invokeStatic(transactor_ext.clj:47)
      ... 13 common frames omitted

Here are my command line arguments for running within docker (and kubernetes)

I was able to run the transactor with your same JVM args and use the ping endpoint successfully:

$ java -server -showversion -XshowSettings:vm -XX:+PrintCommandLineFlags -XX:+ExitOnOutOfMemoryError -XX:+UseStringDeduplication -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -Xms4g -Xmx4g -Ddatomic.pingHost=localhost -Ddatomic.pingPort=9999 -classpath resources:lib/*:datomic-transactor-pro-0.9.5773.jar:samples/clj:bin: clojure.main --main datomic.launcher ../config/dev.properties
Java HotSpot(TM) 64-Bit Server VM warning: Unable to open cgroup memory limit file /sys/fs/cgroup/memory/memory.limit_in_bytes (No such file or directory)
-XX:+ExitOnOutOfMemoryError -XX:InitialHeapSize=4294967296 -XX:MaxGCPauseMillis=50 -XX:MaxHeapSize=4294967296 -XX:MaxRAMFraction=1 -XX:+PrintCommandLineFlags -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseStringDeduplication
VM settings:
    Min. Heap Size: 4.00G
    Max. Heap Size: 4.00G
    Ergonomics Machine Class: server
    Using VM: Java HotSpot(TM) 64-Bit Server VM

java version "1.8.0_152"
Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

Starting datomic:dev://localhost:4334/<DB-NAME>, storing data in: data ...
System started datomic:dev://localhost:4334/<DB-NAME>, storing data in: data

Ping Endpoint:

$ curl localhost:9999/health
ok

I will continue to experiment on additional VMs, but I would suspect your docker environment.

fwiw, I’m having the same issue when trying to enable the ping healthcheck when running in an openshift pod.

@marshall It definitely seems to be container-related. We just ran into this after attempting an upgrade to 0.9.5786. Some more info, collected on a 6-core (12 virtual) 2018 macbook pro:

In the datomic REPL running on the machine (java 8u181):

(.availableProcessors (java.lang.Runtime/getRuntime)) ;; => 12

In the datomic REPL in the container (java 8u181, Docker Desktop 2.0.0.0-mac81 29211):

(.availableProcessors (java.lang.Runtime/getRuntime)) ;; => 6

In that same docker environment, only when run with ping host/port set:

java.lang.IllegalStateException: Insufficient threads: max=4 < needed(acceptors=1 + selectors=3 + request=1)

Without the ping stuff set, works as expected. Also, if I raise docker’s CPU resource limit (in the UI from 6 (the default) to 12, it works with the ping setup. I’ll play with it some more and post back any useful info I find.

Hi Milt,

Thanks for the report.
We’re continuing to investigate this issue but haven’t yet found a smoking gun.

Lol, restricting the CPUs (using docker’s --cpus arg) to 3 or less also seems to resolve the problem (when the global docker setting is at 6, the default)

Hi Milt,

Another user found that having CPU set to 1, 3, and 8 worked. I’d be curious if 8 works in your case.

Thanks,
Jaret

I can confirm, 8 works. So far for me, that’s 1,2,3 and 8+ that work.

I’m hoping that a solution can be found without resorting to passing docker run parameters as I’m running a transactor from Kubernetes and can specify cpu limits but not using docker. Perhaps a system property or environment variable of some sort?

So I just hit this same issue locally. Has anyone had any luck with deployments out to ECS, etc? I changed my allocated cpu to docker desktop to 3 and presto datomic launched with a health endpoint.

@cqowsy We are releasing the ability to make the thread calculation configurable along with documentation on how to do so in the next release to resolve this issue. We are working on some other features for that release and so I do not have a time table for when you’ll see it. In the interim, I can share with you configurations (CPU set to 1,3,8) that worked when you have control over the number of CPUS. I know with Docker or Openshift etc you may not have as much control over the max number of threads.

The following configs worked with health endpoint worked:
openjdk 8, 1 cpu
openjdk 8, 3 cpu
oracle 8, 3 cpu
openjdk 8, 8 cpu
oracle 8, 8 cpu

The following configs failed:

openjdk 8, 2 cpu
oracle 8, 2 cpu
openjdk 8, 4 cpu
oracle 8, 4 cpu
openjdk 8, 5 cpu
oracle 8, 5 cpu
openjdk 8, 6 cpu
oracle 8, 6 cpu
openjdk 8, 7 cpu
oracle 8, 7 cpu

I’ll be sure to update this thread with a link to the documentation and the new feature once released.

1 Like