AMQ219010: Connection is destroyed issue

deivydasofc · March 8, 2022, 8:07am

Datomic version: 1.0.6362

After some time of working, clojure service is getting issues with failed connection to transactor:

org.apache.activemq.artemis.api.core.ActiveMQNotConnectedException: AMQ219006: Channel disconnected
	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:374)
	at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:1228)
	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
...

org.apache.activemq.artemis.api.core.ActiveMQObjectClosedException: AMQ219017: Consumer is closed
	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.checkClosed(ClientConsumerImpl.java:971)
	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:204)
...
org.apache.activemq.artemis.api.core.ActiveMQNotConnectedException: AMQ219010: Connection is destroyed
	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:460)
	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
	at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQClientProtocolManager.createSessionContext(ActiveMQClientProtocolManager.java:300)
....


clojure.lang.ExceptionInfo: Error communicating with HOST 10.0.1.78 on PORT 4334
	at datomic.connector$endpoint_error.invokeStatic(connector.clj:53)
	at datomic.connector$endpoint_error.invoke(connector.clj:50)
	at datomic.connector.TransactorHornetConnector$fn__9390.invoke(connector.clj:224)
	at datomic.connector.TransactorHornetConnector.admin_request_STAR_(connector.clj:212)
	at datomic.peer.Connection$fn__9646.invoke(peer.clj:219)
...

Connection is destroyed and Error communicating repeats till service is dead

We cannot find any particular reason for that.
One of the possible clues on this is GC allocation issue:

[Full GC (Allocation Failure)  57144M->33793M(57344M), 35.1363911 secs]
   [Eden: 0.0B(2864.0M)->0.0B(6800.0M) Survivors: 0.0B->0.0B Heap: 57144.5M(57344.0M)->33793.5M(57344.0M)], [Metaspace: 155619K->155616K(159744K)]
 [Times: user=50.45 sys=0.05, real=35.14 secs]

Maybe GC is the case here? Or there could be other possible issues out there?

jaret · March 8, 2022, 3:54pm

@deivydasofc Did you update your version of Java recently? What version are you running? are you seeing this error on the peer or transactor?

Also, if this is transactor are you using the recommended GC flags? see: Transactor | Datomic

deivydasofc · March 9, 2022, 8:45am

No, Java was not changed recently.
Currently Java version is good old 1.8.
Seeing this on peer.

Flags are:

-Xms28g 
-Xmx28g 
-XX:ActiveProcessorCount=4 
-Ddatomic.txTimeoutMsec=120000 
-Ddatomic.peerConnectionTTLMsec=20000 
-Ddatomic.objectCacheMax=20480m 
-XX:+UseG1GC 
-XX:+PrintGCDetails -verbose:gc 
-XX:+PrintGCDateStamps 
-Dcom.sun.management.jmxremote.port=9000 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false

I am seeing that -XX:MaxGCPauseMillis=50 was not set on my side. Could that have any potential improvement?

jaret · March 9, 2022, 12:40pm

A prolonged GC pause on the transactor could lead to a peer receiving timeouts or exceptions. I would set the MaxPauseGCMillis as recommended. If you encounter this error again after setting that, I’d like to see the peer logs and transactor logs at the time of the event. I should be able to see the metrics for GC pause/query timeout or refusal by looking at both logs along with other context clues.

If you’d like me to review the logs in this instance I recommend e-mailing support@cognitect.com to open a case with us and then you can attach or link to the logs from that case.

deivydasofc · March 14, 2022, 8:31am

GC pause was on the peer, not on transactor.
Java props, which I sent you earlier, are of the peer too (my bad).
Transactor contains jvm settings -Xms200g -Xmx200g -XX:+UseG1GC -XX:MaxGCPauseMillis=50.

jaret · March 14, 2022, 12:03pm

200g heap is a large heap for a transactor. I’d be interested in hearing more about your system. The trade-off of providing a larger heap to peer or transactor is increased GC overhead. I think if you have the logs I should look at both peer and transactor logs during this event. Could you supply me your transactor (include both active and standby) and peer logs for this event via our support portal? You can e-mail support@cognitect.com or via the website here https://support.cognitect.com/hc/en-us/requests/new

Do you always run your peer with -XX:+PrintGCDetails -verbose:gc and -XX:+PrintGCDateStamps or did you add that for this case? I’d recommend not having that set unless you are specifically troubleshooting a GC issue.

Topic		Replies	Views
Trouble connecting to Heroku Transactor Troubleshooting	9	2733	January 11, 2018
Datomic Pro not working on macOs Big Sur(11.0.1) Troubleshooting	1	997	November 17, 2020
Datomic 1.0.6344 now available Announcements	0	1139	September 2, 2021
Java 11.0.1 SSL Exception Datomic Pro	2	2625	December 17, 2018
Unable to connect to remote datomic (mysql) Troubleshooting	2	3901	February 12, 2018

AMQ219010: Connection is destroyed issue

Related topics