REPL on the Compute notes?

danie · February 13, 2019, 10:33am

In debugging my ION Lambda I would really like to connect to a REPL on the compute node.

I see something is running on port 5555, and nc/telnet doesn’t give me a REPL prompt.

Is it possible to have a REPL into the running code?

stu · February 19, 2019, 3:05pm

Hi Danie,

Datomic does not start a server REPL, and nodes run in a security group and private VPC that is locked down.

We have worked to eliminate the need to connect to individual boxes, as it is both error-prone and a security risk.

When I am developing an ion, I sort out all the minor gotchas at a local REPL before ever deploying, and from there have found ion events sufficient for tracking down e.g. configuration errors. What sorts of problems are you encountering?

Cheers,
Stu

danie · February 20, 2019, 10:48am

Hi Stu,

I appreciate the locked down nature of the nodes, and the need for that. And that individual boxes will be inconsistent if the state is changed via a REPL.

In this case, I wanted to see the state of a Kafka producer, and send a few messages to trace what was happening: It did not seem to take in the properties that I gave it.

The cast events added to my confusion, because bootstrap.servers were correctly printed:

{
    "Msg": "Instantiating producer",
    "xxxProducerProps": {
        "bootstrap.servers": "ip-10-211-11-111.us-east-2.compute.internal:9092",
    },
    "Type": "Event",
    "Tid": 56,
    "Timestamp": 1550055900480
}

But the Producer error message still said localhost:

{
    "Msg": "[Producer clientId=clientid] Connection to node 0 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.",
    "DatomicCloudSlf4jLevel": "Warn",
    "DatomicCloudSlf4jSymbol": "org.apache.kafka.clients.NetworkClient",
    "DatomicCloudSlf4jSource": "SLF4J",
    "Type": "Event",
    "Tid": 57,
    "Timestamp": 1550055922977
}

At this point, I would have liked to poke around in a REPL.

I did figure out after a while that my Kafka broker had this set: advertised.host.name=localhost. And the producer did actually connect to the broker, and from the broker going back it went weird, with the quite misleading error message.

This was not fun to track down.

eneroth · March 4, 2019, 1:38pm

We had an issue this morning, same symptoms as described over in…

… except this time it was persistent. I solved it by terminating the node. It returned 500 on requests, but for some reason the NLB was perfectly fine with this and didn’t terminate it automatically.

We’re now looking at more liberal sprinkling of casts throughout the code to catch it next time when/if it happens, but this seems a bit like black box engineering. It would have been nice to have a more direct way to inspect the state of the machine.

Topic		Replies	Views
Help troubleshooting: creating a database with a PostgreSQL connection Troubleshooting	2	2457	May 3, 2019
Exception Unable to find data source General	5	2025	June 14, 2019
New Cloud Client Release - 0.8.78 Announcements	4	2617	June 18, 2019
Ions on production topology deploy Datomic Cloud	1	916	May 28, 2019
Experience report updating from Solo to Datomic Cloud 884-9095 Datomic Cloud	5	1280	August 3, 2021

REPL on the Compute notes?

Related topics