Thanks for the rationale, I will need to delve more into tuning Peer Server. I have some followup questions along these lines, let me know if a new topic or support case would be a better venue for discussion. Or maybe, given the “preview” status, it is too soon for official answers?
For Peer Server, how should we think about sizing for this process? The launch script defaults to 1GB, but that doesn’t seem adequate for the tests I’ve run. I don’t really know what sort of queries the Presto connector is issuing, so it is difficult to reason about.
For analytics server, you’ve bundled what looks like a stripped down and slightly old Presto distribution with some bits added for the datomic connector. My attempts to take the datomic bits (along with the requisite config files) and drop them into a newer presto release didn’t work. Is the use of presto considered an implementation detail where only the bundled version is supported, or will analytics support eventually be shipped as a full fledged presto connector which we could deploy on our own presto install, or even something like a Starburst Presto? The current status is somewhat confusing given that the binaries are all bundled, but then the documentation sometimes references presto docs directly.
Is it possible and/or recommended to run analytics on a multi-node presto cluster? If so, should we also run a dedicated Peer Server for each job node, e.g. to improve cache coherence in Peer Server?
One of the preview status caveats is that certain performance optimizations haven’t been made yet. What are the practical limits of this? Even in some fairly simple tests I am seeing really huge differences between a query issued through SQL and the equivalent datalog issued directly using the Peer API (like, tens of milliseconds for datalog and hundreds of seconds for SQL). I’m still playing with tuning on the Peer Server and presto, so not sure there is even an issue here, but even with
EXPLAIN queries it is difficult to figure out where things are going wrong. Any tips on getting some execution transparency to help troubleshoot performance issues?