Monitoring Documentation Improvement

Ninja · October 18, 2019, 11:50am

Hi,

I’m currently writing a custom monitoring solution by the help of the documentation about monitoring and metrics in order to enable Prometheus scraping transactor metrics. Unfortunately the metrics mentioned there doesn’t seem to completely represent what the transactor is currently able to produce / hand over to a registered callback function.

To be a bit more specific on this:

PodUpdateMsec was part of the metrics handed over to my callback function but it’s not part of the documentation
StorageBackoffhas been declared deprecated as of version 0.8.3826 and should have been replaced by StorageGetBackoffMsec and StoragePutBackoffMsec but is still produced by the transactor in the current version (0.9.5966) without the replacements
WriterMemcachedPutMusec, WriterMemcachedPutFailedMusec, ReaderMemcachedPutMusec and ReaderMemcachedPutFailedMusec that have been introduced in version 0.9.5078 are just part of the change log and not the documentation itself

Furthermore I would appreciate if the monitoring documentation linked above would include a snippet of metrics as handed to a registered callback function. This would eliminate the need to fire up a transactor in the first place just to get a grasp on the structural layout of the information. The latter also seems to be more difficult than it should since it’s not always clear at first glance what metric statistics are mapped to (:li :ho :sum :count).

At this point I just wanted to ask if there is a plan on improving the documentation by adding missing information, bundling information spread over multiple sites (changelog included) and make them more accessible by being more specific on what they describe (cross-links are cool, too) and how they are structured within a metrics ‘blob’?

P.S.
Since this issue affects several categories (Datomic Cloud, Datomic On-Prem and Datomic Applications) General seemed to be a good fit. Please feel free to move if desired.

stu · October 28, 2019, 5:08pm

Hi Ninja, and welcome to the Datomic forum. I want to make sure you have all the information you need to write a correct and robust custom monitoring callback.

The most important thing to understand is that the set of metrics is dynamic, open, and subject to change over time. So a correct implementation needs to be tolerant of

metric names it has never seen, and does not understand
metric values in either documented shape (numbers or maps), regardless of what shape it has seen for a metric previously

With that mind, we can turn to your specific questions:

There are and will continue to be undocumented metrics such as PodUpdateMsec. We document them only if they become important in helping users troubleshoot systems.
I see StorageGetBackoffMsec in my custom callback – can you please start a new forum post with a repro on this?
The memcached metrics you mention were high volume and low value, so we removed them in 0.9.5783. Thanks for pointing this out! We will fix the changelog on our next release.

I will update the docs to include the advice from this thread.

Thanks for your suggestions, and please let me know if this gives you the information you need to proceed.

Stu

Topic		Replies	Views
New Datomic Docs Launch! Announcements	5	155	May 27, 2024
Peer cassandra cluster callback Datomic Pro	3	709	December 10, 2018
Peer metrics reporting and the callback System Property Peer API	2	1088	April 26, 2018
Datomic 1.0.6610 now available Announcements	4	1410	January 31, 2023
Datomic 1.0.7010 Pro now available Announcements	0	433	October 10, 2023

Monitoring Documentation Improvement

Related topics