Initial Prometheus work #169
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "prometheus"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I am sick of all the hacky Python scrips that I am using to gather data and send it to Prometheus / Grafana, so I thought may as well integrate it into the core codebase so anyone else that does something similar can have at it!
Let me know your thoughts - this is a very crude implementation as it calls the APIs internally, to save on duplicating cache collection code.
I'd prefer to have the cache update the data any time it changes - but the cache isn't reliably used and the API request entry points were the ones that made the decision to get from the cache or call the API in the background.
Happy to rework if required, let me know thoughts on this and what else is needed to merge it in, if it's something you'd consider :)
The exported metrics have the following shape, picking on AS10084 (IAA Content) on NSW-IX Route Server 1.
I'll need to review this a bit more in depth - but this is looking good. :)
Thank you!
Thanks @annikahannig - happy to keep workshopping this one.
Some logical changes that I can see are moving the cache logic away from the API endpoints, so metrics can be processed there, and then serving the cache from the endpoints
It looks like there are also a few conflicts with the dev branch, let me merge that in locally - i'll revert one of the commits and clean it up a little, IntelliJ decided it would be a nice idea to reformat that markdown in the README..
I think this is looking good.
I've read a bit in the prometheus documentation and they state in their guide that metrics should have suffixes with clarification on the unit and type...
https://prometheus.io/docs/practices/naming/
So in this case here if this would be:
peer_uptime_seconds_total
and maybe changing peer state topeer_info
and having the state as a label.Rest of the metrics and labels look good I think.
I'll merge this but I'll likely change the config flag to just
enable_metrics
.Also I think I'm going to rewrite this a bit using a custom metrics collector.