|
i really enjoyed when honeycomb did a demo with their tool on the real world example of a service outage on their platform to discover that one of their servers ran out of disk space, something that a loving nagios check would have picked up 20 years ago
|
# ¿ Feb 23, 2019 20:27 |
|
|
# ¿ May 14, 2024 20:44 |
|
prometheus is pretty cool but shoehorning everything into a pull model annoys me
|
# ¿ Feb 23, 2019 20:29 |
|
r u ready to WALK posted:https://www.cacti.net is extremely underrated for generic snmp poo poo it’s ok if what you need to do is turn snmp (or snmplikes) into browser-viewable rrds on one host also make sure you stay on top of the security updates or keep it inaccessible from the web
|
# ¿ Feb 23, 2019 20:42 |
|
tracing seems to be something you write into your web app anybody doing non-http tracing or collecting trace data from apps you don’t write yourself and doesn’t have native tracing support?
|
# ¿ Feb 23, 2019 20:56 |
|
uncurable mlady posted:i wouldn't say it's just in your web app - our entire application is instrumented from webapp all the way down to the db. that said, it's kind of a massive pain in the rear end right now to get trace data from resources you don't directly manage because not everyone uses opentracing and even if they did, wire formats are very tracer dependent. that said, w3c is working on a tracecontext/tracedata specification that's intended to address this problem by standardizing headers and wire formats for context so you could have a situation where you're using some sort of managed ingress proxy or w/e and it'd be able to create spans as part of a trace that started on a client, etc. could also see the same thing at a managed db where the database service on the provider side is able to pick up traces incoming from the application and emit spans that you'd collect. not really tracing today. I have event data in ES that looks like spans, I think? (request A started routine P on node N, and another event when it completes), and time series data from from node N and resource R, S, T that are slightly to tightly correlated (R can tag all requests from P, S can only show traffic from N, and T can only show high-level perf indicators.) What I want is something to take this structured Elastic data, look at what resources are directly or indirectly used by that request, and show relevant TS data from Prometheus and log data from ES. If T crashes I want to be able to look at what requests are active in the system. Given 5 crashes, I want to bisect that down to see that requests like A were the only common requests in all five crashes; I'd also like to see that A are taking longer than normal because resource T is reporting high utilization, etc.
|
# ¿ Feb 24, 2019 17:20 |
|
|
# ¿ May 14, 2024 20:44 |
|
my stepdads beer posted:my prometheus keeps corrupting its data because it's on an nfs share. that's fine because i only use it for some pretty graphs sometimes. prom really suffers from not having good examples for what I think are common scenarios. don’t use nfs for databases
|
# ¿ Mar 2, 2019 10:30 |