Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

i really enjoyed when honeycomb did a demo with their tool on the real world example of a service outage on their platform to discover that one of their servers ran out of disk space, something that a loving nagios check would have picked up 20 years ago

Adbot
ADBOT LOVES YOU

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

prometheus is pretty cool but shoehorning everything into a pull model annoys me

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

r u ready to WALK posted:

https://www.cacti.net is extremely underrated for generic snmp poo poo

the server i set up 10 years ago at work still works pretty much maintenance free

it’s ok if what you need to do is turn snmp (or snmplikes) into browser-viewable rrds on one host

also make sure you stay on top of the security updates or keep it inaccessible from the web

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

tracing seems to be something you write into your web app

anybody doing non-http tracing or collecting trace data from apps you don’t write yourself and doesn’t have native tracing support?

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

uncurable mlady posted:

i wouldn't say it's just in your web app - our entire application is instrumented from webapp all the way down to the db. that said, it's kind of a massive pain in the rear end right now to get trace data from resources you don't directly manage because not everyone uses opentracing and even if they did, wire formats are very tracer dependent. that said, w3c is working on a tracecontext/tracedata specification that's intended to address this problem by standardizing headers and wire formats for context so you could have a situation where you're using some sort of managed ingress proxy or w/e and it'd be able to create spans as part of a trace that started on a client, etc. could also see the same thing at a managed db where the database service on the provider side is able to pick up traces incoming from the application and emit spans that you'd collect.

are you using tracing now? something home-brewed, or opentracing/opencensus?

not really tracing today. I have event data in ES that looks like spans, I think? (request A started routine P on node N, and another event when it completes), and time series data from from node N and resource R, S, T that are slightly to tightly correlated (R can tag all requests from P, S can only show traffic from N, and T can only show high-level perf indicators.)

What I want is something to take this structured Elastic data, look at what resources are directly or indirectly used by that request, and show relevant TS data from Prometheus and log data from ES. If T crashes I want to be able to look at what requests are active in the system. Given 5 crashes, I want to bisect that down to see that requests like A were the only common requests in all five crashes; I'd also like to see that A are taking longer than normal because resource T is reporting high utilization, etc.

Adbot
ADBOT LOVES YOU

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

my stepdads beer posted:

my prometheus keeps corrupting its data because it's on an nfs share. that's fine because i only use it for some pretty graphs sometimes. prom really suffers from not having good examples for what I think are common scenarios.

anyway we use cacti for all our network stuff because of inertia. prom+snmp-exporter+grafana was tedious as hell. nagios for our alerting and it's OK but kind of a pain re: config files.

for "tracing" we use senty for catching our dumb php apps various issues and fatals

don’t use nfs for databases

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply