Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
cowboy beepboop
Feb 24, 2001

my prometheus keeps corrupting its data because it's on an nfs share. that's fine because i only use it for some pretty graphs sometimes. prom really suffers from not having good examples for what I think are common scenarios.

anyway we use cacti for all our network stuff because of inertia. prom+snmp-exporter+grafana was tedious as hell. nagios for our alerting and it's OK but kind of a pain re: config files.

for "tracing" we use senty for catching our dumb php apps various issues and fatals

Adbot
ADBOT LOVES YOU

cowboy beepboop
Feb 24, 2001

it's fine. thing break -> email and sms sent. it also runs happily for years without anyone touching it

cowboy beepboop
Feb 24, 2001

r u ready to WALK posted:

the reason nagios runs for years without anyone touching it is that nobody wants to actively maintain it even with a gun to their head

it can sort of work if you write your own custom scripts that autogenerate all the config files for it,good luck maintaining that poo poo by hand though

ya you use ansible to template the config files it's easy. it's not good. but it works.

cowboy beepboop
Feb 24, 2001

Blinkz0rz posted:

we use a combination of elk and the logging software we sell (dogfooding is good) for logging and datadog for monitoring. i think a small part still has some sensu + grafana for monitoring physical assets or something idk

elk or graylog are cool up until the point you have to learn about maintaining an elasticsearch cluster

cowboy beepboop
Feb 24, 2001

Blinkz0rz posted:

yeah ama about maintaining an elk stack that processes a few tb of logs a day

it loving sucks

tb?! no thank you

cowboy beepboop
Feb 24, 2001

my bitter bi rival posted:

well nagios is free so it looks like im owned then.

prom's alerts and alertmanager seem good but i have never gotten around to migrating

cowboy beepboop
Feb 24, 2001

the prom / grafana guys are making a log thing now

https://grafana.com/loki

no full text search though, also it only works with k8s atm

cowboy beepboop
Feb 24, 2001

uncurable mlady posted:

lol y tho


lol y tho

i assume they got sick of waking up to CLUSTER: RED

cowboy beepboop
Feb 24, 2001

Sylink posted:

Prometheus owns, if anyone has questions we use it all the time.

what do you find useful to monitor
do you install node exporter on every vm
use it for alerting?

cowboy beepboop
Feb 24, 2001


ty that was very helpful

cowboy beepboop
Feb 24, 2001

i want to set up graylog or elk again but i hate elasticsearch

cowboy beepboop
Feb 24, 2001

vector.dev looks nice

cowboy beepboop
Feb 24, 2001

yeah I gave it go yesterday, very early days for it.

cowboy beepboop
Feb 24, 2001

CRIP EATIN BREAD posted:

I'm picturing someone calling a customer service rep with billing questions and the person on the line saying "hold on for one second, sir" and then SSHing into a server and grepping logs

lol my first tech support job was this, but it was a shell command that rsh'd into the relevant router and grepped the output of some commands
nothing like letting your phone people touch production equipment 👩‍🍳👌

cowboy beepboop
Feb 24, 2001

Pardot posted:

i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

greylog or ELK probably

cowboy beepboop
Feb 24, 2001

promql sucks so bad. wishing I had gone with influxdb for its sql like language

Adbot
ADBOT LOVES YOU

cowboy beepboop
Feb 24, 2001

if you're up for a bit of learning and set up prometheus with alertmanager + blackbox exporter

prometheus and promql is annoying and isn't a turn-key solution, requires a bit of set up but it works and it scales well.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply