|
my prometheus keeps corrupting its data because it's on an nfs share. that's fine because i only use it for some pretty graphs sometimes. prom really suffers from not having good examples for what I think are common scenarios. anyway we use cacti for all our network stuff because of inertia. prom+snmp-exporter+grafana was tedious as hell. nagios for our alerting and it's OK but kind of a pain re: config files. for "tracing" we use senty for catching our dumb php apps various issues and fatals
|
# ¿ Mar 2, 2019 06:18 |
|
|
# ¿ May 14, 2024 17:55 |
|
it's fine. thing break -> email and sms sent. it also runs happily for years without anyone touching it
|
# ¿ Mar 2, 2019 07:35 |
|
r u ready to WALK posted:the reason nagios runs for years without anyone touching it is that nobody wants to actively maintain it even with a gun to their head ya you use ansible to template the config files it's easy. it's not good. but it works.
|
# ¿ Mar 2, 2019 21:47 |
|
Blinkz0rz posted:we use a combination of elk and the logging software we sell (dogfooding is good) for logging and datadog for monitoring. i think a small part still has some sensu + grafana for monitoring physical assets or something idk elk or graylog are cool up until the point you have to learn about maintaining an elasticsearch cluster
|
# ¿ Mar 2, 2019 21:49 |
|
Blinkz0rz posted:yeah ama about maintaining an elk stack that processes a few tb of logs a day tb?! no thank you
|
# ¿ Mar 5, 2019 09:45 |
|
my bitter bi rival posted:well nagios is free so it looks like im owned then. prom's alerts and alertmanager seem good but i have never gotten around to migrating
|
# ¿ Mar 6, 2019 22:08 |
|
the prom / grafana guys are making a log thing now https://grafana.com/loki no full text search though, also it only works with k8s atm
|
# ¿ Mar 13, 2019 08:42 |
|
uncurable mlady posted:lol y tho i assume they got sick of waking up to CLUSTER: RED
|
# ¿ Mar 14, 2019 02:54 |
|
Sylink posted:Prometheus owns, if anyone has questions we use it all the time. what do you find useful to monitor do you install node exporter on every vm use it for alerting?
|
# ¿ Apr 29, 2019 08:51 |
|
Sylink posted:prom ty that was very helpful
|
# ¿ Apr 30, 2019 10:21 |
|
i want to set up graylog or elk again but i hate elasticsearch
|
# ¿ Jul 10, 2019 11:49 |
|
vector.dev looks nice
|
# ¿ Jul 24, 2019 10:50 |
|
yeah I gave it go yesterday, very early days for it.
|
# ¿ Jul 25, 2019 00:07 |
|
CRIP EATIN BREAD posted:I'm picturing someone calling a customer service rep with billing questions and the person on the line saying "hold on for one second, sir" and then SSHing into a server and grepping logs lol my first tech support job was this, but it was a shell command that rsh'd into the relevant router and grepped the output of some commands nothing like letting your phone people touch production equipment 👩🍳👌
|
# ¿ Jul 22, 2020 10:05 |
|
Pardot posted:i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access. greylog or ELK probably
|
# ¿ Jul 29, 2020 10:45 |
|
promql sucks so bad. wishing I had gone with influxdb for its sql like language
|
# ¿ Aug 19, 2020 02:43 |
|
|
# ¿ May 14, 2024 17:55 |
|
if you're up for a bit of learning and set up prometheus with alertmanager + blackbox exporter prometheus and promql is annoying and isn't a turn-key solution, requires a bit of set up but it works and it scales well.
|
# ¿ Sep 4, 2020 13:20 |