what the fuck is prometheus anyway? a thread about monitoring

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > what the fuck is prometheus anyway? a thread about monitoring

cowboy beepboop: Feb 24, 2001

my prometheus keeps corrupting its data because it's on an nfs share. that's fine because i only use it for some pretty graphs sometimes. prom really suffers from not having good examples for what I think are common scenarios.

anyway we use cacti for all our network stuff because of inertia. prom+snmp-exporter+grafana was tedious as hell. nagios for our alerting and it's OK but kind of a pain re: config files.

for "tracing" we use senty for catching our dumb php apps various issues and fatals

# ¿ Mar 2, 2019 06:18

Adbot: ADBOT LOVES YOU

# ¿ May 14, 2024 17:55

cowboy beepboop: Feb 24, 2001

it's fine. thing break -> email and sms sent. it also runs happily for years without anyone touching it

# ¿ Mar 2, 2019 07:35

cowboy beepboop: Feb 24, 2001

r u ready to WALK posted:

the reason nagios runs for years without anyone touching it is that nobody wants to actively maintain it even with a gun to their head

it can sort of work if you write your own custom scripts that autogenerate all the config files for it,good luck maintaining that poo poo by hand though

ya you use ansible to template the config files it's easy. it's not good. but it works.

# ¿ Mar 2, 2019 21:47

cowboy beepboop: Feb 24, 2001

Blinkz0rz posted:

we use a combination of elk and the logging software we sell (dogfooding is good) for logging and datadog for monitoring. i think a small part still has some sensu + grafana for monitoring physical assets or something idk

elk or graylog are cool up until the point you have to learn about maintaining an elasticsearch cluster

# ¿ Mar 2, 2019 21:49

cowboy beepboop: Feb 24, 2001

Blinkz0rz posted:

yeah ama about maintaining an elk stack that processes a few tb of logs a day

it loving sucks

tb?! no thank you

# ¿ Mar 5, 2019 09:45

cowboy beepboop: Feb 24, 2001

my bitter bi rival posted:

well nagios is free so it looks like im owned then.

prom's alerts and alertmanager seem good but i have never gotten around to migrating

# ¿ Mar 6, 2019 22:08

cowboy beepboop: Feb 24, 2001

the prom / grafana guys are making a log thing now

https://grafana.com/loki

no full text search though, also it only works with k8s atm

# ¿ Mar 13, 2019 08:42

cowboy beepboop: Feb 24, 2001

uncurable mlady posted:

lol y tho

lol y tho

i assume they got sick of waking up to CLUSTER: RED

# ¿ Mar 14, 2019 02:54

cowboy beepboop: Feb 24, 2001

Sylink posted:

Prometheus owns, if anyone has questions we use it all the time.

what do you find useful to monitor
do you install node exporter on every vm
use it for alerting?

# ¿ Apr 29, 2019 08:51

cowboy beepboop: Feb 24, 2001

Sylink posted:

prom

ty that was very helpful

# ¿ Apr 30, 2019 10:21

cowboy beepboop: Feb 24, 2001

i want to set up graylog or elk again but i hate elasticsearch

# ¿ Jul 10, 2019 11:49

cowboy beepboop: Feb 24, 2001

vector.dev looks nice

# ¿ Jul 24, 2019 10:50

cowboy beepboop: Feb 24, 2001

yeah I gave it go yesterday, very early days for it.

# ¿ Jul 25, 2019 00:07

cowboy beepboop: Feb 24, 2001

CRIP EATIN BREAD posted:

I'm picturing someone calling a customer service rep with billing questions and the person on the line saying "hold on for one second, sir" and then SSHing into a server and grepping logs

lol my first tech support job was this, but it was a shell command that rsh'd into the relevant router and grepped the output of some commands
nothing like letting your phone people touch production equipment 👩‍🍳👌

# ¿ Jul 22, 2020 10:05

cowboy beepboop: Feb 24, 2001

Pardot posted:

i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

greylog or ELK probably

# ¿ Jul 29, 2020 10:45

cowboy beepboop: Feb 24, 2001

promql sucks so bad. wishing I had gone with influxdb for its sql like language

# ¿ Aug 19, 2020 02:43

Adbot: ADBOT LOVES YOU

# ¿ May 14, 2024 17:55

cowboy beepboop: Feb 24, 2001

if you're up for a bit of learning and set up prometheus with alertmanager + blackbox exporter

prometheus and promql is annoying and isn't a turn-key solution, requires a bit of set up but it works and it scales well.

# ¿ Sep 4, 2020 13:20

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > what the fuck is prometheus anyway? a thread about monitoring