|
current monitoring job status: i have 15 different unsee dashboards open and there's anything from 10 to 700 warnings/alerts in bouning around in each of them. i'm somehow supposed to keep eye on all of these with only the two monitors i have these are monitoring a bunch of datacenters that are a hellish mix of mesos-marathon microservice container stuff and poo poo running straight on the metal, located around the world. i'm connected to them through a collection of ssh-tunnels over connections and vpns that sometimes just decide to stop working. there's a lot of gently caress A CONTAINER IS DOWN!!! alerts and then go check it and everything is fine. bunch of alerts that once fired will hang around in the dashboard for 6 hours because ??? i'm never confident i'll catch the real problems with all this useless noise. there's graphana too but a lot of the stuff isnt configured right so you have to go massage some prometheus queries by hand to get the graphs you need documentation if of course nonexistent and/or poo poo. configuring the monitoring is some other teams' job and a lot of the time all i can do is open a ticket and hope. when poo poo hits the fan i have only a vague idea who's responsible for what and who the hell i'm supposed to call so i get to wake up my team leader at four in the morning so he can figure it out welp thats my story, back to lurking
|
# ¿ Mar 16, 2019 21:18 |
|
|
# ¿ May 14, 2024 06:30 |