Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Corla Plankun
May 8, 2007

improve the lives of everyone
you need an append-only table in a database that keeps track of the transaction event stream, and you need to write to it in a more robust way than just using stdout or ddog's possibly inconsistent log sending

Adbot
ADBOT LOVES YOU

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
I'm picturing someone calling a customer service rep with billing questions and the person on the line saying "hold on for one second, sir" and then SSHing into a server and grepping logs

Hed
Mar 31, 2004

Fun Shoe
my CSR would write a perl script for that

distortion park
Apr 25, 2011


how much do you have to spend on datadog before they will negotiate on price, they're so expensive it's like candlesmeme.tweet

cowboy beepboop
Feb 24, 2001

CRIP EATIN BREAD posted:

I'm picturing someone calling a customer service rep with billing questions and the person on the line saying "hold on for one second, sir" and then SSHing into a server and grepping logs

lol my first tech support job was this, but it was a shell command that rsh'd into the relevant router and grepped the output of some commands
nothing like letting your phone people touch production equipment 👩‍🍳👌

motedek
Oct 9, 2012
dropped some points from OP into my phone screen, thx OP

Progressive JPEG
Feb 19, 2003

the code that's writing the log should also be writing to a database

efb

pram
Jun 10, 2001

pointsofdata posted:

how much do you have to spend on datadog before they will negotiate on price, they're so expensive it's like candlesmeme.tweet

they only offer "discounts" on a bulk (static) amount of metrics from what ive seen

Michaellaneous
Oct 30, 2013

When im on call and we get a nagios alert (version probably 2014), i have to ssh into our productive machines after getting an error code, search for a line with that error code in our log, get a second number that i can then use to trace a kibana log

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
hell yeah

Scud Hansen
Dec 13, 2015

Darkness and Evil
When I worked at [ALLIED MASTERCOMPUTER] we had really insane metrics and monitoring, but making any changes to it involved editing a 5000 line xml file, and when you parsed it, the thing that checked its validity couldn't tell you which line errors were on, only that there was one.

Hell.

Prometheus is pretty awesome especially when you stick a fancy dash with grafana on it. Jurassic park poo poo.

distortion park
Apr 25, 2011


I'm fairly certain our datadog spend this month was higher than our (covid impacted) revenue

motedek
Oct 9, 2012
a recruiter told swim datadog is creating a special task force to reduce cloud spend lmbo

Guy Axlerod
Dec 29, 2008

Scud Hansen posted:

When I worked at [ALLIED MASTERCOMPUTER] we had really insane metrics and monitoring, but making any changes to it involved editing a 5000 line xml file, and when you parsed it, the thing that checked its validity couldn't tell you which line errors were on, only that there was one.

Hell.

Prometheus is pretty awesome especially when you stick a fancy dash with grafana on it. Jurassic park poo poo.

I prometheus operator. If there's an error in some PrometheusRules object some other gently caress deployed last week, it just never updates the promethus config again until you find the object causing the error. At least there's a log somewhere that might give you a clue about which part is broken.

animist
Aug 28, 2018

motedek posted:

a recruiter told swim datadog is creating a special task force to reduce cloud spend lmbo

:thunk:

poo poo guys what are we gonna do with all this data

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine
I’m beginning to regret entrusting my data to a dog

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
what do dogs know about cost accounting anyway

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
my understanding is that they basically have just a massive loving Kafka cluster and boy I’d love to hear some stories from their SREs

pram
Jun 10, 2001
ive recently freed myself from janitoring kafka. never again

Michaellaneous
Oct 30, 2013

why would you use kafka for monitoring

pram
Jun 10, 2001
you use it to stream monitoring events genius

Pardot
Jul 25, 2001




i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

Michaellaneous
Oct 30, 2013

pram posted:

you use it to stream monitoring events genius

why not just use prometheus

Pardot posted:

i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

an ELK stack

pram
Jun 10, 2001

Michaellaneous posted:

why not just use prometheus


i answered your question

Progressive JPEG
Feb 19, 2003

Pardot posted:

i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

given just a gigabyte a day and only occasional use, it could honestly just be a set of hourly text files with regular backups that they would grep

if you're feeling fancy then write as a gzip stream and tell them to use zgrep

cowboy beepboop
Feb 24, 2001

Pardot posted:

i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

greylog or ELK probably

Shaggar
Apr 26, 2006

Pardot posted:

i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

Azure Monitor/Application Insights

Scud Hansen
Dec 13, 2015

Darkness and Evil

Ploft-shell crab posted:

I’m beginning to regret entrusting my data to a dog

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison

Pardot posted:

i want to send structured logs/events at some service that I wont have to janitor. the events will mostly be an entity with one or more uuids, and then some info like state changes, maybe some timestmaps or durations, idk. Nothing automated needs to consume this, just sometimes a person will do a search on the uuids to figure out what happened. I only need like 7 maybe 14 days of retention and the volume of incoming events wont be so high. Probably like 1gb/day, for sure less than 100. Only like 5 people will need access.

what service should i use?

honestly? honeycombs free tier might work out well for you

edit - assuming you want to share a password, I think they have a user limit on the free account

Pardot
Jul 25, 2001




uncurable mlady posted:

honestly? honeycombs free tier might work out well for you

edit - assuming you want to share a password, I think they have a user limit on the free account

I’ll take a closer look at that. I know a couple of people
that work there but figured it was all more about “observabity” whatever that is than just looking at a few logs.

I’m trying out timber.io right now and it seems exactly right.

cowboy beepboop
Feb 24, 2001

promql sucks so bad. wishing I had gone with influxdb for its sql like language

Guy Axlerod
Dec 29, 2008
Why yes I love having to take a delta and then sum, instead of a sum then a delta. I like having metrics get stuck whenever I redeploy, it's great.

distortion park
Apr 25, 2011


my stepdads beer posted:

promql sucks so bad. wishing I had gone with influxdb for its sql like language

don't worry the influxdb one is also bad

Chalks
Sep 30, 2009

we've been using Xymon for monitoring our systems for years and it's pretty dated and it's got rather out of hand as we've grown. i may have convinced people that we should migrate to something more modern and less garbage... but i have no idea what.

can anyone recommend a monitoring solution that would be suitable for stuff like checking connectivity to a couple of hundred servers, databases and http endpoints? we've got some complicated poo poo that that alerts based on log file parsing but i'd be happy to just get the important "server is down" alerts onto something more modern as a starting point.

cowboy beepboop
Feb 24, 2001

if you're up for a bit of learning and set up prometheus with alertmanager + blackbox exporter

prometheus and promql is annoying and isn't a turn-key solution, requires a bit of set up but it works and it scales well.

post hole digger
Mar 21, 2011

tfw you are seeing degraded hardware issues in aws

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
who up observing they apps

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

kitten emergency posted:

who up observing they apps

who needs they prometheussy metriced

well-read undead
Dec 13, 2022

datadog is so expensive

Adbot
ADBOT LOVES YOU

Dans Macabre
Apr 24, 2004


I just ping poo poo and if it doesn't answer then I call the vendor.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply