what the fuck is prometheus anyway? a thread about monitoring

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > what the fuck is prometheus anyway? a thread about monitoring

pram: Jun 10, 2001

my company uses splunk for logs and datadog for tracing/monitoring/alerts. these both cost millions of dollars a year. thanks for listening!

# ¿ Feb 25, 2019 20:01

Adbot: ADBOT LOVES YOU

# ¿ May 14, 2024 11:00

pram: Jun 10, 2001

nagios is extremely poo poo

# ¿ Mar 2, 2019 09:17

pram: Jun 10, 2001

just lol that you still have to restart the whole loving thing to update anything

# ¿ Mar 2, 2019 09:26

pram: Jun 10, 2001

also the latest version of opsview is shiiiiiitttttt

# ¿ Mar 2, 2019 09:27

pram: Jun 10, 2001

kafka is about 1000x times shittier so count your blessings

# ¿ Mar 7, 2019 01:43

pram: Jun 10, 2001

Share Bear posted:

oh no! promotheus.

# ¿ Mar 9, 2019 05:36

pram: Jun 10, 2001

Progressive JPEG posted:

kafka is extremely good, maybe your just holding it wrong

lol no. it isnt. youve never used it for anything serious stfu. for example

1) kafka doesnt rebalance topics, ever. if a node is down thats it. the replica is just gone. it doesnt 'migrate' because this is 1998
2) kafka doesnt rebalance storage, ever. if you use JBOD it will just randomly put segments wherever it feels like. if a disk is full it just breaks
3) topic compaction impacts the entire cluster performance if its big enough. nothing you can do about it
4) will randomly break and require a full restart if it lags on the zookeeper state
https://issues.apache.org/jira/browse/KAFKA-2729
5) will effortlessly end up with two cluster controllers if one has degraded performance
6) will spend literal hours 'recovering' on a hard restart (kill) if you have compacted segments
7) replicating data to a replaced node will impact the entire cluster performance, hammering the socket server. and this cant be prevented BECAUSE
8) if you throttle performance it impacts the replica manager AND producers
9) leader rebalancing can still temporarily break producers

and more!

pram fucked around with this message at 05:54 on Mar 9, 2019

# ¿ Mar 9, 2019 05:50

pram: Jun 10, 2001

people dont believe that kafka doesnt migrate anything or rebalance anything. because elasticsearch does so people assume something like kafka (which is pure magic ftw) does

but it literally doesnt. its all manual. if you want to reassign a partition replica, you have to do it yourself with the cli tools or some 3rd party thing. and the operation itself isnt transparent, it actually impacts all the consumers and producers while its doing it (tbf es does this too) its loving garbage

# ¿ Mar 9, 2019 06:02

pram: Jun 10, 2001

ADINSX posted:

Hun, this is interesting. We were playing with kafka at old job because so many things support it and it has per-partition ordering. I knew it was a pain in the rear end to run one, but never knew the reasons why... so this is a lot of reasons.

We were working with Confluent to provide us with a managed instance... I guess they just do all this poo poo behind the scenes? I wonder how they'll do poo poo that actually effects cluster performance? Send the team a notification that its gonna happen? Just never do it?

yes we use confluent (the platform, not their cloud) and they said they basically made a bunch of proprietary additions for their managed service. in that sense its like redislabs cloud vs 'redis' in that you cant replicate it with off the shelf stuff (or even their own provided tools like replicator lol)

amazon msk is straight up vanilla kafka and i think its a big joke right now. same with the azure one, i think its literally just the hortonworks ambari kafka

if you have a single topic you wont have many issues. if you run multi-tenant clusters where people are doing compaction and exactly-once and theres 10000 different consumer groups then its a total shitshow

# ¿ Mar 9, 2019 10:20

pram: Jun 10, 2001

'durrr kafka works fine on my laptop in docker'

# ¿ Mar 9, 2019 16:57

pram: Jun 10, 2001

software just wants to be free - jeff bezos, free software advocate

# ¿ Mar 14, 2019 03:11

pram: Jun 10, 2001

kafka is so very bad i cant even

# ¿ May 11, 2019 22:02

pram: Jun 10, 2001

im reminded of this beauty of an error, for example. kafka partitions would literally just get corrupted and the broken log file would prevent the broker from STARTING

https://issues.apache.org/jira/browse/KAFKA-3919

the only way you could fix it was to go delete the actual file sitting on the disk. if it affected multiple brokers (because of unclean leader election) hope you like data loss

and of course during all this you're totally offline so its a huge outage. epic and ftw

# ¿ May 11, 2019 22:06

pram: Jun 10, 2001

datadog is a metrics/alerting service you $$$ for. its nice

# ¿ Jul 8, 2019 04:38

pram: Jun 10, 2001

nothing more money wont fix

# ¿ Jul 8, 2019 09:49

pram: Jun 10, 2001

yes kafka is loving garbage software

# ¿ Jul 8, 2019 21:41

pram: Jun 10, 2001

pointsofdata posted:

how much do you have to spend on datadog before they will negotiate on price, they're so expensive it's like candlesmeme.tweet

they only offer "discounts" on a bulk (static) amount of metrics from what ive seen

# ¿ Jul 22, 2020 19:50

pram: Jun 10, 2001

ive recently freed myself from janitoring kafka. never again

# ¿ Jul 29, 2020 05:35

pram: Jun 10, 2001

you use it to stream monitoring events genius

# ¿ Jul 29, 2020 07:02

Adbot: ADBOT LOVES YOU

# ¿ May 14, 2024 11:00

pram: Jun 10, 2001

Michaellaneous posted:

why not just use prometheus

i answered your question

# ¿ Jul 29, 2020 07:51

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > what the fuck is prometheus anyway? a thread about monitoring