Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
ADINSX
Sep 9, 2003

Wanna run with my crew huh? Rule cyberspace and crunch numbers like I do?

thanks op didn't read

Adbot
ADBOT LOVES YOU

ADINSX
Sep 9, 2003

Wanna run with my crew huh? Rule cyberspace and crunch numbers like I do?

pram posted:

lol no. it isnt. youve never used it for anything serious stfu. for example


1) kafka doesnt rebalance topics, ever. if a node is down thats it. the replica is just gone. it doesnt 'migrate' because this is 1998
2) kafka doesnt rebalance storage, ever. if you use JBOD it will just randomly put segments wherever it feels like. if a disk is full it just breaks
3) topic compaction impacts the entire cluster performance if its big enough. nothing you can do about it
4) will randomly break and require a full restart if it lags on the zookeeper state
https://issues.apache.org/jira/browse/KAFKA-2729
5) will effortlessly end up with two cluster controllers if one has degraded performance
6) will spend literal hours 'recovering' on a hard restart (kill) if you have compacted segments
7) replicating data to a replaced node will impact the entire cluster performance, hammering the socket server. and this cant be prevented BECAUSE
8) if you throttle performance it impacts the replica manager AND producers
9) leader rebalancing can still temporarily break producers


and more!

Hun, this is interesting. We were playing with kafka at old job because so many things support it and it has per-partition ordering. I knew it was a pain in the rear end to run one, but never knew the reasons why... so this is a lot of reasons.

We were working with Confluent to provide us with a managed instance... I guess they just do all this poo poo behind the scenes? I wonder how they'll do poo poo that actually effects cluster performance? Send the team a notification that its gonna happen? Just never do it?

ADINSX
Sep 9, 2003

Wanna run with my crew huh? Rule cyberspace and crunch numbers like I do?

lancemantis posted:

like spark had a super broken memory model for quite a while, lots of the Hadoop stack is brittle and needs a lot of babysitting

like the noteworthy parts of this stuff is it helps make some stuff feasible but it isn’t “good”

When you refer to a broken memory model is that for spark streaming stuff where the application might leak memory over time? Or does the problem come up in batch execution? I haven't done much spark stuff so I'm curious.

We were able to come up with a pretty solid BIG DATA pipeline using a lot of managed google stuff... but... it was managed by someone else, for all the reasons listed in the thread.

During my interview with the Kinesis team I got the distinct impression that a lot of their job is fighting fires; I realized its probably a lot more fun to USE these managed systems than it is to work on them

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply