Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
tef
May 30, 2004

-> some l-system crap ->
people upgrade cassandra?

Adbot
ADBOT LOVES YOU

Soricidus
Oct 21, 2010
freedom-hating statist shill
the data was actually correct, they just didn't believe it

fritz
Jul 26, 2003

Soricidus posted:

the data was actually correct, they just didn't believe it

sarehu
Apr 20, 2007

(call/cc call/cc)

Suspicious Dish posted:

when you upgrade cassandra make sure to make a backup, because when upgrading it seems to just corrupt all your data

So is upgrading for these databases like, a thing, that you do? You don't just install a new version and have it run on the old files?

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

Soricidus posted:

the data was actually correct, they just didn't believe it

prefect
Sep 11, 2001

No one, Woodhouse.
No one.




Dead Man’s Band

Soricidus posted:

the data was actually correct, they just didn't believe it

ryde
Sep 9, 2011

God I love young girls

Suspicious Dish posted:

my job now is writing big data cloud software and i'm stuck trying to figure out how anything gets done in the first place

A lot of people just tend to model things as a map-reduce job and put it into Hadoop or Spark, from what I've seen. Not a lot of people want to mess around with Zookeeper and figuring out how to distributes work to hosts. I'd honestly be asking if you can partition your data and run your jobs as a collection of map-reduce operations. Even if you don't want to do something heavyweight like a Hadoop cluster, it makes it easier to write a "good enough" system.

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Suspicious Dish posted:

the big data problem? it's asset processing and transformation -- that is, take html doc, normalize it, take image, thumbnail it, that sort of thing

none of it is really postgres-able except for indexing metadata to allow people to search through the thing, and even then, i'll probably throw it in elasticsearch

doesn't sound like a great fit for spark/hadoop really unless there's a reduce step in there you're leaving out. sounds like maybe you want something like storm although you probably could just use kafka/sqs and a bunch of services that consume off the queue

tef
May 30, 2004

-> some l-system crap ->

Suspicious Dish posted:

i was more talking about zookeeper and etcd than actual storage

eh, doesn't matter, they'll be used as storage engines anyway

c.f chubby http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf

quote:

Inevitably, these services grew until the use of Chubby was extreme: a single 1.5MByte file was being rewritten in its entirety on each user action, and the overall space used by the service exceeded the space needs of all other Chubby clients combined. We introduced a limit on file size (256kBytes), and encouraged the services to migrate to more appropriate storage systems. But it is difficult to make significant changes to production systems maintained by busy people—it took approximately a year for the data to be migrated elsewhere

meanwhile zookeeper and etc are modelled as file systems, and not lock services. to the point where zookeeper has a well hidden page explaining how to implement locks in a convoluted fashion: http://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_recipes_Locks

with zk, some things use it for leader election to set up replication (kafka), some use it for pub-sub (kafka). with raft, it seems to be popular with service discovery even though it's probably the wrong tool for the job

https://tech.knewton.com/blog/2014/12/eureka-shouldnt-use-zookeeper-service-discovery/

on the other hand etcd and the other flavour of the month raft implmentations have had known issues with stale reads so maybe it works out in the end :unsmith:

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
yeah, we're modelling our stuff on kafka, which seems like it's nice for what we want to do

there's still a weird thing in there -- how do you broadcast progress back to the originating client? you could have the service push out kafka messages saying "job 12345 is 50% done" on a job-progress channel, but then you need something on the other side that coalesces the updates, puts them in a database, and then somehow pushes them out to a websocket on the client side

i'd love a thing like kafka that was inverted in its channel-ness -- light-weight topic routing, without caring about persistence or availability, because these can always be recreated from the kafka layer. perhaps beanstalkd? but then we're getting into nosql territory.

or you could have a rest-based thing where the client just polls and asks the worker how done it is. i suppose that's the easiest solution, but now you need to figure out how to route the job id to the worker -- which means that you need a separate tracking daemon paying attention to which server is on which kafka partition

poo poo's hard, yo

MononcQc
May 29, 2007

etcd clusters also fail horribly with any asynchronous netsplit as election cycles are called by peers when they lose visibility of the current leader. If the leader sees a message calling from a reelection with a higher epoch than its own, it steps down.

This means that in a case where updates from the master don't make it to a cluster member, but the node can contact the others, this node will repeatedly call elections with higher and higher epochs, and force masters to serially step down. Therefore, a single instance being isolated on an asynchronous netsplit or on an unreliable network can paralyze a raft cluster for as long as the issue remains.

MononcQc
May 29, 2007

for the longest time etcd was by default reading from the master's write-ahead log as an 'optimization' (so you didn't poll a majority of followers on every read), but it turned out this broke Raft and would allow uncommitted data to be read. WOops. So they got told about it, added the full quorum read option, but advised against using it because it's slow.

crazypenguin
Mar 9, 2005
nothing witty here, move along
i saw a talk about kubernetes recently, and it looks really cool so i wanted to learn more about it

but it seems to use docker and etcd and all i hear is how these things are broken and now i dunno

or maybe just everything is broken?

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

crazypenguin posted:

i saw a talk about kubernetes recently, and it looks really cool so i wanted to learn more about it

but it seems to use docker and etcd and all i hear is how these things are broken and now i dunno

or maybe just everything is broken?

turns out distributed systems are hard to implement and people aren't good at it

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





crazypenguin posted:

i saw a talk about kubernetes recently, and it looks really cool so i wanted to learn more about it

but it seems to use docker and etcd and all i hear is how these things are broken and now i dunno

or maybe just everything is broken?

it's all broken poo poo. raft is hilarious because every single production implementation has gaping implementation holes (like the ones MononcQc mentioned) that render them useless for linearization which is the only reason to use raft in the first place. if you just want a distributed k/v store you're better off with riak or cassandra or like a million other things that are better tested, faster and actually meet (some of...) the guarantees they make. if you really need linearizable operations then you should use zookeeper which is probably also broken but at least it's not obviously broken

if anyone wants a good introduction to distributed systems on an approachable level (more of 'why you want this and what to google to learn more' than 'what is this and how does it work') this talk from strange loop is fantastic:https://www.youtube.com/watch?v=ohvPnJYUW1E

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope
what would be the worst possible software development case?

maybe some kind of distributed financial-medical thing?

people lose money and/or die if you gently caress up.

Bloody
Mar 3, 2013

Wheany posted:

what would be the worst possible software development case?

maybe some kind of distributed financial-medical thing?

people lose money and/or die if you gently caress up.

with hard real-time guarantees working in a radioactive/cosmic environment

jony neuemonic
Nov 13, 2009

Wheany posted:

what would be the worst possible software development case?

maybe some kind of distributed financial-medical thing?

people lose money and/or die if you gently caress up.

so, american healthcare?

tef
May 30, 2004

-> some l-system crap ->

Suspicious Dish posted:

yeah, we're modelling our stuff on kafka, which seems like it's nice for what we want to do

i've broken this in two so you can see you saying it's a good fit, and then saying that it doesn't work when you need a return path.

quote:

you could have the service push out kafka messages saying "job 12345 is 50% done" on a job-progress channel, but then you need something on the other side that coalesces the updates, puts them in a database, and then somehow pushes them out to a websocket on the client side

kafka is a big rear end syslog. clients append to it. it's partitioned behind the scenes. if you want to do anything other than send to all, you have to use another service (zookeeper) to track the shared iterator. kafka best used as a firehose.

quote:

i'd love a thing like kafka that was inverted in its channel-ness -- light-weight topic routing, without caring about persistence or availability, because these can always be recreated from the kafka layer. perhaps beanstalkd? but then we're getting into nosql territory.

i am sure you are not the first person to think about building a message queue atop kafka. i don't think you get to call using kafka as a persistence layer lightweight. you could use a message broker instead of writing one atop kafka.

quote:

how do you broadcast progress back to the originating client?

if you insist on using kafka, you're going to have a fun time putting acks in. you'll need to keep them in the same partition / topic to ensure that you have ordering.

you'd be better off putting kafka inside the client, not inside the broker, and keep your brokers stateless. when it comes to recovery, just bring the clients up and get them to replay from the log. i.e kafka is used to track the state of client requests, not to distribute them.

quote:

asset processing and transformation -- that is, take html doc, normalize it, take image, thumbnail it, that sort of thing

so here maybe each asset job would have it's own kafka topic. you fire up a client with a given topic, it reads the state of the job and continues processing it, posting updates to the same topic to mark it as complete. to work out status, you fetch kafka log for task and read last update if you're feeling like suffering, you can even inline the data into the kafka log itself.

you can either homebrew or take something off github to handle firing off the workers/task allocation, or even do it in kafka too, where you have one topic for work requests/assignment and the usual kafka/zookeeper dance to load balance.

but tbh this sort of thing is kinda what map reduce is for: take this poo poo off s3 and dump this other poo poo on s3.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

tef posted:

so here maybe each asset job would have it's own kafka topic. you fire up a client with a given topic, it reads the state of the job and continues processing it, posting updates to the same topic to mark it as complete. to work out status, you fetch kafka log for task and read last update if you're feeling like suffering, you can even inline the data into the kafka log itself.

This is ideally what I'd like, but Kafka doesn't scale with topics -- it's expected that you have a fixed number of topics which you explicitly manage. Maybe you raise the number of partitions once in a while, but it's not like code is supposed to create a new Kafka topic.

I'm not sure what you mean about "putting Kafka into the client". That seems to be what we're doing, no?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Our strategy for scaling and load balancing is to simply increase the number of workers and partitions. So, if we need to process 500 videos at once, we up the "worker-video-transcode" topic to 500 partitions, and then command Mesos to keep 500 workers running, distributed across our cluster. The hope is that the Kafka clients are smart enough to round robin through the partitions.

You can't easily transcode a video as a MapReduce job -- there's an inherent process in there that just takes a long-rear end time, and isn't easily parallelizable.

We've looked at various pipeline tools -- Hadoop/YARN, Storm, Spark, but we're not happy that they lock us into a certain processing methodology. That's why our goal has been to perhaps use one of those technologies internally, but consider it a black box on the part of the worker. All outside communication about status and progress go through either REST or Kafka.

tef
May 30, 2004

-> some l-system crap ->

Suspicious Dish posted:

Our strategy for scaling and load balancing is to simply increase the number of workers and partitions. So, if we need to process 500 videos at once, we up the "worker-video-transcode" topic to 500 partitions,

good luck with changing the number of partitions on a kafka topic

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
http://kafka.apache.org/documentation.html#basic_ops_modify_topic

???

tef
May 30, 2004

-> some l-system crap ->

Suspicious Dish posted:

So, if we need to process 500 videos at once,
[...]
there's an inherent process in there that just takes a long-rear end time, and isn't easily parallelizable.

JawnV6
Jul 4, 2004

So hot ...
"map" is ur first pass and 'reduce' is the second pass

tef
May 30, 2004

-> some l-system crap ->

Be aware that one use case for partitions is to semantically partition data, and adding partitions doesn't change the partitioning of existing data so this may disturb consumers if they rely on that partition. :shrug:

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
That is not our use case, though. We're using partitions unsemantically as a load balance technique.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

What I was saying is that you can't compress raw video to H.264 in parallel. The process of compression is heavily linearized.

The only scaling option you have is to just buy a gently caress-ton of servers and do a lot of them at once.

But that's not MapReduce -- the jobs aren't related or batched in any way, and there's no reduce phase.

crazypenguin
Mar 9, 2005
nothing witty here, move along

Suspicious Dish posted:

You can't easily transcode a video as a MapReduce job -- there's an inherent process in there that just takes a long-rear end time, and isn't easily parallelizable.

the input to the job would just be a list of file names or urls to each video. each map job would just do the transcoding of the videos it's handed. no real reduce phase but that's just fine.

i think you're maybe thinking of trying to do one video as a mapreduce job? that's not right. it's the aggregate.

or am i misunderstanding you

suffix
Jul 27, 2013

Wheeee!

Suspicious Dish posted:

Our strategy for scaling and load balancing is to simply increase the number of workers and partitions. So, if we need to process 500 videos at once, we up the "worker-video-transcode" topic to 500 partitions, and then command Mesos to keep 500 workers running, distributed across our cluster. The hope is that the Kafka clients are smart enough to round robin through the partitions.

You can't easily transcode a video as a MapReduce job -- there's an inherent process in there that just takes a long-rear end time, and isn't easily parallelizable.

We've looked at various pipeline tools -- Hadoop/YARN, Storm, Spark, but we're not happy that they lock us into a certain processing methodology. That's why our goal has been to perhaps use one of those technologies internally, but consider it a black box on the part of the worker. All outside communication about status and progress go through either REST or Kafka.

that's a lot of partitions.
tbh it sounds like what you want is closer to a task queue, like celery.
i like the speed and reliability of kafka but it's a primitive, for any use case that's not just a big hose you'll end up reimplementing a lot of functionality on top

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

crazypenguin posted:

the input to the job would just be a list of file names or urls to each video. each map job would just do the transcoding of the videos it's handed. no real reduce phase but that's just fine.

i think you're maybe thinking of trying to do one video as a mapreduce job? that's not right. it's the aggregate.

or am i misunderstanding you

That would work in a case where we have a batch job of videos to process. We don't. Assets can be processed at any time.

AWWNAW
Dec 30, 2008

I think rabbit MQ has an exchange type for that?

you don't need topics or anything. just spin up a bunch of subscribers to the queue!

tef
May 30, 2004

-> some l-system crap ->
let me know when this fishmech thread is over

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

suffix posted:

that's a lot of partitions.
tbh it sounds like what you want is closer to a task queue, like celery.
i like the speed and reliability of kafka but it's a primitive, for any use case that's not just a big hose you'll end up reimplementing a lot of functionality on top

We want a task queue which has the ability to have failed tasks w/ limited retry capability, archives the status and logs of tasks nearly permanently (we regularly have cases where we want to see the log output of a task from 2 years ago), has audit trails about who started tasks, is polyglot-friendly (supports something other than your favorite language) and scales in a distributed fashion.

We didn't see anything like that existing, so we're planning to build it from scratch, using Kafka as a primitive.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

tef posted:

let me know when this fishmech thread is over

Will do, promise!

JawnV6
Jul 4, 2004

So hot ...

Suspicious Dish posted:

That would work in a case where we have a batch job of videos to process. We don't.

Suspicious Dish posted:

if we need to process 500 videos at once,
but You Don't???

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
is fishmech still alive

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Suspicious Dish posted:

That would work in a case where we have a batch job of videos to process. We don't. Assets can be processed at any time.

it sounds like you just want an autoscale group that reads from sqs or kafka and scales up based on length of queue. have whatever you're using to do the encoding update a rest endpoint somewhere clients can query to get status. trying to write updates back to the queue seems like a lot of complexity for no reason

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

JawnV6 posted:

but You Don't???

We have around 2000 videos to process incoming from the Rachel Project and Khan Academy, and we also are in talks with newspapers to distribute their snippet videos of the day. We also process content other than videos.

the talent deficit posted:

it sounds like you just want an autoscale group that reads from sqs or kafka and scales up based on length of queue. have whatever you're using to do the encoding update a rest endpoint somewhere clients can query to get status. trying to write updates back to the queue seems like a lot of complexity for no reason

Right. We figured writing to Kafka was a way to notify that REST endpoint, or would you recommend the workers themselves poke the REST endpoint?

Adbot
ADBOT LOVES YOU

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Suspicious Dish posted:

What I was saying is that you can't compress raw video to H.264 in parallel. The process of compression is heavily linearized.
you split the video into segments, encode those segments in parallel, then concatenate them

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply