Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Shaggar
Apr 26, 2006
apache is the best cause its designed by and for professionals who work in the real world. MIT/BSD are ok, but apache is more modern and clearly defines rights, whereas they are kind of assumed to exist in MIT/BSD (afaik).

you're not gonna be able to control your open sores. using apache over a restrictive license (one that requires code kick backs, or worse, no proprietary software) gives users the ability to change their mind about contributing later on instead of skipping your license or ignoring it entirely.

Adbot
ADBOT LOVES YOU

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Shaggar posted:

you're not gonna be able to control your open sores. using apache over a restrictive license (one that requires code kick backs, or worse, no proprietary software) gives users the ability to change their mind about contributing later on instead of skipping your license or ignoring it entirely.

"i violate copyright at work, therefore everyone else does too"

Zombywuf
Mar 29, 2008

Who the gently caress is releasing code under a license that requires upstream contribution?

Max Facetime
Apr 18, 2009

Otto Skorzeny posted:

"i violate copyright at work, therefore everyone else does too"

copyright infringement is easy, prevention less so, but not impossible if you make the source code totally unappealing. gpl helps with this

Condiv
May 7, 2008

Sorry to undo the effort of paying a domestic abuser $10 to own this poster, but I am going to lose my dang mind if I keep seeing multiple posters who appear to be Baloogan.

With love,
a mod


https://www.youtube.com/watch?v=E3418SeWZfQ

shaggar was right?

NickFendon
May 4, 2009
Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right?

Squinty Applebottom
Jan 1, 2013

NickFendon posted:

Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right?

its identical

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software,

MononcQc
May 29, 2007

Quick rundown of eventual consistency methods https://research.microsoft.com/pubs/192621/sigtt611-bernstein.pdf -- a nice quick read to have a general idea of what's out there.

NickFendon
May 4, 2009

polpotpi posted:

its identical

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software,

in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

Squinty Applebottom
Jan 1, 2013

lol like I'm reading the whole drat thing

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

NickFendon posted:

Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right?

because its useless

Nomnom Cookie
Aug 30, 2009



NickFendon posted:

Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right?

Some C++ libs are. Java projects use Apache license because the ASF does

Also I don't mind if someone uses code I publish but no attribution? I'm not ok with that

Nomnom Cookie
Aug 30, 2009



MononcQc posted:

Quick rundown of eventual consistency methods https://research.microsoft.com/pubs/192621/sigtt611-bernstein.pdf -- a nice quick read to have a general idea of what's out there.

is there anything better than last write wins and stupid limited things like replicated counters

Gazpacho
Jun 18, 2004

by Fluffdaddy
Slippery Tilde

prefect posted:

artistic license

who doesn't want to be an artist?


also it's a joke on the phrase "artistic license", which i have always appreciated
autistic license: if you distribute a modified version of this code the author will send you e-mail tantrums until you agree to stop

Sapozhnik
Jan 2, 2005

Nap Ghost

Nomnom Cookie posted:

Some C++ libs are. Java projects use Apache license because the ASF does

Also I don't mind if someone uses code I publish but no attribution? I'm not ok with that

Copyright law requires it irrespective of the license. If someone copies your code into their work, then whether or not you allow them to do that you still own the copyright for your work and it is illegal for them to plagiarise it (claim that they wrote it and not you, aka "moral rights") or remove any copyright notices you've added in there

http://en.wikipedia.org/wiki/ISC_license

ISC license best license, it's basically a "do whatever the gently caress you want" license except without all the poo poo that's required by copyright law anyway. My only complaint is the HURR IF IT'S IN CAPITAL LETTERS IT'S MORE LEGALER second paragraph.

EMILY BLUNTS
Jan 1, 2005

polpotpi posted:

lol like I'm reading the whole drat thing

did you know?
there is software that can compare two bodies of text,

MononcQc
May 29, 2007

Nomnom Cookie posted:

is there anything better than last write wins and stupid limited things like replicated counters

Last write wins is a simpler version of logical clocks and others, which aim to track logical dependencies in order to limit how often you need to actually need to make a decision that will lose data. CRDTs will have you design your data types such as they can be merged in any way or order and give stable results (the paper above represents a way to do it with sets via counters, but it's not always counters).

It always depends on what you can allow yourself to lose or not. The general ideas remain limited by the CAP theorem: you can't have updates that work and are reflected on all nodes of a system during failures. Either you stop operations, or you allow some of them and deal with the fallover later.

A lot of it is either designing a system such that you get the optimal amount of communication required to guarantee consistency (through a quorum or whatever), or organizing (some or all) operations such as that they can be reconciled safely when the cluster is healthy.

Max Facetime
Apr 18, 2009

So which software is the best for that?

Notorious b.s.d.
Jan 25, 2003

by Reene

Max Facetime posted:

So which software is the best for that?

Best for what?

If you require partition tolerance, do you want a CP or an AP system? This is a subtle and difficult choice. AP is really hard to do unless you build your software from the ground up to support it (e.g. datomic + riak makes AP kinda the default choice when designing your product)

CP is easier to wrap your (well, my) head around, and requires less of your time spent thinking about CRDTs, but having the whole world grind to a halt in the face of a problem is kinda scary.

And before you say "...but I don't need partition tolerance" consider how easy it is to have a partition. e.g. some clients able to speak to only one server due to a network fault. (Yes, clients are part of your distributed system!)

Cocoa Crispies
Jul 20, 2001

Vehicular Manslaughter!

Pillbug

Notorious b.s.d. posted:

And before you say "...but I don't need partition tolerance" consider how easy it is to have a partition. e.g. some clients able to speak to only one server due to a network fault. (Yes, clients are part of your distributed system!)

lol somebody read http://aphyr.com/tags/jepsen

Notorious b.s.d.
Jan 25, 2003

by Reene

yep. actually i saw him present on the topic. it was p. sweet

i had never really thought about the client's angle and that was a scary moment

Cocoa Crispies
Jul 20, 2001

Vehicular Manslaughter!

Pillbug

Notorious b.s.d. posted:

yep. actually i saw him present on the topic. it was p. sweet

i had never really thought about the client's angle and that was a scary moment

was that at ricon east? missed it to go to tefville instead

tef
May 30, 2004

-> some l-system crap ->
:toot:

tef
May 30, 2004

-> some l-system crap ->

Notorious b.s.d. posted:

yep. actually i saw him present on the topic. it was p. sweet

did you find out what vector clocks are?

spongeh
Mar 22, 2009

BREADAGRAM OF PROTECTION

tef posted:

did you find out what vector clocks are?

misread that as victor clocks, and i wanted to know more

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

spongeh posted:

misread that as victor clocks, and i wanted to know more

i misread it as vector cocks

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>
is victor still building Tsunami or was that someone else

Max Facetime
Apr 18, 2009

Notorious b.s.d. posted:

Best for what?

If you require partition tolerance, do you want a CP or an AP system? This is a subtle and difficult choice. AP is really hard to do unless you build your software from the ground up to support it (e.g. datomic + riak makes AP kinda the default choice when designing your product)

CP is easier to wrap your (well, my) head around, and requires less of your time spent thinking about CRDTs, but having the whole world grind to a halt in the face of a problem is kinda scary.

And before you say "...but I don't need partition tolerance" consider how easy it is to have a partition. e.g. some clients able to speak to only one server due to a network fault. (Yes, clients are part of your distributed system!)

:eek:

...best for those words you just mentioned?

Cocoa Crispies
Jul 20, 2001

Vehicular Manslaughter!

Pillbug

Max Facetime posted:

:eek:

...best for those words you just mentioned?

there's no best, just what works for what you need

Max Facetime
Apr 18, 2009

availability seems a more interesting problem than consistency, if I understood those jepsen articles right

I want to use Cassandra's CQL because it's finally making those column families somewhat understandable but how do you figure out if there's similar pitfalls like postresql, redis, mongodb and riak had?

Max Facetime
Apr 18, 2009

Cocoa Crispies posted:

there's no best, just what works for what you need

so 1 step forward and 2 steps back

... thanks

MononcQc
May 29, 2007

Max Facetime posted:

availability seems a more interesting problem than consistency, if I understood those jepsen articles right.

They're two faces of the same coin. It's "easy" to go all in for consistency (require a two-phase commit, require everyone to agree on the value, and stall otherwise), and it's easy to go all-in for availability (just drop all consistency requirements).

It's harder to have consistency with multiple failures (quorums, see Paxos, Raft, ZAB), or to have high availability while trying to be as consistent as possible (Dynamo, CRDTs, etc.)

In the last few years, a lot of effort has been devoted to the latter category. The requirements for low latency, huge clusters (multiple hundreds of nodes) and very dynamic environments or cross-datacenter data sets makes it very interesting to find solutions that can work well there. It probably looks more interesting because I don't think there has been nearly as much formal research in that area as there has been for approaches that want to get availability first.

spongeh
Mar 22, 2009

BREADAGRAM OF PROTECTION

Max Facetime posted:

availability seems a more interesting problem than consistency, if I understood those jepsen articles right

I want to use Cassandra's CQL because it's finally making those column families somewhat understandable but how do you figure out if there's similar pitfalls like postresql, redis, mongodb and riak had?

We adopted Cassandra really early on (too early), and didn't keep up with updating it. We ran into huge performance issues that would cause it to become cripplingly slow under load, but didn't keep up with the various version breakages and API changes which made it really hard to just upgrade our cluster. I'm sure now that they've hit 1.0 it's a bit better, but even with CQL being relatively new, you'll probably want to make sure you keep up to date with new Cassandra updates, as even recently they still seemed to be making a lot of big changes between 1.x versions.

Notorious b.s.d.
Jan 25, 2003

by Reene

tef posted:

did you find out what vector clocks are?

like all distributed systems ninjas i rely on memes to communicate c.s. concepts

do u even kno how memes work

Edit: http://teespring.com/doyouevenknow for all y'all not on the tef inside baseball circuit (smug, proven correct, impoverished)

Notorious b.s.d. fucked around with this message at 04:06 on Jun 18, 2013

MeruFM
Jul 27, 2010
Are all these distributed, availability, consistency, etc problems just theoretical or do these issues actually happen in real life? And in what systems of so?

I've used some variant of SQL mostly and the biggest problems still arise from poo poo code (like 99.999 of all computer problems really) causing deadlock and rollbacks.

The problems seem interesting, but at what size does MySQL/oracle stop being good enough, provided nothing stupid going on?

MeruFM fucked around with this message at 09:04 on Jun 18, 2013

Squinty Applebottom
Jan 1, 2013

mysql starts being poo poo the second you install it

ultramiraculous
Nov 12, 2003

"No..."
Grimey Drawer

polpotpi posted:

mysql starts being poo poo the second you install it

but...but...facebook

Squinty Applebottom
Jan 1, 2013

oracle is cool cause it gives DBAs more power than they should reasonably have in a storage backend to do weird terrible poo poo so its lots of fun to play around with

Zombywuf
Mar 29, 2008

MeruFM posted:

The problems seem interesting, but at what size does MySQL/oracle stop being good enough, provided nothing stupid going on?

MySQL is never good enough; Postgres damnit.

In general though, a simple relational db is fine up to a few thousand transactions per minute provided you code it properly. Decent architecture can often make most of your CAP theorem problems just stop existing. In general it seems the problems that the research into distributed systems are good for are MMOs (reads and writes in equal proportion with everyone to everyone data sharing) and systems with over 100,000 simultaneous users and fast growth.

Adbot
ADBOT LOVES YOU

MononcQc
May 29, 2007

You're fine with the usual SQL databases as long as your set of operations is simple enough that a single instance or a few more (if you have master-to-master replication) can deal with the load. When you go over that limit and you need to scale up the number of masters that can write, that's when you may start serious data problems.

But even before you get there, you are hitting some distributed systems issue. Let's take cache for an example. Adding cache in front of a website is basically going for something that breaks a bunch of assumptions: the DB, the cache server (say memcached, redis for some counters, or whatever), and multiple clients (HTTP cache) may all see different values for the same resource.

Usually, this isn't a big deal because people will generally use a single browser, and programmers will treat the DB as the authoritative copy and ignore the values taken by other components -- the DB is still the master, the rest is just replication.

But let's say we've got 5 front-end nodes, and because we want cache to be faster, we cache things locally on each front-end node rather than a dedicated machine on the side (we might have better geo distribution, for example). Now, the timing of each request made can impact what a user hitting a front-end node will see.

Better, every time you refresh the page, you may get a different copy of the document, image, or whatever. It might not be a problem, but if what you're hosting is static content that visually impacts the site (like your templates, the CSS Stylesheet, and the background images it loads), you may end up with a request for the template on node A, the CSS from node B, and the images from node C. If someone hits this partially through a deploy, you may end up with a broken website for most users. Then if users cache it client-side, they might still see the issue minutes or hours after you've fixed it. Woops.

Most of the time it's not gonna be a big deal -- you could invalidate the cache once the deploy is over and fetch a fresh copy (dropping state), change the URL of the new documents (my-background.png?v54 -- duplicate documents and differentiate between state and identity) or go to a CDN that could deal with it (where these guys deal with the distributed systems poo poo for you).

Is this a hard, customer-impacting problem that requires a distributed system expert to step in? Not really. There are simple mitigation techniques and people have been dealing with it informally for years already, with good enough results. You still have an authoritative copy of everything somewhere and can deal with poo poo.

Even if it's not really complex, it still shows some properties of distributed systems where you choose to lack consistency in favor of lowered latency, as described in PACELC. The more state needs to be replicated over many nodes, with the more operations over the data set, the more difficult it gets.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply