|
apache is the best cause its designed by and for professionals who work in the real world. MIT/BSD are ok, but apache is more modern and clearly defines rights, whereas they are kind of assumed to exist in MIT/BSD (afaik). you're not gonna be able to control your open sores. using apache over a restrictive license (one that requires code kick backs, or worse, no proprietary software) gives users the ability to change their mind about contributing later on instead of skipping your license or ignoring it entirely.
|
# ? Jun 14, 2013 18:36 |
|
|
# ? Jun 11, 2024 00:48 |
|
Shaggar posted:you're not gonna be able to control your open sores. using apache over a restrictive license (one that requires code kick backs, or worse, no proprietary software) gives users the ability to change their mind about contributing later on instead of skipping your license or ignoring it entirely. "i violate copyright at work, therefore everyone else does too"
|
# ? Jun 14, 2013 19:00 |
|
Who the gently caress is releasing code under a license that requires upstream contribution?
|
# ? Jun 14, 2013 19:20 |
|
Otto Skorzeny posted:"i violate copyright at work, therefore everyone else does too" copyright infringement is easy, prevention less so, but not impossible if you make the source code totally unappealing. gpl helps with this
|
# ? Jun 14, 2013 19:43 |
|
https://www.youtube.com/watch?v=E3418SeWZfQ shaggar was right?
|
# ? Jun 14, 2013 20:42 |
|
Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right?
|
# ? Jun 15, 2013 01:32 |
|
NickFendon posted:Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right? its identical The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software,
|
# ? Jun 15, 2013 01:36 |
|
Quick rundown of eventual consistency methods https://research.microsoft.com/pubs/192621/sigtt611-bernstein.pdf -- a nice quick read to have a general idea of what's out there.
|
# ? Jun 15, 2013 02:07 |
|
polpotpi posted:its identical in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.
|
# ? Jun 15, 2013 02:15 |
|
lol like I'm reading the whole drat thing
|
# ? Jun 15, 2013 02:26 |
|
NickFendon posted:Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right? because its useless
|
# ? Jun 15, 2013 02:38 |
|
NickFendon posted:Why does nobody release stuff under the Boost license? It's like MIT but without the requirement to include the license with compiled output / binaries, right? Some C++ libs are. Java projects use Apache license because the ASF does Also I don't mind if someone uses code I publish but no attribution? I'm not ok with that
|
# ? Jun 15, 2013 03:05 |
|
MononcQc posted:Quick rundown of eventual consistency methods https://research.microsoft.com/pubs/192621/sigtt611-bernstein.pdf -- a nice quick read to have a general idea of what's out there. is there anything better than last write wins and stupid limited things like replicated counters
|
# ? Jun 15, 2013 03:06 |
|
prefect posted:artistic license
|
# ? Jun 15, 2013 07:57 |
|
Nomnom Cookie posted:Some C++ libs are. Java projects use Apache license because the ASF does Copyright law requires it irrespective of the license. If someone copies your code into their work, then whether or not you allow them to do that you still own the copyright for your work and it is illegal for them to plagiarise it (claim that they wrote it and not you, aka "moral rights") or remove any copyright notices you've added in there http://en.wikipedia.org/wiki/ISC_license ISC license best license, it's basically a "do whatever the gently caress you want" license except without all the poo poo that's required by copyright law anyway. My only complaint is the HURR IF IT'S IN CAPITAL LETTERS IT'S MORE LEGALER second paragraph.
|
# ? Jun 15, 2013 13:56 |
|
polpotpi posted:lol like I'm reading the whole drat thing did you know? there is software that can compare two bodies of text,
|
# ? Jun 15, 2013 15:04 |
|
Nomnom Cookie posted:is there anything better than last write wins and stupid limited things like replicated counters Last write wins is a simpler version of logical clocks and others, which aim to track logical dependencies in order to limit how often you need to actually need to make a decision that will lose data. CRDTs will have you design your data types such as they can be merged in any way or order and give stable results (the paper above represents a way to do it with sets via counters, but it's not always counters). It always depends on what you can allow yourself to lose or not. The general ideas remain limited by the CAP theorem: you can't have updates that work and are reflected on all nodes of a system during failures. Either you stop operations, or you allow some of them and deal with the fallover later. A lot of it is either designing a system such that you get the optimal amount of communication required to guarantee consistency (through a quorum or whatever), or organizing (some or all) operations such as that they can be reconciled safely when the cluster is healthy.
|
# ? Jun 15, 2013 18:09 |
|
So which software is the best for that?
|
# ? Jun 15, 2013 22:26 |
|
Max Facetime posted:So which software is the best for that? Best for what? If you require partition tolerance, do you want a CP or an AP system? This is a subtle and difficult choice. AP is really hard to do unless you build your software from the ground up to support it (e.g. datomic + riak makes AP kinda the default choice when designing your product) CP is easier to wrap your (well, my) head around, and requires less of your time spent thinking about CRDTs, but having the whole world grind to a halt in the face of a problem is kinda scary. And before you say "...but I don't need partition tolerance" consider how easy it is to have a partition. e.g. some clients able to speak to only one server due to a network fault. (Yes, clients are part of your distributed system!)
|
# ? Jun 16, 2013 17:33 |
|
Notorious b.s.d. posted:And before you say "...but I don't need partition tolerance" consider how easy it is to have a partition. e.g. some clients able to speak to only one server due to a network fault. (Yes, clients are part of your distributed system!) lol somebody read http://aphyr.com/tags/jepsen
|
# ? Jun 16, 2013 17:54 |
|
Cocoa Crispies posted:lol somebody read http://aphyr.com/tags/jepsen yep. actually i saw him present on the topic. it was p. sweet i had never really thought about the client's angle and that was a scary moment
|
# ? Jun 16, 2013 18:08 |
|
Notorious b.s.d. posted:yep. actually i saw him present on the topic. it was p. sweet was that at ricon east? missed it to go to tefville instead
|
# ? Jun 16, 2013 18:53 |
|
|
# ? Jun 16, 2013 20:57 |
|
Notorious b.s.d. posted:yep. actually i saw him present on the topic. it was p. sweet did you find out what vector clocks are?
|
# ? Jun 16, 2013 21:00 |
|
tef posted:did you find out what vector clocks are? misread that as victor clocks, and i wanted to know more
|
# ? Jun 16, 2013 21:29 |
|
spongeh posted:misread that as victor clocks, and i wanted to know more i misread it as vector cocks
|
# ? Jun 17, 2013 05:00 |
|
is victor still building Tsunami or was that someone else
|
# ? Jun 17, 2013 07:11 |
|
Notorious b.s.d. posted:Best for what? ...best for those words you just mentioned?
|
# ? Jun 17, 2013 15:26 |
|
Max Facetime posted:
there's no best, just what works for what you need
|
# ? Jun 17, 2013 15:30 |
|
availability seems a more interesting problem than consistency, if I understood those jepsen articles right I want to use Cassandra's CQL because it's finally making those column families somewhat understandable but how do you figure out if there's similar pitfalls like postresql, redis, mongodb and riak had?
|
# ? Jun 17, 2013 15:35 |
|
Cocoa Crispies posted:there's no best, just what works for what you need so 1 step forward and 2 steps back ... thanks
|
# ? Jun 17, 2013 15:40 |
|
Max Facetime posted:availability seems a more interesting problem than consistency, if I understood those jepsen articles right. They're two faces of the same coin. It's "easy" to go all in for consistency (require a two-phase commit, require everyone to agree on the value, and stall otherwise), and it's easy to go all-in for availability (just drop all consistency requirements). It's harder to have consistency with multiple failures (quorums, see Paxos, Raft, ZAB), or to have high availability while trying to be as consistent as possible (Dynamo, CRDTs, etc.) In the last few years, a lot of effort has been devoted to the latter category. The requirements for low latency, huge clusters (multiple hundreds of nodes) and very dynamic environments or cross-datacenter data sets makes it very interesting to find solutions that can work well there. It probably looks more interesting because I don't think there has been nearly as much formal research in that area as there has been for approaches that want to get availability first.
|
# ? Jun 17, 2013 16:49 |
|
Max Facetime posted:availability seems a more interesting problem than consistency, if I understood those jepsen articles right We adopted Cassandra really early on (too early), and didn't keep up with updating it. We ran into huge performance issues that would cause it to become cripplingly slow under load, but didn't keep up with the various version breakages and API changes which made it really hard to just upgrade our cluster. I'm sure now that they've hit 1.0 it's a bit better, but even with CQL being relatively new, you'll probably want to make sure you keep up to date with new Cassandra updates, as even recently they still seemed to be making a lot of big changes between 1.x versions.
|
# ? Jun 17, 2013 20:00 |
|
tef posted:did you find out what vector clocks are? like all distributed systems ninjas i rely on memes to communicate c.s. concepts do u even kno how memes work Edit: http://teespring.com/doyouevenknow for all y'all not on the tef inside baseball circuit (smug, proven correct, impoverished) Notorious b.s.d. fucked around with this message at 04:06 on Jun 18, 2013 |
# ? Jun 18, 2013 04:03 |
|
Are all these distributed, availability, consistency, etc problems just theoretical or do these issues actually happen in real life? And in what systems of so? I've used some variant of SQL mostly and the biggest problems still arise from poo poo code (like 99.999 of all computer problems really) causing deadlock and rollbacks. The problems seem interesting, but at what size does MySQL/oracle stop being good enough, provided nothing stupid going on? MeruFM fucked around with this message at 09:04 on Jun 18, 2013 |
# ? Jun 18, 2013 08:53 |
|
mysql starts being poo poo the second you install it
|
# ? Jun 18, 2013 10:59 |
|
polpotpi posted:mysql starts being poo poo the second you install it but...but...facebook
|
# ? Jun 18, 2013 11:06 |
|
oracle is cool cause it gives DBAs more power than they should reasonably have in a storage backend to do weird terrible poo poo so its lots of fun to play around with
|
# ? Jun 18, 2013 11:15 |
|
MeruFM posted:The problems seem interesting, but at what size does MySQL/oracle stop being good enough, provided nothing stupid going on? MySQL is never good enough; Postgres damnit. In general though, a simple relational db is fine up to a few thousand transactions per minute provided you code it properly. Decent architecture can often make most of your CAP theorem problems just stop existing. In general it seems the problems that the research into distributed systems are good for are MMOs (reads and writes in equal proportion with everyone to everyone data sharing) and systems with over 100,000 simultaneous users and fast growth.
|
# ? Jun 18, 2013 11:52 |
|
|
# ? Jun 11, 2024 00:48 |
|
You're fine with the usual SQL databases as long as your set of operations is simple enough that a single instance or a few more (if you have master-to-master replication) can deal with the load. When you go over that limit and you need to scale up the number of masters that can write, that's when you may start serious data problems. But even before you get there, you are hitting some distributed systems issue. Let's take cache for an example. Adding cache in front of a website is basically going for something that breaks a bunch of assumptions: the DB, the cache server (say memcached, redis for some counters, or whatever), and multiple clients (HTTP cache) may all see different values for the same resource. Usually, this isn't a big deal because people will generally use a single browser, and programmers will treat the DB as the authoritative copy and ignore the values taken by other components -- the DB is still the master, the rest is just replication. But let's say we've got 5 front-end nodes, and because we want cache to be faster, we cache things locally on each front-end node rather than a dedicated machine on the side (we might have better geo distribution, for example). Now, the timing of each request made can impact what a user hitting a front-end node will see. Better, every time you refresh the page, you may get a different copy of the document, image, or whatever. It might not be a problem, but if what you're hosting is static content that visually impacts the site (like your templates, the CSS Stylesheet, and the background images it loads), you may end up with a request for the template on node A, the CSS from node B, and the images from node C. If someone hits this partially through a deploy, you may end up with a broken website for most users. Then if users cache it client-side, they might still see the issue minutes or hours after you've fixed it. Woops. Most of the time it's not gonna be a big deal -- you could invalidate the cache once the deploy is over and fetch a fresh copy (dropping state), change the URL of the new documents (my-background.png?v54 -- duplicate documents and differentiate between state and identity) or go to a CDN that could deal with it (where these guys deal with the distributed systems poo poo for you). Is this a hard, customer-impacting problem that requires a distributed system expert to step in? Not really. There are simple mitigation techniques and people have been dealing with it informally for years already, with good enough results. You still have an authoritative copy of everything somewhere and can deal with poo poo. Even if it's not really complex, it still shows some properties of distributed systems where you choose to lack consistency in favor of lowered latency, as described in PACELC. The more state needs to be replicated over many nodes, with the more operations over the data set, the more difficult it gets.
|
# ? Jun 18, 2013 14:01 |