|
Notorious b.s.d. posted:so in this system, the physical timestamp is just for laffs? The physical timestamp is the primary mechanism by which you define which update wins. It's a Last-Write-Wins mechanism. It's inspired by Google's Spanner, which uses GPS and atomic clocks to give you a very predictable interval of uncertainty between your updates, and keep them as small as possible. These require hardware and specific deployments in a data center. Hybrid clocks, on the other hand, try to give you the spanner semantics, but without the special handrolled hardware Google can afford to use and maintain. They replace the very small inaccuracy with a bigger one constrained by NTP, but compensated with a lamport clock to retain some causality information to reduce the number of conflicts you'd have otherwise. it's a cool software-level fix to a hardware-level lack of capacity.
|
# ? Nov 23, 2014 01:21 |
|
|
# ? May 20, 2024 07:56 |
|
suffix posted:so if you have two messages with the same time and counter, they came from different servers and are unordered? That's a cool conflict. So what you do is pick a winner (any, may depend on your system), and then increment the logical clock. Dealing with causality will necessarily yields these sooner or later, but they try to limit them as much as possible. the talent deficit posted:no it's like a best effort thing. event α probably happened at time t but definitely happened before event β. the casual part (α → β) is critical but the timestamp being a wall clock time is nice to have I think it's the opposite. In hybrid clocks, the hardware clock more or less dominates. The best effort is in using a lamport clock for causality, which is the nice to have whenever the hardware clock is bad. That's why you reset the lamport clock, but never reset the timestamp. the talent deficit posted:it doesn't really do this i don't think. not without some kind of synchronization on top (paxos or some kind of write throttling that prevents writes that fall outside some tolerance bracket). i mean, they're consistent in the sense they all make some sort of sense but they don't necessarily reflect reality. different nodes can have wildly different skew. you only get happened before relationships on a single node It does this because it assumes NTP keeps a bound on the skew, and the paper mentions ignoring updates that are too far ahead in time they have to be an error. The synchronization is implicit otherwise to every clock (which is why you switch from the timestamp to the lamport clock when the other one is more advanced, to make things monotonic) It will not necessarily reflect reality, but it will be a best effort to reduce it.
|
# ? Nov 23, 2014 01:25 |
|
the talent deficit posted:paxos is the only linearizable algorithm in common use. i know etcd and consul (and maybe something else i'm forgetting) are using raft but they are both very new. spanner and zookeeper aren't strictly paxos (spanner uses crazy gps synchronized atomic clocks to skip some steps and zookeeper requires your transport make a bunch of guarantees to do the same) but they are based on paxos Zookeeper isn't a quorum system so to say, but the result is the same in that there's a leader election and every write has to go through the master in a well-defined order. They are equivalent systems. Same with Raft. To be exact, Paxos only gives you linearization (with a guarantee of progress) when you first use it to elect a leader and the leader thereon becomes your central point of decision-making, which needs to be backed by a majority to ensure the information can't be crushed or lost. Without the leader you lose the guarantee of progress. the talent deficit posted:the sacrifice crdts make over paxos based systems is linearizability. they're serialized but not linear. if you apply the entirety of some set α you get result β but there's no guarantee that applying a partial set of α is consistent with any other partial set of α. in practice that means there's no read consistency except when you can apply some total ordering, ie, linearize. single nodes get read consistency but reading from two different nodes can give completely different results. two nodes may start in the same state and end in the same state in a serialized system but their intermediate states could be completely divergent CRDTs aren't serialized because during a netsplit, you will have inconsistent views that may not ever represent a view you could have serialized on a regular system. ACID transactions are serializable but not necessarily linearized as a guarantee because replaying the same requests in the same order could end up applied differently on a single system. CRDTs provide 'strong eventual consistency' meaning they're still eventually consistent, but will never have conflicts. the talent deficit posted:in an op based crdt (cmrdt) you can establish a casual relationship (i mean 'happened before') on events from a single node. that lets you relax commutativity because you can control the order of ops. you still need associativity for the reason you state (other nodes can essentially insert operations between yours). cmrdt's are really just a specialization of crdts in which you can make the casual guarantee
|
# ? Nov 23, 2014 01:31 |
|
MononcQc posted:I think it's the opposite. In hybrid clocks, the hardware clock more or less dominates. The best effort is in using a lamport clock for causality, which is the nice to have whenever the hardware clock is bad. That's why you reset the lamport clock, but never reset the timestamp. right but it can reset the lamport clock because the physical clock has incremented. it still guarantees α → β MononcQc posted:It does this because it assumes NTP keeps a bound on the skew, and the paper mentions ignoring updates that are too far ahead in time they have to be an error. The synchronization is implicit otherwise to every clock (which is why you switch from the timestamp to the lamport clock when the other one is more advanced, to make things monotonic) right that's what i meant by throttling writes that are outside whatever your error bound is how are you synchronizing clocks such that an event generated on one node happened physically before an event on another node? you get a consistent logical ordering but not necessarily an accurate one
|
# ? Nov 23, 2014 02:01 |
|
the talent deficit posted:how are you synchronizing clocks such that an event generated on one node happened physically before an event on another node? you get a consistent logical ordering but not necessarily an accurate one Well, again, NTP limits the uncertainty there. In most practical cases, my guess is the regular clock will be good enough. Whenever it conflicts, the you can only ever get the 'happened before' relationship if the evidence is hard (you saw your peer's update before you performed yours, which the lamport clock ensures). It's true there's risk that you will lose or erroneously get some of your updates in the wrong order. That's when neither your causal nor your physical clocks can be used to figure things out. That's a risk and that's where the hybrid clock does a best effort, but not much more.
|
# ? Nov 23, 2014 02:34 |
|
lmao there is a for real programming language called "coq"
|
# ? Nov 23, 2014 03:01 |
|
akadajet posted:lmao there is a for real programming language called "coq" the (french) guy named it specifically to piss off english speakers
|
# ? Nov 23, 2014 03:04 |
|
every year the first 3 lectures of the course on it are basically non-stop giggling. from 20 year olds.
|
# ? Nov 23, 2014 03:04 |
|
yeah, i'm giggling at youtube videos right now of guys talking about how long they've been using coq and how it helps them with their work
|
# ? Nov 23, 2014 03:07 |
|
why would anyone use django's html template language? python code that outputs an html string: fast to write, easy to debug, very powerful. html template languages: inferior in every way.
|
# ? Nov 23, 2014 04:00 |
|
Ator posted:why would anyone use django's html template language? mainly b/c operating on strings is terrible. tho i have recently been using react.js, which uses a an XML-like DSL compiled to function calls for markup, meaning you can do fun stuff like code:
also the only good templating language is jinja2 (and mozilla's javascript port nunjucks)
|
# ? Nov 23, 2014 04:11 |
|
suffix posted:nope. tried with 128 bit keys and still getting ridiculous distributions like consistent hashing is terrible, that's why we use vnodes
|
# ? Nov 23, 2014 13:22 |
|
suffix posted:nope. tried with 128 bit keys and still getting ridiculous distributions like my results say the opposite: consistent hashing (without vnodes), has a crap distribution rendevous hashing (without vnodes), has a pretty good distribution kadmelia has a better distribution than consistent hashing, but not as good as rendevous hashing, which is what i'd expect
|
# ? Nov 23, 2014 14:20 |
|
what i'm actually finding is rendevous hashing is really, really good but using hash(a,b) as hash(a) ^ hash(b) is not as nice as i'd hoped
|
# ? Nov 23, 2014 15:54 |
|
trying now with vnodes: rendevous hashing works amazingly, every time, even with small numbers of workers kadmelia works ok-ish, but adding vnodes you can clamp variance between nodes to something ok consistent hashing works, but less well than kadmelia most of the time. adding vnodes helps, but kadmelia with same amount of nodes usually performs better it seems that rendevous hashing is loving ace, far better than i thought. i haven't tried with rebalancing, but, my gut tells me that i'll see similar results: rendevous hashing gets the best result, kadmeila isn't as good, better with vnodes, and consistent hashing does badly by comparison
|
# ? Nov 23, 2014 16:08 |
|
the thing is, i should be able to implement kademlia as fast as consistent hashing, and i think there will be a point where rendevous is too slow and kadmelia is good enough tef fucked around with this message at 16:52 on Nov 23, 2014 |
# ? Nov 23, 2014 16:09 |
|
If it were possible to have a crush on algorithms, Rendez-Vous hashing would be pretty drat near the top of my list.
|
# ? Nov 23, 2014 17:32 |
|
rendezvous hashing is perfect. with a good hash function each key gets a independent endpoint ordering consistent hashing needs a silly number of hashes to get good distribution kademlia does seem to perform slightly better if you add more hashes, but i think they're pretty similar. they both hash both the endpoints and the key and try to find the closest one. you could look at kademlia as a different way to divide the circle. is there a tradeoff here with behavior on node failure? we have some services that use consistent hashing but makes the hash ring for each request, getting the worst of both worlds. i think i'll change those to rendezvous hashing code:
|
# ? Nov 23, 2014 18:31 |
|
The downside of rendezvous is that it costs more for each single call, so it makes sense to use it specifically when its objective is to save a lot more time than it costs by using it. i.e. caching dispatching and whatnot, and when you expect frequent variation in the nodes of your cluster with a need for low displacement. If you have a set of dedicated servers that you will grow or shrink once in a blue moon and where your objective is to find info (and is purely a cost, not a cost-saving measure), persistent hashing remains fairly great and has a lower cost. I'm not too aware of how Kamdelia compares in either case, might be a compromise?
|
# ? Nov 23, 2014 19:39 |
|
rendevous hashing is good if - your server size is small - you're not looking for very very low latency - you want excellent balancing on addition/removal of nodes - you have global knowledge of all workers kadmelia is I think a better choice than chord, because it does a bit better than balancing it takes the same storage requirements + runtime for consistent hashing, and can be just as fast it can also deal with partial knowledge too
|
# ? Nov 23, 2014 19:46 |
|
3 of those requirements seem pretty terrible. And 4 begets 1 so it's just 2 requirements i guess.
|
# ? Nov 23, 2014 20:25 |
|
The DS (Distributed Systems) thread: I'm glad Erlang is the current hipste
|
# ? Nov 23, 2014 20:36 |
|
MeruFM posted:3 of those requirements seem pretty terrible. And 4 begets 1 so it's just 2 requirements i guess. Well, let's see. 1. your server size is small 2. you're not looking for very very low latency 3. you want excellent balancing on addition/removal of nodes 4. you have global knowledge of all workers 1. Technically, the definition of 'small' depends on 2 and 4; rendezvous-hashing is an O(n) algorithm requiring you to hash a key with all the servers in your set to find the best match. Whether you can do this or not will depend on how long you can take to hash, and how easy it is to know your server list. 2. Depending on the hash algorithm and language, the cost will be x*n, where x is the cost of hashing a message, and n is the number of servers that are candidates to be routed to. If you need 5µs to hash once and have 150 servers, choosing where to send the data is going to take ~750µs. If using rendezvous-hashing lets you save 30ms once in a while due to good data locality even through failures, it might be a winning proposition in order to shrink your long tail (p99 or p999) and get more predictable results. 3. This one is okay. If nodes come and go, network is unreliable, or whatever, rendezvous hashing lets you pick primaries and failovers for a given key on the go. Adding or removing nodes from the cluster permanently guarantees the least disruption possible (1/|nodes| keys move on average for each addition or removal of a node, as opposed to the consistent's hash (|nodes|-1)/|nodes| unless you use vnodes) 4. This fully depends on how dynamic your cluster is. Most people, for example, may have an idea of what machines they are running.
|
# ? Nov 23, 2014 20:51 |
|
tef posted:rendevous hashing is good if This is like 99.9% of poo poo inside a webserver DC, where who gives a gently caress about an extra hash call or something
|
# ? Nov 23, 2014 21:05 |
|
MononcQc posted:Well, let's see. lets say hashing takes 5us. 1k instances, goes up and down between 600-1100 as demand goes up and down so: 5ms. Network latency is 10s-100s of ms, who gives a gently caress if u tryin to route packets and trades and poo poo, sure
|
# ? Nov 23, 2014 21:08 |
|
Also the number of installs that have thousands of servers and no way to somehow filter or cull the total amount to find a candidate is fairly low. Still interesting from a theoretical point of view though.
|
# ? Nov 23, 2014 21:19 |
|
fwiw anyone interested may want to watch this http://vimeo.com/52446728 and maybe follow with this https://www.youtube.com/watch?v=ggCffvKEJmQ
|
# ? Nov 23, 2014 21:28 |
|
MononcQc posted:Also the number of installs that have thousands of servers and no way to somehow filter or cull the total amount to find a candidate is fairly low. Still interesting from a theoretical point of view though. i guess u could do it hierarchically
|
# ? Nov 24, 2014 03:20 |
|
Malcolm XML posted:lets say hashing takes 5us. 1k instances, goes up and down between 600-1100 as demand goes up and down so: 5ms. Network latency is 10s-100s of ms, who gives a gently caress Or if you make more than one call to the hash table per operation. I've seen systems make hundreds. Terrible, terrible systems.
|
# ? Nov 24, 2014 13:14 |
|
tef posted:fwiw anyone interested may want to watch this Reading the paper now. They've implemented it as a Ruby DSL
|
# ? Nov 24, 2014 17:33 |
|
i rather think that that is the legitimate use of ruby though. works out basically like tcl, but with worse libraries
|
# ? Nov 24, 2014 18:36 |
|
second talk is about daedelus
|
# ? Nov 24, 2014 18:36 |
|
Cybernetic Vermin posted:i rather think that that is the legitimate use of ruby though. works out basically like tcl, but with worse libraries it's not a dsl if the errors are in another language, it is not a legitimate use, it is a terrible use
|
# ? Nov 24, 2014 18:37 |
|
Cybernetic Vermin posted:i rather think that that is the legitimate use of ruby though. works out basically like tcl, but with worse libraries see also: perl python
|
# ? Nov 24, 2014 20:24 |
|
so i have this crazy/dumb idea for a personal universal data network. basically what i want to do is have a bunch of agents feeding heterogeneous data streams into the system, then a bunch of internal agents transforming that data into other streams, and then some me-facing endpoints that these streams get muxed into. think something like an extremely chatty cattebot on an irc channel of one, where not only can i ask it about the weather but it can also tell me at random that a web server i run got 10 hits in the last minute or display the last 3 posts in the bitcoin thread. everything that i personally find "interesting" and can collapse into semi-structured data could be ingested into the system and eventually be formatted for ADD consumption in a box that scrolls by i want to be able to write the agents in any language (so i could write the agent that consumes that web server log file in ruby but have internal agents written in scala, clojure, f#) and eventually i want the screen that is showing me this stream to be something like a REPL where i can bang out a small function and have it be turned into an agent - say i decided that i want to know about a specific IP hitting my web server, i could write a function that filters that log stream, and then turn it into an agent that would sometimes barf data at my screen, all without having to ssh anywhere i dont need persistent storage of this data to start with but my concept of this being a signal-processing chain means that i could add it later with just another agent anyway i am kind of stuck on what basic tech to use to hook these things together. my current "leader" for this is zeromq because it looks like it gives me several building blocks to construct my own network topology and has bindings for pretty much every language i care about. it also works like sockets which i already have a good mental model of. does anyone have any horror stories about zmq that would make it unsuitable for this? any suggestions of what would be a better choice? i can tell that i'm probably getting hung up on some pretty meaningless choices here but i'm trying to avoid reinventing too many wheels. also feel free to tell me that my project is dumb and ten thousand security incidents waiting to happen
|
# ? Nov 24, 2014 20:30 |
|
ill start the wiki?
|
# ? Nov 24, 2014 20:30 |
|
https://hubot.github.com and write your own plugins or maybe go outside and get some fresh air imo
|
# ? Nov 24, 2014 20:31 |
|
power botton posted:https://hubot.github.com and write your own plugins or maybe go outside and get some fresh air imo quote:Today's version of Hubot is open source, written in CoffeeScript on Node.js, *record scratch*
|
# ? Nov 24, 2014 20:32 |
|
if you're gonna do dumb poo poo like reinvent spotlight/alfred plugins for fucktarded ideas you deserve all the pain you get
|
# ? Nov 24, 2014 20:34 |
|
|
# ? May 20, 2024 07:56 |
|
power botton posted:if you're gonna do dumb poo poo like reinvent spotlight/alfred plugins for fucktarded ideas you deserve all the pain you get fair, but spotlight/Alfred only search my local machine, and they are pull only; I can't see my web server data on my desktop without some serious work, and then I still have to do yet more work to get it on my iPhone or my work machine. I will freely admit that part of the inspiration for this comes from my childish desire to have a window with scrolling text/graphics on all of my computing devices except it should show me interesting (to me) data instead of fake placeholders the ability to write code in a box and have it run on a machine in my cluster without any more effort needed than "type the code" is sort of important too
|
# ? Nov 24, 2014 20:40 |