Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

otoh enjoy not having to use matlab

Adbot
ADBOT LOVES YOU

coffeetable
Feb 5, 2006

TELL ME AGAIN HOW GREAT BRITAIN WOULD BE IF IT WAS RULED BY THE MERCILESS JACKBOOT OF PRINCE CHARLES

YES I DO TALK TO PLANTS ACTUALLY

dragon enthusiast posted:

enjoy your peers being unable to run or figure out your code because you didnt pick matlab
funnily enough all the other prototypes in the office are written in R, but i was strongly advised against picking it up because "it's a horrorshow and we wouldn't use it if we hadn't been using it for ten years"

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

r is better than matlab in this goon's opinion

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

dragon enthusiast posted:

enjoy your peers being unable to run or figure out your code because you didnt pick matlab

I doubt it, in my experience matlab code is usually harder to read than fortran

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

MeramJert posted:

r is better than matlab in this goon's opinion

r has nice graphs

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

just use gnu octave

Stringent
Dec 22, 2004


image text goes here

Symbolic Butt posted:

r has nice graphs

so does javascript

Soricidus
Oct 21, 2010
freedom-hating statist shill

Stringent posted:

so does javascript
congrats you thought of a lang that's less maintainable than r or matlab

Stringent
Dec 22, 2004


image text goes here

Soricidus posted:

congrats you thought of a lang that's less maintainable than r or matlab

:thejoke:

cowboy beepboop
Feb 24, 2001

use julia

pseudopresence
Mar 3, 2005

I want to get online...
I need a computer!
MononcQc and anyone else who has opinions: I'd like to get some ideas on good ways to manage exclusive access to a resource in a distributed environment, maybe how it would be done in Erlang, so I can adapt the ideas to my own work.

I'm working in a framework called ROS that does distributed messaging and launching of processes written in python or C++. It's pretty neat but it's still got plenty to learn from Erlang in my opinion (although I have yet to properly learn me some Erlang), and that's why I'm asking you.

We have a robot, and we only want one process to control the robot at a time, so there's a process that acts as a broker for controllers. Other processes can ask for exclusive access, i.e. lock a mutex, by sending a message to the broker, and they keep access until they send another message releasing the mutex. The problem is that the controller process could crash or be terminated or freeze while it has the lock; right now we deal with that by allowing the user to manually release the mutex from our GUI, but this isn't really a solution I want to keep. How would something like this be done in Erlang? Would you just use some kind of heartbeat and timeout the mutex if the process hasn't been heard from in a while?

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Fib posted:

MononcQc and anyone else who has opinions: I'd like to get some ideas on good ways to manage exclusive access to a resource in a distributed environment, maybe how it would be done in Erlang, so I can adapt the ideas to my own work.

I'm working in a framework called ROS that does distributed messaging and launching of processes written in python or C++. It's pretty neat but it's still got plenty to learn from Erlang in my opinion (although I have yet to properly learn me some Erlang), and that's why I'm asking you.

We have a robot, and we only want one process to control the robot at a time, so there's a process that acts as a broker for controllers. Other processes can ask for exclusive access, i.e. lock a mutex, by sending a message to the broker, and they keep access until they send another message releasing the mutex. The problem is that the controller process could crash or be terminated or freeze while it has the lock; right now we deal with that by allowing the user to manually release the mutex from our GUI, but this isn't really a solution I want to keep. How would something like this be done in Erlang? Would you just use some kind of heartbeat and timeout the mutex if the process hasn't been heard from in a while?

We handle this by heartbeats and supervising processes that restart the downed process, and we keep state persisted as best we can. Would be interested in hearing of better solutions

prefect
Sep 11, 2001

No one, Woodhouse.
No one.




Dead Man’s Band

Shaggar posted:

yes I correctly use punctuation.

shaggar was wrong

Mr. Glass
May 1, 2009

Fib posted:

MononcQc and anyone else who has opinions: I'd like to get some ideas on good ways to manage exclusive access to a resource in a distributed environment, maybe how it would be done in Erlang, so I can adapt the ideas to my own work.

I'm working in a framework called ROS that does distributed messaging and launching of processes written in python or C++. It's pretty neat but it's still got plenty to learn from Erlang in my opinion (although I have yet to properly learn me some Erlang), and that's why I'm asking you.

We have a robot, and we only want one process to control the robot at a time, so there's a process that acts as a broker for controllers. Other processes can ask for exclusive access, i.e. lock a mutex, by sending a message to the broker, and they keep access until they send another message releasing the mutex. The problem is that the controller process could crash or be terminated or freeze while it has the lock; right now we deal with that by allowing the user to manually release the mutex from our GUI, but this isn't really a solution I want to keep. How would something like this be done in Erlang? Would you just use some kind of heartbeat and timeout the mutex if the process hasn't been heard from in a while?

A more resilient approach would be to use a distributed consensus algorithm like Raft or Paxos to do this.

https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
http://static.googleusercontent.com/media/research.google.com/en/us/archive/chubby-osdi06.pdf

MononcQc
May 29, 2007

Fib posted:

MononcQc and anyone else who has opinions: I'd like to get some ideas on good ways to manage exclusive access to a resource in a distributed environment, maybe how it would be done in Erlang, so I can adapt the ideas to my own work.

I'm working in a framework called ROS that does distributed messaging and launching of processes written in python or C++. It's pretty neat but it's still got plenty to learn from Erlang in my opinion (although I have yet to properly learn me some Erlang), and that's why I'm asking you.

We have a robot, and we only want one process to control the robot at a time, so there's a process that acts as a broker for controllers. Other processes can ask for exclusive access, i.e. lock a mutex, by sending a message to the broker, and they keep access until they send another message releasing the mutex. The problem is that the controller process could crash or be terminated or freeze while it has the lock; right now we deal with that by allowing the user to manually release the mutex from our GUI, but this isn't really a solution I want to keep. How would something like this be done in Erlang? Would you just use some kind of heartbeat and timeout the mutex if the process hasn't been heard from in a while?

Alright, this is gonna be a long post, with pretty much 0 relationship to Erlang. First of all, there are a few things to consider:
  1. Can there be more than one type of lock possible? (Read/Write/Exclusive)
  2. Can there be more than one type of lock at the same time?
  3. Is there a need to queue locks and wait for your turn?
  4. Is your broker on the robot or on another node?
  5. Is your broker just handing over permission to talk to the robot or does it tunnel the communication for you?

To make things simple, I'm going to assume that:
  • The robot can return to a 'clean' state if disconnected. This means that any disconnection from a controller will never leave it in a state in which it is inconsistent or unable to accept new connections. No deadlock or whatever. If you get disconnected, assume the client can query the state of the robot back and do whatever it needs to do.
  • There is one type of lock possible at a time -- exclusive lock (having more is not necessarily that more complex, though it can be hairy)
  • The broker is not on the robot -- if your broker is on the robot, this can be simulated by having a single assigned master broker that tunnels your stuff to the robot, so I'm going a bit more general here.
  • Locks don't require queuing of blocked attempts -- just keep retrying (queues can often be added simply enough, but they're lovely state to carry around and I want to avoid it)
  • There are no byzantine failures (nobody is actively trying to break your poo poo, and the code is correct)
  • The broker tunnels connections

That being said, a lock manager has three big universal requirements: a request message from the client, a granted message from the manager, a release message.

Using a designated Central Broker

I'm going with a broker that tunnels connections because the other kind of stuff is immensely harder. Just have this failure scenario in mind:

1. The broker grants permission to p1
code:
        [broker]                [robot]
          /
    [p1]-'
    [p2]
2. p1 talks to the robot while the broker monitors it (indirectly or not)
code:
        [broker](p1)            [robot]
       /                        /
    [p1]-----------------------'
    [p2]
3. Any netsplit between the broker and p1, or the robot and p1 ends up having the broker not knowing the true state of the connection between p1 and robot. In this case, it becomes possible for the broker to grant access to the robot to p2.
code:
        [broker]                [robot]
       ?   /                     /
    [p1]--/-------?-------------'
    [p2]-'
To avoid that scenario, it's simpler to pipe the connection through the broker:

code:
        [broker]----------------[robot]
          /
    [p1]-'
    [p2]
Any split that happens at this point then still guarantees that only one process has access to the resources on the robot.

So this pattern is easy to enact. Just make the broker a central process with its own local lock management, and shove everything behind a TCP connection. The TCP connection (with a timeout) can deal with keeping a session alive, and the central authority will avoid having an asymetric split (Broker sees P1 as gone, but P1 still sees Broker) ever come back if you just kill it. There will be no way for a controller to keep sending orders to the robot after its lock has been forcefully removed. Killing the connection is releasing the lock, whoever does it. It does mean that lovely network may end up releasing locks more often than you need though.

It's tempting to decide to say that starting the TCP connection is your request message, that establishing it is the granted message, and killing it is the release message, but be aware that the kernel may play tricks on you. I'm thinking specifically of the TCP backlog here. The TCP backlog (if you don't know what it is) is an optimization where the kernel will decide to accept connections on your behalf, but not relay them to userland until you make manual calls to 'accept' in your code. The risk there is that the client will be able to see a 'request -> granted' sequence without your broker effectively ever seeing it, many times.

A way to fix this is to set the TCP backlog to 1, but many linux kernels have a hard-coded minimum value to 16, below which you can't go. This means that, for clients' sake, you should choose to establish a connection that makes it to userland, and have an end-to-end (part of your protocol) message to ask for a resource, and one message to grant the resource. In the case of a backlogged connection, you'll be sending a request that just won't get a response until the broker is ready to accept a connection, effectively acting as a queue.

You can still see the disconnection as an optional or non-optional release -- that one is safe. You could also have an explicit release, letting you distinguish between possibly normal or abnormal releases.

With that method, you've offshored timeout management to TCP, got your session handling ready, can do failure detection of all kinds, etc. and can enforce through your software on a central broker that only one connection is available at once.

The downside is that controlling your robot now has a central point of failure. If the broker isn't sitting on the robot itself, this sucks. We've made the whole thing sequential and that's the consequence of it.


Decentralized Locking

Decentralized locking is a poo poo term because all locking is somehow centralised. To decentralized locking and whatnot, what you get is effectively a logical equivalent to leader election. Or rather, something that would let you elect a custom broker, but if the broker doesn't share connections and keeps it to itself, you've got a lock. You can use leader election for both.

Leader election consists of having all involved processes talk to each other and pick a winner to lead whatever is going to happen next. You can see how a leader election grants an exclusive status across all nodes, and that's pretty much a lock. Coming into this territory, there's a lot that exists already in terms of algorithms:

The first one is the bully algorithm - this one kind of sucks because it assumes a synchronous system, which doesn't really happen that much in the wild. It's basically a dick-waving contest where each process sends a personal token (say their Pid or IP or whatever) and the one with the biggest token is the leader. When you decide to be elected, you broadcast your (victory, Token) everywhere, and if anyone has a higher pid, it can trigger new elections. You have to keep each process listening to broadcasts from all other and assert you're the biggest one on the block to keep being alive.

It needs to be synchronous because you need to be able to detect splits and failures and whatnot, and unless you know of all the processes in the cluster (which a tick/heartbeat to all peers can do, but can block all operations), you may run the risk of having >1 leaders, being a poo poo locking algorithm.

Then you have the Chang & Roberts Algorithm that looks a bit like the Bully algorithm, but also requires that you set all your processes in a unidirectional ring. When starting an election, you send your UID, and wait until it comes back. If any process on the ring has a bigger UID, they set themselves up as the new potential winner. When the message has gone around the ring, you've elected a leader. This algorithm sucks because it's not fault tolerant. Any process going down may stall an election. Don't use it unless you're sitting in a token ring network or something.

The HS algorithm is basically the same as Chang & Roberts, except the messages go both ways to be somewhat faster.

There's a shitload more ring-based leader election algorithm, but I tend to hate them because they suck at dealing with nodes going away short of some failure-prone ping to reconfigure. Similarly for the algorithms that tend to depend on finding a minimal spanning tree graph over the network.

The more robust models come from dealing with Quorums. Quorum systems will generally require you to know who participates in elections (i.e. all nodes know about all other nodes), or at the very least, have a value that represents any majority. Meaning that if you have 50 nodes, you don't have to know all of their names, but you know that you need at least 26 of them to agree on a value to keep going. If you ever add nodes without raising the quorums (say reach 52 nodes), then you're done, you lost the ability to elect a majority. You can set arbitrarily higher quorums (like 90%) if you want to. Be aware that setting a quorum may require either reconfiguring all nodes by hand, or going through a request in the quorum system (if you need a minimal quorum of 12 [23 nodes total], want to set it to 15, but set it to 11, you just broke your ability to set a new quorum reliably ever again!)

There are three main players there:
  • Paxos
  • ZAB (Zookeeper Atomic Broadcast), aka "just use zookeeper"
  • Raft

Paxos is the canonical quorum system. The paper is notoriously hard to understand, and Google took a long time to get it right with Chubby. It was proven to be optimal in most cases in terms of messages and roundtrips required, providing an upper-bound for good quorum. The gist of it is that you know it exists, and maybe use an existing solution, but don't write your own.

ZAB is a fancypants protocol behind Zookeeper, which has been proven to be pretty solid. Zookeeper assumes a small-ish set of nodes with TCP connections available to them. By using this (and fancy epochs and maths), it's able to set up the equivalent of a synchronous network that can tolerate a minority of failures (like all quorum systems). You can use a Zookeeper cluster to directly manage your locks, if you want. That might be the best practical option here.

Raft is the answer to Paxos' complexity. It's a leader election algorithm (a very basic one, probabilistic) and a protocol to replicate and commit data. It's still fairly young, and implementations like etcd, for the longest time, had not read the paper properly, which left them with plenty of cases where poo poo was inconsistent. It's my crowning achievement in distributed systems that I figured that one out in an internal e-mail thread at work 2 months before aphyr proved it (the thing is they used an internal master log to do fast-reads, which broke quorums in case where netsplits caused multiple leaders to coexist, a big no-no). It's more solid now, and could be worth trying.

Failure mode of Leader Election

The challenge with any leader election comes with specific netsplits and the timing required. Most leader election algorithms have a large-ish weakness in that it's possible that more than one process believe they are the leader. However, as long as followers are required to validate the leader's actions, processes that wrongly believe they are leaders won't cause any major issue (equivalent to the bug in etcd).

In our case though, because the robot is not part of that quorum mechanism, it cannot know if it's getting messaged by the true leader or not.

This means that every time you want to acquire a lock, you need to do a leader election, and the robot has to somehow be able to say to a beaten leader (who doesn't know about it) "move out jerk, you've been beaten by another leader". The best way to figure out a leader was beaten is often to use an 'epoch'. When the system starts, the session is at 0. Every time a lock is acquired, raise it by 1. Whenever the robot receives a session with a lock strictly greater than the one already established, it can terminate the older one (a new global lock was acquired). Whenever it's smaller or equal, you know there's a split somewhere. Make sure increasing the epoch requires a majority and you should be set.

Hopefully this helps.

MononcQc fucked around with this message at 14:37 on Aug 13, 2014

Bloody
Mar 3, 2013


please

i want julia to flourish

its so much better than every alternative

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Bloody posted:

please

i want julia to flourish

its so much better than every alternative

lol 1-based indexing

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

r and matlab have 1 based indexing too iirc

Bloody
Mar 3, 2013

yup

Mr. Glass
May 1, 2009

this is a good post. thanks for the zab link, somehow i wasn't familiar with that one already

Stringent
Dec 22, 2004


image text goes here

i feel like i should have paid (more) to read this

Shaggar
Apr 26, 2006
lmao @ all you terrible typers not using double spaces. lol

Stringent
Dec 22, 2004


image text goes here

Shaggar posted:

lmao @ all you terrible typers not using double spaces. lol

I have tabs set to two spaces anyhow so I just use tab.

MononcQc
May 29, 2007

Shaggar posted:

lmao @ all you terrible typers not using double spaces. lol

Somehwat wrong. Two-space rule got more adoption because typewriters and some typesetters had a poo poo time doing anything but monospace, which made the text look wonky.

Except that YOSPOS uses a monospace font so hey, in this case it's okay-ish to double-space.

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror

Shaggar posted:

lmao @ all you terrible typers not using double spaces. lol

shaggar was completely, objectively wrong

Shaggar
Apr 26, 2006
one space over 2 is wrong as hell and the only idiots who support it probably also use spaces instead of tabs and non-allman style bracing. they're just wrong and it doesn't matter if wrong people can write a blog, they're still wrong.

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror
nope

MononcQc
May 29, 2007

The one-space rule is a thing both the Chicago Manual of Style and the AP Stylebook agree on. This is a pretty open-and-shut case.

Shaggar
Apr 26, 2006
do y'all not use monospaced fonts in ur code? that is some shameful poo poo.

also style guides are largely a joke made at the expense of high school students

Dicky B
Mar 23, 2004

the only reason to double space is to show everybody what a dork you are. see also: oxford comma

Shaggar
Apr 26, 2006
if you don't use the oxford comma you're an idiot

Shaggar
Apr 26, 2006
I mean why even use commas at all if you aren't gonna use them consistently

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror
oxford comma is correct because commas go where you pause speaking and you pause between y and and when you say "x, y, and z"

Dicky B
Mar 23, 2004

no i don't

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Dicky B posted:

no i don't

you probably do. almost no one says x..yandz.

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

Shaggar posted:

if you don't use the oxford comma you're an idiot

shaggar was... right?

Dicky B
Mar 23, 2004

commas don't actually represent pauses. that just what you teach children to simplify the concept

nobody pauses between list items unless they have respiratory problems

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Shaggar posted:

if you don't use the oxford comma you're an idiot

I was a big supporter of the Oxford comma but now I'm doubting myself.

Shaggar
Apr 26, 2006
the commas separate the items you are listing and to stop using commas as a separator at the end is really stupid

Adbot
ADBOT LOVES YOU

Bloody
Mar 3, 2013

Subjunctive posted:

I was a big supporter of the Oxford comma but now I'm doubting myself.

this but also

Shaggar posted:

one space over 2 is wrong as hell and the only idiots who support it probably also use spaces instead of tabs and non-allman style bracing. they're just wrong and it doesn't matter if wrong people can write a blog, they're still wrong.

these things

also


Shaggar posted:

do y'all not use monospaced fonts in ur code? that is some shameful poo poo.

also style guides are largely a joke made at the expense of high school students

yeah all those sentences i write in my code i bet you write Console. writeline("my posts are bad. - shaggar");

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply