|
i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial. the same is technically true of function calls yes but when a function call has random-rear end side effects on things that aren't its arguments then that is considered impolite. when you await and your shared state is mutated under your feet then that's kind of the entire point.
|
# ? Sep 10, 2023 19:26 |
|
|
# ? May 27, 2024 03:25 |
crazypenguin posted:everyone memes about "concurrency vs parallelism" but that's the actual answer. Thanks for this answer; it was really helpful. To follow up on the quoted bit: I would have guessed that concurrent cases could be handled by crossbeam's queues / channels / etc, where you could have one thread listening for events, and then send them to worker threads for processing via queues or channels. Under what sort of circumstances would that break down, but async would work? I could imagine a problem if you have many types of heterogenous tasks to handle, but in that case, I would guess that a threadpool would work? Sorry if this is a dumb / obvious question, I've never really needed to use async and I'm trying to understand the use cases
|
|
# ? Sep 10, 2023 19:36 |
|
In general, I am skeptical of the amount of complexity hidden in the runtime for green threads. Lurking under the hood is an unholy monster of a runtime system that most people have no idea is there, because it tries to hide all the problems, and if you ever run afoul of it, god help you. Fun fact: The Go runtime has this heuristic thing where it tries to detect if a thread is blocked and moves it out of the main thread pool into a background one, and spawns a new OS thread for the main thread pool, to adapt to blocking code that might cause its level of concurrency to drop. The Java developers proposed green threads seemingly blithely unaware of any of the problems that can occur with a green threading runtime system. I don't dispute it's probably the right choice for Java, but I have a bad feeling about the first few major production deploys using them. Last I checked (admittedly 2 years ago), they didn't have any of the mechanisms for handling blocked threads like Erlang or Go have, and their proposal doc was just "haha, this will never be a problem" completely straightfaced. Ask your operators about "exciting new failure modes" today!
|
# ? Sep 10, 2023 19:36 |
|
Sapozhnik posted:i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial. yeah, i have zero idea why you'd be upset by your syntax having something calling out "at this point, control flow gets taken over by your event loop"
|
# ? Sep 10, 2023 19:40 |
|
Cybernetic Vermin posted:......i have no idea what color threads that was, but os support for green threads seems an oxymoron the reason why green threads always get abandoned is because you need your OS kernel to upcall into your green thread scheduler every time a real thread would block, and the OS support for this is less than stellar
|
# ? Sep 10, 2023 19:44 |
|
I don’t know what a green thread is, but I do know about async await in c#
|
# ? Sep 10, 2023 19:54 |
|
and that the joinabletaskfactory is a magic pile of code that should be in the framework but isn’t
|
# ? Sep 10, 2023 19:54 |
|
i think it’s more typical to make all of your core blocking apis use non-blocking syscalls under the hood than to get callbacks when a syscall blocks, but of course that still does rely on a bunch of os non-blocking support being baked, comprehensive, and performant anyway there’s a bunch of interesting issues going on with green threads and how much sense they make in any particular environment
|
# ? Sep 10, 2023 19:57 |
|
the problem with that is that even if the OS has a complete set of non-blocking versions of their syscalls (and they don't), it becomes the programmer's responsibility to never use any of the blocking variants otherwise everything falls apart
|
# ? Sep 10, 2023 20:05 |
|
yeah, sortof. it doesn't actually all fall apart, because typically blocking syscalls aren't blocking on other work that's going to be done by the current process. but you do potentially waste a lot of time
|
# ? Sep 10, 2023 20:10 |
|
Subjunctive posted:cooperative multitasking is a lot easier to reason about than preemption. you don’t have to worry about flaky data races due to unexpected preemption points, because your preemption points are all explicit, if not necessarily syntactic it's easier to manage when you have like maybe 10 coroutines but when you have thousands, it's easy to lock up the entire system i do recommend people use "one big lock" when it comes to concurrency for very similar reasons
|
# ? Sep 10, 2023 20:34 |
|
crazypenguin posted:The Java developers proposed green threads seemingly blithely unaware of any of the problems that can occur with a green threading runtime system. the first JVM I ever hacked on (trying to make two GCs talk to each other for LiveConnect ) had green thread support, which is what it used for threading on Win16. so they just need to ask whoever was working on the JVM 25 years ago what a world with green threads is like, really
|
# ? Sep 10, 2023 21:43 |
|
Bloody posted:I don’t know what a green thread is, but I do know about async await in c# so first we have to define what a non-green thread is the absolute minimum that an os thread can be is... let's call it an "impulse" to avoid confusion. the os schedules a core to actually run instructions. if the impulse can't be pre-empted at all, then the os wouldn't really need to track anything about it, but that would be a pretty messed up abstraction for a number of reasons. so in fact there's always going to be some memory in the kernel to identify the impulse and save its register state while it's suspended. on modern architectures, this isn't actually a small amount of memory because register files are huge: avx-512 has 32 64-byte vector registers, so that's 2kb right there. if the impulse was only ever suspended at safe points, maybe not all the registers would have to be saved; but if the kernel can pre-empt the impulse at an arbitrary point (e.g. because it's been running for a long time) then you really do need all of it but an "impulse" wouldn't really be enough to run anything except extremely, extremely specialized code because it doesn't include any local storage beyond registers. there are a lot of strategies for allocating memory for local storage, and we can talk about some of the others in a later post if people want, but the simplest is to set up a fixed-size, contiguous, and non-relocatable stack allocation, and that's what almost every c abi expects. it's contiguous because c functions allocate/deallocate stack space by just decreasing/increasing the stack pointer (stacks usually grow down in the address space). it's non-relocatable because you can take the address of a local allocation in c and store it wherever, and because there's no way for the system to reliably track down and fix up those pointers, in practice they just have to stay valid, which means the stack has to stay where it is. that combination also means it has to be fixed-size, at least virtually — if you want the stack to be able to grow out to 1mb, the system doesn't have to immediately dedicate 1mb of memory to it, but it does have to set aside 1mb of contiguous address space that won't be used for anything else. (you might think of this as free as long as the stack doesn't actually grow that much, but in fact address space reservations can be a significant source of overhead — 1mb of page table entries with 4kb pages is 256 entries, which could take up to 16kb of wired memory, although there are some things the kernel can do to help with that.) if you want to be able to call c functions, you need a c stack that looks like that. and unless you're very sure that the impulse is only going to do very specific things that won't need much memory, that stack allocation usually has to be pretty big or you'll be blowing out the stack for all sorts of reasonable code; 1mb is a pretty typical number here. and for safety you put a "guard page" below the bottom of the stack (a page reservation that makes sure that running off the stack traps instead of corrupting other memory) for similar reasons, the kernel also needs local memory in order to execute syscalls, and again this is typically a contiguous c-style stack. usually kernel programmers do put in the effort to make sure that they don't need more than a few kb of kernel stack to complete any syscall, but still, you need a kernel stack. usually this gets colocated with the "impulse" data structures so that the overhead is down as close as possible to a single page per impulse (plus, again, a guard page reservation) so a kernel thread is at least an "impulse" plus a kernel stack plus a user stack. and then different levels of os facilities often add capabilities on top of that which add extra memory requirements. like most ways of creating threads will support accessing thread-local storage, which often involves relatively large up-front allocations. windows used to offer a second kind of kernel thread ("fiber") which was supposed to be more lightweight than their normal kernel threads, but iiuc these have largely been unified with the standard implementation, such that now there's actually *extra* overhead in every thread to support them the end result is that an absolute minimal kernel thread tends to require something on the order of 32kb of memory, and if it's doing general-purpose work, it's usually quite a bit more than that. that is a serious scalability problem — you really shouldn't write code that expects to be able to create thousands of kernel threads. and setting all of that up requires a lot of non-trivial work and back-and-forth between userspace and the kernel, so creating a new kernel thread is a relatively expensive operation. so all that gives rise to this idea of maybe re-using kernel threads rjmccall fucked around with this message at 22:00 on Sep 10, 2023 |
# ? Sep 10, 2023 21:51 |
|
been saying this
|
# ? Sep 10, 2023 21:53 |
|
its too bad programmers aren't any good at making state machines otherwise we wouldn't need any of this green threads or async/await nonsense
|
# ? Sep 10, 2023 22:08 |
|
problems with state machines:
|
# ? Sep 10, 2023 22:18 |
|
tl;dr: this is a state machine
|
# ? Sep 10, 2023 22:21 |
|
c# makes async await decisions easy: is the thing calling your statement using async/await? y -> you should use async/await n -> you should probably not use async/await
|
# ? Sep 10, 2023 22:30 |
|
tef posted:"we learned it from watching you" huh CMPSCI~1.TEX but also as cybernetic vermin says sometimes you actually do want to be precise in your language, either because what you're describing warrants the precision or you're trying to baffle with bullshit
|
# ? Sep 10, 2023 22:47 |
|
so if you want to re-use kernel threads, the simplest thing you can do is just move responsibility for some of that stuff into userspace. the kernel doesn’t really need to care about the user-space state associated with its kernel-level state during a suspension — it’s just going to save it and then restore it later, whatever it is. so in principle you can just switch all of that stuff in userspace whenever you want by assigning new values into all of appropriate registers, just like the kernel would. so the kernel-level thread structures can be shared between different user-level threads (called green threads). you have a small pool of kernel threads and schedule user threads onto them when those threads don’t have anything else to do. you create a new thread by reserving pages for a stack, setting up all your user-space thread structures, and then adding the new thread to some user-space queue of things to run whenever the kernel threads are unoccupied there are some benefits to this beyond memory impact. there’s extra overhead whenever control transfers through the kernel, mostly associated with the trust boundary between the kernel and user processes. a user process is allowed to trust itself, so it can switch significantly more efficiently than the kernel can, as long as it’s not switching for some reason that requires entering the kernel anyway. and a lot of switches might happen at safe points that can therefore get away with not saving and restoring every register, because the process isn’t worried about information leaking between its own threads. if all possible suspensions are like that, the green thread can reserve a lot less space for saving and restoring registers on the other hand, you do have some novel problems, most importantly that progress can be blocked just because all the kernel threads are currently blocking on something, even if there are other green threads waiting to run. if blocking includes potentially waiting on other green threads to do something — e.g. to signal some condition variable — then that’s not just inefficient, it’s a potential deadlock. so it’s critically important that every primitive like that gets reimplemented in a way that knows to only suspend the green thread and not the current kernel thread also the details of your green thread scheduler matter a lot. iiuc early versions of the jvm used green threads, but they were strongly associated with specific kernel threads (maybe even permanently tied to them?), so certain patterns of green thread creation could lead to extremely suboptimal utilization because all of the runnable green threads were on the same kernel thread maybe most importantly, if you followed all the math in my previous post, you’ll see the memory overheads were overwhelmingly associated with the user-space context — the huge stack reservation, its corresponding huge number of page table entries, all the thread-local storage and other threading library overheads, etc. so if you don’t change anything about how local state is stored, you haven’t changed the fundamental scalability problem
|
# ? Sep 10, 2023 23:04 |
|
Cybernetic Vermin posted:specifically cooperative green thread concurrency is also real easy to design in a way where there are no surprises. or rather the surprises have all have easy to explain causes. erlang's stuff is preemptive and therefore objectively better in terms of representing concurrent ideas robustly.
|
# ? Sep 10, 2023 23:08 |
|
MononcQc posted:erlang's stuff is preemptive and therefore objectively better in terms of representing concurrent ideas robustly. you are right. i was going to argue that it is cooperative in that it is a vm level thing which does check in on thread switching as an accounting part of interpretation of a couple of key opcodes, *but* i think ultimately that is indeed better understood as preemption.
|
# ? Sep 10, 2023 23:18 |
|
pseudorandom name posted:its too bad programmers aren't any good at making state machines otherwise we wouldn't need any of this green threads or async/await nonsense the history of programming is the history of trying to implement a state machine without actually having to write one
|
# ? Sep 10, 2023 23:25 |
|
so the green thread implementations that want to scale to tens or hundreds of thousands of threads all also rely on changing how local storage is allocated usually this means using some variant of segmented stacks, in which you allocate a new slab of memory whenever the current allocation runs out. this means you can safely reserve a very small amount of memory up front and grow it lazily as needed. if the language permits, you can even grow the stack like a dynamic array, relocating existing frames and keeping it contiguous. there’s a lot of variation here. i know go used to have an allocator with really bad threshold effects — it would immediately deallocate on return, so if you turned around and made a new call, it had to allocate again (many functional languages do something really different. i don’t even want to talk about it because i hate it so much) the huge problem with these approaches is that you can’t use your funky stack for c. this is okay if you’re just calling into c and there aren’t calls back or suspensions within c — this admits an implementation where you actually host a green thread on top of a full c thread, not just a kernel-level thread. but otherwise you basically have to dynamically find and pin a c stack to your green thread while the c call is in flight this is part of why environments using this kind of green thread really want to reimplement basically everything natively: there’s a ton of extra overhead when interacting with other environments. so e.g. go gets a lot of performance from directly using the syscall interface on linux, but on macos they have to call c because that’s the system abi
|
# ? Sep 10, 2023 23:40 |
|
rjmccall posted:also the details of your green thread scheduler matter a lot. iiuc early versions of the jvm used green threads, but they were strongly associated with specific kernel threads (maybe even permanently tied to them?), so certain patterns of green thread creation could lead to extremely suboptimal utilization because all of the runnable green threads were on the same kernel thread reaching out way back in my memory, green threads were M kernel threads for N green threads, but there were only certain circumstances in which a green thread could switch kernel threads and I think they were tied to GC scheduling? the release notes we got from Sun every month always had a stanza about blocking operations that hadn’t been properly shunted to a kernel thread for waiting and could lock up a cohort of threads (or everything on Win16, bless that mess) but were now fixed this time for sure
|
# ? Sep 10, 2023 23:41 |
|
fwiw, swift’s async/await is very intentionally hosted on normal c threads, with async functions split into funclets that tail-call each other so that they can suspend by just returning back to the scheduler. and the async functions use a segmented stack for any state that persists across suspensions. so basically async functions can call c functions (or normal swift functions) without any special magic but those c functions cannot directly suspend the current async thread it is prone to the same kind of exhaustion problems as green threads if you block a thread during an async function. but that was fully understood as a consequence, and we’ve been rigorous about telling people that no, they are not allowed to block tasks on work that is not currently running, and we are not going to find a way to make their code work
|
# ? Sep 10, 2023 23:58 |
|
Cybernetic Vermin posted:tl;dr: this is a state machine ok so we write all of our programs as diagrams and use ML to generate the code, easy
|
# ? Sep 11, 2023 00:02 |
|
rjmccall posted:but that was fully understood as a consequence, and we’ve been rigorous about telling people that no, they are not allowed to block tasks on work that is not currently running, and we are not going to find a way to make their code work plus "here's gcd" i guess, so people didn't have to invent the universe to run something in the background
|
# ? Sep 11, 2023 00:06 |
Sweeper posted:ok so we write all of our programs as diagrams and use ML to generate the code, easy It worked for LabVIEW!
|
|
# ? Sep 11, 2023 00:11 |
|
tef posted:plus "here's gcd" i guess, so people didn't have to invent the universe to run something in the background that’s actually complicated. gcd is required to overcommit, swift async is not. they’re serviced by the same pool of threads, so we do have to track whether a job requires overcommit or not. but the rule we use is not “overcommit if there are any pending overcommit jobs”, it’s “overcommit if any of the currently-running jobs is an overcommit job”. so if the whole pool is tied up with swift async jobs, and they’re all blocked ultimately waiting for a gcd job, the system will deadlock even though gcd is overcommit. iow we really do enforce the “no blocking on future work” rule
|
# ? Sep 11, 2023 00:18 |
|
I guess priority inversion there for the scheduler would be a bigger mess, hmm.
|
# ? Sep 11, 2023 00:22 |
|
the priority system is this whole other thing
|
# ? Sep 11, 2023 00:23 |
|
Subjunctive posted:reaching out way back in my memory, green threads were M kernel threads for N green threads, but there were only certain circumstances in which a green thread could switch kernel threads and I think they were tied to GC scheduling? the release notes we got from Sun every month always had a stanza about blocking operations that hadn’t been properly shunted to a kernel thread for waiting and could lock up a cohort of threads (or everything on Win16, bless that mess) but were now fixed this time for sure hmm, this matches with what i’ve heard. i wonder what the gc thing is, maybe they just took the opportunity to redistribute threads because iirc the collector was stop-the-world anyway
|
# ? Sep 11, 2023 00:28 |
|
pseudorandom name posted:its too bad programmers aren't any good at making state machines; otherwise we wouldn't need any of these green threads or async/await nonsense I had to write a non-blocking state machine for the latest project (bare metal.) For times when we need to sleep, we cycle between the states and check a stored value of when a sleep began vs. when it should end. It works quite well. States are added to a struct of two method pointers: - State init - The actual state It works quite well and is simple and easy to follow.
|
# ? Sep 11, 2023 00:30 |
|
Sweeper posted:ok so we write all of our programs as diagrams and use ML to generate the code, easy https://images-ext-2.discordapp.net/external/EsUAMvLmwnWy3tyBiNgrhzvK4WrivJ0YxWQYRFgv12s/https/i.imgur.com/lkmdL5U.mp4
|
# ? Sep 11, 2023 00:58 |
|
c bad, got it
|
# ? Sep 11, 2023 02:40 |
|
redleader posted:c bad, got it my shameful secret is I like tracking ownership of all my memory in C, it’s fun, a little graph one edge away from a segfault
|
# ? Sep 11, 2023 02:45 |
|
I write my big complex state machines as Erlang processes so I get them running concurrently for cheap
|
# ? Sep 11, 2023 02:47 |
|
the compiler writes my state machines for me. simple as.
|
# ? Sep 11, 2023 02:52 |
|
|
# ? May 27, 2024 03:25 |
|
if anyone's interested, i recently wrote a post that delves into how to implement the game of life in lil and then take the speed from "dogshit slow" to "fine" (and also make the program much more concise) by taking better advantage of the features of the language and stdlib: https://beyondloom.com/blog/life.html
|
# ? Sep 11, 2023 03:49 |