Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Sapozhnik
Jan 2, 2005

Nap Ghost
i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial.

the same is technically true of function calls yes but when a function call has random-rear end side effects on things that aren't its arguments then that is considered impolite. when you await and your shared state is mutated under your feet then that's kind of the entire point.

Adbot
ADBOT LOVES YOU

VikingofRock
Aug 24, 2008




crazypenguin posted:

everyone memes about "concurrency vs parallelism" but that's the actual answer.

Got asynchronous, non-deterministic events to handle? (i.e. concurrency?) async

Got lots of computation to do? (i.e. parallelism?) threads

[...]

Thanks for this answer; it was really helpful. To follow up on the quoted bit: I would have guessed that concurrent cases could be handled by crossbeam's queues / channels / etc, where you could have one thread listening for events, and then send them to worker threads for processing via queues or channels. Under what sort of circumstances would that break down, but async would work? I could imagine a problem if you have many types of heterogenous tasks to handle, but in that case, I would guess that a threadpool would work?

Sorry if this is a dumb / obvious question, I've never really needed to use async and I'm trying to understand the use cases

crazypenguin
Mar 9, 2005
nothing witty here, move along
In general, I am skeptical of the amount of complexity hidden in the runtime for green threads.

Lurking under the hood is an unholy monster of a runtime system that most people have no idea is there, because it tries to hide all the problems, and if you ever run afoul of it, god help you.

Fun fact: The Go runtime has this heuristic thing where it tries to detect if a thread is blocked and moves it out of the main thread pool into a background one, and spawns a new OS thread for the main thread pool, to adapt to blocking code that might cause its level of concurrency to drop.

The Java developers proposed green threads seemingly blithely unaware of any of the problems that can occur with a green threading runtime system. I don't dispute it's probably the right choice for Java, but I have a bad feeling about the first few major production deploys using them. Last I checked (admittedly 2 years ago), they didn't have any of the mechanisms for handling blocked threads like Erlang or Go have, and their proposal doc was just "haha, this will never be a problem" completely straightfaced.

Ask your operators about "exciting new failure modes" today!

abraham linksys
Sep 6, 2010

:darksouls:

Sapozhnik posted:

i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial.

yeah, i have zero idea why you'd be upset by your syntax having something calling out "at this point, control flow gets taken over by your event loop"

pseudorandom name
May 6, 2007

Cybernetic Vermin posted:

......i have no idea what color threads that was, but os support for green threads seems an oxymoron

the reason why green threads always get abandoned is because you need your OS kernel to upcall into your green thread scheduler every time a real thread would block, and the OS support for this is less than stellar

Bloody
Mar 3, 2013

I don’t know what a green thread is, but I do know about async await in c#

Bloody
Mar 3, 2013

and that the joinabletaskfactory is a magic pile of code that should be in the framework but isn’t

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
i think it’s more typical to make all of your core blocking apis use non-blocking syscalls under the hood than to get callbacks when a syscall blocks, but of course that still does rely on a bunch of os non-blocking support being baked, comprehensive, and performant

anyway there’s a bunch of interesting issues going on with green threads and how much sense they make in any particular environment

pseudorandom name
May 6, 2007

the problem with that is that even if the OS has a complete set of non-blocking versions of their syscalls (and they don't), it becomes the programmer's responsibility to never use any of the blocking variants otherwise everything falls apart

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
yeah, sortof. it doesn't actually all fall apart, because typically blocking syscalls aren't blocking on other work that's going to be done by the current process. but you do potentially waste a lot of time

tef
May 30, 2004

-> some l-system crap ->

Subjunctive posted:

cooperative multitasking is a lot easier to reason about than preemption. you don’t have to worry about flaky data races due to unexpected preemption points, because your preemption points are all explicit, if not necessarily syntactic

it's easier to manage when you have like maybe 10 coroutines but when you have thousands, it's easy to lock up the entire system

i do recommend people use "one big lock" when it comes to concurrency for very similar reasons

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

crazypenguin posted:

The Java developers proposed green threads seemingly blithely unaware of any of the problems that can occur with a green threading runtime system.

the first JVM I ever hacked on (trying to make two GCs talk to each other for LiveConnect :corsair:) had green thread support, which is what it used for threading on Win16. so they just need to ask whoever was working on the JVM 25 years ago what a world with green threads is like, really

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Bloody posted:

I don’t know what a green thread is, but I do know about async await in c#

so first we have to define what a non-green thread is

the absolute minimum that an os thread can be is... let's call it an "impulse" to avoid confusion. the os schedules a core to actually run instructions. if the impulse can't be pre-empted at all, then the os wouldn't really need to track anything about it, but that would be a pretty messed up abstraction for a number of reasons. so in fact there's always going to be some memory in the kernel to identify the impulse and save its register state while it's suspended. on modern architectures, this isn't actually a small amount of memory because register files are huge: avx-512 has 32 64-byte vector registers, so that's 2kb right there. if the impulse was only ever suspended at safe points, maybe not all the registers would have to be saved; but if the kernel can pre-empt the impulse at an arbitrary point (e.g. because it's been running for a long time) then you really do need all of it

but an "impulse" wouldn't really be enough to run anything except extremely, extremely specialized code because it doesn't include any local storage beyond registers. there are a lot of strategies for allocating memory for local storage, and we can talk about some of the others in a later post if people want, but the simplest is to set up a fixed-size, contiguous, and non-relocatable stack allocation, and that's what almost every c abi expects. it's contiguous because c functions allocate/deallocate stack space by just decreasing/increasing the stack pointer (stacks usually grow down in the address space). it's non-relocatable because you can take the address of a local allocation in c and store it wherever, and because there's no way for the system to reliably track down and fix up those pointers, in practice they just have to stay valid, which means the stack has to stay where it is. that combination also means it has to be fixed-size, at least virtually — if you want the stack to be able to grow out to 1mb, the system doesn't have to immediately dedicate 1mb of memory to it, but it does have to set aside 1mb of contiguous address space that won't be used for anything else. (you might think of this as free as long as the stack doesn't actually grow that much, but in fact address space reservations can be a significant source of overhead — 1mb of page table entries with 4kb pages is 256 entries, which could take up to 16kb of wired memory, although there are some things the kernel can do to help with that.) if you want to be able to call c functions, you need a c stack that looks like that. and unless you're very sure that the impulse is only going to do very specific things that won't need much memory, that stack allocation usually has to be pretty big or you'll be blowing out the stack for all sorts of reasonable code; 1mb is a pretty typical number here. and for safety you put a "guard page" below the bottom of the stack (a page reservation that makes sure that running off the stack traps instead of corrupting other memory)

for similar reasons, the kernel also needs local memory in order to execute syscalls, and again this is typically a contiguous c-style stack. usually kernel programmers do put in the effort to make sure that they don't need more than a few kb of kernel stack to complete any syscall, but still, you need a kernel stack. usually this gets colocated with the "impulse" data structures so that the overhead is down as close as possible to a single page per impulse (plus, again, a guard page reservation)

so a kernel thread is at least an "impulse" plus a kernel stack plus a user stack. and then different levels of os facilities often add capabilities on top of that which add extra memory requirements. like most ways of creating threads will support accessing thread-local storage, which often involves relatively large up-front allocations. windows used to offer a second kind of kernel thread ("fiber") which was supposed to be more lightweight than their normal kernel threads, but iiuc these have largely been unified with the standard implementation, such that now there's actually *extra* overhead in every thread to support them

the end result is that an absolute minimal kernel thread tends to require something on the order of 32kb of memory, and if it's doing general-purpose work, it's usually quite a bit more than that. that is a serious scalability problem — you really shouldn't write code that expects to be able to create thousands of kernel threads. and setting all of that up requires a lot of non-trivial work and back-and-forth between userspace and the kernel, so creating a new kernel thread is a relatively expensive operation. so all that gives rise to this idea of maybe re-using kernel threads

rjmccall fucked around with this message at 22:00 on Sep 10, 2023

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

been saying this

pseudorandom name
May 6, 2007

its too bad programmers aren't any good at making state machines otherwise we wouldn't need any of this green threads or async/await nonsense

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
problems with state machines:

  • describing state machines as a textual representation of graph structures is inconvenient
  • describing state machines as tables is also inconvenient, in different ways
  • if you make a small change, you usually need to carefully reconsider the whole design
  • very limited potential for adding buzzwords to your resume

Cybernetic Vermin
Apr 18, 2005

tl;dr: this is a state machine

Shaggar
Apr 26, 2006
c# makes async await decisions easy:
is the thing calling your statement using async/await?
y -> you should use async/await
n -> you should probably not use async/await

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

tef posted:

"we learned it from watching you" huh

CMPSCI~1.TEX

but also as cybernetic vermin says sometimes you actually do want to be precise in your language, either because what you're describing warrants the precision or you're trying to baffle with bullshit

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
so if you want to re-use kernel threads, the simplest thing you can do is just move responsibility for some of that stuff into userspace. the kernel doesn’t really need to care about the user-space state associated with its kernel-level state during a suspension — it’s just going to save it and then restore it later, whatever it is. so in principle you can just switch all of that stuff in userspace whenever you want by assigning new values into all of appropriate registers, just like the kernel would. so the kernel-level thread structures can be shared between different user-level threads (called green threads). you have a small pool of kernel threads and schedule user threads onto them when those threads don’t have anything else to do. you create a new thread by reserving pages for a stack, setting up all your user-space thread structures, and then adding the new thread to some user-space queue of things to run whenever the kernel threads are unoccupied

there are some benefits to this beyond memory impact. there’s extra overhead whenever control transfers through the kernel, mostly associated with the trust boundary between the kernel and user processes. a user process is allowed to trust itself, so it can switch significantly more efficiently than the kernel can, as long as it’s not switching for some reason that requires entering the kernel anyway. and a lot of switches might happen at safe points that can therefore get away with not saving and restoring every register, because the process isn’t worried about information leaking between its own threads. if all possible suspensions are like that, the green thread can reserve a lot less space for saving and restoring registers

on the other hand, you do have some novel problems, most importantly that progress can be blocked just because all the kernel threads are currently blocking on something, even if there are other green threads waiting to run. if blocking includes potentially waiting on other green threads to do something — e.g. to signal some condition variable — then that’s not just inefficient, it’s a potential deadlock. so it’s critically important that every primitive like that gets reimplemented in a way that knows to only suspend the green thread and not the current kernel thread

also the details of your green thread scheduler matter a lot. iiuc early versions of the jvm used green threads, but they were strongly associated with specific kernel threads (maybe even permanently tied to them?), so certain patterns of green thread creation could lead to extremely suboptimal utilization because all of the runnable green threads were on the same kernel thread

maybe most importantly, if you followed all the math in my previous post, you’ll see the memory overheads were overwhelmingly associated with the user-space context — the huge stack reservation, its corresponding huge number of page table entries, all the thread-local storage and other threading library overheads, etc. so if you don’t change anything about how local state is stored, you haven’t changed the fundamental scalability problem

MononcQc
May 29, 2007

Cybernetic Vermin posted:

specifically cooperative green thread concurrency is also real easy to design in a way where there are no surprises. or rather the surprises have all have easy to explain causes.

erlang has its dna in that camp as well, and it is not incompetence that landed it there.

erlang's stuff is preemptive and therefore objectively better in terms of representing concurrent ideas robustly.

Cybernetic Vermin
Apr 18, 2005

MononcQc posted:

erlang's stuff is preemptive and therefore objectively better in terms of representing concurrent ideas robustly.

you are right. i was going to argue that it is cooperative in that it is a vm level thing which does check in on thread switching as an accounting part of interpretation of a couple of key opcodes, *but* i think ultimately that is indeed better understood as preemption.

tef
May 30, 2004

-> some l-system crap ->

pseudorandom name posted:

its too bad programmers aren't any good at making state machines otherwise we wouldn't need any of this green threads or async/await nonsense

the history of programming is the history of trying to implement a state machine without actually having to write one

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
so the green thread implementations that want to scale to tens or hundreds of thousands of threads all also rely on changing how local storage is allocated

usually this means using some variant of segmented stacks, in which you allocate a new slab of memory whenever the current allocation runs out. this means you can safely reserve a very small amount of memory up front and grow it lazily as needed. if the language permits, you can even grow the stack like a dynamic array, relocating existing frames and keeping it contiguous. there’s a lot of variation here. i know go used to have an allocator with really bad threshold effects — it would immediately deallocate on return, so if you turned around and made a new call, it had to allocate again

(many functional languages do something really different. i don’t even want to talk about it because i hate it so much)

the huge problem with these approaches is that you can’t use your funky stack for c. this is okay if you’re just calling into c and there aren’t calls back or suspensions within c — this admits an implementation where you actually host a green thread on top of a full c thread, not just a kernel-level thread. but otherwise you basically have to dynamically find and pin a c stack to your green thread while the c call is in flight

this is part of why environments using this kind of green thread really want to reimplement basically everything natively: there’s a ton of extra overhead when interacting with other environments. so e.g. go gets a lot of performance from directly using the syscall interface on linux, but on macos they have to call c because that’s the system abi

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

rjmccall posted:

also the details of your green thread scheduler matter a lot. iiuc early versions of the jvm used green threads, but they were strongly associated with specific kernel threads (maybe even permanently tied to them?), so certain patterns of green thread creation could lead to extremely suboptimal utilization because all of the runnable green threads were on the same kernel thread

reaching out way back in my memory, green threads were M kernel threads for N green threads, but there were only certain circumstances in which a green thread could switch kernel threads and I think they were tied to GC scheduling? the release notes we got from Sun every month always had a stanza about blocking operations that hadn’t been properly shunted to a kernel thread for waiting and could lock up a cohort of threads (or everything on Win16, bless that mess) but were now fixed this time for sure

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
fwiw, swift’s async/await is very intentionally hosted on normal c threads, with async functions split into funclets that tail-call each other so that they can suspend by just returning back to the scheduler. and the async functions use a segmented stack for any state that persists across suspensions. so basically async functions can call c functions (or normal swift functions) without any special magic but those c functions cannot directly suspend the current async thread

it is prone to the same kind of exhaustion problems as green threads if you block a thread during an async function. but that was fully understood as a consequence, and we’ve been rigorous about telling people that no, they are not allowed to block tasks on work that is not currently running, and we are not going to find a way to make their code work

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

Cybernetic Vermin posted:

tl;dr: this is a state machine

ok so we write all of our programs as diagrams and use ML to generate the code, easy

tef
May 30, 2004

-> some l-system crap ->

rjmccall posted:

but that was fully understood as a consequence, and we’ve been rigorous about telling people that no, they are not allowed to block tasks on work that is not currently running, and we are not going to find a way to make their code work

plus "here's gcd" i guess, so people didn't have to invent the universe to run something in the background

VikingofRock
Aug 24, 2008




Sweeper posted:

ok so we write all of our programs as diagrams and use ML to generate the code, easy

It worked for LabVIEW!

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

tef posted:

plus "here's gcd" i guess, so people didn't have to invent the universe to run something in the background

that’s actually complicated. gcd is required to overcommit, swift async is not. they’re serviced by the same pool of threads, so we do have to track whether a job requires overcommit or not. but the rule we use is not “overcommit if there are any pending overcommit jobs”, it’s “overcommit if any of the currently-running jobs is an overcommit job”. so if the whole pool is tied up with swift async jobs, and they’re all blocked ultimately waiting for a gcd job, the system will deadlock even though gcd is overcommit. iow we really do enforce the “no blocking on future work” rule

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

I guess priority inversion there for the scheduler would be a bigger mess, hmm.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
the priority system is this whole other thing

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Subjunctive posted:

reaching out way back in my memory, green threads were M kernel threads for N green threads, but there were only certain circumstances in which a green thread could switch kernel threads and I think they were tied to GC scheduling? the release notes we got from Sun every month always had a stanza about blocking operations that hadn’t been properly shunted to a kernel thread for waiting and could lock up a cohort of threads (or everything on Win16, bless that mess) but were now fixed this time for sure

hmm, this matches with what i’ve heard. i wonder what the gc thing is, maybe they just took the opportunity to redistribute threads because iirc the collector was stop-the-world anyway

FlapYoJacks
Feb 12, 2009

pseudorandom name posted:

its too bad programmers aren't any good at making state machines; otherwise we wouldn't need any of these green threads or async/await nonsense

I had to write a non-blocking state machine for the latest project (bare metal.) For times when we need to sleep, we cycle between the states and check a stored value of when a sleep began vs. when it should end. It works quite well.

States are added to a struct of two method pointers:
- State init
- The actual state

It works quite well and is simple and easy to follow.

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

Sweeper posted:

ok so we write all of our programs as diagrams and use ML to generate the code, easy

https://images-ext-2.discordapp.net/external/EsUAMvLmwnWy3tyBiNgrhzvK4WrivJ0YxWQYRFgv12s/https/i.imgur.com/lkmdL5U.mp4

redleader
Aug 18, 2005

Engage according to operational parameters
c bad, got it

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

redleader posted:

c bad, got it

my shameful secret is I like tracking ownership of all my memory in C, it’s fun, a little graph one edge away from a segfault

MononcQc
May 29, 2007

I write my big complex state machines as Erlang processes so I get them running concurrently for cheap

redleader
Aug 18, 2005

Engage according to operational parameters
the compiler writes my state machines for me. simple as.

Adbot
ADBOT LOVES YOU

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
if anyone's interested, i recently wrote a post that delves into how to implement the game of life in lil and then take the speed from "dogshit slow" to "fine" (and also make the program much more concise) by taking better advantage of the features of the language and stdlib: https://beyondloom.com/blog/life.html

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply