|
Sapozhnik posted:i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial. note this for yosgoons.xls, agreeing with Sapozhnik here
|
# ? Sep 11, 2023 08:18 |
|
|
# ? May 27, 2024 02:50 |
|
VikingofRock posted:Thanks for this answer; it was really helpful. To follow up on the quoted bit: I would have guessed that concurrent cases could be handled by crossbeam's queues / channels / etc, where you could have one thread listening for events, and then send them to worker threads for processing via queues or channels. Under what sort of circumstances would that break down, but async would work? I could imagine a problem if you have many types of heterogenous tasks to handle, but in that case, I would guess that a threadpool would work? Hmm, it's very complicated. Not wholly because it's complicated in and of itself, but this is one of those things with history. Because computation in the physical world is inherently a concurrent events sort of thing. Even on old single-core machines with no network, you're submitting requests to disks, and getting events (IRQs) from keyboard and sound cards and so on. But we started out with a non-concurrent sequential programming model for everything. Oops, the real world is concurrent, how do we cope? Blocking APIs. Read from disk? Ok, send the request and just wait for the response. But blocking means we're wasting time when we could be doing something else. And so we get threads. For concurrency. (Confusing everything, because today we'd usually say threads are for parallelism, and not the best abstraction for concurrent programming. But they do both!) All this before multicore processors were even a thing. Now one thread can block waiting, but you can do work on another! So we hosed up by programming a concurrent world with non-concurrent languages and APIs that blocked, then papered over it with threads. But for all the reasons rjmccall posted, threads don't scale well. (Again, not inherently, exactly, but at least partially for path-dependency historical reasons.) There used to be this thing called the "C10k problem" about handling bigger and bigger servers that might interest anyone who likes history here. So we also get NON-blocking APIs you can poll... for some things. Mostly network sockets (for those servers). And depressingly, we don't get non-blocking APIs for even really common things like disk i/o until depressingly recently. (SO depressingly recently, you still can't count on the APIs being available today because your linux distro's kernel might be too old.) But now with events, you have a programming model/abstraction problem. Ok, you get an event for something, but what does your event loop do with it? How do you track what the next step of whatever "thing" was going on there? You've got a few options: 1. Well poo poo, just use threads. Still a common option, despite its lack of scaling. Thread wakes up where it was, blocking call done, continues. 2. "Green threads." They look like threads (cool!), but secretly use event loops. 3. Continuations / call-backs / promises. This was the "node.js way". You put together a closure that handles the response, so the event loop can answer "what do I do now?" with "just call the closure." 4. State machines / async/await. Explicitly represented state machines are uncommon for ??reasons?, but async/await is basically a way of automating their creation. Event loop gets event, finds the state machine it corresponds to, and makes it go brrr So looping back around to your question about "where would that break down", it's mostly an efficiency thing. But personally I think there's a lot of value to the mental model differences between "threads" and "events". But this gets complicated by the lack of appropriate OS APIs for certain things (and the standard library APIs as well: Rust after all gives you blocking read from files because it doesn't really have a choice, but now you have a problem if you want to do that from a concurrent events context...) Representing the concurrency of "read two files at once" is much better off thought of a "kick off two non-blocking reads then wait for them to finish" than as "create an entire new thread to do that other read, then join the results." But regardless today we usually have to do "send two read requests to the 'sacrificial blocking-io worker threadpool' then wait for them to send finish events back." I think it's a windmill worth tilting at. Maybe in 30 years we'll finally have kernels/languages/libraries/etc that let us write programs right.
|
# ? Sep 11, 2023 18:32 |
crazypenguin posted:[awesome effortpost] Thank you so much for this, this helps clear things up a lot. Posts like this are why I love this forum
|
|
# ? Sep 11, 2023 19:40 |
|
crazypenguin posted:So we also get NON-blocking APIs you can poll... for some things. Mostly network sockets (for those servers). And depressingly, we don't get non-blocking APIs for even really common things like disk i/o until depressingly recently. (SO depressingly recently, you still can't count on the APIs being available today because your linux distro's kernel might be too old.) Linux has had signal-notified I/O completions for a long time, no? uring is new but AIO is twenty years old now I recall that AIO and the aiocb */SIGIO was part of the response to c10k (I remember being so impressed that IRIX in 1998 would let you pass a mutex to poll so you could wait on threads and sockets in a unified way)
|
# ? Sep 11, 2023 20:29 |
|
that was also when Linux folks were using lmbench to push on microbenchmarks and beat Solaris and friends on things like context switch overhead and localhost network performance. David Miller was adorably fierce about that
|
# ? Sep 11, 2023 20:31 |
|
lol at still using the terminology "non-blocking i/o" to refer to unsafe writes
|
# ? Sep 11, 2023 20:33 |
|
sure, I’ll walk into it—what is unsafe about the writes from async I/O like completion ports or uring or posix AIO?
|
# ? Sep 11, 2023 20:37 |
|
Subjunctive posted:Linux has had signal-notified I/O completions for a long time, no? uring is new but AIO is twenty years old now On Linux "AIO" had so many caveats that I decided they were not worth remembering because they boiled down to "do not use." Like, I think it just didn't work on most filesystems, and required O_DIRECT, or otherwise it just actually silently did blocking IO. And if my memory isn't failing me, the only meaningful software that ever used it was Oracle.
|
# ? Sep 11, 2023 20:39 |
|
there was some toy “high performance” web and ftp servers that used it I think, but yeah it wasn’t great
|
# ? Sep 11, 2023 20:43 |
|
Subjunctive posted:sure, I’ll walk into it—what is unsafe about the writes from async I/O like completion ports or uring or posix AIO? how are you avoiding blocking when writing to some device?
|
# ? Sep 11, 2023 20:53 |
|
glibc's aio implementation is basically a practical joke. it just spawns a thread for you and does blocking io there, and if that's what you want it's easier to do it directly than to use aio. it would make sense as a compatibility shim for running existing programs which were written for an os with a useful implementation of aio, but not much else
|
# ? Sep 11, 2023 21:19 |
|
Shaggar posted:how are you avoiding blocking when writing to some device? passing ownership of the buffer to the kernel, and then getting ownership of that buffer back once it’s filled? the same way you do it if you’re blocking really
|
# ? Sep 11, 2023 22:45 |
|
but even without good ownership enforcement, racing writes into the buffer after it’s passed to the kernel isn’t necessarily unsafe any more than shearing on a non-vsync rendering output is. what behaviour are you worried about, and what safety principle is being violated?
|
# ? Sep 11, 2023 22:48 |
|
Sapozhnik posted:i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial. and part of the annoyance is that even if you're happy explicitly designing and marking functions as async-compatible, in most languages now they won't work in synchronous contexts. if I have to copy-paste a function just to use it both ways then something has gone wrong in the expressiveness of my programming language.
|
# ? Sep 11, 2023 23:52 |
|
Subjunctive posted:but even without good ownership enforcement, racing writes into the buffer after it’s passed to the kernel isn’t necessarily unsafe any more than shearing on a non-vsync rendering output is. what behaviour are you worried about, and what safety principle is being violated? i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term
|
# ? Sep 12, 2023 00:14 |
|
punted i/o
|
# ? Sep 12, 2023 00:20 |
|
Plorkyeran posted:glibc's aio implementation is basically a practical joke. it just spawns a thread for you and does blocking io there, and if that's what you want it's easier to do it directly than to use aio. it would make sense as a compatibility shim for running existing programs which were written for an os with a useful implementation of aio, but not much else there's also the Oracle-funded kernel AIO that does the bare minimum to make Oracle go brrrr and blocks at random if you use it outside of Oracle's happy path and a Red Hat developed version of glibc POSIX AIO that uses the kernel AIO, that they gave up on because see above Shaggar posted:i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term non-blocking IO is when your socket read or write returns EWOULDBLOCK, it doesn't have anything to do with disk IO or caching
|
# ? Sep 12, 2023 00:20 |
|
blocking io isn’t guaranteed to write anything to disk anyways so who care
|
# ? Sep 12, 2023 00:22 |
|
Shaggar posted:i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term i have bad news about blocking io for very similar reasons
|
# ? Sep 12, 2023 00:25 |
|
tef posted:i have bad news about blocking io for very similar reasons it’s weirdly hard to make sure your writes are durable, might be impossible when your disk has its own cache you can’t flush
|
# ? Sep 12, 2023 00:28 |
|
pseudorandom name posted:non-blocking IO is when your socket read or write returns EWOULDBLOCK, it doesn't have anything to do with disk IO or caching you can have non-blocking I/O for disk and should if you're triggering it from a UI thread. all of Chrome's disk I/O is async if it's for the UI thread to consume, I believe. putting your test profile data on a network filesystem is a great way to develop a sensitivity to blocking disk I/O Shaggar posted:i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term sorry, do you mean that you think that blocking I/O only returns once the data has been written to its final storage, be that disk or some SMB share or a socket or something? even blocking I/O returns once it's written into the page cache or a socket buffer, unless you go out of your way to do much slower I/O that waits for a device sync (and even then you might get lied to and probably will on consumer OSes) or your buffers are full and you need to wait for the kernel to make space by processing previous requests non-blocking I/O is explicitly about the I/O not having happened when the requesting API call is complete, but rather later when the kernel signals that the underlying interaction has completed
|
# ? Sep 12, 2023 00:29 |
|
so call it asynchronous i/o
|
# ? Sep 12, 2023 00:34 |
|
Sweeper posted:it’s weirdly hard to make sure your writes are durable, might be impossible when your disk has its own cache you can’t flush cannot be emphasized enough that random disks will just decide to claim writes are durable when they aren't yet
|
# ? Sep 12, 2023 00:35 |
|
Shaggar posted:so call it asynchronous i/o what do you think “nonblocking” means that’s different from “asynchronous” here? if you’re blocking on something, your flow of execution is synchronous with that something. if you don’t block on it, the execution is asynchronous with respect to the operation. you might not even care to find out if/when completed at all, if you’re just chucking something in a best-effort log file
|
# ? Sep 12, 2023 00:38 |
|
Subjunctive posted:you can have non-blocking I/O for disk and should if you're triggering it from a UI thread. all of Chrome's disk I/O is async if it's for the UI thread to consume, I believe. putting your test profile data on a network filesystem is a great way to develop a sensitivity to blocking disk I/O that's async IO blocking IO - you initiate the IO operation and maybe it blocks or maybe it doesn't, depending on caching and what kind of durability guarantees you demand non-blocking IO - you initiate the IO and either it happens or the kernel says no I'm not doing that right now async IO - you initiate the IO operation and at some later point in time you are informed of its completion
|
# ? Sep 12, 2023 00:41 |
|
it's nonblocking because the syscalls dont block until the io is finished. it's about the user facing programming model. that the io has to ""block"" at some point when it takes a non-zero amount of time to engrave the bits upon the physical media is unrelated to that. disks are block devices in linux there's no way around that.
|
# ? Sep 12, 2023 00:41 |
|
Subjunctive posted:what do you think “nonblocking” means that’s different from “asynchronous” here? if you’re blocking on something, your flow of execution is synchronous with that something. if you don’t block on it, the execution is asynchronous with respect to the operation. you might not even care to find out if/when completed at all, if you’re just chucking something in a best-effort log file nonblocking implies that its synchronous and immediate i.e it does the i/o without blocking. asynchronous implies you start the operation and it finishes later.
|
# ? Sep 12, 2023 00:50 |
|
pseudorandom name posted:that's async IO yeah see i dont like that either cause in the event the kernel says no then you have non-blocking i/o litterrally blocking i/o from happening. maybe opportunistic i/o would be a better term.
|
# ? Sep 12, 2023 00:56 |
|
BobHoward posted:cannot be emphasized enough that random disks will just decide to claim writes are durable when they aren't yet one of the nice things about targeting apple platforms is that on apple hardware when writing to internal storage the os can actually guarantee that f_fullfsync works and everything has genuinely been persisted by the time it runs since they have control over the entire stack. the downside is that it turns out that's a really loving expensive thing to guarantee. thankfully they seem to have finally gotten write barriers working and that's usually all that you actually need. Shaggar posted:nonblocking implies that its synchronous and immediate no it doesn't. if the meaning you've made up for a word doesn't match how people use it, that means that the meaning you made up is wrong, not that everyone using the word is wrong.
|
# ? Sep 12, 2023 00:56 |
|
Vanadium posted:it's nonblocking because the syscalls dont block until the io is finished. it's about the user facing programming model. that the io has to ""block"" at some point when it takes a non-zero amount of time to engrave the bits upon the physical media is unrelated to that. disks are block devices in linux there's no way around that. so dont call it non blocking if its going to eventually block.
|
# ? Sep 12, 2023 00:57 |
|
Plorkyeran posted:one of the nice things about targeting apple platforms is that on apple hardware when writing to internal storage the os can actually guarantee that f_fullfsync works and everything has genuinely been persisted by the time it runs since they have control over the entire stack. when someone adds "non" to a word it means to opposite of the following word. Non-blocking <X> means X does not block. i/o means input/output. its a term used for the process of writing or reading data. non-blocking i/o therefore means "a process of writing or reading data that does not block". If you do block at some point when doing i/o, then you are not doing non-blocking i/o if you do not write or read data at some point, then you are not doing non-blocking i/o Shaggar fucked around with this message at 01:04 on Sep 12, 2023 |
# ? Sep 12, 2023 01:00 |
|
Subjunctive posted:Linux has had signal-notified I/O completions for a long time, no? uring is new but AIO is twenty years old now and plenty of platforms have had asynchronous I/O for far longer, most minicomputer systems from 1960-1980 had asynchronous APIs for interactive I/O, as did the microcomputer systems that their developers worked on afterwards for example, this is why the original Macintosh used IOParameterBlock structures that included support for completion routines; lots of people who worked on Lisa had previously worked on HP 3000, which used a similar parameter block design for asynchronous I/O, and the ideas filtered over from them—on Macintosh they were necessary for maintaining interactive performance in the face of a very slow floppy disk, on HP 3000 they were necessary to maintain interactive performance in the face of many interactive block-mode terminals (on 16-bit minicomputers running at 1MHz with 1MB of RAM) as usual, UNIX was the odd one out here, taking a “blocking I/O and forked processes are all you’ll ever need” approach for implementation simplicity—just like EINTR was created so the kernel authors didn’t have to figure out how to implement interruptable system calls, and just pushed the problem off to clients many things in the UNIX world have basically been rediscovery of things its contemporaries did and why they did them
|
# ? Sep 12, 2023 01:20 |
|
Sweeper posted:it’s weirdly hard to make sure your writes are durable, might be impossible when your disk has its own cache you can’t flush or actively lies to you about flushing
|
# ? Sep 12, 2023 01:23 |
|
blocking/non blocking refers to the syscall. and synchronous/asynchronous, here, refers to the programming model a synchronous program can make blocking calls, and it can make non blocking calls and poll for updates (hello select()) an asynchronous program can make make non blocking or blocking calls, but if you call a blocking operation inside a coroutine, the entire thing pauses, because asynchronous programming relies on apartment/cooperative threading and it's funny because shaggar might as well be yelling about close() throwing an exception
|
# ? Sep 12, 2023 01:27 |
|
Shaggar posted:when someone adds "non" to a word it means to opposite of the following word. Non-blocking <X> means X does not block. i/o means input/output. its a term used for the process of writing or reading data. non-blocking i/o therefore means "a process of writing or reading data that does not block". "blocking" means "suspending the current thread"
|
# ? Sep 12, 2023 02:14 |
|
so its fire and forget?
|
# ? Sep 12, 2023 02:16 |
|
only if you forget to not check that the aio_write that you fired off completes, op
|
# ? Sep 12, 2023 02:59 |
|
what happens to the thread if aio_write hasnt completed and i need to ensure it completes?
|
# ? Sep 12, 2023 03:17 |
|
the thread does other work until the IO completes you know, the topic of the start of this conversation -- threads vs. state machines vs. async/await
|
# ? Sep 12, 2023 03:22 |
|
|
# ? May 27, 2024 02:50 |
|
do you mean “what should i do if it’s time for pthread_exit() but I still have outstanding operations to complete”? you’re still free to block on pending events’s state changes, epoll-style, in a case like that I suppose
|
# ? Sep 12, 2023 03:29 |