Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Sapozhnik posted:

i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial.

the same is technically true of function calls yes but when a function call has random-rear end side effects on things that aren't its arguments then that is considered impolite. when you await and your shared state is mutated under your feet then that's kind of the entire point.

note this for yosgoons.xls, agreeing with Sapozhnik here

Adbot
ADBOT LOVES YOU

crazypenguin
Mar 9, 2005
nothing witty here, move along

VikingofRock posted:

Thanks for this answer; it was really helpful. To follow up on the quoted bit: I would have guessed that concurrent cases could be handled by crossbeam's queues / channels / etc, where you could have one thread listening for events, and then send them to worker threads for processing via queues or channels. Under what sort of circumstances would that break down, but async would work? I could imagine a problem if you have many types of heterogenous tasks to handle, but in that case, I would guess that a threadpool would work?

Sorry if this is a dumb / obvious question, I've never really needed to use async and I'm trying to understand the use cases

Hmm, it's very complicated. Not wholly because it's complicated in and of itself, but this is one of those things with history.

Because computation in the physical world is inherently a concurrent events sort of thing. Even on old single-core machines with no network, you're submitting requests to disks, and getting events (IRQs) from keyboard and sound cards and so on.

But we started out with a non-concurrent sequential programming model for everything. Oops, the real world is concurrent, how do we cope? Blocking APIs. Read from disk? Ok, send the request and just wait for the response.

But blocking means we're wasting time when we could be doing something else. And so we get threads. For concurrency. (Confusing everything, because today we'd usually say threads are for parallelism, and not the best abstraction for concurrent programming. But they do both!) All this before multicore processors were even a thing. Now one thread can block waiting, but you can do work on another!

So we hosed up by programming a concurrent world with non-concurrent languages and APIs that blocked, then papered over it with threads.

But for all the reasons rjmccall posted, threads don't scale well. (Again, not inherently, exactly, but at least partially for path-dependency historical reasons.) There used to be this thing called the "C10k problem" about handling bigger and bigger servers that might interest anyone who likes history here.

So we also get NON-blocking APIs you can poll... for some things. Mostly network sockets (for those servers). And depressingly, we don't get non-blocking APIs for even really common things like disk i/o until depressingly recently. (SO depressingly recently, you still can't count on the APIs being available today because your linux distro's kernel might be too old.)

But now with events, you have a programming model/abstraction problem. Ok, you get an event for something, but what does your event loop do with it? How do you track what the next step of whatever "thing" was going on there?

You've got a few options:

1. Well poo poo, just use threads. Still a common option, despite its lack of scaling. Thread wakes up where it was, blocking call done, continues.
2. "Green threads." They look like threads (cool!), but secretly use event loops.
3. Continuations / call-backs / promises. This was the "node.js way". You put together a closure that handles the response, so the event loop can answer "what do I do now?" with "just call the closure."
4. State machines / async/await. Explicitly represented state machines are uncommon for ??reasons?, but async/await is basically a way of automating their creation. Event loop gets event, finds the state machine it corresponds to, and makes it go brrr

So looping back around to your question about "where would that break down", it's mostly an efficiency thing. But personally I think there's a lot of value to the mental model differences between "threads" and "events". But this gets complicated by the lack of appropriate OS APIs for certain things (and the standard library APIs as well: Rust after all gives you blocking read from files because it doesn't really have a choice, but now you have a problem if you want to do that from a concurrent events context...)

Representing the concurrency of "read two files at once" is much better off thought of a "kick off two non-blocking reads then wait for them to finish" than as "create an entire new thread to do that other read, then join the results." But regardless today we usually have to do "send two read requests to the 'sacrificial blocking-io worker threadpool' then wait for them to send finish events back."

I think it's a windmill worth tilting at. Maybe in 30 years we'll finally have kernels/languages/libraries/etc that let us write programs right.

VikingofRock
Aug 24, 2008




crazypenguin posted:

[awesome effortpost]

Thank you so much for this, this helps clear things up a lot. Posts like this are why I love this forum

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

crazypenguin posted:

So we also get NON-blocking APIs you can poll... for some things. Mostly network sockets (for those servers). And depressingly, we don't get non-blocking APIs for even really common things like disk i/o until depressingly recently. (SO depressingly recently, you still can't count on the APIs being available today because your linux distro's kernel might be too old.)

Linux has had signal-notified I/O completions for a long time, no? uring is new but AIO is twenty years old now

I recall that AIO and the aiocb */SIGIO was part of the response to c10k

(I remember being so impressed that IRIX in 1998 would let you pass a mutex to poll so you could wait on threads and sockets in a unified way)

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

that was also when Linux folks were using lmbench to push on microbenchmarks and beat Solaris and friends on things like context switch overhead and localhost network performance. David Miller was adorably fierce about that

Shaggar
Apr 26, 2006
lol at still using the terminology "non-blocking i/o" to refer to unsafe writes

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

sure, I’ll walk into it—what is unsafe about the writes from async I/O like completion ports or uring or posix AIO?

crazypenguin
Mar 9, 2005
nothing witty here, move along

Subjunctive posted:

Linux has had signal-notified I/O completions for a long time, no? uring is new but AIO is twenty years old now

I recall that AIO and the aiocb */SIGIO was part of the response to c10k

On Linux "AIO" had so many caveats that I decided they were not worth remembering because they boiled down to "do not use."

Like, I think it just didn't work on most filesystems, and required O_DIRECT, or otherwise it just actually silently did blocking IO.

And if my memory isn't failing me, the only meaningful software that ever used it was Oracle.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

there was some toy “high performance” web and ftp servers that used it I think, but yeah it wasn’t great

Shaggar
Apr 26, 2006

Subjunctive posted:

sure, I’ll walk into it—what is unsafe about the writes from async I/O like completion ports or uring or posix AIO?

how are you avoiding blocking when writing to some device?

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
glibc's aio implementation is basically a practical joke. it just spawns a thread for you and does blocking io there, and if that's what you want it's easier to do it directly than to use aio. it would make sense as a compatibility shim for running existing programs which were written for an os with a useful implementation of aio, but not much else

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Shaggar posted:

how are you avoiding blocking when writing to some device?

passing ownership of the buffer to the kernel, and then getting ownership of that buffer back once it’s filled? the same way you do it if you’re blocking really

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

but even without good ownership enforcement, racing writes into the buffer after it’s passed to the kernel isn’t necessarily unsafe any more than shearing on a non-vsync rendering output is. what behaviour are you worried about, and what safety principle is being violated?

Dylan16807
May 12, 2010

Sapozhnik posted:

i never found the whole "what color are your functions" argument compelling because the await keyword is a pretty handy big syntactical neon sign saying "hey the entire world might change under your feet here" and having that called out in concurrent code is beneficial.

the same is technically true of function calls yes but when a function call has random-rear end side effects on things that aren't its arguments then that is considered impolite. when you await and your shared state is mutated under your feet then that's kind of the entire point.
it's hard to impossible to avoid blocking or being rescheduled at random points and that will change the world out from under you just as badly as 'await'

and part of the annoyance is that even if you're happy explicitly designing and marking functions as async-compatible, in most languages now they won't work in synchronous contexts. if I have to copy-paste a function just to use it both ways then something has gone wrong in the expressiveness of my programming language.

Shaggar
Apr 26, 2006

Subjunctive posted:

but even without good ownership enforcement, racing writes into the buffer after it’s passed to the kernel isn’t necessarily unsafe any more than shearing on a non-vsync rendering output is. what behaviour are you worried about, and what safety principle is being violated?

i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term

Grum
May 7, 2007
punted i/o

pseudorandom name
May 6, 2007

Plorkyeran posted:

glibc's aio implementation is basically a practical joke. it just spawns a thread for you and does blocking io there, and if that's what you want it's easier to do it directly than to use aio. it would make sense as a compatibility shim for running existing programs which were written for an os with a useful implementation of aio, but not much else

there's also the Oracle-funded kernel AIO that does the bare minimum to make Oracle go brrrr and blocks at random if you use it outside of Oracle's happy path

and a Red Hat developed version of glibc POSIX AIO that uses the kernel AIO, that they gave up on because see above


Shaggar posted:

i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term

non-blocking IO is when your socket read or write returns EWOULDBLOCK, it doesn't have anything to do with disk IO or caching

Bloody
Mar 3, 2013

blocking io isn’t guaranteed to write anything to disk anyways so who care

tef
May 30, 2004

-> some l-system crap ->

Shaggar posted:

i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term

i have bad news about blocking io for very similar reasons

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

tef posted:

i have bad news about blocking io for very similar reasons

it’s weirdly hard to make sure your writes are durable, might be impossible when your disk has its own cache you can’t flush

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

pseudorandom name posted:

non-blocking IO is when your socket read or write returns EWOULDBLOCK, it doesn't have anything to do with disk IO or caching

you can have non-blocking I/O for disk and should if you're triggering it from a UI thread. all of Chrome's disk I/O is async if it's for the UI thread to consume, I believe. putting your test profile data on a network filesystem is a great way to develop a sensitivity to blocking disk I/O

Shaggar posted:

i just have distaste for the terminology of "non-blocking I/O" because it implies the I/O has happened when it hasn't. you've fired it into a pile of caches and power backups that let you pretend it doesnt have to block eventually. its mostly safe but i would prefer some other term

sorry, do you mean that you think that blocking I/O only returns once the data has been written to its final storage, be that disk or some SMB share or a socket or something? even blocking I/O returns once it's written into the page cache or a socket buffer, unless you go out of your way to do much slower I/O that waits for a device sync (and even then you might get lied to and probably will on consumer OSes) or your buffers are full and you need to wait for the kernel to make space by processing previous requests

non-blocking I/O is explicitly about the I/O not having happened when the requesting API call is complete, but rather later when the kernel signals that the underlying interaction has completed

Shaggar
Apr 26, 2006
so call it asynchronous i/o

BobHoward
Feb 13, 2012

The only thing white people deserve is a bullet to their empty skull

Sweeper posted:

it’s weirdly hard to make sure your writes are durable, might be impossible when your disk has its own cache you can’t flush

cannot be emphasized enough that random disks will just decide to claim writes are durable when they aren't yet

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Shaggar posted:

so call it asynchronous i/o

what do you think “nonblocking” means that’s different from “asynchronous” here? if you’re blocking on something, your flow of execution is synchronous with that something. if you don’t block on it, the execution is asynchronous with respect to the operation. you might not even care to find out if/when completed at all, if you’re just chucking something in a best-effort log file

pseudorandom name
May 6, 2007

Subjunctive posted:

you can have non-blocking I/O for disk and should if you're triggering it from a UI thread. all of Chrome's disk I/O is async if it's for the UI thread to consume, I believe. putting your test profile data on a network filesystem is a great way to develop a sensitivity to blocking disk I/O

that's async IO

blocking IO - you initiate the IO operation and maybe it blocks or maybe it doesn't, depending on caching and what kind of durability guarantees you demand

non-blocking IO - you initiate the IO and either it happens or the kernel says no I'm not doing that right now

async IO - you initiate the IO operation and at some later point in time you are informed of its completion

Vanadium
Jan 8, 2005

it's nonblocking because the syscalls dont block until the io is finished. it's about the user facing programming model. that the io has to ""block"" at some point when it takes a non-zero amount of time to engrave the bits upon the physical media is unrelated to that. disks are block devices in linux there's no way around that.

Shaggar
Apr 26, 2006

Subjunctive posted:

what do you think “nonblocking” means that’s different from “asynchronous” here? if you’re blocking on something, your flow of execution is synchronous with that something. if you don’t block on it, the execution is asynchronous with respect to the operation. you might not even care to find out if/when completed at all, if you’re just chucking something in a best-effort log file

nonblocking implies that its synchronous and immediate i.e it does the i/o without blocking. asynchronous implies you start the operation and it finishes later.

Shaggar
Apr 26, 2006

pseudorandom name posted:

that's async IO

blocking IO - you initiate the IO operation and maybe it blocks or maybe it doesn't, depending on caching and what kind of durability guarantees you demand

non-blocking IO - you initiate the IO and either it happens or the kernel says no I'm not doing that right now

async IO - you initiate the IO operation and at some later point in time you are informed of its completion

yeah see i dont like that either cause in the event the kernel says no then you have non-blocking i/o litterrally blocking i/o from happening. maybe opportunistic i/o would be a better term.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

BobHoward posted:

cannot be emphasized enough that random disks will just decide to claim writes are durable when they aren't yet

one of the nice things about targeting apple platforms is that on apple hardware when writing to internal storage the os can actually guarantee that f_fullfsync works and everything has genuinely been persisted by the time it runs since they have control over the entire stack.

the downside is that it turns out that's a really loving expensive thing to guarantee. thankfully they seem to have finally gotten write barriers working and that's usually all that you actually need.

Shaggar posted:

nonblocking implies that its synchronous and immediate

no it doesn't. if the meaning you've made up for a word doesn't match how people use it, that means that the meaning you made up is wrong, not that everyone using the word is wrong.

Shaggar
Apr 26, 2006

Vanadium posted:

it's nonblocking because the syscalls dont block until the io is finished. it's about the user facing programming model. that the io has to ""block"" at some point when it takes a non-zero amount of time to engrave the bits upon the physical media is unrelated to that. disks are block devices in linux there's no way around that.

so dont call it non blocking if its going to eventually block.

Shaggar
Apr 26, 2006

Plorkyeran posted:

one of the nice things about targeting apple platforms is that on apple hardware when writing to internal storage the os can actually guarantee that f_fullfsync works and everything has genuinely been persisted by the time it runs since they have control over the entire stack.

the downside is that it turns out that's a really loving expensive thing to guarantee. thankfully they seem to have finally gotten write barriers working and that's usually all that you actually need.

no it doesn't. if the meaning you've made up for a word doesn't match how people use it, that means that the meaning you made up is wrong, not that everyone using the word is wrong.

when someone adds "non" to a word it means to opposite of the following word. Non-blocking <X> means X does not block. i/o means input/output. its a term used for the process of writing or reading data. non-blocking i/o therefore means "a process of writing or reading data that does not block".

If you do block at some point when doing i/o, then you are not doing non-blocking i/o
if you do not write or read data at some point, then you are not doing non-blocking i/o

Shaggar fucked around with this message at 01:04 on Sep 12, 2023

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Subjunctive posted:

Linux has had signal-notified I/O completions for a long time, no? uring is new but AIO is twenty years old now

and plenty of platforms have had asynchronous I/O for far longer, most minicomputer systems from 1960-1980 had asynchronous APIs for interactive I/O, as did the microcomputer systems that their developers worked on afterwards

for example, this is why the original Macintosh used IOParameterBlock structures that included support for completion routines; lots of people who worked on Lisa had previously worked on HP 3000, which used a similar parameter block design for asynchronous I/O, and the ideas filtered over from them—on Macintosh they were necessary for maintaining interactive performance in the face of a very slow floppy disk, on HP 3000 they were necessary to maintain interactive performance in the face of many interactive block-mode terminals (on 16-bit minicomputers running at 1MHz with 1MB of RAM)

as usual, UNIX was the odd one out here, taking a “blocking I/O and forked processes are all you’ll ever need” approach for implementation simplicity—just like EINTR was created so the kernel authors didn’t have to figure out how to implement interruptable system calls, and just pushed the problem off to clients

many things in the UNIX world have basically been rediscovery of things its contemporaries did and why they did them

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Sweeper posted:

it’s weirdly hard to make sure your writes are durable, might be impossible when your disk has its own cache you can’t flush

or actively lies to you about flushing

tef
May 30, 2004

-> some l-system crap ->
blocking/non blocking refers to the syscall. and synchronous/asynchronous, here, refers to the programming model

a synchronous program can make blocking calls, and it can make non blocking calls and poll for updates (hello select())

an asynchronous program can make make non blocking or blocking calls, but if you call a blocking operation inside a coroutine, the entire thing pauses, because asynchronous programming relies on apartment/cooperative threading

and it's funny because shaggar might as well be yelling about close() throwing an exception

pseudorandom name
May 6, 2007

Shaggar posted:

when someone adds "non" to a word it means to opposite of the following word. Non-blocking <X> means X does not block. i/o means input/output. its a term used for the process of writing or reading data. non-blocking i/o therefore means "a process of writing or reading data that does not block".

If you do block at some point when doing i/o, then you are not doing non-blocking i/o
if you do not write or read data at some point, then you are not doing non-blocking i/o

"blocking" means "suspending the current thread"

Shaggar
Apr 26, 2006
so its fire and forget?

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

only if you forget to not check that the aio_write that you fired off completes, op

Shaggar
Apr 26, 2006
what happens to the thread if aio_write hasnt completed and i need to ensure it completes?

pseudorandom name
May 6, 2007

the thread does other work until the IO completes

you know, the topic of the start of this conversation -- threads vs. state machines vs. async/await

Adbot
ADBOT LOVES YOU

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

do you mean “what should i do if it’s time for pthread_exit() but I still have outstanding operations to complete”? you’re still free to block on pending events’s state changes, epoll-style, in a case like that I suppose

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply