The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

«‹›1037 »

Mescal: Jul 23, 2005

while you were posting i tried something i hadnt yet--hitting shift wile restarting, (while still in win) gives you option to boot from usb device, then gives me a blank dialog box with "ok." that's all. i click ok, then i finally get to 'boot manager' which was no available from the usual way.
my only two options are
os boot mgr uefi hgst hts54 1075 a9e680
or
boot from efi file
i'll keep that wehere it is on option 1 and keep going. that thumb drive was fresh from amazon yesterday. i will try cpu viking's suggestions after i figure out secure boot. i don't care for this computers bios at all.

edit: shift-restart, troubleshoot, then i get to the startup menu!! this fkn screen, is this the ones thats prevented from my drat use due to secure boot?
sysinfo
diagnos
boot device opt
bios setup
system recover (im taking photos of these screens if needed)

and then once again i'm in boot opt menu--i can change from os boot mgr to boot from efi file. that didnt sound right a minute ago but now itdoes. ill change that.

Mescal fucked around with this message at 14:50 on Apr 27, 2023

# ? Apr 27, 2023 14:44

Adbot: ADBOT LOVES YOU

# ? May 31, 2024 22:40

Computer viking: May 30, 2011; Now with less breakage.

Mescal posted:

while you were posting i tried something i hadnt yet--hitting shift wile restarting, (while still in win) gives you option to boot from usb device, then gives me a blank dialog box with "ok." that's all. i click ok, then i finally get to 'boot manager' which was no available from the usual way.
my only two options are
os boot mgr uefi hgst hts54 1075 a9e680
or
boot from efi file
i'll keep that wehere it is on option 1 and keep going. that thumb drive was fresh from amazon yesterday. i will try cpu viking's suggestions after i figure out secure boot. i don't care for this computers bios at all.

That does sound like secure boot�problems.

# ? Apr 27, 2023 14:45

Mescal: Jul 23, 2005

Computer viking posted:

That does sound like secure boot�problems.

yeah i choose option two then it allows to click on No volume label my usb drive yay

# ? Apr 27, 2023 14:51

Computer viking: May 30, 2011; Now with less breakage.

Mescal posted:

yeah i choose option two then it allows to click on No volume label my usb drive yay

That's kind of good (the drive is working), but I don't think it'll solve the problem - the reason it's not listed as a boot alternative is probably that it didn't find any files on there that are signed for secure boot.

# ? Apr 27, 2023 14:53

Mescal: Jul 23, 2005

# ? Apr 27, 2023 14:57

Mescal: Jul 23, 2005

Computer viking posted:

That's kind of good (the drive is working), but I don't think it'll solve the problem - the reason it's not listed as a boot alternative is probably that it didn't find any files on there that are signed for secure boot.

author again with something other than rufus maybe?e

# ? Apr 27, 2023 14:57

Computer viking: May 30, 2011; Now with less breakage.

Mescal posted:

author again with something other than rufus maybe?e

Did you disable secure boot?

e: It looks like it's under System Configuration , Boot Options on a typical HP BIOS, if that helps.

Computer viking fucked around with this message at 15:06 on Apr 27, 2023

# ? Apr 27, 2023 14:59

unruly: May 12, 2002; YES!!!

Mescal posted:

yeah i choose option two then it allows to click on No volume label my usb drive yay

then ive got file explorer
.
..
microsoft
boot
hp (these are all in <> brackets)
i click boot and under boot there's boot64.efi, i selected it, and it booted to win

i fuckin installed windows on this thing cause i thought if i want to dual boot in the future it will probably be easier install windows before linux. i honestly dont know whether i would have been able to get into the actual boot menu if win weren't installed. would i have had secure boot probs if there were no OS installed on the hdd?

anyway i selected the boot file thats on the thumb drive. but it still booted into win.

i can wipe the drive if thatll remove secure boot

i am following googled instructions and y'all advice

Secure Boot is managed by the EFI/BIOS system. Any bootloader needs to be signed by a trusted vendor, and then the kernel loaders after that and so on.

Turning off Secure Boot in EFI/BIOS means that the firmware ignores signatures (or lack thereof). This can help you install an OS that might need more setup to support it, or just doesn't. It's not really needed unless you're paranoid about the security of the boot system and kernel modules.

I don't know what kind of interface you have available, but generally, in the EFI/BIOS menus or setup, they have an option to remove or change not only the boot order, but entries, too. If you have entries that aren't working, you can remove them.

If you're ultimately trying to just install Linux, then feel free to nuke anything in there and set the device order to ensure that USB is first (so you can boot the disk). Otherwise, it's a bit more of a mess. Windows likes to own the bootloader regardless of what else is installed on other disks. It will clobber GRUB or SHIM or whathaveyou on an update leaving you with a Windows "only" booting system until you manually fix it. Honestly, in my opinion, dual boot hasn't been worth it for years.

My own recommendation is either use Windows (with WSL for your Linux needs) or Linux with libvirt or VirtualBox (or WINE/Proton) for your Windows needs.

# ? Apr 27, 2023 15:06

Mescal: Jul 23, 2005

first of all how the heck do people get instructions like what you've given me if somebody like you isn't around? MS's instructions, step one was contact your administrator. rofl

Computer viking posted:

On the windows side, could you try one thing?

With the USB drive connected:
Type cmd in the start menu, which should find the command prompt. Right-click it, and Run as Administrator.
In that command prompt, run diskpart
That should print something like "Microsoft DiskPart version 10.xxx" and give you a DISKPART> prompt.
At that prompt, type list disk
Ideally, this should list a big Disk 0 (the SSD in the laptop), and a smaller Disk 1 (the USB stick). If it has an SD card reader or some other weirdness, there may be more of them.
Type select disk 1 (or whichever disk is the USB drive).
Then run clean to remove any partition tables on the disk - it's not a complete wipe, but it will make it look factory fresh in the ways that count here.
Make sure it gets a GPT partition table with convert gpt
and create a single partition that covers the entire thing with create partition primary (the default is to cover the entire drive, which is fine here)
Format it with format fs=fat32 quick
and exit with exit

Then unplug/replug the drive, and test it a bit - copy some files over, try to view a few of them, and try to delete everything.
If that works, good - try writing the linux image again. If not, the USB stick is probably dead.

Like this:

yeah i choose option two then it allows to click on No volume label my usb drive yay

then ive got file explorer
.
..
microsoft
boot
hp (these are all in <> brackets)
i click boot and under boot there's boot64.efi, i selected it, and it booted to win

i fuckin installed windows on this thing cause i thought if i want to dual boot in the future it will probably be easier install windows before linux. i honestly dont know whether i would have been able to get into the actual boot menu if win weren't installed. would i have had secure boot probs if there were no OS installed on the hdd?

anyway i selected the boot file thats on the thumb drive. but it still booted into win.

i can wipe the drive if thatll remove secure boot

i am following googled instructions and y'all advice

disc0 online size 698gb free 5120kb
disk1 online 1000gb 0kb

-i am formatting it in CMD and it's at zero percent with no movement yet

it is annoying how there are multiple ways to run as admin and usually all are grayed out. yesterday i found one place to un-gray an option and it was in the last place you would ever look.i wish there had been an option for "restore this pc to grown-up mode for admins where you can do things"

yeah, this format is not working. poo poo. i must have fried it with my corrupted data from an old drive?

# ? Apr 27, 2023 15:10

Klyith: Aug 3, 2007; GBS Pledge Week

Mescal posted:

i can wipe the drive if thatll remove secure boot

i am following googled instructions and y'all advice

No, secure boot is an option in the BIOS settings of the PC.

On a HP computer that's normally pressing F10 when the machine first boots up. (But might be something else, like F1, 11, 12, or Del. This can turn into a game of twister with your fingers when you try to hit all of them on a PC where you don't know the right key.)

unruly posted:

There are some distros that can use TPM/Secure Boot.

Yeah, though even that can sometimes be unfortunately complicated. For ex, PopOS supports secure boot (because they use ubuntu's signed stuff), but their installer doesn't.

unruly posted:

I am, however, not pleased with how Microsoft-centric TPM and SecureBoot are.

Yeah. There's a bit where I understand how it happened and don't want to be too :tinfoil:

about it -- back when the spec was made there wasn't a good single entity to trust with a secure boot key on behalf of linux. But it amounted to the same boil-the-frog control over alternate OSes in the end, now that MS made it mandatory.

IMO that should be revisited if MS wanted to show good faith, but it ain't gonna happen.

# ? Apr 27, 2023 15:10

Mescal: Jul 23, 2005

how should i shut down the formatting process in cmd if it's not working? EDIT: Wait it just hit one percent. this is like when windows tried to repair the disk, it took forever, got to eighty, and failed. can formatting take about ten hours? that's what it'll take at this rate. should i shut down the format and try again on linux EDIT: never mind i canceled the format accidentally im gonna put it in the steam deck and try another format in konsole

Mescal fucked around with this message at 15:27 on Apr 27, 2023

# ? Apr 27, 2023 15:17

Mescal: Jul 23, 2005

ive gotten to the pt where i can manually boot from something other than windows, but haven't yet found where to turn off secure boot.

again, this screen says boot option menu - os boot mgr - boot from efi file. nothing for turning off secure boot.
you can hit f10 to go in the "main" (bullshit) bios menu, but obv that's not where the boot options are. that's under the boot options menu. but it doesnt tell me all the options that are usually in a boot options menu. ill upload screen shots

i would be FINE at this point if i hadnt corrupted the thumb drive

# ? Apr 27, 2023 15:26

Kibner: Oct 21, 2008; Acguy Supremacy

You should be seeing a menu like this to disable Secure Boot:

# ? Apr 27, 2023 15:30

Mescal: Jul 23, 2005

thanks! secure boot is now disabled. that was NOT where all the google results said. HP should fuckin have these instructions, their instructions for this model were not adequate.

i'll see if i can find space on a non corrupted drive somewhere.

# ? Apr 27, 2023 15:37

spiritual bypass: Feb 19, 2008; Grimey Drawer

I have never seen BIOS/UEFI instructions that weren't terrible

# ? Apr 27, 2023 16:10

ExcessBLarg!: Sep 1, 2001

Klyith posted:

Yeah. There's a bit where I understand how it happened and don't want to be too about it -- back when the spec was made there wasn't a good single entity to trust with a secure boot key on behalf of linux. But it amounted to the same boil-the-frog control over alternate OSes in the end, now that MS made it mandatory.

Secure boot exists primarily as a deterrent to boot-time persistent rootkits since if you shim the boot chain without proper signatures machines won't boot anymore. For the most part, I think it was right that MS pushed for secure boot and making it mandatory to boot Windows, since that nixes a significant attack vector for malware.

The "wishful thinking" part was designing secure boot to be theoretically vendor agnostic, but the reality is that unless the OEM is preloading keys, getting your keys into the BIOS is practically impossible. Again, the mechanisms exist in theory, but who exactly is doing QA on that? Nobody, so nobody does it.

Is it actually mandatory to enable secure boot across UEFI hardware now? For years I bought Intel NUCs that had secure boot disabled by default since they didn't shop as complete builds (no disk, no Windows). In the most recent generation I do see secure boot enabled by default but like, during a build you have to go into the BIOS anyways so it's easy to disable that.

# ? Apr 27, 2023 16:29

Volguus: Mar 3, 2009

There are vendors of lovely machines that enable it and do not allow disabling it, but as far as I know OS-es can and do boot just fine with it disabled. On windows, certain things like BitLocker may not work, but meh...

# ? Apr 27, 2023 16:46

Mega Comrade: Apr 22, 2004; Listen buddy, we all got problems!

At my last job my work laptop motherboard went, so dell came out and changed it over.

They however didn't reset the tpm so I was hard locked out of office 365 apps.

# ? Apr 27, 2023 16:52

cruft: Oct 25, 2007

Klyith posted:

The new MS regime of enforced secure boot for win11 is gonna be a real fucker. Hoo boy.

Can you elaborate on this? I've been trying to pretend secure boot doesn't exist but it sounds like I may not be able to soon.

# ? Apr 27, 2023 17:06

Klyith: Aug 3, 2007; GBS Pledge Week

Mescal posted:

thanks! secure boot is now disabled. that was NOT where all the google results said. HP should fuckin have these instructions, their instructions for this model were not adequate.

It's a relatively new thing. Secure boot was disabled by default on almost all PCs -- business-class OEM laptops were afaik the main exception -- until very recently when MS made it a requirement for Windows 11.

ExcessBLarg! posted:

Secure boot exists primarily as a deterrent to boot-time persistent rootkits since if you shim the boot chain without proper signatures machines won't boot anymore. For the most part, I think it was right that MS pushed for secure boot and making it mandatory to boot Windows, since that nixes a significant attack vector for malware.

Yes, but... against consumer machines you can attack the UEFI itself without that much additional difficulty. They're more standardized than you think, and the convenience features for in-OS firmware updates mean you just need to breach the normal admin & defender safeguards. There are systems that also lock down that vector, but they're not normal in consumer PCs (and are pretty customer-hostile if you don't know what you're getting).

Meanwhile, bootkit attacks that function under secure boot are happening. MS is making noise about disable 3rd party secure boot entirely with the pluton stuff. They're blaming grub for being insecure, but they've been poo poo themselves handing out far worse signatures to people other than linux.

I dunno, I see the security benefit, but it also seems like it may become a black box where MS says "trust us" and you have no alternative. I feel like MS has good motivations for securing their OS, but bad motivations for including anyone else or allowing consumers to stay in control of their own devices. And some of their PR on the subject has been straight up disingenuous -- TPM is also required for Win11, and when they announced it they said it was big for security. But then when Win11 actually came out they had to admit that the TPM isn't being used for anything that 10 wasn't doing.

tl;dr Windows 11 is why I'm using linux

cruft posted:

Can you elaborate on this? I've been trying to pretend secure boot doesn't exist but it sounds like I may not be able to soon.

ExcessBLarg! posted:

Is it actually mandatory to enable secure boot across UEFI hardware now?

Yes, if you want to sell a PC with an official Windows sticker, having secure boot on is mandatory. You're still allowed to have a on/off switch to turn it off in bios. So you don't need to deal with it, you can just do that... for now. :tinfoil:

(And windows 11 won't install / upgrade if you don't have it turned on, so many existing PCs got firmware updates that enabled it. My crap laptop, which isn't even win11-compatible because the CPU is too old, got a bios update delivered via windows update to turn it on.)

# ? Apr 27, 2023 17:21

Mescal: Jul 23, 2005

sorry ive been on the phone all day. i've got a setting i'm about to hit here in the bios. this is different from the other one, which apparently disabled secure boot once. this one says clear all secure boot keys and certificates from secure boot databases.

# ? Apr 27, 2023 17:44

Computer viking: May 30, 2011; Now with less breakage.

Mescal posted:

sorry ive been on the phone all day. i've got a setting i'm about to hit here in the bios. this is different from the other one, which apparently disabled secure boot once. this one says clear all secure boot keys and certificates from secure boot databases.

That sounds like something you should leave alone.

# ? Apr 27, 2023 17:51

Mr. Crow: May 22, 2008; Snap City mayor for life

BrainDance posted:

As far as I could find out last time I went looking for a solution Jellyfin doesnt for shows, at least it hasnt since the last time I updated my library. I dont use Plex anymore so maybe they do now? But Jellyfin will only really add something as a show if it somewhat matches something in the databases (which is how I learned there's something called Arga Snickaren, The Angry Carpainter in Sweden from my SNICK rips) and, even then, because the episodes don't really match it wont be right. And my Tech TV videos don't get picked up (they're probably in the databases but don't match what I have) besides a couple.

It's probably different for movies, I have some movies too that it added as just the filename because they're Chinese. But when I started asking about shows before I was told "you gotta just make the nfo files for it." If I made a movies playlist for them that would just be a mess because it's so many things (my big archive of videos from ZDTV/TechTV that I am nostalgic for is 6,723 files, probably about 80% are videos. Just love watching awkward people call Leo Laporte before we knew he was a pervert asking how they can connect their printer to their Windows 98 machine.) And, going back to Plex or running a Plex and Jellyfin server together is something I'd rather not do.

works fine op

Only registered members can see post attachments!

# ? Apr 27, 2023 18:02

ExcessBLarg!: Sep 1, 2001

Klyith posted:

It's a relatively new thing. Secure boot was disabled by default on almost all PCs -- business-class OEM laptops were afaik the main exception -- until very recently when MS made it a requirement for Windows 11.

Our 2015 Dell XPS had secure boot enabled, but maybe that's a business-class OEM laptop.

Klyith posted:

Yes, but... against consumer machines you can attack the UEFI itself without that much additional difficulty. They're more standardized than you think, and the convenience features for in-OS firmware updates mean you just need to breach the normal admin & defender safeguards.

I'm well aware of UEFI's shortcomings and that you can drive a truck through them.

Klyith posted:

Meanwhile, bootkit attacks that function under secure boot are happening. MS is making noise about disable 3rd party secure boot entirely with the pluton stuff. They're blaming grub for being insecure, but they've been poo poo themselves handing out far worse signatures to people other than linux.

For a while I think their main requirement for signing third-party bootloaders is that they couldn't be used to chainload Windows. Although I'm not really sure why they couldn't if the bootloader can boot EFI binaries and the Windows bootloader is also an EFI binary.

Klyith posted:

I dunno, I see the security benefit, but it also seems like it may become a black box where MS says "trust us" and you have no alternative.

I think the design by committee approach they've taken with Intel (and, presumably "other BIOS vendors") just keeps coming back to bite them. Chrome OS has a very solid security model that works tightly with its hardware while generally affording users significant freedom (except for enterprise enrolled devices). But Google has much more control over its ODMs than Microsoft has been able to achieve.

Klyith posted:

Yes, if you want to sell a PC with an official Windows sticker, having secure boot on is mandatory.

Intel NUCs are sold as barebones PCs without RAM or disk, no OS license, and no Windows stickers in the box so they're, presumably, not under these terms--unless the terms state that if you want to sell any Windows PCs then all PCs you sell have to have it enabled. As a practical matter if Windows 11 boot media won't boot without secure boot enabled then it would be prudent for them to enable it by default even if the machines aren't labeled.

So what's going to happen when the current Microsoft secure boot keys has their certificate expire in 2025? I can't remember the exact date, but it's around then. Will machines across the world stop booting?

# ? Apr 27, 2023 18:17

Mescal: Jul 23, 2005

i gotta ask is it possible that as i was ferrying my backups around, some corrupted data permanently marred the boot sectors on both these thumb drives? or fried the usb controller drivers on two separate pcs? i'm beginning to suspect none of these had filesystem problems at all. oh, except the one. whatever screwed it up is making the 1tb thumbdrive take about 12 hours to format. and the things i'm changing in the bios, i save and exit and when i go back to the bios it shows the changes i made, but in one case they didnt take effect: where i changed the action keys mode. that makes the top keys function keys instead of play pause etc. that's the thing that makes it hard to get in the bios without an external full-size keyboard, and it's not disabled like it says. hell of a day for a toothache. i'm wondering if this thing would even want to boot from a cdrom.

^^thats a post i wrote earlier but did not submit. since then i disabled all safe boot forever, and prevented it from preloading usb3 until windows booted--which may have well been a thing in my way. but now i have to turn it back on because my usb devices arent being recognized.

i did also call my it guy, he couldnt think of anything i hadn't done.

Mescal fucked around with this message at 19:51 on Apr 27, 2023

# ? Apr 27, 2023 19:46

Nitrousoxide: May 30, 2011; do not buy a oneplus phone

I played around with Vanilla OS and I guess it's sorta neat. It mostly seems like an Ubuntu flavor with an A/B immutable image deployment and a wrapper around distrobox called "apx". This lets you do some most of the distrobox manipulation from the host OS rather than having to

code:

distrobox enter

the relevant container and then install it. I don't know if that really adds a lot of value tbh.

The A/B partition takes up A LOT of space compared to an ostree system like silverblue looking at gnome disks. Which, I mean, shouldn't be surprising but... I don't see the value in going A/B compared to ostree.

Otherwise it has some nice defaults for using userland flathub, setting up nvidia (if needed). Nothing really earth shattering there.

You can auto-initialize a nix system with apx and install stuff with that I guess, so that's kind of a neat thing there since it translates the really loving complicated nix installs into much simpler apx --nix install (package name) commands.

Too bad it doesn't look like nix works for gui apps nor does it have a version of distrobox export to auto-create .desktop files for stuff you install there.

# ? Apr 27, 2023 19:59

bsaber: Jul 27, 2007

Mescal posted:

^^thats a post i wrote earlier but did not submit. since then i disabled all safe boot forever, and prevented it from preloading usb3 until windows booted--which may have well been a thing in my way. but now i have to turn it back on because my usb devices arent being recognized.

i did also call my it guy, he couldnt think of anything i hadn't done.

Does windows still boot? If so, check and see if fast boot is enabled and disable it if enabled.

# ? Apr 27, 2023 20:41

BattleMaster: Aug 14, 2000

So just screaming into the void here, I've been playing around with lower-level Unix-like userspace networking stuff. Specifically, I've been writing servers that can accept and maintain many TCP/IP streams at a time. In my tests they just echo back what was sent, but in theory it could be expanded to handle HTTP, Telnet, FTP, etc. though those things are obviously well-serviced so aside from a learning exercise it wouldn't be useful to go that far.

In my first naive attempt, my server:

-Forks into several workers
-Each worker creates a host socket, enables address and port reuse on it, binds the same address to it, and listens to it
-Each worker then uses epoll (Linux specific) to wait for the host socket to be readable
-When the host socket is readable, the worker accepts the client socket and adds it to the epoll list
-When the client socket indicates it's readable, it gets read and then the contents gets written back to it.
-When the client socket indicates EOF, the client is considered connected. Also, epoll listens to a timerfd (Linux specific) that disconnects clients after one minute of inactivity.

So in my testing, this actually works great. There's no "thundering herd" problem; when a connection comes in, the kernel selects only one process to hand it over to so only one process wakes up when a connection comes in. (The disadvantages of the address reuse technique is that if you use a port above 1023, a rogue program could also listen to the same port and take a portion of the incoming connections. There might be a countermeasure for this but I didn't go that far.)

Also, it doesn't suffer any of the supposed problems with epoll claimed by people who use it in frankly incredibly boneheaded ways like sharing it across threads or creating it before the fork or not following directions properly (closing FDs without removing them from the epoll list) and stuff like that.

I examined CERN's httpd source from 1996 to see how things were done back then. httpd can compile on a large number of platforms but with an eye for Linux specifically, Linux didn't have epoll or address/port reuse at the time, and didn't even reliably have support for poll (added to unstable kernels in 1996, support aded to glibc in 1997, and reached stable kernels in 1999 or so.) The basic process was:

-Set up the host socket to listen
-Use select to wait for the host socket to be readable
-When it is, accept the client socket and fork, handling the client with the child process and going back to waiting for new connections on the host
-The child process uses blocking I/O to greatly simplify the logic of waiting for stuff without holding any of the other processes up
-The use of select with a timeout in the parent process wasn't for fancy I/O multiplexing but so that the server could go and do other stuff without blocking for very long times waiting for a connection

So a pretty effective way of handling lots of connections. There's the cost of fork for every new client but it makes the parallelization fairly trivial.

So I wanted to see if there was a way to do what I did before (forking then epoll) with only the stuff available in Linux 1.0. I had to use select instead of epoll, which while crusty wasn't nearly as bad as I feared it would be. It takes more bookkeeping and you have to rebuild the set of FDs to listen to every time you call it so there's a loss of efficiency but it isn't super terrible. A single process can fairly efficiently handle many simultaneous connections with just select, but it doesn't scale for the multiple CPU systems I assume you'd probably use for a webserver that accepts lots of connections in 1996.

Also, crucially, address and port reuse didn't exist at the time so I couldn't just do a version of what I did before with epoll downgraded to select.

I found something that worked fine but not ideal:

-Set up the host socket to listen
-Fork into several workers which will all inherit that socket
-Use select to wait for the host socket
-Accept the client connection on the host socket and add the socket to the list of things to be added to the select on the next loop

Because timerfd didn't exist, I use time() to put timestamps on each client at the time of connection or last incoming message, and on each loop while building the FD set for select I check to see if the time() - timestamp => 60 to see if I should boot that connection instead of adding it to the FD set. I use a 61 second timeout time on select which should ensure that the last connection gets booted after only one timeout unless there's insane jitter (PCs are very bad at accurate timing after all.)

So the problem with this setup is that it actually does suffer from the "thundering herd" problem. Not ALL of the workers wake up when a connection comes in, but in my test with 4 workers 2 or 3 of them wake up for each connection. Secondly, I need to set the host socket to non-blocking because while one of those workers successfully accepts the connection, the others will block on the accept waiting for a connection that no longer exists (which would cause them to ignore any of the clients they were supposed to be listening to.) So by setting it to non-blocking I can have the process realize it was late and go back to sleep or handling clients.

So not ideal but may still actually be more efficient for very high amounts of traffic than forking for every connection? I guess I would need to go and write a post-forking version of my echo server to do a proper benchmark.

Anyway thanks for listening, I thought this stuff was pretty interesting and I didn't know where else to share it

# ? Apr 27, 2023 20:55

ExcessBLarg!: Sep 1, 2001

BattleMaster posted:

Because timerfd didn't exist, I use time() to put timestamps on each client at the time of connection or last incoming message, and on each loop while building the FD set for select I check to see if the time() - timestamp => 60 to see if I should boot that connection instead of adding it to the FD set.

Maybe I'm missing your explanation but why not use the SO_RCVTIMEO socket option and close the socket on EAGAIN/EWOULDBLOCK instead of tracking the idle time yourself?

Also if you're not already familiar with them, Poul-Hennings random outbursts is an interesting read. He's the lead on Varnish Cache. He comes from a BSD background so his perspective has that, slant to it, but Varnish is a pretty battle-tested piece of software even if the design might seem a bit dated compared to latest best practices.

# ? Apr 27, 2023 21:22

BattleMaster: Aug 14, 2000

ExcessBLarg! posted:

Maybe I'm missing your explanation but why not use the SO_RCVTIMEO socket option and close the socket on EAGAIN/EWOULDBLOCK instead of tracking the idle time yourself?

First of all... because I didn't know it existed :v:

Second of all, it looks like it doesn't really work with sockets that are polled. It causes a failure if you do a blocking read on a socket that blocks for the timeout you set. Since I'm not doing blocking reads at all I don't think it would help me. (I think it would be useful in the accept/fork model that CERN uses - the worker process uses blocking reads at points where it realizes it needs more from the client. This simplifies logic a lot but you'd probably want it to give up after a time of not receiving anything.)

In order to not having to do blocking reads in my own code, I'm using an event-driven system. I have a bunch of active client socket FDs that I'm waiting to be readable. I am not actually interacting with those sockets directly during the wait - I'm only waiting for select or epoll to report to me that the socket is readable.

If that client never sends me anything but remains connected, the socket never reports it is readable and so will be skipped over by my logic. So I have some timer mechanism in either version of the server (select's timeout or a timerfd hooked into epoll) which lets me keep track of how long it has been since the last successful read and disconnects the client if it takes too long.

If there's a way to have select or epoll report an error on that FD after a certain amount of time with no incoming data without having to directly interact with the socket in the meantime, that would simplify my code a bit.

If there isn't I'm probably stuck with the system I have. At least with the select version, I have to iterate through all the clients to figure out which FDs to add to the watch sets so I can do the timeout logic there with very little cost. It's probably not ideal with the epoll version that I iterate through everyone roughly every second though.

quote:

Also if you're not already familiar with them, Poul-Hennings random outbursts is an interesting read. He's the lead on Varnish Cache. He comes from a BSD background so his perspective has that, slant to it, but Varnish is a pretty battle-tested piece of software even if the design might seem a bit dated compared to latest best practices.

Nice, thanks. Aside from those posts there's probably a lot I can learn from the source code of Varnish. I was actually thinking of trying to do a layer 7 router which is not even that far off from something like that.

# ? Apr 27, 2023 21:53

cruft: Oct 25, 2007

BattleMaster posted:

You are firmly in my wheelhouse here, kiddo, and I strenuously approve of everything you're doing here.

I feel like I should say more, but you're playing around with stuff I consider to be very interesting, and all I have to contribute is "that's it, lookin' good"

# ? Apr 27, 2023 22:20

ExcessBLarg!: Sep 1, 2001

BattleMaster posted:

Second of all, it looks like it doesn't really work with sockets that are polled.

Ah yeah, socket(7) does say it's not useful for select, poll, or epoll_wait. That's a shame.

# ? Apr 27, 2023 22:55

ihafarm: Aug 12, 2004

Mescal posted:

��.

Give up. If all of this is real you couldn�t even be here. Hmmmm, or chatgp.

# ? Apr 28, 2023 00:21

Mescal: Jul 23, 2005

https://ibb.co/gWCCPPB

This is the error message I get when I try to load the EFI boot file in the bios. It just says okay.

I was so close.

# ? Apr 28, 2023 00:52

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

BattleMaster posted:

-Forks into several workers
-Each worker creates a host socket, enables address and port reuse on it, binds the same address to it, and listens to it

I missed something: why doesn�t the initial process create the socket and have it inherited by the workers?

# ? Apr 28, 2023 01:22

cruft: Oct 25, 2007

Subjunctive posted:

I missed something: why doesn’t the initial process create the socket and have it inherited by the workers?

~~Just a design choice for some test code. I would have done it the same way (fork, then socket)~~

Oh, wait, I reread what OP wrote and now I have the same question.

# ? Apr 28, 2023 01:35

Yaoi Gagarin: Feb 20, 2014

Subjunctive posted:

I missed something: why doesn�t the initial process create the socket and have it inherited by the workers?

IIUC: that would hit the thundering herd problem. If you open and then fork, all the processes inherit the same socket. So when a connection comes in, select/poll/epoll wakes up all of them. But if you open a new socket in each worker with SO_REUSEPORT, the kernel will put an incoming connection on just one of the sockets, and then only that process has to wake up.

e: one question, with this setup do you even need to fork at all? Could the workers just be pthreads in the same process?

# ? Apr 28, 2023 01:46

BattleMaster: Aug 14, 2000

cruft posted:

You are firmly in my wheelhouse here, kiddo, and I strenuously approve of everything you're doing here.

I feel like I should say more, but you're playing around with stuff I consider to be very interesting, and all I have to contribute is "that's it, lookin' good"

I set up Debian 3.1 Sarge (2005, Linux 2.4.27, glibc 2.3.2) in a VM and was able to compile and run my fork-select server just fine. Although I noticed something weird with how the processes were waking up - only the lowest PID one was.

Then I noticed that the kernel that came with it wasn't compiled with SMP support so it was running on a single CPU core. I tried recompiling the kernel source but even after loading the same settings as the stock kernel and enabling SMP support I'm having in panic on bootup while trying to mount the filesystem. So I don't know what's going on, but I'm going to assume it works as expected with more cores.

Subjunctive posted:

I missed something: why doesn�t the initial process create the socket and have it inherited by the workers?

By making the sockets independently with the SO_REUSEADDR and SO_REUSEPORT options set and the same port and address you actually avoid a specific problem - the problem that those options were added to fix. And in fact, the problem is illustrated by the second thing I wrote using more primitive methods.

The problem is the "thundering herd" - where you create one socket and have multiple threads or processes listening on it. Assume the processes are all asleep using select (or poll, or epoll, or alternatives that other Unix-likes have.) When a connection comes in on that socket, all or several of the processes will wake up and run to try to accept the connection.

Only one will succeed in accepting it, so the others will go back to sleep with nothing to do. This results in the scheduler having to give CPU time to a bunch of processes that aren't going to actually do any useful work with it.

Here's a text log of my that makes the socket first, then forks:

quote:

S - Spawning workers
S - Spawned worker #0 (PID 149096)
0 - Starting server
S - Spawned worker #1 (PID 149097)
0 - Server started
1 - Starting server
1 - Server started
S - Spawned worker #2 (PID 149098)
2 - Starting server
2 - Server started
S - Spawned worker #3 (PID 149099)
S - All workers spawned
3 - Starting server
3 - Server started
3 - Woke up
1 - Woke up
1 - Thundering
3 - New connection from 192.168.1.3:49154 assigned id 0

When I connected, two of the processes woke up and only one succeeded in accepting in time. Sometimes 3 or even 4 wake up. Also, the distribution of which process accepts first is highly uneven and in tests where I used around 100 clients, a couple of the processes handled 40 and the other two handled 10 or so.

But if I fork and then have each worker make the sockets individually:

quote:

3 - Woke up
3 - New connection from 192.168.1.3:54996 assigned id 0
3 - Woke up
3 - Disconnected client 0: EOF received from client
2 - Woke up
2 - New connection from 192.168.1.3:41108 assigned id 0
2 - Woke up
2 - Disconnected client 0: EOF received from client
0 - Woke up
0 - New connection from 192.168.1.3:41124 assigned id 0
0 - Woke up
0 - Disconnected client 0: EOF received from client
3 - Woke up
3 - New connection from 192.168.1.3:41136 assigned id 0
3 - Woke up
3 - Disconnected client 0: EOF received from client
0 - Woke up
0 - New connection from 192.168.1.3:47994 assigned id 0

The kernel distributes the connections across all processes that are listening for the socket. Not only does only one process wake up at a time, but the distribution ends up far more even over many connections.

I guess nowadays it's debatable how much of a problem the "thundering herd" is with how powerful CPUs are and how many cores systems have, but it was enough of a problem that I was able to find huge amounts of discussion and debate on it and on how to solve it.

From what I can tell older stuff made the socket and forked because you HAD to with older Linux kernels, but now I don't know if there's really a reason to do that. As I mentioned a rogue program could theoretically listen in on the same port and grab a portion of your connections, but that's mitigated somewhat by needing root privileges to use ports under 1024. There may also be a more direct mitigation for that but I haven't done a lot of research into it.

VostokProgram posted:

IIUC: that would hit the thundering herd problem. If you open and then fork, all the processes inherit the same socket. So when a connection comes in, select/poll/epoll wakes up all of them. But if you open a new socket in each worker with SO_REUSEPORT, the kernel will put an incoming connection on just one of the sockets, and then only that process has to wake up.

e: one question, with this setup do you even need to fork at all? Could the workers just be pthreads in the same process?

Probably yes but I'm not sure how much utility there would be for an application where the processes don't need to talk to each other. Also I know how to fork but not how to thread so I'm sticking to what I know.

Doing so probably looks very similar though, you probably don't want to share epoll FDs across threads (the EPOLL_EXCLUSIVE flag alleviates the issues but like why) and you may even just want to make individual sockets too because I heard there are problems with that in threads.

edit: and also in my "historical" version that's theoretically compatible with very early Linux versions I'm not actualyl certain if threading was an option

BattleMaster fucked around with this message at 02:18 on Apr 28, 2023

# ? Apr 28, 2023 02:13

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Duh, of course. I need to think more before posting, thank you.

# ? Apr 28, 2023 02:14

Adbot: ADBOT LOVES YOU

# ? May 31, 2024 22:40

ExcessBLarg!: Sep 1, 2001

BattleMaster posted:

The problem is the "thundering herd" - where you create one socket and have multiple threads or processes listening on it. Assume the processes are all asleep using select (or poll, or epoll, or alternatives that other Unix-likes have.) When a connection comes in on that socket, all or several of the processes will wake up and run to try to accept the connection.

While this might be suboptimal behavior it also seems to me like it shouldn't matter much. If your request rate is low enough that you're having extra wakeups when there isn't immediately another connection sitting in the accept queue, then your external load is low enough that overall performance shouldn't be impacted much.

# ? Apr 28, 2023 02:54

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

«‹›1037 »