Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
hbag
Feb 13, 2021

i use btrfs on my laptop and i remember explicitly wanting that filesystem but i cant quite remember why. its a perfectly fine filesystem but i cant remember what it has over any others

Adbot
ADBOT LOVES YOU

fresh_cheese
Jul 2, 2014

MY KPI IS HOW MANY VP NUTS I SUCK IN A FISCAL YEAR AND MY LAST THREE OFFICE CHAIRS COMMITTED SUICIDE
it is faster than ext4 in all ways including losing your data

have backups. test your backups.

BlankSystemDaemon
Mar 13, 2009



fresh_cheese posted:

have backups. test your backups.

hbag
Feb 13, 2021

didn't they fix the losing your data part forever ago

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

ok don’t have backups we don’t care

mystes
May 31, 2006

Captain Foo posted:

ok don’t have backups we don’t care
I don't think that's what they meant, hopefully

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:
Snapshots, atomicity guarantees, and transparent compression are good.

Eating your data, less so.

Cybernetic Vermin
Apr 18, 2005

fresh_cheese posted:

it is faster than ext4 in all ways including losing your data

unless something has very suddenly changed btrfs is absolutely not faster than ext4 outside of some microbenchmarks. which makes sense in that ext4 is really simple and has been tuned for decades, baseline expectation is that it'll saturate most any io device you're likely to have. btrfs has to do a lot more work for its features (you can get into a whole thing optimizing specific tasks by disabling cow but tbh unless you work for a storage vendor or some such i don't see that being a sane use of time).

Nitrousoxide
May 30, 2011

do not buy a oneplus phone



As far as I know, BTRFS only had an issue with eating your data in specific raid setups. If you're using it in a one-drive situation none of those would have applied and it should be as safe as ext4, plus have the added benefit of quick snapshotting for easy rollbacks if you setup your subvolumes right, or your distro uses sane ones for /home.

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:
And yet it corrupted a boring single-disk file system and a scrub would freeze all i/o. That was last year.

Cybernetic Vermin
Apr 18, 2005

it is kind of amazing how hard it apparently is to get this poo poo right.

not that i think it is obviously simple, but it seems notorious that filesystems have a huge complexity day 1, the entire area apparently resistant to doing things as simply as possible in a first rev and then iterating from there. freezing disk formats part of it i guess.

Nitrousoxide
May 30, 2011

do not buy a oneplus phone



Antigravitas posted:

And yet it corrupted a boring single-disk file system and a scrub would freeze all i/o. That was last year.

Are you sure that wasn't just a hardware failure?

The_Franz
Aug 8, 2003

eschaton posted:

incorrect use of Windows I/O, usually by pretending its equivalent to UNIX I/O

because used correctly the VMS (and Windows NT, and Lisa, and classic Mac, and HP3000, and…) asynchronous-callback I/O model provides much better overall throughput than the UNIX “just block and let something else run hope you weren’t interactive LOL” I/O model

never thought i would see an earnest "you're holding it wrong" post, but in reference to a filesystem

and no, windows i/o really is just incredibly slow, and it's unfixable, as much of the performance cost comes from the internal layers, the removal of which would basically put the preinstalled security theater crapware industry out of business overnight, so they won't do it. you might be able to mask some of the slowness by queuing up async operations, but that only goes so far

Cybernetic Vermin
Apr 18, 2005

that's a rather different point. the way you interact with io on nt was probably always just better, and in present day so much stuff had converged (including increasingly linux) in the async direction that it is surely the baseline.

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp

eschaton posted:

(remember on UNIX to always check whether every syscall encountered EINTR or EAGAIN and retry!)
i don't have time for this sorry

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:

Nitrousoxide posted:

Are you sure that wasn't just a hardware failure?

Yes, absolutely. Hardware is still working fine.

Tankakern
Jul 25, 2007

tested out poetterings pet project https://github.com/poettering/diskomator

pretty nifty, currently building a system on my htpc on another computer with the disk on the htpc exported through nvme-tcp

fresh_cheese
Jul 2, 2014

MY KPI IS HOW MANY VP NUTS I SUCK IN A FISCAL YEAR AND MY LAST THREE OFFICE CHAIRS COMMITTED SUICIDE
i would also like to point out that the internal snapshots btrfs does are absolutely not backups

a backup is an offline copy of data stored on entirely different devices as dark away as is practical to keep them

it also only becomes a backup, as opposed to a copy, once you have verified that you are able to restore your data from it and you then place it somewhere safe



list your favorite off site backup locations:

* shoe box with the lid taped shut at moms house
* US-West S3 region
* yosmas secret santa distributed storage

ryanrs
Jul 12, 2011

eschaton posted:

(remember on UNIX to always check whether every syscall encountered EINTR or EAGAIN and retry!)

code:
501 ~$ cat
^Z
[1]+  Stopped                 cat
502 ~$ fg
cat
cat: stdin: Interrupted system call
503 ~$ 
:hmmno:

sb hermit
Dec 13, 2016





fresh_cheese posted:

i would also like to point out that the internal snapshots btrfs does are absolutely not backups

a backup is an offline copy of data stored on entirely different devices as dark away as is practical to keep them

it also only becomes a backup, as opposed to a copy, once you have verified that you are able to restore your data from it and you then place it somewhere safe



list your favorite off site backup locations:

* shoe box with the lid taped shut at moms house
* US-West S3 region
* yosmas secret santa distributed storage

that's an offline backup, and then an off-site backup

you can still have online backups that take incremental COW snapshots which are fine for hourly or daily activity but I agree that nothing beats an off site backup for making sure that records are sufficiently protected

sb hermit
Dec 13, 2016





in that respect, the difference between a copy and a backup is only in the eye of the system administrator

you can make a copy of a set of photos so that a client can give them to her granddaughter

you can also make a copy of a set of photos to set aside for the client's backup just in case

sb hermit
Dec 13, 2016





ryanrs posted:

Help! USB isn't working on my router.

It's an Aruba AP-303H running OpenWrt. It's based on a Qualcomm IPQ4019 SoC.



The SoC two USB controllers for USB 2.0, and USB 3.0. The external USB port is on the USB 2.0 controller, I'm pretty sure. Lsusb can see the two hubs (part of the USB controllers inside the SoC).

Here's a bootlog. A keyboard is plugged into the USB port. But I think it fails when trying to talk to the USB 2.0 hub inside the SoC.

Pastebin: serial console full boot log


Very early in the boot, the kernel flips on a GPIO to power up the USB port. A USB power meter plugged into the port will turn on at exactly this point.
code:
[    0.029331] gpio-435 (USB-power): hogged as output/high
Later in the boot there are USB errors:
code:
[   16.601276] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[   16.683864] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   16.691163] SCSI subsystem initialized
[   16.757409] fsl-ehci: Freescale EHCI Host controller driver
[   16.794479] ehci-platform: EHCI generic platform driver
[   16.861153] ehci-pci: EHCI PCI platform driver
[   16.923372] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   16.976636] ohci-platform: OHCI generic platform driver
[   17.051339] uhci_hcd: USB Universal Host Controller Interface driver
[   17.216192] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[   17.216278] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[   17.268632] xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f665 hci version 0x100 quirks 0x0000008002010010
[   17.359082] xhci-hcd xhci-hcd.0.auto: irq 104, io mem 0x06000000
[   17.471671] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[   17.546543] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[   17.610039] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
[   17.701491] hub 1-0:1.0: USB hub found
[   17.779884] hub 1-0:1.0: 1 port detected
[   17.825132] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[   17.874218] hub 2-0:1.0: USB hub found
[   17.976578] hub 2-0:1.0: config failed, hub doesn't have any ports! (err -19)
[   18.019194] usbcore: registered new interface driver usb-storage
[   18.101518] usbcore: registered new interface driver uas
[   18.172639] usb 1-1: new full-speed USB device number 2 using xhci-hcd
[   18.174335] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[   18.317168] init: - preinit -
[   18.398594] usb 1-1: device descriptor read/64, error -71
[   18.697219] usb 1-1: device descriptor read/64, error -71
[   18.967150] usb 1-1: new full-speed USB device number 3 using xhci-hcd
[   19.117157] usb 1-1: device descriptor read/64, error -71
[   19.387209] usb 1-1: device descriptor read/64, error -71
[   19.507528] usb usb1-port1: attempt power cycle
[   19.977180] usb 1-1: new full-speed USB device number 4 using xhci-hcd
[   19.977365] usb 1-1: Device not responding to setup address.
[   20.151882] ipqess-edma c080000.ethernet eth0: configuring for fixed/internal link mode
[   20.152645] qca8k-ipq4019 c000000.switch lan1: configuring for phy/psgmii link mode
[   20.233400] ipqess-edma c080000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   20.324919] usb 1-1: Device not responding to setup address.
[   20.637094] usb 1-1: device not accepting address 4, error -71
[   20.787095] usb 1-1: new full-speed USB device number 5 using xhci-hcd
[   20.787228] usb 1-1: Device not responding to setup address.
[   21.067227] usb 1-1: Device not responding to setup address.
[   21.174953] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Searching the internet for error -71 showed people talking about usbcore.use_both_schemes=y and usbcore.old_scheme_first=y. I tried the 1st, then both, passed on the kernel command line (and verified as taking effect in /sys/module/usbcore/parameters/). Neither made any difference in the boot log.

Here's my package list. I just shoveled in anything USB-ish, to see if it was just a missing driver. Didn't help.
code:
ath10k-board-qca4019 ath10k-firmware-qca4019-ct base-files busybox ca-bundle dnsmasq dropbear firewall4 fstools kmod-ath10k-ct
kmod-gpio-button-hotplug kmod-leds-gpio kmod-nft-offload kmod-usb-dwc3 kmod-usb-dwc3-qcom kmod-usb3 libc libgcc libustream-mbedtls
logd luci mtd netifd nftables odhcp6c odhcpd-ipv6only opkg ppp ppp-mod-pppoe procd procd-seccomp procd-ujail uboot-envtools uci
uclient-fetch urandom-seed urngd wpad-basic-mbedtls

i2c-tools libi2c kmod-i2c-core kmod-i2c-gpio kmod-i2c-algo-bit kmod-i2c-smbus kmod-regmap-i2c kmod-hwmon-core kmod-tpm-i2c-atmel
kmod-eeprom-at24

block-mount e2fsprogs kmod-usb-storage-uas kmod-usb2 kmod-usb3 kmod-usb-audio usbutils luci-app-hd-idle kmod-fs-ext4 kmod-fs-cifs
kmod-fs-exfat kmod-fs-f2fs kmod-fs-ksmbd parted lsblk kmod-usb-core kmod-usb-hid kmod-usb-ledtrig-usbport kmod-usb-net
kmod-usb-net-cdc-ether kmod-usb-net-lan78xx kmod-usb-ohci kmod-usb-printer kmod-usb-serial kmod-usb-serial-ark3116 kmod-usb-serial-belkin
kmod-usb-serial-ch341 kmod-usb-serial-cp210x kmod-usb-serial-cypress-m8 kmod-usb-serial-ftdi kmod-usb-serial-pl2303 kmod-usb-serial-qualcomm
kmod-usb-serial-simple kmod-usb-serial-ti-usb kmod-usb-storage kmod-usb-uhci kmod-usb-xhci-hcd kmod-usb2-pci kmod-usbmon uhubctl
usbutils
If no fix is obvious, I'd like to at least better understand what is failing. I'm trying to get a USB flash drive working.

One thing you can do is blacklist xhci (and maybe even ehci) and see if it's the activation of the drivers that are causing issues and perhaps find a workaround in that respect.

It could be a driver incompatibility or, worse, an issue with your hardware.

FlapYoJacks
Feb 12, 2009
Yep. It's only a backup if there isn't a single point of failure with the original data.
A "backup" on an external drive in the same house as the original data isn't a backup because the house burning down would cause total data loss.

ryanrs
Jul 12, 2011

sb hermit posted:

One thing you can do is blacklist xhci (and maybe even ehci) and see if it's the activation of the drivers that are causing issues and perhaps find a workaround in that respect.

It could be a driver incompatibility or, worse, an issue with your hardware.

Blacklisting is a good tip.

I have 2 of these Aruba APs and both have this USB problem, so it is not something like a damaged USB port.

sb hermit
Dec 13, 2016





fresh_cheese posted:

list your favorite off site backup locations:

* shoe box with the lid taped shut at moms house
* US-West S3 region
* yosmas secret santa distributed storage

binary blobs scattered throughout the internet, shared with bittorrent or usenet, encrypted with a 256-bit AES key

itself encrypted by a lattice-based asymmetric key, and hidden within various pieces of ai-generated art with steganography including an elaborate goatse mural painted on the inside of a walk-in closet of a waffle house in alabama

sb hermit
Dec 13, 2016





ryanrs posted:

Blacklisting is a good tip.

I have 2 of these Aruba APs and both have this USB problem, so it is not something like a damaged USB port.

good luck!

Woolie Wool
Jun 2, 2006


Sapozhnik posted:

i think about my filesystem a lot when i find myself using windows

because that filesystem is slow as poo poo

though technically that's because of windows io rather than any particular filesystem running on it

NTFS is unbearably slow for games on Linux so it's slow as poo poo on any os

mycophobia
May 7, 2008

FlapYoJacks posted:

Yep. It's only a backup if there isn't a single point of failure with the original data.
A "backup" on an external drive in the same house as the original data isn't a backup because the house burning down would cause total data loss.

i feel like i would have a lot more to worry about than data loss at that point

Nitrousoxide
May 30, 2011

do not buy a oneplus phone



It's pretty trivial to setup an rsync to a backblaze bucket if you want an off-site backup. I do that for my self-hosted configs and backups.

BlankSystemDaemon
Mar 13, 2009



hbag posted:

didn't they fix the losing your data part forever ago
once a filesystem has munched someones data, it's not likely they're going to keep using it
the only real way to mitigate that is if the filesystem also makes it easy for them to make, test, and restore from backups - and it has to be so easy that the user will do it

Antigravitas posted:

And yet it corrupted a boring single-disk file system and a scrub would freeze all i/o. That was last year.
corrupted what, and how?

at least with zfs, you just get a corrupted record, and provided it's not a system-critical file, the call will just return EIO

if you have a system that physically can't fit two disks, it's not a bad idea to enable dittoblocks by setting copies=2 at pool creation
although it does mean that each write takes double the amount of time, for obvious reasons

Cybernetic Vermin posted:

it is kind of amazing how hard it apparently is to get this poo poo right.

not that i think it is obviously simple, but it seems notorious that filesystems have a huge complexity day 1, the entire area apparently resistant to doing things as simply as possible in a first rev and then iterating from there. freezing disk formats part of it i guess.
we've had simple filesystems since ffs, and it works well with geom, even to the point of doing raid
the issue is that there's a whole host of things that simple filesystems can't* deal with like phantom writes, misdirected I/O, dma parity errors, and driver bugs

*: it might not be impossible to do checksum-in-blockpointer Merkel-trees without the design zfs has, but nobody has made it yet

BlankSystemDaemon fucked around with this message at 18:12 on Apr 22, 2024

Sapozhnik
Jan 2, 2005

Nap Ghost
I use gnome backup and it writes encrypted incremental backups to google drive

it would be nice if they supported backblaze as well but their explicit goal is to only support "consumer" cloud storage and well yeah okay fair

FlapYoJacks
Feb 12, 2009

mycophobia posted:

i feel like i would have a lot more to worry about than data loss at that point

Depends on what was on your PC.

Cybernetic Vermin
Apr 18, 2005

just put your data on the yostop, ill tape it when it passes by np

FlapYoJacks
Feb 12, 2009
Send me your tax return backups and I will keep them safe for you.

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:

BlankSystemDaemon posted:

corrupted what, and how?


Accessing certain files or scrubbing the file system would print a fairly meaningless stacktrace in dmesg, and btrfs would swallow all i/o forever. Any process trying to do any i/o to the fs would get stuck in kernel land.

I blkdiscard-ed the entire drive and restored from backup (saved on a mirror zpool on a small server sitting in a basement 20km from me).

btrfs is fine if you treat it as an utterly disposable thing.

FlapYoJacks
Feb 12, 2009

ryanrs posted:

Help! USB isn't working on my router.

It's an Aruba AP-303H running OpenWrt. It's based on a Qualcomm IPQ4019 SoC.



The SoC two USB controllers for USB 2.0, and USB 3.0. The external USB port is on the USB 2.0 controller, I'm pretty sure. Lsusb can see the two hubs (part of the USB controllers inside the SoC).

Here's a bootlog. A keyboard is plugged into the USB port. But I think it fails when trying to talk to the USB 2.0 hub inside the SoC.

Pastebin: serial console full boot log


Very early in the boot, the kernel flips on a GPIO to power up the USB port. A USB power meter plugged into the port will turn on at exactly this point.
code:
[    0.029331] gpio-435 (USB-power): hogged as output/high
Later in the boot there are USB errors:
code:
[   16.601276] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[   16.683864] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   16.691163] SCSI subsystem initialized
[   16.757409] fsl-ehci: Freescale EHCI Host controller driver
[   16.794479] ehci-platform: EHCI generic platform driver
[   16.861153] ehci-pci: EHCI PCI platform driver
[   16.923372] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   16.976636] ohci-platform: OHCI generic platform driver
[   17.051339] uhci_hcd: USB Universal Host Controller Interface driver
[   17.216192] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[   17.216278] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[   17.268632] xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f665 hci version 0x100 quirks 0x0000008002010010
[   17.359082] xhci-hcd xhci-hcd.0.auto: irq 104, io mem 0x06000000
[   17.471671] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[   17.546543] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[   17.610039] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
[   17.701491] hub 1-0:1.0: USB hub found
[   17.779884] hub 1-0:1.0: 1 port detected
[   17.825132] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[   17.874218] hub 2-0:1.0: USB hub found
[   17.976578] hub 2-0:1.0: config failed, hub doesn't have any ports! (err -19)
[   18.019194] usbcore: registered new interface driver usb-storage
[   18.101518] usbcore: registered new interface driver uas
[   18.172639] usb 1-1: new full-speed USB device number 2 using xhci-hcd
[   18.174335] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[   18.317168] init: - preinit -
[   18.398594] usb 1-1: device descriptor read/64, error -71
[   18.697219] usb 1-1: device descriptor read/64, error -71
[   18.967150] usb 1-1: new full-speed USB device number 3 using xhci-hcd
[   19.117157] usb 1-1: device descriptor read/64, error -71
[   19.387209] usb 1-1: device descriptor read/64, error -71
[   19.507528] usb usb1-port1: attempt power cycle
[   19.977180] usb 1-1: new full-speed USB device number 4 using xhci-hcd
[   19.977365] usb 1-1: Device not responding to setup address.
[   20.151882] ipqess-edma c080000.ethernet eth0: configuring for fixed/internal link mode
[   20.152645] qca8k-ipq4019 c000000.switch lan1: configuring for phy/psgmii link mode
[   20.233400] ipqess-edma c080000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   20.324919] usb 1-1: Device not responding to setup address.
[   20.637094] usb 1-1: device not accepting address 4, error -71
[   20.787095] usb 1-1: new full-speed USB device number 5 using xhci-hcd
[   20.787228] usb 1-1: Device not responding to setup address.
[   21.067227] usb 1-1: Device not responding to setup address.
[   21.174953] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Searching the internet for error -71 showed people talking about usbcore.use_both_schemes=y and usbcore.old_scheme_first=y. I tried the 1st, then both, passed on the kernel command line (and verified as taking effect in /sys/module/usbcore/parameters/). Neither made any difference in the boot log.

Here's my package list. I just shoveled in anything USB-ish, to see if it was just a missing driver. Didn't help.
code:
ath10k-board-qca4019 ath10k-firmware-qca4019-ct base-files busybox ca-bundle dnsmasq dropbear firewall4 fstools kmod-ath10k-ct
kmod-gpio-button-hotplug kmod-leds-gpio kmod-nft-offload kmod-usb-dwc3 kmod-usb-dwc3-qcom kmod-usb3 libc libgcc libustream-mbedtls
logd luci mtd netifd nftables odhcp6c odhcpd-ipv6only opkg ppp ppp-mod-pppoe procd procd-seccomp procd-ujail uboot-envtools uci
uclient-fetch urandom-seed urngd wpad-basic-mbedtls

i2c-tools libi2c kmod-i2c-core kmod-i2c-gpio kmod-i2c-algo-bit kmod-i2c-smbus kmod-regmap-i2c kmod-hwmon-core kmod-tpm-i2c-atmel
kmod-eeprom-at24

block-mount e2fsprogs kmod-usb-storage-uas kmod-usb2 kmod-usb3 kmod-usb-audio usbutils luci-app-hd-idle kmod-fs-ext4 kmod-fs-cifs
kmod-fs-exfat kmod-fs-f2fs kmod-fs-ksmbd parted lsblk kmod-usb-core kmod-usb-hid kmod-usb-ledtrig-usbport kmod-usb-net
kmod-usb-net-cdc-ether kmod-usb-net-lan78xx kmod-usb-ohci kmod-usb-printer kmod-usb-serial kmod-usb-serial-ark3116 kmod-usb-serial-belkin
kmod-usb-serial-ch341 kmod-usb-serial-cp210x kmod-usb-serial-cypress-m8 kmod-usb-serial-ftdi kmod-usb-serial-pl2303 kmod-usb-serial-qualcomm
kmod-usb-serial-simple kmod-usb-serial-ti-usb kmod-usb-storage kmod-usb-uhci kmod-usb-xhci-hcd kmod-usb2-pci kmod-usbmon uhubctl
usbutils
If no fix is obvious, I'd like to at least better understand what is failing. I'm trying to get a USB flash drive working.

Asking in irc.oftc.net #openwrt may get you a much better answer.

shackleford
Sep 4, 2006

Cybernetic Vermin posted:

that was always a bit of a thing, but spending some unreasonable effort on getting something working is then quite disliked, as it can be construed as a claim that it didn't previously work.

code:
In article <54990q$n5e@caip.rutgers.edu>,
David S. Miller  wrote:
>bacon@mtu.edu (Jeff Bacon) writes:
>
>Since I have been able to find an intelligent posting in this thread,
>I will respond to it and explain what I can as chief architect of the
>SparcLinux port.
>
>> of course, then, the obvious question that comes up is WHY is it that
>> solaris has such higher overhead costs in doing things? 
>> 
>> obviously there's more code to work through to do any given thing. 
>> someone must have thought it necessary. but...why? 
>> 
>> obviously it's got lots of extra crud from SVR4. why not pitch it? 
>> 
>
>The answer to this is pretty straight forward actually.  The main
>points of interest are:
>
>1) Solaris's networking stack, in all of it's incantations (one breed
>   of it was the Lochman code in 2.0, 2.1 and early 2.2 releases, then
>   it was rewritten by another company for 2.3 onward) is SVR4 streams
>   based.  The performance penalty, even with lots of tricks, for
>   using a SVR4 streams networking architecture is well known.
>   Someone who happens to have a 2.2 Solaris CD around, or even a 2.3
>   Solaris CD, should install that thing and run lmbench on it to see
>   what "pure Streams based networking" without the tricks can really
>   do.
>
>   Linux on the other hand has a "no bullshit" networking architecture
>   that is not streams based.  Yet we also take advantage of the many
>   known networking performance enhancements that exist in the
>   research realm (ie. copy/checksum, the van jacobson hacks, etc.)
>
>2) Linux is light weight, Solaris is a pig.
>
>   One of the most critical things that contributes to performance
>   is cache/tlb footprint of the operating system.  Linux being small
>   (yet still provide a full POSIX unix environment!) solves the cache
>   footprint problem in a big way.  I've solved the TLB footprint
>   using Linux's small size and a Sparc specific trick.
>
>   The MMU's present on the sun4m/sun4d line of Sun machines possess a
>   three level page table scheme.  Using this, one has the capability
>   to use the normal 4k sizes pages, and also larger 256k and 16MB
>   sized pages.  The average TLB on these machines has 32 or 64
>   entries to cache these pte's, if the entry is not in the TLB
>   hardware has to go out to the memory bus and walk the software page
>   tables to "reload" the TLB so that the translation can be
>   satisfied.
>
>   This "miss processing" is very expensive.  Under SunOS and Solaris,
>   they do not take advantage of the 16MB and 256k sized pages to map
>   the operating system.  Therefore those two systems take many misses
>   in the TLB during even the most rudimentary trap into the kernel.   
>   However under Linux the TLB misses for the OS are quite minimal.
>   In fact I will gave an example:
>
>      On your average SparcClassic with a 32 entry TLB, consider such
>      a machine with 24MB of memory installed.  Under Linux I can map
>      the entire operating system (sans IO device register mappings
>      and Lance Ethernet DMA) in 3 (count 'em, 3!) TLB entries.  These
>      3 entries are enough to allow the kernel to access an arbitrary
>      physical page from kernel space.
>
>      Under Solaris, the OS would need 3 + (24MB / 256K) + (24MB / 4K)
>      TLB entries to map this same amount of space.  For a great many
>      number of operations, it is quite easy for an OS with this page
>      table strategy to blow the entire user context out of the
>      hardware TLB.  Which in turn means more processor stalls (in
>      fact many) for both the user level processes and the operating
>      system.
>
>      Result?  Severe degradation in performance for the latter
>      scheme.
>
>3) Every BSD and SVR4 based system today, except for Linux, has a very
>   broken System call mechanism.
>
>   You'd think that when people put together function call conventions
>   for a particular processor, the OS people would take a look at this
>   and find a way to take advantage of this.  In fact, believe it or
>   not, they have not to this very day.
>
>   Linux from day one, takes advantage of the procedure call
>   conventions on a particular architecture so that it can process
>   system calls in the most expediant way possible.  I will give
>   an example on the Sparc to prove this:
>
>    Consider your average 3 argument system call.  The user level
>    code does something like this:
>
>	mov	%arg0, %o0
>	mov	%arg1, %o1
>	mov	%arg2, %o3
>	mov	SYSTEM_CALL_NUMBER, %g1
>	t	SYSCALL_TRAP
>
>    At this point control reaches the operating system, it must
>    prepare to handle this request from the user.  On the Sparc, this
>    is either a two step or a three step process depending upon
>    whether you are doing it in the traditional broken UNIX way or the
>    clean, fast, and superior Linux way.  First I will show the Linux
>    method for doing this:
>
>	1) Step one, jump onto the kernel stack for this task
>	   and make sure the kernel has a register window to
>	   operate in safely.
>
>	   For Linux the code path for this runs at ~18 instructions
>	   for the common case (the kernel already has a valid
>	   register to use so now saving needs to be done).  It runs
>	   at ~42 instructions for the second most common case (the
>	   kernels needs to allocate a new register window and the
>	   user has a valid stack pointer) and ~82 instructions for
>	   the least common case (kernel needs a window, user has
>	   an invalid stack pointer, thus the kernel needs to save
>	   the user's window into a special per-task save area).
>
>	2) Take the system call number, check for valid value, use
>	   this to offset into a table of system call function ptrs.
>	   Move arguments into place and perform the syscall.
>
>	   Basically this is a simple operations a looks something
>	   like:
>
>	sll	%g1, 2, %l4		! produce offset
>	ld	[%l7 + %g1], %l7	! syscall ptr base was in %l7
>	SAVE_ALL			! perform step #1 above
>	mov	%i0, %o0
>	mov	%i1, %o1
>	mov	%i2, %o2
>	mov	%i3, %o3
>	mov	%i4, %o4
>	jmp	%l7, %o7
>	 mov	%i5, %o5
>
>	   That is it, that is the entire system call under Linux.
>
>    Under Solaris/SunOS things are wildly different.  Step one is
>    basically the same, but step 2 is disgustingly inefficient for
>    those systems.  Basically they do:
>
>	2) Call common system_call() C function.
>
>	3) This routine allocates a "system call argument package"
>	   structure on the kernel stack.  This is wasteful because
>	   we already have all of this information in registers or
>	   in guarenteed save areas.
>
>	4) Then this routine determines the function to call, and
>	   passed this "package" of arguments to the routine.
>
>	5) Every system call which expects arguments then must
>	   "unpack" this structure to get at the copy of the arguments
>	   again highly inefficient.
>
>   For every system call the system performs, you eat this unnecessary
>   overhead under SunOS/Solaris, under Linux only the bare minimum is
>   performed to do the system call successfully.
>
>4) Solaris cannot even do it's own optimizations correctly because
>   SunPRO is a broken compiler.
>
>   I won't make such a statement without explaining this with real
>   facts, here goes.
>
>   A neat part of the Sparc ABI is that it leaves you with a few
>   processor registers that the C compiler is not allowed to use
>   in the code it produces.  Two of which are "%g6" and "%g7".
>   A problem in unix kernels is that you are constantly accessing
>   the current tasks control structure ('proc' and 'uarea' on
>   traditional UNIX's, the 'task_struct" under Linux).  Hey, why
>   not put these pieces of information in those "extra" registers
>   and avoid the address computation all the time?  Yes, very
>   brilliant idea.
>
>   Under Solaris the trap entry code places the uarea and proc ptr
>   in %g6 and %g7.  Under Linux the trap entry code places the
>   current processes task_struct in %g6.  Now here comes where the
>   implementations differ.
>
>   Under Solaris all of the so called "locore" (basically all the
>   gook which has to be written in raw assembly) code can directly
>   take advantage of this.  However, the C code cannot do this
>   because SunPRO lacks a way for you to tell the compiler that
>   "hey you don't need to load things, it's already in these
>    hard coded registers"  So they have the C code call these little
>   assembly stubs to get the values:
>
>	get_uarea:
>		retl
>		 mov	%g6, %o0
>
>	get_proc:
>		retl
>		 mov	%g7, %o0
>
>   That is gross, why even do the optimization in the first place?
>
>   Now GCC has a way to fully take advantage of such an optimization,
>   basically all I have to do is put the following in a header file.
>
>	register struct task_struct *current asm("g6");
>
>   Tada, now GCC will fully understand what I have done for it.
>   Under SparcLinux this optimization alone took away 115 instructions
>   in the scheduler sources, and it took ~50 instructions out of the
>   exit() handling, and it took ~65 instructions out of the fork()
>   handling.
>
>So my question always is, in matters such as these.  Who are these
>processor cycles for anyways, the kernel or the user?  Think about
>this when you consider how much overhead is being saved from one OS to
>another, and to what scale this is occurring.
>
>I hope that explains some of it, and gives people at least some sort
>of idea of the kinds of things that makes Linux scream on just about
>any hardware.  If people would like more explainations like the above,
>I'd be more than happy to chat with you via email about it or
>similar.  I love talking about performance issues on various
>processors and systems.
>
>Oh, and one thing that has not been mentioned yet in this thread (and
>yes NetBSD/OpenBSD both have this as well, good work guys).  That
>SparcLinux kernel that gets all of this incredible performance runs on
>both sun4c and sun4m machines.  Sun Engineers way back when scratched
>their heads for months and couldn't figure out a way to pull it off
>(you need a seperate kernel image depending upon whether you are
>running on a sun4m or a sun4c, for SunOS/Solaris).  And on top of that
>Linux obviously pulls it off efficiently.
>
>One final note.  When you have to deal with SunSOFT to report a bug,
>how "important" do you have (ie. Fortune 500?) to be and how big of a
>customer do you have to be (multi million dollar purchases?) to get
>direct access to Sun's Engineers at Sun Quentin?  With Linux, all you
>have to do is send me or one of the other SparcLinux hackers an email
>and we will attend to your bug in due time.  We have too much pride in
>our system to ignore you and not fix the bug.
>
>David S. Miller
>davem@caip.rutgers.edu
>

Have you ever kissed a girl?

	- Bryan

----------------------------------------------------------------------
Bryan Cantrill, Solaris Performance.   bmc@eng.sun.com  (415) 786-3652

FlapYoJacks
Feb 12, 2009
YOSPOS -> Linux: Have you ever kissed a girl?

Share Bear
Apr 27, 2004

call up brian from a payphone and say “yeah, your mother”

Adbot
ADBOT LOVES YOU

shackleford
Sep 4, 2006

lmao looks like linux kernel developers with long memories and/or wikimedia copyright license compliance nerds have selected the only photo that appears on bryan cantrill's wikipedia page

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply