Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Nomnom Cookie
Aug 30, 2009



my homie dhall posted:

I have no idea how they have convinced people that their product, which totally dumps decades of investment and research in memory management, is ready for prime time

and I say this as someone who runs k8s in production

it’s because linux is poo poo at managing swap so you are better off not having swap, eating the OOMs, and moving pods off the overloaded node to another one. distributed systems are designed around fail-stop and swap turns fail-stop into fail-slow. it doesn’t help for the scenarios k8s is deployed into. that’s why k8s didn’t support swap originally. idk why it’s starting to—demand from people like you maybe? I won’t be enabling it

Adbot
ADBOT LOVES YOU

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

Nomnom Cookie posted:

it’s because linux is poo poo at managing swap so you are better off not having swap, eating the OOMs, and moving pods off the overloaded node to another one. distributed systems are designed around fail-stop and swap turns fail-stop into fail-slow. it doesn’t help for the scenarios k8s is deployed into. that’s why k8s didn’t support swap originally. idk why it’s starting to—demand from people like you maybe? I won’t be enabling it

I don't think it's a matter of fail-stop vs fail-slow, not having swap is strictly worse in low-memory conditions. you're going to be paging to disk no matter what, you probably want it to be the memory a bunch of programs allocated at startup and never used again rather than hot items in the page cache or the executable sections of your programs

if you don't want your programs to be able to overcommit memory, that's your choice. but my understanding is that in v2 cgroups the accounting is going to (correctly) include things outside of anonymous memory like pages and kernel allocations, so your users might try to allocate less than whatever their limit is and things will either be killed or "slow down" because you've reduced the number of pages they're allowed to have in the cache

Nomnom Cookie
Aug 30, 2009



my homie dhall posted:

I don't think it's a matter of fail-stop vs fail-slow, not having swap is strictly worse in low-memory conditions. you're going to be paging to disk no matter what, you probably want it to be the memory a bunch of programs allocated at startup and never used again rather than hot items in the page cache or the executable sections of your programs

if you don't want your programs to be able to overcommit memory, that's your choice. but my understanding is that in v2 cgroups the accounting is going to (correctly) include things outside of anonymous memory like pages and kernel allocations, so your users might try to allocate less than whatever their limit is and things will either be killed or "slow down" because you've reduced the number of pages they're allowed to have in the cache

in practice not having swap means the OOM killer comes out to play quite quickly. kubelet will also start evicting pods once free memory gets low enough. yeah the kernel may swap out some text pages but it can’t reclaim enough memory that way to bring the system to a halt. it will just give up and start axe murdering processes. which is what you want

especially with typical cloud deployments where all your storage is some kind of cloud disk throttled to like 1k iops, swapping is definitely not something you ever want to happen.

SYSV Fanfic
Sep 9, 2003

by Pragmatica

Nomnom Cookie posted:

it’s because linux is poo poo at managing swap

Swapping is probably the least important thing the page fault handler does. Not worth the performance hit to try to improve it.

BlankSystemDaemon
Mar 13, 2009



how about, i dunno, fixing the OOM killer? oh wait, people have been trying for over a decade

what it invariably comes down to is either people talking about performance as if that's the most important thing over system stability when the system is in production use, or it's some vague "you don't need that" post, as in the first response to the lwn.net article above

BlankSystemDaemon fucked around with this message at 06:30 on Dec 11, 2021

pseudorandom name
May 6, 2007

the OOM killer exists and is terrible because overcommit is a necessarily idiotic implementation detail of the fundamental Unix design flaw called fork(2)

SYSV Fanfic
Sep 9, 2003

by Pragmatica

There's nothing to fix. Exhausting all available physical and virtual memory is an error condition caused by bugs or bad design on the part of the user. "System stability" is kind of a moot point when the other option is to automatically flush all dirty buffers/cache to disk and halt the system or wait for a human oracle rather than a heuristic.

Giving people (more) control over the heuristic just makes it more likely the OOM killer won't be able to bring the system into a safe state to even respond to sysrq keys. Truly important work gets check pointed/has a commit log. Those mechanisms will fail if the kernel can't recover to the point that pending file writes can complete.

KozmoNaut
Apr 23, 2008

Happiness is a warm
Turbo Plasma Rifle


"You don't need a swap partition" and "just set vm_swappiness to 0 or 1" and "swap kills SSDs" are some of the worst advice ever given by Linux "experts".

Of course don't put swap on a IOPS-limited network disk or SD card or whatever, there's no reason to be daft, but unless you truly absolutely know that you have vastly more memory in your machine than your workload will ever need, you will want some swap to free up RAM for disk cache and actively running application that need to be responsive, rather than relying on their own private caching setup.

This entire idea of swap only being something computers needed in the old days, because memory was expensive is a myth.

Truga
May 4, 2014
Lipstick Apathy
on the servers at work i even have swap on the vm hosts, because even with 256gb of ram, vm performance is just better, and it lets me overprovision ram in a couple hungry vms that gets swapped out when unneeded

on the cloud vms we host the webshits on i have 0b of swap, because even though digitalocean is supposedly all SSD, filling up ram absolutely murders performance somehow, usually to the point where even ssh becomes unresponsive. if a website is leaking enough to cause swapping, i'd rather it gets killed, which then gets logged, than me trying to figure out what the gently caress was going on on a vm before i had to power cycle it because it was unresponsive due to a single rogue process eating up ram

on my shitbox at home i have 0b of swap because an old drive died, but i haven't bothered fixing it yet because i have 64gb ram and i no longer require running local vms, and don't use chome so my current peak usage is like 12-16gb with *everything* running :v:

i'll be replacing the last of my hdds with flash next year and will fix it then

KozmoNaut
Apr 23, 2008

Happiness is a warm
Turbo Plasma Rifle


Truga posted:

on my shitbox at home i have 0b of swap because an old drive died, but i haven't bothered fixing it yet because i have 64gb ram and i no longer require running local vms, and don't use chome so my current peak usage is like 12-16gb with *everything* running :v:

I have 32GB and run both Firefox and Chrome for different purposes. I definitely need swap :v:

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine
for VMs you can also use zram for the swap partition

SYSV Fanfic
Sep 9, 2003

by Pragmatica
Any time my computer starts swapping I get paranoid that it's dying/freezing. Can't be the only one.

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:
If you know a bunch of your pages are inactive, swapping them out to have more cache is good, actually.

How well does KSM perform with container weeaboo? I don't have big enough container hosts to test.

SYSV Fanfic
Sep 9, 2003

by Pragmatica

Antigravitas posted:

If you know a bunch of your pages are inactive, swapping them out to have more cache is good, actually.

Does the kernel still use LRU? On a desktop some unused pages have a lot more utility than others. You'll definitely notice if you come back to your computer and X has to be paged back in 4kb at a time. Dunno if you can give hints via malloc now either.

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:
You can tell Linux to keep your pages resident.

Sadly, Linux does not have anything as good as the ARC…

Kazinsal
Dec 13, 2011
preemptively swapping means your OS is freeing up memory that's being use for no reason. if you have an SSD on a SATA3 port that's 600 MB/s of maximum I/O bandwidth and your random IOPS is going to be high enough that dumping a few hundred pages to disk doesn't matter. it can do that in preparation for something else being loaded, and native command queueing means that it can interleave swap flushes with loads for negligible I/O latency cost

SYSV Fanfic
Sep 9, 2003

by Pragmatica

Kazinsal posted:

preemptively swapping means your OS is freeing up memory that's being use for no reason. if you have an SSD on a SATA3 port that's 600 MB/s of maximum I/O bandwidth and your random IOPS is going to be high enough that dumping a few hundred pages to disk doesn't matter. it can do that in preparation for something else being loaded, and native command queueing means that it can interleave swap flushes with loads for negligible I/O latency cost

Why cause that negligeable bit of wear on an SSD when I can just buy 32gb more ram instead?

Antigravitas posted:

You can tell Linux to keep your pages resident.

Sadly, Linux does not have anything as good as the ARC…

Hey yeah, are you talking about mlock? Looking for ways to do it and you can force it, but you can also set per cgroup swappiness now via systemd.

KozmoNaut
Apr 23, 2008

Happiness is a warm
Turbo Plasma Rifle


SYSV Fanfic posted:

Why cause that negligeable bit of wear on an SSD when I can just buy 32gb more ram instead?

My motherboard is officially limited to 16GB, but works with 32GB*, anything beyond that is :lol: nope.

*The memory controller in the Phenom II can address 32GB and absolutely no more.

I've been using this SSD (Samsung 840 Evo) since it was the fanciest and newest you could get and this PC had 4GB RAM, with swap enabled. According to the conservative values from SMART, it has 95% wear life left. I'm not worried about SSDs wearing out.

SYSV Fanfic
Sep 9, 2003

by Pragmatica
Maybe with the SSDs you can buy on a KozmoNaut salary old person joke, sorry ur absolutely right though

Cybernetic Vermin
Apr 18, 2005

swapless oom killer should just unmap the lru page on the system, and have the page fault handler kill the process that touches a page unmapped this way.

SYSV Fanfic
Sep 9, 2003

by Pragmatica

Cybernetic Vermin posted:

swapless oom killer should just unmap the lru page on the system, and have the page fault handler kill the process that touches a page unmapped this way.

Whether than focusing on whether this idea is good or bad, I'm trying to envision how this policy would be represented in a metaphorical computer world you could go inside of, like TRON.

BlankSystemDaemon
Mar 13, 2009



SYSV Fanfic posted:

There's nothing to fix. Exhausting all available physical and virtual memory is an error condition caused by bugs or bad design on the part of the user. "System stability" is kind of a moot point when the other option is to automatically flush all dirty buffers/cache to disk and halt the system or wait for a human oracle rather than a heuristic.

Giving people (more) control over the heuristic just makes it more likely the OOM killer won't be able to bring the system into a safe state to even respond to sysrq keys. Truly important work gets check pointed/has a commit log. Those mechanisms will fail if the kernel can't recover to the point that pending file writes can complete.

SYSV Fanfic posted:

Any time my computer starts swapping I get paranoid that it's dying/freezing. Can't be the only one.
There's something deeply wrong with the entire loving virtual memory management when the response to fixing a tiny piece of software is to completely abandon (demand) paging and insist on going back to physical memorymapping, because there's apparently a guarantee that if any swapping happens, the system will crash.

SYSV Fanfic
Sep 9, 2003

by Pragmatica

BlankSystemDaemon posted:

There's something deeply wrong with the entire loving virtual memory management when the response to fixing a tiny piece of software is to completely abandon (demand) paging and insist on going back to physical memorymapping, because there's apparently a guarantee that if any swapping happens, the system will crash.

Talking about being anxious my hardware is dying when swapping happens. Like... did this thing just freeze for five seconds ten times in a row b/c I need to repaste something, am I going to have to do an RMA/buy a new board? Thermal throttling? Oh no, just firefox using 10,000% CPU and 150% available ram. Nope just working as intended, swapping out thousands of 4k chunks of memory like it's 1994 and I'm running emacs or some poo poo.

On a desktop, OOM killer always gets its man b/c it's always the browser.

SYSV Fanfic
Sep 9, 2003

by Pragmatica
Had it happen when I was helping my brother with his class on a true multi-user install of ubuntu. Some tab my mom had left open when we switched user went bezerk. I'd just written him a nice little annuity calculator in python. System locked up, couldn't log in to a console, then BAM. OOM killer stepped in and saved me. Real hero of the hour.

BlankSystemDaemon
Mar 13, 2009



SYSV Fanfic posted:

Talking about being anxious my hardware is dying when swapping happens. Like... did this thing just freeze for five seconds ten times in a row b/c I need to repaste something, am I going to have to do an RMA/buy a new board? Thermal throttling? Oh no, just firefox using 10,000% CPU and 150% available ram. Nope just working as intended, swapping out thousands of 4k chunks of memory like it's 1994 and I'm running emacs or some poo poo.

On a desktop, OOM killer always gets its man b/c it's always the browser.
Yeah, something is deeply loving broken with the VM subsystem if that's what you're experiencing.

hifi
Jul 25, 2012

BlankSystemDaemon posted:

There's something deeply wrong with the entire loving virtual memory management when the response to fixing a tiny piece of software is to completely abandon (demand) paging and insist on going back to physical memorymapping, because there's apparently a guarantee that if any swapping happens, the system will crash.

you're conflating crashing and slowing down

hifi
Jul 25, 2012

anyways bsd probably sucks just as much poo poo if not more when some docker crapware is spawning itself 300 times a second

The_Franz
Aug 8, 2003

hifi posted:

anyways bsd probably sucks just as much poo poo if not more when some docker crapware is spawning itself 300 times a second

freebsd oom killer uses the "new guy in the prison yard" approach: find the biggest process and take it out. linux attempts to use some heuristics when trying to figure out which process to whack

Mr. Crow
May 22, 2008

Snap City mayor for life
I haven't had dedicated swap in years and I've never had an issue. RAM is cheap as poo poo, even with multiple desktop VMs all with browsers overloaded with tabs because I'm incapable of closing anything and simultaneously running memory intensive games I've never noticed oom killer or any performance degradation/crashing due to memory pressure. I'm only using half my ram slots too.

:shrug:

Sapozhnik
Jan 2, 2005

Nap Ghost
desktop linux these days uses user-mode systemd to contain every application within its own cgroup

you'd think that would prevent the system from getting hosed by a process with runaway memory consumption and yet here we are.

of course, we would ideally migrate everything to posix_spawn or at the very least vfork, then configure our kernels not to write rubber checks for memory in the first place. but yeah that isn't happening any time soon.

mawarannahr
May 21, 2019

I run zen on my desktop with no swap. I’ve had a few hard freezes — I wonder if maybe I shouldn’t do one of these things on 32 GB RAM?

BlankSystemDaemon
Mar 13, 2009



hifi posted:

anyways bsd probably sucks just as much poo poo if not more when some docker crapware is spawning itself 300 times a second
I've seen people fail to forkbomb a FreeBSD based honeypot that I used to manage, without success.

BlankSystemDaemon fucked around with this message at 19:57 on Dec 11, 2021

Tankakern
Jul 25, 2007

so they succeded?

mystes
May 31, 2006

Tankakern posted:

so they succeded?

BlankSystemDaemon
Mar 13, 2009



Tankakern posted:

so they succeded?
Ha, holy poo poo I did a double negative unintentionally. :v:

Progressive JPEG
Feb 19, 2003

does the linux oom killer just blindly pick a pid at random or what

seems real good at killing the wrong things ime

init, kubelet, whatever *blam*

BlankSystemDaemon
Mar 13, 2009



The_Franz posted:

freebsd oom killer uses the "new guy in the prison yard" approach: find the biggest process and take it out. linux attempts to use some heuristics when trying to figure out which process to whack
it might also be worth mentioning that protect(1) is meant to be used to protect processes from it, and that the _oomprotect suffix for rc.conf can set it for services

Nomnom Cookie
Aug 30, 2009



Sapozhnik posted:

desktop linux these days uses user-mode systemd to contain every application within its own cgroup

you'd think that would prevent the system from getting hosed by a process with runaway memory consumption and yet here we are.

of course, we would ideally migrate everything to posix_spawn or at the very least vfork, then configure our kernels not to write rubber checks for memory in the first place. but yeah that isn't happening any time soon.

if you set a memory limit on the cgroup this will work. cgroups are how docker enforces limits. but the system can’t set a limit on its own

Nomnom Cookie
Aug 30, 2009



BlankSystemDaemon posted:

There's something deeply wrong with the entire loving virtual memory management when the response to fixing a tiny piece of software is to completely abandon (demand) paging and insist on going back to physical memorymapping, because there's apparently a guarantee that if any swapping happens, the system will crash.

my earlier point, which you seem to have either missed or ignored, is that in the scenarios kubernetes is designed for and deployed into, crashing the system is better than slowing down. a crash is a fail-stop, fail-stops are well understood and distributed systems are usually pretty good at dealing with them. fail-slows are a lot less well understood, are more complicated to handle, and some of the mechanisms used to detect fail-stops will aggravate the effects of a fail-slow. that is why kubernetes was built to operate without swap and why people who know what they’re doing won’t use kube with swap unless they really know what they’re doing and really need it

Adbot
ADBOT LOVES YOU

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

BlankSystemDaemon posted:

it might also be worth mentioning that protect(1) is meant to be used to protect processes from it, and that the _oomprotect suffix for rc.conf can set it for services

there is a similar mechanism to give weights to processes for the linux oom killer, i have never looked in to details of how these are set but ive seen a bunch of oom killer ouptut in dmesg, and most systems are configured out of the box so e.g. sshd will never be oom-killed

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply