Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
VictualSquid
Feb 29, 2012

Gently enveloping the target with indiscriminate love.

BlankSystemDaemon posted:

Doesn't Linux have something like protect(1)?

It does, but I don't remember how to invoke it. I think it is a file in proc that you touch or write something to.

Adbot
ADBOT LOVES YOU

pseudorandom name
May 6, 2007

VictualSquid posted:

It does, but I don't remember how to invoke it. I think it is a file in proc that you touch or write something to.

choom(:420:)

Super-NintendoUser
Jan 16, 2004

COWABUNGERDER COMPADRES
Soiled Meat
anyone familiar with s390x IBM Linux? does it run regular x86 apps?

Klyith
Aug 3, 2007

GBS Pledge Week

a dingus posted:

I upgraded my GPU from a 5700xt to a 6900xt and I'm having some weird issues that seem driver related. My displays will randomly freeze, no mouse movement etc, but I can hear things continue in the background. My system will eventually respond after doing ctrl-alt-f2 and I can use the non-gui environment. I know AMD drivers are in the kernel so I'm not sure how to reinstall or what. I updated my system after I installed the card. Any ideas? I'm on Fedora 35 w/ Wayland & sway.

Fedora says they're on kernel 5.18.5, which should be totally up to date for your GPU. You can do uname -r to check your running kernel version.

Is there any type of consistent things that you're doing when this happens? For ex, does it happen at a plain desktop with only 2d windowed apps running and no video playing?

ExcessBLarg!
Sep 1, 2001

Jerk McJerkface posted:

anyone familiar with s390x IBM Linux? does it run regular x86 apps?
I haven't used it, but I'd assume that s390x distribution ports provide binaries natively compiled for z/Architecture.

If you have an x86 binary you have to run on s390x you could probably do it with QEMU. I wouldn't expect it to be terribly fast though.

Mr. Crow
May 22, 2008

Snap City mayor for life

a dingus posted:

I upgraded my GPU from a 5700xt to a 6900xt and I'm having some weird issues that seem driver related. My displays will randomly freeze, no mouse movement etc, but I can hear things continue in the background. My system will eventually respond after doing ctrl-alt-f2 and I can use the non-gui environment. I know AMD drivers are in the kernel so I'm not sure how to reinstall or what. I updated my system after I installed the card. Any ideas? I'm on Fedora 35 w/ Wayland & sway.

I think Wayland support is still iffy, i know 36 had some more improvements in that regard, its probably worth upgrading? You could try switching to X as a quick sanity check.

RFC2324
Jun 7, 2012

http 418

ExcessBLarg! posted:

I haven't used it, but I'd assume that s390x distribution ports provide binaries natively compiled for z/Architecture.

If you have an x86 binary you have to run on s390x you could probably do it with QEMU. I wouldn't expect it to be terribly fast though.

Pretty sure the normal approach is "get source, compile for native arch and hope/engage the devs that a firm that can afford an s390 have on staff to fix it"

E: wait, I think I'm thinking z390? Its too early and I need coffee

Lifroc
May 8, 2020

Mr. Crow posted:

I think Wayland support is still iffy, i know 36 had some more improvements in that regard, its probably worth upgrading? You could try switching to X as a quick sanity check.

Wayland is pretty good on Fedora, I've been using it for the past 8 months on a 6800XT with no issues.

a dingus posted:

I upgraded my GPU from a 5700xt to a 6900xt and I'm having some weird issues that seem driver related. My displays will randomly freeze, no mouse movement etc, but I can hear things continue in the background. My system will eventually respond after doing ctrl-alt-f2 and I can use the non-gui environment. I know AMD drivers are in the kernel so I'm not sure how to reinstall or what. I updated my system after I installed the card. Any ideas? I'm on Fedora 35 w/ Wayland & sway.

That sounds like what used to happen with my 5500XT a couple years ago which put me off big time from AMD cards, apparently I did track it down to a driver/KMS issue that was still unresolved. I gave the card away recently and last I heard the card was working with latest kernels, so it must have been fixed.

Your best bet is figuring out what exactly is crashing. Check with journalctl the logs, if the system needs rebooting you can check the previous boot logs with journalctl -b -1

FWIW, I have a 6800XT on Fedora 36 GNOME+Wayland and it's smooth sailing.

Klyith
Aug 3, 2007

GBS Pledge Week

Lifroc posted:

That sounds like what used to happen with my 5500XT a couple years ago which put me off big time from AMD cards, apparently I did track it down to a driver/KMS issue that was still unresolved. I gave the card away recently and last I heard the card was working with latest kernels, so it must have been fixed.

FWIW I have a 5700 and it's been working fine in Linux (Manjaro/KDE/Wayland) so far -- though I haven't done much 3d gaming yet to stress it out.

In one way it's working better than in Windows: I have a kinda crap Asus monitor that's connected via DisplayPort. In windows, the monitor frequently loses connection when it's powersaved, causing windows to drop the screen and shuffle all my apps onto the 2nd monitor. Super annoying, super random, happened about once a day. I actually wrote a window position saver into my autohotkey system script to restore them after those cut-outs.

In linux it has happened twice in 2 months. So there must be some OS/driver part to it. OTOH both times it's happened in linux have very bad for the display manager, requiring restarts. So, uh, not a total victory.


(Losing autohotkey is the one thing I'm having a hard time getting over.)

Lifroc
May 8, 2020

Klyith posted:

FWIW I have a 5700 and it's been working fine in Linux (Manjaro/KDE/Wayland) so far -- though I haven't done much 3d gaming yet to stress it out.

In one way it's working better than in Windows: I have a kinda crap Asus monitor that's connected via DisplayPort. In windows, the monitor frequently loses connection when it's powersaved, causing windows to drop the screen and shuffle all my apps onto the 2nd monitor. Super annoying, super random, happened about once a day. I actually wrote a window position saver into my autohotkey system script to restore them after those cut-outs.

In linux it has happened twice in 2 months. So there must be some OS/driver part to it. OTOH both times it's happened in linux have very bad for the display manager, requiring restarts. So, uh, not a total victory.


(Losing autohotkey is the one thing I'm having a hard time getting over.)

Just throwing some ideas around, but how's your PSU? I used to use an eGPU years ago and it would sometimes crash even when completely idle, randomly. I tracked it down to the PSU being defective, even though it seemed to be perfectly fine even under load.

a dingus
Mar 22, 2008

Rhetorical questions only
Fun Shoe

Lifroc posted:

Wayland is pretty good on Fedora, I've been using it for the past 8 months on a 6800XT with no issues.

That sounds like what used to happen with my 5500XT a couple years ago which put me off big time from AMD cards, apparently I did track it down to a driver/KMS issue that was still unresolved. I gave the card away recently and last I heard the card was working with latest kernels, so it must have been fixed.

Your best bet is figuring out what exactly is crashing. Check with journalctl the logs, if the system needs rebooting you can check the previous boot logs with journalctl -b -1

FWIW, I have a 6800XT on Fedora 36 GNOME+Wayland and it's smooth sailing.

Thanks all for the GPU help. Im looking through the logs and all I can find that looks suspicious is this snippet. I hooked up a new display which I didnt even think about... hooked it up at the same time as my new GPU. So maybe theres something going on between the two of them.

The lock-ups have occurred anywhere from a 20 seconds after login in an hour or so. I can be playing a game, browsing the web or working in the terminal.

code:
Jun 20 10:55:33 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CRTC:77:crtc-0] flip_done timed out
Jun 20 10:57:53 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:57:53 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CRTC:77:crtc-0] commit wait timed out
Jun 20 10:58:03 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:03 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CONNECTOR:105:DP-3] commit wait timed out
Jun 20 10:58:13 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:13 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [PLANE:65:plane-5] commit wait timed out
Jun 20 10:58:24 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:24 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [PLANE:75:plane-7] commit wait timed out
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/ldac
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/aac
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSink/sbc
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/sbc
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSink/sbc_xq
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/sbc_xq
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/faststream
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/faststream_duplex
Jun 20 10:58:24 fedora bluetoothd[919]: Player unregistered: sender=:1.37 path=/media_player0
Jun 20 10:58:24 fedora systemd[1]: Started Getty on tty2.


As far as PSUs go, I upgraded to a Corsair RM850x so it should be good for a 6900XT. Like I said I've had not issues in windows, idle or not so I hope its not a PSU problem.

I'm going to try an upgrade to Fedora 36 and see if that helps.

Klyith
Aug 3, 2007

GBS Pledge Week

Lifroc posted:

Just throwing some ideas around, but how's your PSU? I used to use an eGPU years ago and it would sometimes crash even when completely idle, randomly. I tracked it down to the PSU being defective, even though it seemed to be perfectly fine even under load.

Oh everything about my system is stable, nothing else that might indicate PSU or anything else hardware. The monitor is definitely the problem and more specifically the displayport part. It's done other dumb poo poo in the past, such that on my previous 1060 GPU I had it connected by HDMI because I thought the DP was actively broken. But now the 5700 only has one HDMI port and I need that for the 2nd monitor.

Lifroc
May 8, 2020

a dingus posted:

Thanks all for the GPU help. Im looking through the logs and all I can find that looks suspicious is this snippet. I hooked up a new display which I didnt even think about... hooked it up at the same time as my new GPU. So maybe theres something going on between the two of them.

The lock-ups have occurred anywhere from a 20 seconds after login in an hour or so. I can be playing a game, browsing the web or working in the terminal.

code:
Jun 20 10:55:33 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CRTC:77:crtc-0] flip_done timed out
Jun 20 10:57:53 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:57:53 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CRTC:77:crtc-0] commit wait timed out
Jun 20 10:58:03 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:03 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CONNECTOR:105:DP-3] commit wait timed out
Jun 20 10:58:13 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:13 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [PLANE:65:plane-5] commit wait timed out
Jun 20 10:58:24 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:24 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [PLANE:75:plane-7] commit wait timed out
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/ldac
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/aac
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSink/sbc
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/sbc
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSink/sbc_xq
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/sbc_xq
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/faststream
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/faststream_duplex
Jun 20 10:58:24 fedora bluetoothd[919]: Player unregistered: sender=:1.37 path=/media_player0
Jun 20 10:58:24 fedora systemd[1]: Started Getty on tty2.


As far as PSUs go, I upgraded to a Corsair RM850x so it should be good for a 6900XT. Like I said I've had not issues in windows, idle or not so I hope its not a PSU problem.

I'm going to try an upgrade to Fedora 36 and see if that helps.

That sure sounds like a driver problem I would report upstream on the mesa/amdgpu repo, but my experience with them has been less than stellar. I would give you a link but I'm on mobile atm

Splode
Jun 18, 2013

put some clothes on you little freak
I'm having an issue where when I copy files to a usb thumb drive (NTFS format), it's both slow and then sits on 100% for a really long time before finally finishing. Copying multiple files makes it much worse.

I'm running pop!_os if that's relevant

Any ideas?

Lifroc
May 8, 2020

Crappy USB drive? The cheaper it is, the more noticeable how slow they truly are.

Klyith
Aug 3, 2007

GBS Pledge Week

Splode posted:

I'm having an issue where when I copy files to a usb thumb drive (NTFS format), it's both slow and then sits on 100% for a really long time before finally finishing. Copying multiple files makes it much worse.

I have had slow / unresponsive write performance with NTFS on cheap flash sticks, and particularly with SD cards, in Windows. I can only imagine that the slow & cautious Linux NTFS would be worse.


NTFS, and other journaling filesystems like EXT4, are hard on crappy cheap flash drives because they produce extra small write & erase for the journal. Due to the mechanics of flash that's not ideal. On a real SSD you have a sophisticated controller and multiple flash chips, the drive can basically multitask that stuff. On a USB stick you have the cheapest dumbest controller and probably one flash chip of the cheapest NAND.

tl;dr Fat32 or exFat are better choices than NTFS for cheap usb sticks (plus linux likes them better than NTFS anyways)

Mr. Crow
May 22, 2008

Snap City mayor for life

a dingus posted:

Thanks all for the GPU help. Im looking through the logs and all I can find that looks suspicious is this snippet. I hooked up a new display which I didnt even think about... hooked it up at the same time as my new GPU. So maybe theres something going on between the two of them.

The lock-ups have occurred anywhere from a 20 seconds after login in an hour or so. I can be playing a game, browsing the web or working in the terminal.

code:
Jun 20 10:55:33 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CRTC:77:crtc-0] flip_done timed out
Jun 20 10:57:53 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:57:53 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CRTC:77:crtc-0] commit wait timed out
Jun 20 10:58:03 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:03 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [CONNECTOR:105:DP-3] commit wait timed out
Jun 20 10:58:13 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:13 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [PLANE:65:plane-5] commit wait timed out
Jun 20 10:58:24 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* flip_done timed out
Jun 20 10:58:24 fedora kernel: amdgpu 0000:28:00.0: [drm] *ERROR* [PLANE:75:plane-7] commit wait timed out
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/ldac
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/aac
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSink/sbc
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/sbc
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSink/sbc_xq
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/sbc_xq
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/faststream
Jun 20 10:58:24 fedora bluetoothd[919]: Endpoint unregistered: sender=:1.37 path=/MediaEndpoint/A2DPSource/faststream_duplex
Jun 20 10:58:24 fedora bluetoothd[919]: Player unregistered: sender=:1.37 path=/media_player0
Jun 20 10:58:24 fedora systemd[1]: Started Getty on tty2.


As far as PSUs go, I upgraded to a Corsair RM850x so it should be good for a 6900XT. Like I said I've had not issues in windows, idle or not so I hope its not a PSU problem.

I'm going to try an upgrade to Fedora 36 and see if that helps.

Quick google on phone so ymmv https://gitlab.freedesktop.org/drm/amd/-/issues/1717

BlankSystemDaemon
Mar 13, 2009



Klyith posted:

I have had slow / unresponsive write performance with NTFS on cheap flash sticks, and particularly with SD cards, in Windows. I can only imagine that the slow & cautious Linux NTFS would be worse.


NTFS, and other journaling filesystems like EXT4, are hard on crappy cheap flash drives because they produce extra small write & erase for the journal. Due to the mechanics of flash that's not ideal. On a real SSD you have a sophisticated controller and multiple flash chips, the drive can basically multitask that stuff. On a USB stick you have the cheapest dumbest controller and probably one flash chip of the cheapest NAND.

tl;dr Fat32 or exFat are better choices than NTFS for cheap usb sticks (plus linux likes them better than NTFS anyways)
Yeah, it's basically a crap-shoot.

Cheap non-volatile flash storage also tends to use the most number of bits per cell to save on the amount of circuitry needed.
Since TLC and QLC have significantly worse write endurance, that also very negatively impacts both the write speeds but also the longevity of the drive.
This is further exacerbated by the razor thin margin of additional cells that're included for the drives firmware to move data to when it's powered, and it's made even worse if, as above, the drive is left unpowered for long periods of time, because that means the firmware can't move things around.

Basically, don't buy cheap flash storage. Buy a regular M.2 drive from a reputable manufacturer and something like this.

Kivi
Aug 1, 2006
I care
Used this guide, https://docs.rackspace.com/support/how-to/linux-out-of-memory-killer/

Tried giving the (both, the pg_dump and pg_upgrade) process this:

echo -17 > /proc/(PID)/oom_adj

but my process was still killed with free -mh reporting over 50 GB of memory available. :psyduck: I'm stumped.

E: Trying this instead now, as per handy hint in the dmesg

bash (33639): /proc/33905/oom_adj is deprecated, please use /proc/33905/oom_score_adj instead.

Kivi fucked around with this message at 09:47 on Jun 22, 2022

Lifroc
May 8, 2020

Kivi posted:

Used this guide, https://docs.rackspace.com/support/how-to/linux-out-of-memory-killer/

Tried giving the (both, the pg_dump and pg_upgrade) process this:

echo -17 > /proc/(PID)/oom_adj

but my process was still killed with free -mh reporting over 50 GB of memory available. :psyduck: I'm stumped.

E: Trying this instead now, as per handy hint in the dmesg

bash (33639): /proc/33905/oom_adj is deprecated, please use /proc/33905/oom_score_adj instead.

What are the logs saying? Do you see 50 GB available after the OOM killer has ran? Because it is possible there wasn't much memory available when it triggered.

Be aware that in some cases any process with runaway memory consumption might cause the OOM killer to target another process, in this case postgres. Are you sure it is postgres itself the cause, and not something else that misbehaves?

BlankSystemDaemon
Mar 13, 2009



Lifroc posted:

Be aware that in some cases any process with runaway memory consumption might cause the OOM killer to target another process, in this case postgres.
How is that possible?

The logic should be fairly simple.

Kivi
Aug 1, 2006
I care

Lifroc posted:

What are the logs saying? Do you see 50 GB available after the OOM killer has ran? Because it is possible there wasn't much memory available when it triggered.

Be aware that in some cases any process with runaway memory consumption might cause the OOM killer to target another process, in this case postgres. Are you sure it is postgres itself the cause, and not something else that misbehaves?
No, I'm monitoring the usage by free -h and top on a tmux tabs.

On top, process memory usage goes to ca. 72 GB and then is killed. Is there better way to monitor it?

On free, free (memory) never goes to zero, swap is never used and there's always some memory in buffer / cached and available column, used only goes to same 70ish GB. No other software apart from monitoring is run on the box.

Logs have nothing.

RFC2324
Jun 7, 2012

http 418

BlankSystemDaemon posted:

How is that possible?

The logic should be fairly simple.

I have yet to meet anyone who could reliably predict what the linux oomkiller was gonna hit, other than the obvious "whichever option is gonna screw system stability the most"

It was a long time ago but I saw oomkiller kill pid 1 once, and that absolutely should not have been possible

RFC2324
Jun 7, 2012

http 418

My own current wtf is, as usual, the point where 2 projects meet.

Project 1 is a generic docker-compose file that starts up a tdarr compute node. It uses an env file to find $HOSTNAME and set that as the nodes reporting name.

Project 2 is an ansible playbook that will do everything to prep the node and then kick off the compose file. It works perfectly other than the fact that it isn't parsing $HOSTNAME, and all the nodes end up with the same name and cant register/work.

As far as I can tell the docker compose implementation in ansible never calls bash, and so never actually populates or parses vars. I think the solution is going to be use ansible to define the var as the inventory name? My hacky solution is just getting hackier

Mr. Crow
May 22, 2008

Snap City mayor for life
Not sure what the use case for compose is if you're already setting it up in ansible? Just tie them together in ansible

Voodoo Cafe
Jul 19, 2004
"You got, uhh, Holden Caulfield in there, man?"

RFC2324 posted:

My own current wtf is, as usual, the point where 2 projects meet.

Project 1 is a generic docker-compose file that starts up a tdarr compute node. It uses an env file to find $HOSTNAME and set that as the nodes reporting name.

Project 2 is an ansible playbook that will do everything to prep the node and then kick off the compose file. It works perfectly other than the fact that it isn't parsing $HOSTNAME, and all the nodes end up with the same name and cant register/work.

As far as I can tell the docker compose implementation in ansible never calls bash, and so never actually populates or parses vars. I think the solution is going to be use ansible to define the var as the inventory name? My hacky solution is just getting hackier

$HOSTNAME is a bashism. your container probably uses sh or dash or busybox. try
code:
HOSTNAME=`hostname`
in your script

BlankSystemDaemon
Mar 13, 2009



Whoever bought me this redtext is hilarious, and right. :v:

RFC2324 posted:

It was a long time ago but I saw oomkiller kill pid 1 once, and that absolutely should not have been possible
Yeah, that seems like the wrong choice.

xdice
Feb 15, 2006

RFC2324 posted:


It was a long time ago but I saw oomkiller kill pid 1 once, and that absolutely should not have been possible

Suddenly, I'm reminded of PSDoom.

https://www.cs.unm.edu/~dlchao/flake/doom/chi/chi.html

Lifroc
May 8, 2020

BlankSystemDaemon posted:

How is that possible?

The logic should be fairly simple.

Not sure if thick BSD kernel code is fairly simple to read and understand, but have a look here for how it works in Linux: https://lwn.net/Articles/317814/

I haven't dealt with OOM in a decade, but in my past life as a DBA I remember seeing it kill another process that might have been considered "bad" by its heuristic though not necessarily the actual culprit. I have also vague memories of seeing gnome-shell killed by OOM because of a Firefox memory leak a few times.

When only one process is massively leaking, its behaviour is predictable, but under intense memory pressure where more than one process are consuming large amounts of memory, or a misbehaving process has been running for more time than an innocent one, you're never sure whose head is gonna fall.

xzzy
Mar 5, 2009

Don't worry, psi will fix it all.

Computer viking
May 30, 2011
Now with less breakage.

Lifroc posted:

Not sure if thick BSD kernel code is fairly simple to read and understand, but have a look here for how it works in Linux: https://lwn.net/Articles/317814/

I haven't dealt with OOM in a decade, but in my past life as a DBA I remember seeing it kill another process that might have been considered "bad" by its heuristic though not necessarily the actual culprit. I have also vague memories of seeing gnome-shell killed by OOM because of a Firefox memory leak a few times.

When only one process is massively leaking, its behaviour is predictable, but under intense memory pressure where more than one process are consuming large amounts of memory, or a misbehaving process has been running for more time than an innocent one, you're never sure whose head is gonna fall.

The BSD code seems kind of readable, in this case - basically, iterate through all processes that aren't system/locked/already killed/not running, kill the largest one. The only "fancy" thing is the rate limiting - it enforces a little bit of a cooldown between each OOM kill.

Kivi
Aug 1, 2006
I care
I noticed that /proc/meminfo had CommitLimit of 72 or so GB so I put in vm.overcommit_ratio 90 and vm.overcommit_memory 2 as per some guides I found and will have a test if that's the issue.

E: didn't help, still process spits out "out of memory" after ca. 72 GB of memory used.

Kivi fucked around with this message at 10:53 on Jun 23, 2022

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

I have a couple machines people remote into and run graphical apps.

Once in a while, they only get a black screen. Kernel logs are full of something like:
code:
NVRM: Xid 31, pid=x, name=gnome-shell, .... MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_0 faulted. fault is of type FAULT_VA_LIMIT_VIOLATION ACCESS_TYPE_READ
According to some googling, it appears it's either a driver issue or application issue. Not running out of video memory. Usually some random game and you install a patch and it's fixed. I've used the latest NVIDIA driver for Linux and still get it. It will happen, then a couple hours later it works fine.

How do I troubleshoot this further or pin it down? My vendors just shift the blame around.

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

What do you guys on Windows for terminal programs? I want something better than 7 putty sessions.

Super-NintendoUser
Jan 16, 2004

COWABUNGERDER COMPADRES
Soiled Meat

Bob Morales posted:

What do you guys on Windows for terminal programs? I want something better than 7 putty sessions.

I use WSL with tmux. I've also had good experience with SecureCRT (not free though) and Mobaterm

SolusLunes
Oct 10, 2011

I now have several regrets.

:barf:

Bob Morales posted:

What do you guys on Windows for terminal programs? I want something better than 7 putty sessions.

The new windows terminal is actually pretty slick, and the tabbed interface got me to basically completely dump putty for it.

Klyith
Aug 3, 2007

GBS Pledge Week

Bob Morales posted:

What do you guys on Windows for terminal programs? I want something better than 7 putty sessions.

a. gently caress putty, windows has SSH built in now

b.

Klyith posted:

Also if you get windows terminal you can put SSH sessions directly on the tab-launcher thing. poo poo's tight.



(As you can see I only have a Pi and a router at home, anyone with home servers and NAS stuff should definitely do this)

SolusLunes
Oct 10, 2011

I now have several regrets.

:barf:

Klyith posted:

a. gently caress putty, windows has SSH built in now

b.

oh poo poo I didn't know you could pre-setup connections on the tab launcher, you've made my day

xzzy
Mar 5, 2009

SolusLunes posted:

The new windows terminal is actually pretty slick, and the tabbed interface got me to basically completely dump putty for it.

Same here. You're gonna lose an afternoon setting up the settings.json (make sure you back it up somewhere too, because windows keeps it in a stupid spot you'll forget about if you ever wipe the hard drive) but once it's done it is really good.

W11 with WSL2 finally brings the OS the same level as unix work on OSX. The X11 forwarding "just works." Under W10 it required some firewall rules.

Adbot
ADBOT LOVES YOU

RFC2324
Jun 7, 2012

http 418

Voodoo Cafe posted:

$HOSTNAME is a bashism. your container probably uses sh or dash or busybox. try
code:
HOSTNAME=`hostname`
in your script

Turns out I'm an idiot who just needed to learn about wrapping bash vars in curly brackets to import them. ${HOSTNAME} worked

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply