Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
Hey there. I'm regularly getting BSODs but can't figure out what the issue is. There is no obvious pattern that I have yet decerned as to when, although it's often when I open a new program such as outlook - that might just be that I notice it more when that happens. I sometime leave the machine to make a drink and come back to it having restarted... That last happening during my writing this post...

My full system is here but it's a i7-8700K, ASRock Fatal1ty Z370, 2x 8gb DDR4-3200 G.Skill Ripjaws V CL16, Gigabyte GeForce RTX 2080 8gb build from 2018 (with a bunch of legacy hard drives doing non-critical things).

I initially thought that it might be a driver conflict issue / system corruption / malware as I'd not wiped and restarted since 2018. I therefore formatted my M.2 drive and reinstalled Windows 11 from USB media and then fresh installed my various apps / drivers / games from new downloads. No joy. As part of the update process I had to update my BIOS (which is now the latest build) which also reset my CPU overclock and RAM timings (both of were relatively conservative and had been running without issue for years). The RAM reset to default settings, still BSODs; I have now set it to XMP mode (3200 16-18-18-38-2T) and it has made no difference. I don't have spare components to swap in to see if that resolves the issue so I'm stuck with trying to identify the problem without resorting to physically swapping hardware.

Things tried thus far:
  • Reformat of HD and fresh install of Windows 11 and latest updates and drivers.
  • sfc /scannow reports no integrity violations.
  • BIOS update to latest version.
  • Removed overclocks on RAM / CPU (I've never overclocked my GPU).
  • Speccy suggests that all of my hard drives as good on their SMART reports.
  • MemTest86 gave my RAM a clean bill of health.
  • Prime98 blend test for 8 1/2 hours. All six cores were happy with no errors reported.
  • Played PUBG for five hours without issue. :)
  • FurMark (v1.31) ran for five hours without issue (GPU sat at 87C throughout)

I'm now out of ideas. I might try keeping Outlook closed today to see if that makes any difference. What are my next steps?

Adbot
ADBOT LOVES YOU

seance snacks
Mar 30, 2007

Sounds like you’ve covered the usual suspects

Next time it happens, or if you recall a previous timestamp to go off of now, check the Event Log. Select System and look at any Critical events right before the blue screen. You can try filtering the log too by some common BSOD/unexpected shutdown error codes.

If I had to guess, it’s probably a corrupted driver. Or a driver thats getting updated in the background, which causes a crash, and then Windows rolls it back on the next reboot.

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights

seance snacks posted:

Sounds like you’ve covered the usual suspects

Next time it happens, or if you recall a previous timestamp to go off of now, check the Event Log. Select System and look at any Critical events right before the blue screen. You can try filtering the log too by some common BSOD/unexpected shutdown error codes.

If I had to guess, it’s probably a corrupted driver. Or a driver thats getting updated in the background, which causes a crash, and then Windows rolls it back on the next reboot.

That's a good plan. Thanks for the advice.

Event viewers shows eight critical errors, all with event code "Event 41, Kernel-Power" and the detail suggests task category 63. 1 critical error in the last 24 hours and 4 in the last 7 days. That sounds about right, if perhaps lower than the actual number. Presumably some BSODs might be hidden in the 'error' section (53 last 24 hours, 222 last 7 days)?

A quick google suggests that this is usually associated with wrong drivers, which fits with your theory. I'm not used to event viewer - is any of the data listed likely to help me narrow down which driver is at fault?

seance snacks
Mar 30, 2007

Bouchehog posted:

Presumably some BSODs might be hidden in the 'error' section (53 last 24 hours, 222 last 7 days)?

Yup, I should have clarified, you'll want to check all the Event Level boxes to include Warning, Error, Information, etc.

And upon finding an unexpected shutdown event, you'll want to look at the events just before that happened for clues.


Bouchehog posted:

I'm not used to event viewer - is any of the data listed likely to help me narrow down which driver is at fault?

Yup, it's the bottom pane of the Event Viewer. It should tell you the Source and then it just kinda depends what text the error gives you. I looked at mine as an example:

An Error, "The driver detected an internal driver error on \Device\VBoxNetLwf."

I know that V-anything points to some sort of VM on the system. In this case, I know it's because I don't have Virtualization turned on in the MOBO.


A Warning, "The driver \Driver\WudfRd failed to load for the device ROOT\WindowsHelloFaceSoftwareDriver\0000."

This one points you towards the driver, at which point you might go into Device Manager and turn off automatic driver updates on the offender.

If you're not getting enough info, I would double check that Windows is logging everything, or change it to add Verbose logging.

seance snacks fucked around with this message at 21:27 on Sep 7, 2022

Canine Blues Arooo
Jan 7, 2008

when you think about it...i'm the first girl you ever spent the night with

Grimey Drawer
Upload the minidumps and I'll take a look at them this evening.

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights

Canine Blues Arooo posted:

Upload the minidumps and I'll take a look at them this evening.

Very kind of you but I don't think that I need trouble you as the issue is clear from the logs.

I looked at the logs around the last five critical errors:
    08/09/2022 09:38:10
    07/09/2022 08:59:18
    05/09/2022 21:45:33
    05/09/2022 10:57:45
    04/09/2022 12:37:39

On each occasion the administrative events log has the same two identical entries with the exact same timestamp:
  • Error: Dump file creation failed due to error during dump creation. EventID: 161 \Device\HarddiskVolume9
  • Warning: The driver \Driver\WudfRd failed to load for the device ROOT\WINDOWSHELLOFACESOFTWAREDRIVER\0000. Event ID: 219 ROOT\WINDOWSHELLOFACESOFTWAREDRIVER\0000

The details tab of the driver event shows:

quote:

System

- Provider
[ Name] Microsoft-Windows-Kernel-PnP
[ Guid] {9c205a39-1250-487d-abd7-e831c6290539}
EventID 219
Version 0
Level 3
Task 212
Opcode 0
Keywords 0x8000000000000000

- TimeCreated
[ SystemTime] 2022-09-08T08:38:10.8129123Z
EventRecordID 4981
Correlation

- Execution
[ ProcessID] 4
[ ThreadID] 608
Channel System
Computer Study-PC

- Security
[ UserID] S-1-5-18

- EventData
DriverNameLength 40
DriverName ROOT\WINDOWSHELLOFACESOFTWAREDRIVER\0000
Status 3221226341
FailureNameLength 14
FailureName \Driver\WudfRd
Version 0

\Driver\WudfRd is obviously a driver related to Windows Hello. Not quite sure what Hello was up to during normal use of my PC. So presumably this is either something borked with Hello, or more likely the drivers around the associated camera.

I have a Logitech BRIO webcam, which has the latest firmware [v2.0.58, 23/3/22] and software [v2.213.314, 13/8/2022]. There isn't any obvious way to role back the firmware but I could look into how to do that. I can no doubt uninstall the software and go for an earlier incarnation. I can also try checking my USB controller drivers and make sure that I've got the lastest ones for my Mobo. Will try all of that today.

Any other ideas? Am I on the right track here?

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
The Asrock page for my board doesn't have any obvious Windows 11 drivers relating to USB controllers or the intel chipset.

Windows update has the following optional updates:


Not quite sure how to proceed here. I've unplugged the webcam for the time being...

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
So, quick update. Unplugging the camera has made zero difference. I'm still getting BSODs. I have disabled Windows Hello as a sign in option in the settings menu but I'm still at a loss as to how to proceed from here.

Anyone have any ideas?

Canine Blues Arooo
Jan 7, 2008

when you think about it...i'm the first girl you ever spent the night with

Grimey Drawer

Canine Blues Arooo posted:

Upload the minidumps and I'll take a look at them this evening.

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
Many thanks for the kind offer, which I've been trying to take up. There was no C:\Windows\memory.dmp, or C:\memory.dmp (nor anything called Minidump.dump). Doing a complete search of my c: drive for '*.dmp' only brings up a couple of game related crashdumps in users\bouch\AppData\Local\CrashDumps for UE4Minidump.dmp, GalaxyClient.exe.25264.dmp, AcroLicApp.exe(1).12624.dmp, AcroLicApp.exe.12624.dmp and AcroLicApp.exe.12724.dmp. None of those file have a recent date and so I assume are unrelated.

I've turned on 'Complete memory dump' in system failure control panel. It should dump to '%SystemRoot%\MEMORY.DMP'. I've just had two crashes. Nothing sitting in the root of my C: drive or in the windows folder.

The event log reports at the same time stamp as each crash 'Dump file creation failed due to error during dump creation.' (event id 161)

Geebs
Jun 1, 2022

The ø is silent.

Bouchehog posted:

  • Error: Dump file creation failed due to error during dump creation. EventID: 161 \Device\HarddiskVolume9
  • Warning: The driver \Driver\WudfRd failed to load for the device ROOT\WINDOWSHELLOFACESOFTWAREDRIVER\0000. Event ID: 219 ROOT\WINDOWSHELLOFACESOFTWAREDRIVER\0000

That error there is concerning. You won't get any BSOD dumps until that's solved, which is pretty vital to figuring out what happened. Is your C drive full? You should set that to a different drive so at least you can at least try to generate a minidump. You don't need to turn on full memory dump, a minidump is fine. Turning on full dump might make the problem worse (if it's a drive space issue, or if the machine crashes before the dump is able to complete).

Also, this might be indicative of the failure that's actually causing your BSOD. If \Device\HarddiskVolume9 is failing, that will cause dump creation to fail as well.

I know I'm not a typical poster round here but I have a knack for debugging BSODs (I work with the kernel), so if you get a minidump I can take a look as well.

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights

Geebs posted:

That error there is concerning. You won't get any BSOD dumps until that's solved, which is pretty vital to figuring out what happened. Is your C drive full? You should set that to a different drive so at least you can at least try to generate a minidump. You don't need to turn on full memory dump, a minidump is fine. Turning on full dump might make the problem worse (if it's a drive space issue, or if the machine crashes before the dump is able to complete).

Also, this might be indicative of the failure that's actually causing your BSOD. If \Device\HarddiskVolume9 is failing, that will cause dump creation to fail as well.

I know I'm not a typical poster round here but I have a knack for debugging BSODs (I work with the kernel), so if you get a minidump I can take a look as well.

The C drive has 271gb free on it (it's a 500gb M.2). I've tried setting the minidump to another drive to see if that works. I've switched it to a small memory dump on the basis that the fewer the bytes, the more likely it is to sort it out before dying.

I have no idea what \Device\HarddiskVolume9 actually is...

Geebs
Jun 1, 2022

The ø is silent.

Bouchehog posted:

The C drive has 271gb free on it (it's a 500gb M.2). I've tried setting the minidump to another drive to see if that works. I've switched it to a small memory dump on the basis that the fewer the bytes, the more likely it is to sort it out before dying.

I have no idea what \Device\HarddiskVolume9 actually is...

That's an internal name that the eventlog is using. You can use this NirSoft application to see which drive that maps to: http://www.nirsoft.net/utils/drive_letter_view.html. However, since your C: drive was set to save dumps last, it is probably your C: drive.

Geebs fucked around with this message at 23:51 on Sep 13, 2022

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
Predictably volume9 is my c: drive / M.2 drive.

Running a few tools from the admin command prompt:
  • DISM.exe /Online /Cleanup-image /Restorehealth gave me a BSOD at 53% first time, completed successfully the second time.
  • DISM /Online /Cleanup-Image /ScanHealth completed successfully first time
  • DISM /Online /Cleanup-Image /RestoreHealth completed successfully first time
  • SFC /scannow completed successfully and found corrupt files

The two logs are here: CBS / DISM.

I have no idea how to read these. I seem to recall that I ran SFC after the very first BSOD on this fresh install of Windows 11 and it was fine; I therefore wonder if the corruptions, whatever they were, were caused by the repeated BSODs rather than the other way around.

There's the September cumulative update due to install (KN5017328) which I'm going to install now and see if the SFC corrections and new update make any difference....


[edit:] I am going with 'no'. I got a BSOD before the login screen after updating (I didn't babysit to check that it even fully finished). SFC /scannow reports no integrity violation. Still no minidumps anywhere (save for games/applications).

Bouchehog fucked around with this message at 10:14 on Sep 14, 2022

Geebs
Jun 1, 2022

The ø is silent.

Bouchehog posted:

Predictably volume9 is my c: drive / M.2 drive.

Running a few tools from the admin command prompt:
  • DISM.exe /Online /Cleanup-image /Restorehealth gave me a BSOD at 53% first time, completed successfully the second time.
  • DISM /Online /Cleanup-Image /ScanHealth completed successfully first time
  • DISM /Online /Cleanup-Image /RestoreHealth completed successfully first time
  • SFC /scannow completed successfully and found corrupt files

The two logs are here: CBS / DISM.

I have no idea how to read these. I seem to recall that I ran SFC after the very first BSOD on this fresh install of Windows 11 and it was fine; I therefore wonder if the corruptions, whatever they were, were caused by the repeated BSODs rather than the other way around.

There's the September cumulative update due to install (KN5017328) which I'm going to install now and see if the SFC corrections and new update make any difference....


[edit:] I am going with 'no'. I got a BSOD before the login screen after updating (I didn't babysit to check that it even fully finished). SFC /scannow reports no integrity violation. Still no minidumps anywhere (save for games/applications).
So you did switch the minidump dir to a new location and it didn't generate one? Does the event log have a new entry for the failure to write out the dump? It's going to be vital to get that minidump created somehow.

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
I switched the minidump location so that in the 'start-up and recovery' control panel it said 'd:\Minidump' not '%SystemRoot%\Minidump'. Opening it up again now it's back to pointing at '%SystemRoot%\Minidump (i.e. C:\Windows). Not quite sure what is going on there. I've changed it again and it has survived a restart, still showing d:\Minidump so I will have to see what happens after the next crash. I suspect I won't have long to wait.

Geebs
Jun 1, 2022

The ø is silent.

Bouchehog posted:

I switched the minidump location so that in the 'start-up and recovery' control panel it said 'd:\Minidump' not '%SystemRoot%\Minidump'. Opening it up again now it's back to pointing at '%SystemRoot%\Minidump (i.e. C:\Windows). Not quite sure what is going on there. I've changed it again and it has survived a restart, still showing d:\Minidump so I will have to see what happens after the next crash. I suspect I won't have long to wait.

Not sure what happened there but a restart is definitely required to make that change stick. Hopefully that produces a minidump now.

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights

Geebs posted:

So you did switch the minidump dir to a new location and it didn't generate one? Does the event log have a new entry for the failure to write out the dump? It's going to be vital to get that minidump created somehow.
So, I've just had another crash. Timestamp 20:42:31. The same timestamp had the the usual /WudfRd Windows Hello driver issue and a second earlier a error notice: "Dump file creation failed due to error during dump creation." (ID 161) event data suggests that it again relates to "\Device\HarddiskVolume9" - the system / C: drive. Small dump director in the control panel is still listed as 'd:\Minidump'

I am so lost as to what to do here.

[Edit:] And a second one at 20:57:33. Same story.

Bouchehog fucked around with this message at 20:59 on Sep 14, 2022

Canine Blues Arooo
Jan 7, 2008

when you think about it...i'm the first girl you ever spent the night with

Grimey Drawer
I so, so hate trying to diagnose BSoDs without a Minidump. It tends to devolve into blind guesswork almost immediately if the problem isn't dead obvious. However, given the circumstances...

WudfRd itself is Windows Driver Foundation, which is...not much to go on, but it is a smoking gun. Things you can do: Unplug and remove everything you don't need (in this case, especially your webcam). See if the problem improves.

There is a chance that something in the storage I/O pipe is busted. It might be hardware on the board. It might be the storage device itself. It might be a cable (it's probably not a cable).

As mentioned, all of this is big guesswork. Without a minidump, everything is super surface level and ranges from 'information is missing' to 'information is misleading'.

Geebs
Jun 1, 2022

The ø is silent.

Bouchehog posted:

So, I've just had another crash. Timestamp 20:42:31. The same timestamp had the the usual /WudfRd Windows Hello driver issue and a second earlier a error notice: "Dump file creation failed due to error during dump creation." (ID 161) event data suggests that it again relates to "\Device\HarddiskVolume9" - the system / C: drive. Small dump director in the control panel is still listed as 'd:\Minidump'

I am so lost as to what to do here.

[Edit:] And a second one at 20:57:33. Same story.

Unfortunately this is going to be the main difficulty here. Without a minidump the possibilities of issues are endless. However, I will say that *not* being able to generate a minidump is typically a sign of a pretty serious issue. Minidumps use a special path that really only relies on your drive, the CPU, memory, and power to function. If any of those stops working while this process is happening, you'll get a crash dump write failure. If the system resets during that period, you'll get a failure as well. Major hardware issues generally cause immediate resets as the system is too unstable to continue at all.

The issue here is that, you've already seemed to have stressed tested your machine fairly well. It could also be an actual power issue, if you have it plugged into an extension cable or surge protector, try plugging that it directly into the wall instead. I might throw in an SSD stress test in there as well, SMART doesn't catch everything. Still, NVMe drives are not prone to failure.

If the BSOD happens while you're using it, do you notice the BSOD remain on the screen for very long or does it immediately disappear and reboot? If it does remain on the screen, take a photo of it with a camera and post that. That'll give a bit more info to work with.

If that fails and you're feeling *particularly* savvy and good at following instructions, you can set up a kernel debugger on a separate machine using kdnet to see the BSOD message over the network before it tries to write the crash dump out. You'd run windbg on one machine which will debug your main machine. There should be many tutorials on how to set up kdnet, including on MSDN. The only command you'd need to run is 'g' and it'll sit there waiting for the system to crash. Then you should see the initial BSOD message there. Again, not for the faint of heart.

You can also use the more aggressive process of elimination route to rule out hardware failure. You can try:

- Changing your pagefile drive to D:\. That could help mitigate some crash dump generation issues.
- Unplugging all external devices. That could narrow down a particular faulty device causing a power issue.
- Installing windows on your other drive and booting from there. That will narrow down the faulty OS drive case, although I'm skeptical that's the problem.
- Booting a linux livecd and seeing if that crashes with some stress. If it does, you can generally point to hardware issue as that shows it's not related to the OS.

Geebs fucked around with this message at 02:57 on Sep 15, 2022

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights

Geebs posted:

Various helpful stuff

Canine Blues Arooo posted:

Other helpful stuff

So, significant updates. I managed to get a video of the BSOD screen to capture the message:



I now also have the following dumpfiles (I was set to small memory dump at the time, so I hope this is enough):
092322-8265-01.dmp (from D:\Minidump\)
DumpStack.log (from C:\)

I am not entirely sure what changed - I've not been around that much over the last week but of the seven crashes in that time (five before, one after) this is the only minidump created. The EventViewer logs show no obvious difference in the event to the usual, save that the usual 'dump file fail becuase failed' message was a success message. There should have been fireworks.

Hopefully this will give someone more au fait with diagnosing these things a route in.

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
Just by way of update, I've just done some digging into 'dpc watchdog violation' errors and the main suggested causes are SSD or graphics card firmware issues.

Samsung Magician (v.7.1.1) reports that my firmware is up-to-date on the three relevant SSDs (C: 960 M.2 500gb, F: 840 SSD 120gb and E: 850 SSD 250gb). The app suggests that the drive health for all three drives is food.
Crucial Storage Executive (v.8.03) reports that the one relevant drive (D: CT2000MX500 1.8gb) is in good health with firmware up-to-date. I haven't bothered to check the firmware on the two legacy platter HDs (both Seagate drives, a ST2000D 2gb drive and a ST1000D 1gb drive), if indeed old platter HDs have firmware that gets updated, but both report good health via Speccy.

My Gigabyte WINDFORCE RTX 2080 is running nVidia v516.94 and the bios is reported in the adapter information as Version90.4.d.0.52. The webpage for the GPU shows two BIOS updates - one for Samsung memory, one for Micron memory. I have no idea which I have or how to check so I've not tried either file. I tried downloading 'AORUS ENGINE' to see if that would point me to the correct firmware but it's like opening a 1980's text based adventure game, reported LED driver errors and I immediately uninstalled it...

I have also gone onto the ASRock z370 webpage and downloaded and installed what it called Windows 10 "INF drivers", which I take it to mean the controller and chipset drivers. There were none for Windows 11.

I will see if that makes any difference pending any update from the dump file from someone more knowledgeable.

Bouchehog fucked around with this message at 13:26 on Sep 27, 2022

redeyes
Sep 14, 2002

by Fluffdaddy
I believe DPC watchdog violation means some device is spamming DPCs to the point the system dies. I'd actually remove ALL gigabyte branded software from the computer as a starting point. See if that does anything.

Stanley Pain
Jun 16, 2001

by Fluffdaddy
Do you have a PSU tester handy? Would be good to rule that out first.

DPC watchdog BSODs are typically IRQL level of BSODs which are almost always hardware related (and sometimes driver related).

After ruling out the PSU, next culprits tend to be memory (both RAM and GPU), CPU (overheating?).

Canine Blues Arooo
Jan 7, 2008

when you think about it...i'm the first girl you ever spent the night with

Grimey Drawer
Multiple dmp files would be ideal if you get more in the meantime, but from this one, we have moderately high confidence about a few things:

Bugcheck with Parameters:

code:
BugCheck 133, {1, 1e00, fffff80015f1c340, 0}
First parameter being a 1 is generally a good sign - it's *probably* a software issue (everything here is 'probably' - computers can be goofy). 133 is particularity useful because we can just check what's in the DPC queue and probably get an answer.

code:
CPU Type      KDPC       Function
 0: Normal  : 0xffffaf09df8440e0 0xfffff80044cf4ba0 nvlddmkm
 0: Normal  : 0xffffaf09e0cf27e0 0xfffff8004451ae10 nvlddmkm
 0: Normal  : 0xfffff80012015c68 0xfffff80015568a80 nt!KiEntropyDpcRoutine
 0: Normal  : 0xfffff80015e3bbc0 0xfffff800154149b0 nt!PpmCheckPeriodicStart
 0: Normal  : 0xffffaf09e02b2898 0xfffff80019277800 ndis!ndisInterruptDpc
 0: Normal  : 0xffffaf09dfde2a08 0xfffff80019277800 ndis!ndisInterruptDpc
 0: Normal  : 0xfffff80015e42260 0xfffff80015567440 nt!KiBalanceSetManagerDeferredRoutine
 0: Normal  : 0xffffaf09df8a1d00 0xfffff800185c6ca0 Wdf01000!FxInterrupt::_InterruptDpcThunk
 0: Normal  : 0xffffaf09dfbfc6e8 0xfffff8001a2b3ea0 dxgkrnl!DpiFdoDpcForIsr
 0: Normal  : 0xffffaf09db104838 0xfffff800155e36b0 nt!EtwpLoggerDpc
 0: Threaded: 0xfffff80012015758 0xfffff80015768370 nt!KiDpcWatchdog
To get some confidence in this, we can also look at the stack trace:

code:
fffff800`19c646b8 fffff800`1541d9ef : 00000000`00000133 00000000`00000001 00000000`00001e00 fffff800`15f1c340 : nt!KeBugCheckEx
fffff800`19c646c0 fffff800`1541c893 : 0000028d`2161dad8 fffff800`1200d180 00000000`004eeaec fffff800`1200d180 : nt!KeAccumulateTicks+0x23f
fffff800`19c64720 fffff800`1541c7d4 : 00000000`0000000c fffff800`1200d180 fffff800`15e41b00 000000bc`27805600 : nt!KiUpdateRunTime+0x93
fffff800`19c648e0 fffff800`1541ab3d : fffff800`15e41b00 ffffffff`ffffffff 00000000`00000000 00000000`00000009 : nt!KiUpdateTime+0x12f4
fffff800`19c64ba0 fffff800`1541a38a : fffff800`15e5fe10 fffff800`15f0d8e0 fffff800`15f0d8e0 00000000`00000000 : nt!KeClockInterruptNotify+0x39d
fffff800`19c64c50 fffff800`155370de : 000000bc`27e22337 fffff800`15f0d830 fffff800`1200d180 00000000`00000000 : nt!HalpTimerClockInterrupt+0x10a
fffff800`19c64c80 fffff800`15622bda : fffff800`19c64d90 fffff800`15f0d830 ffffaf09`dfbfc030 00000000`00000000 : nt!KiCallInterruptServiceRoutine+0x19e
fffff800`19c64cc0 fffff800`156231a7 : 00000000`00000000 fffff800`44cb66f9 00000000`00000000 00000000`00000000 : nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
fffff800`19c64d10 fffff800`44cf7ba0 : fffff800`1a2b3a3d 00000000`00000000 00000000`00000120 00000000`00000000 : nt!KiInterruptDispatchNoLockNoEtw+0x37
fffff800`19c64ea8 fffff800`1a2b3a3d : 00000000`00000000 00000000`00000120 00000000`00000000 ffffd781`7aded000 : nvlddmkm+0x877ba0
fffff800`19c64eb0 fffff800`1a30105c : ffffd781`7aded000 00000000`00000002 00000000`00000120 ffffc20c`8a4efb90 : dxgkrnl!DpiFdoMessageInterruptRoutine+0x5d
fffff800`19c64f00 fffff800`155370ab : ffffc20c`8a4efd98 fffff800`154a6d3c ffffd781`7aded008 ffffd781`7aded000 : dxgkrnl!DpiFdoLineInterruptRoutine+0xc
fffff800`19c64f30 fffff800`156227ac : ffffc20c`8a4efb90 ffffd781`7aded000 00000000`00000000 ffffd781`7aded000 : nt!KiCallInterruptServiceRoutine+0x16b
fffff800`19c64f70 fffff800`156223d7 : 00000000`00000000 ffffd781`7aded000 00000000`00000000 00000000`00000000 : nt!KiScanInterruptObjectList+0x14c
All roads point pretty clearly to your GPU's driver as the fault.

There is a high likelihood that you can install the latest driver from nVidia and see the problem disappear. Failing that, roll back to an older version to see if the problem improves.

While this suggests your GPU driver as the faulting module, that does not mean that it is the root cause. It could be hardware (which would most likely be the GPU here). It could be something doing goofy poo poo with the driver. It could be the phase of the moon, but it's really not worth thinking about that until we take the easy steps first.

e: All this is for *this exact crash*. More data points would help determine if your GPU driver is at fault here for real, or if it's just the fall guy for a larger problem.

Canine Blues Arooo fucked around with this message at 22:53 on Sep 27, 2022

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights

Canine Blues Arooo posted:

...All roads point pretty clearly to your GPU's driver as the fault.

There is a high likelihood that you can install the latest driver from nVidia and see the problem disappear. Failing that, roll back to an older version to see if the problem improves.

While this suggests your GPU driver as the faulting module, that does not mean that it is the root cause. It could be hardware (which would most likely be the GPU here). It could be something doing goofy poo poo with the driver. It could be the phase of the moon, but it's really not worth thinking about that until we take the easy steps first. ....

OK, so I installed the new nVidia drivers and in the last eight days have not seen a BSOD. I do however get a new regular fatal crash - every so often Windows just dies. I can move the mouse and, for some time, scroll windows but cannot click on anything. If a crash were to happen whilst typing this, for instance, the blinking cursor would stop blinking, I would not be able to change to another application or press the Windows start button or close any active window but I would be able to move the mouse (including between my three monitors) and use the scroll wheel to move the active window content up and down. The keyboard doesn't work. It all renders without any obvious artifact. After a few minutes of this I the scrolling would no longer work but I would still be able to use my mouse (to do nothing other than move it around the three screens). That continues indefinately and I have to hold the power button to restart my computer. If task manager is open, this just freezes the information at the point of the crash.

It is an intermittent fault, rather like the BSOD version. Sometimes I can load up and use the computer for hours without issue - either for games with discord and chrome open or word processing/outlook/excel/chrome for work purposes.

Googling the fault again suggests GPU drivers so I downloaded DDU, removed the driver in safe mode, installed the latest nVidia driver in safe mode and tried again. Same deal.

I am guessing that this is a hardware issue either with my GPU or with something on the motherboard which controls it.

As an aside, my ethernet connection now has a habit of disconnecting then reconnecting. This may be unrelated (something to do with my having installed the wrong drivers when I did a fresh install of Windows?) but I wonder if it all points to some hardware issue with a controller of some sorts on my motherboard.

Is there any way to narrow this down? Or am I going to have to start swapping out hardware to do so (in which case I think I might just go for a new build)

[Edit:] as luck would have it, I got exactly that crash in between posting the message and Crome redirecting me back to the post. I could not scroll (the whole page was displayed) and I noticed that the Chrome 'page loading' swirl in the tab at the top of the page was swirling for a 20s or so after it was obvious that the PC had crashed.

Bouchehog fucked around with this message at 09:42 on Oct 4, 2022

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights
Bumpity bump. The issue is still there. It may be no more than coincidence but it seems worse when I open Outlook - many of the freezes have associated with that, but far from all. It also seems to be that case that it either hits me in the first five/ten minutes of use otherwise I've got very good odds of it being totally fine.

I'm fast arriving at the point where I just build a new PC (which was on the cards soon anyway), pop in the old GPU and see if it was that. I can then move the HDs over and see if any of them cause issues.

Stanley Pain
Jun 16, 2001

by Fluffdaddy
Have you tested your PSU yet?

ihafarm
Aug 12, 2004

Bouchehog posted:

The Asrock page for my board doesn't have any obvious Windows 11 drivers relating to USB controllers or the intel chipset.

Windows update has the following optional updates:


Not quite sure how to proceed here. I've unplugged the webcam for the time being...

Did you install all these updates?

Bouchehog
Dec 19, 2002

The Campaign for Badger Rights

ihafarm posted:

Did you install all these updates?

No - I wasn't sure which ones to update. Happy to do so...


Stanley Pain posted:

Have you tested your PSU yet?
No idea how to do that. Corsair's website suggests this. I do have a multimeter and so could check the voltages if you think that might be an issue.

Or did you mean some other test?

Stanley Pain
Jun 16, 2001

by Fluffdaddy

Bouchehog posted:

No - I wasn't sure which ones to update. Happy to do so...

No idea how to do that. Corsair's website suggests this. I do have a multimeter and so could check the voltages if you think that might be an issue.

Or did you mean some other test?

Any $30-50 tester should work. They're pretty simple devices. I have the Thermaltake Dr. Power II and it's worked well.

taiyoko
Jan 10, 2008


Also I'd verify if the PSU is up to the wattage demands of your cpu+gpu, I had a lot of intermittent crashes on an old build because I hadn't paid attention when I upgraded my video card and was way overdrawing on a no-name PSU that came bundled with the case. Thankfully it didn't destroy my system when it finally released the magic smoke, but realizing I was trying to draw 600 watts on a 400 watt psu was "well, no loving wonder it was so much trouble."

Klyith
Aug 3, 2007

GBS Pledge Week
Multimeters and cheap PSU testers are worthless. All they are looking at is basic voltages, which will test ok unless the PSU is seriously defective. They aren't testing ripple or short-duration voltage excursions. You need a decent oscilloscope for that.

If you are concerned that a PSU is bad, the easiest way to find out is to just replace it. Problem goes away? PSU was bad. Still have crashes? Send the new one back (or keep it as a spare). An inexpensive PSU is a handy thing to have.

Stanley Pain
Jun 16, 2001

by Fluffdaddy

Klyith posted:

Multimeters and cheap PSU testers are worthless. All they are looking at is basic voltages, which will test ok unless the PSU is seriously defective. They aren't testing ripple or short-duration voltage excursions. You need a decent oscilloscope for that.

Counter-Point, a lot of failing PSUs will fail with the simple testers. In context, OPs problems aren't causing by ripple or anything like that. If the PSU is to blame it would easily fail an auto PSU test.

Telling someone they need an oscilloscope is LOL.

Klyith
Aug 3, 2007

GBS Pledge Week

Stanley Pain posted:

Counter-Point, a lot of failing PSUs will fail with the simple testers. In context, OPs problems aren't causing by ripple or anything like that. If the PSU is to blame it would easily fail an auto PSU test.

I have poked at bad PSUs with multimeters, and frequently seen in-spec voltage. This was in situations where the PSU was 100% the problem, because a replacement solved the issues.

That thermaltake thing in particular is totally worthless because it tests with zero load. And it isn't even particularly accurate.

Stanley Pain posted:

Telling someone they need an oscilloscope is LOL.

I don't think they need an oscilloscope. I think someone with a potentially bad PSU needs a second PSU to swap out for.

Adbot
ADBOT LOVES YOU

Stanley Pain
Jun 16, 2001

by Fluffdaddy

Klyith posted:

I have poked at bad PSUs with multimeters, and frequently seen in-spec voltage. This was in situations where the PSU was 100% the problem, because a replacement solved the issues.

That thermaltake thing in particular is totally worthless because it tests with zero load. And it isn't even particularly accurate.

I don't think they need an oscilloscope. I think someone with a potentially bad PSU needs a second PSU to swap out for.


I've had a completely different experience with PSU testers. It's something quick + cheap to rule something out. If other troubleshooting doesn't resolve whatever the issue is come back to the PSU.

I mean who doesn't have 3 or 4 spare PSUs just lurking about anyway? :q:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply