|
Hey there. I'm regularly getting BSODs but can't figure out what the issue is. There is no obvious pattern that I have yet decerned as to when, although it's often when I open a new program such as outlook - that might just be that I notice it more when that happens. I sometime leave the machine to make a drink and come back to it having restarted... That last happening during my writing this post... My full system is here but it's a i7-8700K, ASRock Fatal1ty Z370, 2x 8gb DDR4-3200 G.Skill Ripjaws V CL16, Gigabyte GeForce RTX 2080 8gb build from 2018 (with a bunch of legacy hard drives doing non-critical things). I initially thought that it might be a driver conflict issue / system corruption / malware as I'd not wiped and restarted since 2018. I therefore formatted my M.2 drive and reinstalled Windows 11 from USB media and then fresh installed my various apps / drivers / games from new downloads. No joy. As part of the update process I had to update my BIOS (which is now the latest build) which also reset my CPU overclock and RAM timings (both of were relatively conservative and had been running without issue for years). The RAM reset to default settings, still BSODs; I have now set it to XMP mode (3200 16-18-18-38-2T) and it has made no difference. I don't have spare components to swap in to see if that resolves the issue so I'm stuck with trying to identify the problem without resorting to physically swapping hardware. Things tried thus far:
I'm now out of ideas. I might try keeping Outlook closed today to see if that makes any difference. What are my next steps?
|
# ? Sep 7, 2022 09:29 |
|
|
# ? May 3, 2024 23:08 |
|
Sounds like you’ve covered the usual suspects Next time it happens, or if you recall a previous timestamp to go off of now, check the Event Log. Select System and look at any Critical events right before the blue screen. You can try filtering the log too by some common BSOD/unexpected shutdown error codes. If I had to guess, it’s probably a corrupted driver. Or a driver thats getting updated in the background, which causes a crash, and then Windows rolls it back on the next reboot.
|
# ? Sep 7, 2022 16:41 |
|
seance snacks posted:Sounds like you’ve covered the usual suspects That's a good plan. Thanks for the advice. Event viewers shows eight critical errors, all with event code "Event 41, Kernel-Power" and the detail suggests task category 63. 1 critical error in the last 24 hours and 4 in the last 7 days. That sounds about right, if perhaps lower than the actual number. Presumably some BSODs might be hidden in the 'error' section (53 last 24 hours, 222 last 7 days)? A quick google suggests that this is usually associated with wrong drivers, which fits with your theory. I'm not used to event viewer - is any of the data listed likely to help me narrow down which driver is at fault?
|
# ? Sep 7, 2022 19:40 |
|
Bouchehog posted:Presumably some BSODs might be hidden in the 'error' section (53 last 24 hours, 222 last 7 days)? Yup, I should have clarified, you'll want to check all the Event Level boxes to include Warning, Error, Information, etc. And upon finding an unexpected shutdown event, you'll want to look at the events just before that happened for clues. Bouchehog posted:I'm not used to event viewer - is any of the data listed likely to help me narrow down which driver is at fault? Yup, it's the bottom pane of the Event Viewer. It should tell you the Source and then it just kinda depends what text the error gives you. I looked at mine as an example: An Error, "The driver detected an internal driver error on \Device\VBoxNetLwf." I know that V-anything points to some sort of VM on the system. In this case, I know it's because I don't have Virtualization turned on in the MOBO. A Warning, "The driver \Driver\WudfRd failed to load for the device ROOT\WindowsHelloFaceSoftwareDriver\0000." This one points you towards the driver, at which point you might go into Device Manager and turn off automatic driver updates on the offender. If you're not getting enough info, I would double check that Windows is logging everything, or change it to add Verbose logging. seance snacks fucked around with this message at 21:27 on Sep 7, 2022 |
# ? Sep 7, 2022 21:09 |
|
Upload the minidumps and I'll take a look at them this evening.
|
# ? Sep 7, 2022 23:20 |
|
Canine Blues Arooo posted:Upload the minidumps and I'll take a look at them this evening. Very kind of you but I don't think that I need trouble you as the issue is clear from the logs. I looked at the logs around the last five critical errors:
07/09/2022 08:59:18 05/09/2022 21:45:33 05/09/2022 10:57:45 04/09/2022 12:37:39 On each occasion the administrative events log has the same two identical entries with the exact same timestamp:
The details tab of the driver event shows: quote:System \Driver\WudfRd is obviously a driver related to Windows Hello. Not quite sure what Hello was up to during normal use of my PC. So presumably this is either something borked with Hello, or more likely the drivers around the associated camera. I have a Logitech BRIO webcam, which has the latest firmware [v2.0.58, 23/3/22] and software [v2.213.314, 13/8/2022]. There isn't any obvious way to role back the firmware but I could look into how to do that. I can no doubt uninstall the software and go for an earlier incarnation. I can also try checking my USB controller drivers and make sure that I've got the lastest ones for my Mobo. Will try all of that today. Any other ideas? Am I on the right track here?
|
# ? Sep 8, 2022 10:11 |
|
The Asrock page for my board doesn't have any obvious Windows 11 drivers relating to USB controllers or the intel chipset. Windows update has the following optional updates: Not quite sure how to proceed here. I've unplugged the webcam for the time being...
|
# ? Sep 8, 2022 10:20 |
|
So, quick update. Unplugging the camera has made zero difference. I'm still getting BSODs. I have disabled Windows Hello as a sign in option in the settings menu but I'm still at a loss as to how to proceed from here. Anyone have any ideas?
|
# ? Sep 9, 2022 10:32 |
|
Canine Blues Arooo posted:Upload the minidumps and I'll take a look at them this evening.
|
# ? Sep 9, 2022 23:26 |
|
Many thanks for the kind offer, which I've been trying to take up. There was no C:\Windows\memory.dmp, or C:\memory.dmp (nor anything called Minidump.dump). Doing a complete search of my c: drive for '*.dmp' only brings up a couple of game related crashdumps in users\bouch\AppData\Local\CrashDumps for UE4Minidump.dmp, GalaxyClient.exe.25264.dmp, AcroLicApp.exe(1).12624.dmp, AcroLicApp.exe.12624.dmp and AcroLicApp.exe.12724.dmp. None of those file have a recent date and so I assume are unrelated. I've turned on 'Complete memory dump' in system failure control panel. It should dump to '%SystemRoot%\MEMORY.DMP'. I've just had two crashes. Nothing sitting in the root of my C: drive or in the windows folder. The event log reports at the same time stamp as each crash 'Dump file creation failed due to error during dump creation.' (event id 161)
|
# ? Sep 13, 2022 20:36 |
|
Bouchehog posted:
That error there is concerning. You won't get any BSOD dumps until that's solved, which is pretty vital to figuring out what happened. Is your C drive full? You should set that to a different drive so at least you can at least try to generate a minidump. You don't need to turn on full memory dump, a minidump is fine. Turning on full dump might make the problem worse (if it's a drive space issue, or if the machine crashes before the dump is able to complete). Also, this might be indicative of the failure that's actually causing your BSOD. If \Device\HarddiskVolume9 is failing, that will cause dump creation to fail as well. I know I'm not a typical poster round here but I have a knack for debugging BSODs (I work with the kernel), so if you get a minidump I can take a look as well.
|
# ? Sep 13, 2022 22:15 |
|
Geebs posted:That error there is concerning. You won't get any BSOD dumps until that's solved, which is pretty vital to figuring out what happened. Is your C drive full? You should set that to a different drive so at least you can at least try to generate a minidump. You don't need to turn on full memory dump, a minidump is fine. Turning on full dump might make the problem worse (if it's a drive space issue, or if the machine crashes before the dump is able to complete). The C drive has 271gb free on it (it's a 500gb M.2). I've tried setting the minidump to another drive to see if that works. I've switched it to a small memory dump on the basis that the fewer the bytes, the more likely it is to sort it out before dying. I have no idea what \Device\HarddiskVolume9 actually is...
|
# ? Sep 13, 2022 22:59 |
|
Bouchehog posted:The C drive has 271gb free on it (it's a 500gb M.2). I've tried setting the minidump to another drive to see if that works. I've switched it to a small memory dump on the basis that the fewer the bytes, the more likely it is to sort it out before dying. That's an internal name that the eventlog is using. You can use this NirSoft application to see which drive that maps to: http://www.nirsoft.net/utils/drive_letter_view.html. However, since your C: drive was set to save dumps last, it is probably your C: drive. Geebs fucked around with this message at 23:51 on Sep 13, 2022 |
# ? Sep 13, 2022 23:48 |
|
Predictably volume9 is my c: drive / M.2 drive. Running a few tools from the admin command prompt:
The two logs are here: CBS / DISM. I have no idea how to read these. I seem to recall that I ran SFC after the very first BSOD on this fresh install of Windows 11 and it was fine; I therefore wonder if the corruptions, whatever they were, were caused by the repeated BSODs rather than the other way around. There's the September cumulative update due to install (KN5017328) which I'm going to install now and see if the SFC corrections and new update make any difference.... [edit:] I am going with 'no'. I got a BSOD before the login screen after updating (I didn't babysit to check that it even fully finished). SFC /scannow reports no integrity violation. Still no minidumps anywhere (save for games/applications). Bouchehog fucked around with this message at 10:14 on Sep 14, 2022 |
# ? Sep 14, 2022 09:58 |
|
Bouchehog posted:Predictably volume9 is my c: drive / M.2 drive.
|
# ? Sep 14, 2022 16:46 |
|
I switched the minidump location so that in the 'start-up and recovery' control panel it said 'd:\Minidump' not '%SystemRoot%\Minidump'. Opening it up again now it's back to pointing at '%SystemRoot%\Minidump (i.e. C:\Windows). Not quite sure what is going on there. I've changed it again and it has survived a restart, still showing d:\Minidump so I will have to see what happens after the next crash. I suspect I won't have long to wait.
|
# ? Sep 14, 2022 19:40 |
|
Bouchehog posted:I switched the minidump location so that in the 'start-up and recovery' control panel it said 'd:\Minidump' not '%SystemRoot%\Minidump'. Opening it up again now it's back to pointing at '%SystemRoot%\Minidump (i.e. C:\Windows). Not quite sure what is going on there. I've changed it again and it has survived a restart, still showing d:\Minidump so I will have to see what happens after the next crash. I suspect I won't have long to wait. Not sure what happened there but a restart is definitely required to make that change stick. Hopefully that produces a minidump now.
|
# ? Sep 14, 2022 20:04 |
|
Geebs posted:So you did switch the minidump dir to a new location and it didn't generate one? Does the event log have a new entry for the failure to write out the dump? It's going to be vital to get that minidump created somehow. I am so lost as to what to do here. [Edit:] And a second one at 20:57:33. Same story. Bouchehog fucked around with this message at 20:59 on Sep 14, 2022 |
# ? Sep 14, 2022 20:51 |
|
I so, so hate trying to diagnose BSoDs without a Minidump. It tends to devolve into blind guesswork almost immediately if the problem isn't dead obvious. However, given the circumstances... WudfRd itself is Windows Driver Foundation, which is...not much to go on, but it is a smoking gun. Things you can do: Unplug and remove everything you don't need (in this case, especially your webcam). See if the problem improves. There is a chance that something in the storage I/O pipe is busted. It might be hardware on the board. It might be the storage device itself. It might be a cable (it's probably not a cable). As mentioned, all of this is big guesswork. Without a minidump, everything is super surface level and ranges from 'information is missing' to 'information is misleading'.
|
# ? Sep 14, 2022 22:27 |
|
Bouchehog posted:So, I've just had another crash. Timestamp 20:42:31. The same timestamp had the the usual /WudfRd Windows Hello driver issue and a second earlier a error notice: "Dump file creation failed due to error during dump creation." (ID 161) event data suggests that it again relates to "\Device\HarddiskVolume9" - the system / C: drive. Small dump director in the control panel is still listed as 'd:\Minidump' Unfortunately this is going to be the main difficulty here. Without a minidump the possibilities of issues are endless. However, I will say that *not* being able to generate a minidump is typically a sign of a pretty serious issue. Minidumps use a special path that really only relies on your drive, the CPU, memory, and power to function. If any of those stops working while this process is happening, you'll get a crash dump write failure. If the system resets during that period, you'll get a failure as well. Major hardware issues generally cause immediate resets as the system is too unstable to continue at all. The issue here is that, you've already seemed to have stressed tested your machine fairly well. It could also be an actual power issue, if you have it plugged into an extension cable or surge protector, try plugging that it directly into the wall instead. I might throw in an SSD stress test in there as well, SMART doesn't catch everything. Still, NVMe drives are not prone to failure. If the BSOD happens while you're using it, do you notice the BSOD remain on the screen for very long or does it immediately disappear and reboot? If it does remain on the screen, take a photo of it with a camera and post that. That'll give a bit more info to work with. If that fails and you're feeling *particularly* savvy and good at following instructions, you can set up a kernel debugger on a separate machine using kdnet to see the BSOD message over the network before it tries to write the crash dump out. You'd run windbg on one machine which will debug your main machine. There should be many tutorials on how to set up kdnet, including on MSDN. The only command you'd need to run is 'g' and it'll sit there waiting for the system to crash. Then you should see the initial BSOD message there. Again, not for the faint of heart. You can also use the more aggressive process of elimination route to rule out hardware failure. You can try: - Changing your pagefile drive to D:\. That could help mitigate some crash dump generation issues. - Unplugging all external devices. That could narrow down a particular faulty device causing a power issue. - Installing windows on your other drive and booting from there. That will narrow down the faulty OS drive case, although I'm skeptical that's the problem. - Booting a linux livecd and seeing if that crashes with some stress. If it does, you can generally point to hardware issue as that shows it's not related to the OS. Geebs fucked around with this message at 02:57 on Sep 15, 2022 |
# ? Sep 15, 2022 02:45 |
|
Geebs posted:Various helpful stuff Canine Blues Arooo posted:Other helpful stuff So, significant updates. I managed to get a video of the BSOD screen to capture the message: I now also have the following dumpfiles (I was set to small memory dump at the time, so I hope this is enough): 092322-8265-01.dmp (from D:\Minidump\) DumpStack.log (from C:\) I am not entirely sure what changed - I've not been around that much over the last week but of the seven crashes in that time (five before, one after) this is the only minidump created. The EventViewer logs show no obvious difference in the event to the usual, save that the usual 'dump file fail becuase failed' message was a success message. There should have been fireworks. Hopefully this will give someone more au fait with diagnosing these things a route in.
|
# ? Sep 26, 2022 10:08 |
|
Just by way of update, I've just done some digging into 'dpc watchdog violation' errors and the main suggested causes are SSD or graphics card firmware issues. Samsung Magician (v.7.1.1) reports that my firmware is up-to-date on the three relevant SSDs (C: 960 M.2 500gb, F: 840 SSD 120gb and E: 850 SSD 250gb). The app suggests that the drive health for all three drives is food. Crucial Storage Executive (v.8.03) reports that the one relevant drive (D: CT2000MX500 1.8gb) is in good health with firmware up-to-date. I haven't bothered to check the firmware on the two legacy platter HDs (both Seagate drives, a ST2000D 2gb drive and a ST1000D 1gb drive), if indeed old platter HDs have firmware that gets updated, but both report good health via Speccy. My Gigabyte WINDFORCE RTX 2080 is running nVidia v516.94 and the bios is reported in the adapter information as Version90.4.d.0.52. The webpage for the GPU shows two BIOS updates - one for Samsung memory, one for Micron memory. I have no idea which I have or how to check so I've not tried either file. I tried downloading 'AORUS ENGINE' to see if that would point me to the correct firmware but it's like opening a 1980's text based adventure game, reported LED driver errors and I immediately uninstalled it... I have also gone onto the ASRock z370 webpage and downloaded and installed what it called Windows 10 "INF drivers", which I take it to mean the controller and chipset drivers. There were none for Windows 11. I will see if that makes any difference pending any update from the dump file from someone more knowledgeable. Bouchehog fucked around with this message at 13:26 on Sep 27, 2022 |
# ? Sep 27, 2022 13:12 |
|
I believe DPC watchdog violation means some device is spamming DPCs to the point the system dies. I'd actually remove ALL gigabyte branded software from the computer as a starting point. See if that does anything.
|
# ? Sep 27, 2022 15:24 |
|
Do you have a PSU tester handy? Would be good to rule that out first. DPC watchdog BSODs are typically IRQL level of BSODs which are almost always hardware related (and sometimes driver related). After ruling out the PSU, next culprits tend to be memory (both RAM and GPU), CPU (overheating?).
|
# ? Sep 27, 2022 20:17 |
|
Multiple dmp files would be ideal if you get more in the meantime, but from this one, we have moderately high confidence about a few things: Bugcheck with Parameters: code:
code:
code:
There is a high likelihood that you can install the latest driver from nVidia and see the problem disappear. Failing that, roll back to an older version to see if the problem improves. While this suggests your GPU driver as the faulting module, that does not mean that it is the root cause. It could be hardware (which would most likely be the GPU here). It could be something doing goofy poo poo with the driver. It could be the phase of the moon, but it's really not worth thinking about that until we take the easy steps first. e: All this is for *this exact crash*. More data points would help determine if your GPU driver is at fault here for real, or if it's just the fall guy for a larger problem. Canine Blues Arooo fucked around with this message at 22:53 on Sep 27, 2022 |
# ? Sep 27, 2022 21:11 |
|
Canine Blues Arooo posted:...All roads point pretty clearly to your GPU's driver as the fault. OK, so I installed the new nVidia drivers and in the last eight days have not seen a BSOD. I do however get a new regular fatal crash - every so often Windows just dies. I can move the mouse and, for some time, scroll windows but cannot click on anything. If a crash were to happen whilst typing this, for instance, the blinking cursor would stop blinking, I would not be able to change to another application or press the Windows start button or close any active window but I would be able to move the mouse (including between my three monitors) and use the scroll wheel to move the active window content up and down. The keyboard doesn't work. It all renders without any obvious artifact. After a few minutes of this I the scrolling would no longer work but I would still be able to use my mouse (to do nothing other than move it around the three screens). That continues indefinately and I have to hold the power button to restart my computer. If task manager is open, this just freezes the information at the point of the crash. It is an intermittent fault, rather like the BSOD version. Sometimes I can load up and use the computer for hours without issue - either for games with discord and chrome open or word processing/outlook/excel/chrome for work purposes. Googling the fault again suggests GPU drivers so I downloaded DDU, removed the driver in safe mode, installed the latest nVidia driver in safe mode and tried again. Same deal. I am guessing that this is a hardware issue either with my GPU or with something on the motherboard which controls it. As an aside, my ethernet connection now has a habit of disconnecting then reconnecting. This may be unrelated (something to do with my having installed the wrong drivers when I did a fresh install of Windows?) but I wonder if it all points to some hardware issue with a controller of some sorts on my motherboard. Is there any way to narrow this down? Or am I going to have to start swapping out hardware to do so (in which case I think I might just go for a new build) [Edit:] as luck would have it, I got exactly that crash in between posting the message and Crome redirecting me back to the post. I could not scroll (the whole page was displayed) and I noticed that the Chrome 'page loading' swirl in the tab at the top of the page was swirling for a 20s or so after it was obvious that the PC had crashed. Bouchehog fucked around with this message at 09:42 on Oct 4, 2022 |
# ? Oct 4, 2022 09:23 |
|
Bumpity bump. The issue is still there. It may be no more than coincidence but it seems worse when I open Outlook - many of the freezes have associated with that, but far from all. It also seems to be that case that it either hits me in the first five/ten minutes of use otherwise I've got very good odds of it being totally fine. I'm fast arriving at the point where I just build a new PC (which was on the cards soon anyway), pop in the old GPU and see if it was that. I can then move the HDs over and see if any of them cause issues.
|
# ? Oct 9, 2022 16:55 |
|
Have you tested your PSU yet?
|
# ? Oct 9, 2022 22:43 |
|
Bouchehog posted:The Asrock page for my board doesn't have any obvious Windows 11 drivers relating to USB controllers or the intel chipset. Did you install all these updates?
|
# ? Oct 11, 2022 22:20 |
|
ihafarm posted:Did you install all these updates? No - I wasn't sure which ones to update. Happy to do so... Stanley Pain posted:Have you tested your PSU yet? Or did you mean some other test?
|
# ? Oct 13, 2022 14:13 |
|
Bouchehog posted:No - I wasn't sure which ones to update. Happy to do so... Any $30-50 tester should work. They're pretty simple devices. I have the Thermaltake Dr. Power II and it's worked well.
|
# ? Oct 13, 2022 14:52 |
Also I'd verify if the PSU is up to the wattage demands of your cpu+gpu, I had a lot of intermittent crashes on an old build because I hadn't paid attention when I upgraded my video card and was way overdrawing on a no-name PSU that came bundled with the case. Thankfully it didn't destroy my system when it finally released the magic smoke, but realizing I was trying to draw 600 watts on a 400 watt psu was "well, no loving wonder it was so much trouble."
|
|
# ? Oct 14, 2022 07:36 |
|
Multimeters and cheap PSU testers are worthless. All they are looking at is basic voltages, which will test ok unless the PSU is seriously defective. They aren't testing ripple or short-duration voltage excursions. You need a decent oscilloscope for that. If you are concerned that a PSU is bad, the easiest way to find out is to just replace it. Problem goes away? PSU was bad. Still have crashes? Send the new one back (or keep it as a spare). An inexpensive PSU is a handy thing to have.
|
# ? Oct 18, 2022 15:03 |
|
Klyith posted:Multimeters and cheap PSU testers are worthless. All they are looking at is basic voltages, which will test ok unless the PSU is seriously defective. They aren't testing ripple or short-duration voltage excursions. You need a decent oscilloscope for that. Counter-Point, a lot of failing PSUs will fail with the simple testers. In context, OPs problems aren't causing by ripple or anything like that. If the PSU is to blame it would easily fail an auto PSU test. Telling someone they need an oscilloscope is LOL.
|
# ? Oct 18, 2022 18:01 |
|
Stanley Pain posted:Counter-Point, a lot of failing PSUs will fail with the simple testers. In context, OPs problems aren't causing by ripple or anything like that. If the PSU is to blame it would easily fail an auto PSU test. I have poked at bad PSUs with multimeters, and frequently seen in-spec voltage. This was in situations where the PSU was 100% the problem, because a replacement solved the issues. That thermaltake thing in particular is totally worthless because it tests with zero load. And it isn't even particularly accurate. Stanley Pain posted:Telling someone they need an oscilloscope is LOL. I don't think they need an oscilloscope. I think someone with a potentially bad PSU needs a second PSU to swap out for.
|
# ? Oct 18, 2022 19:02 |
|
|
# ? May 3, 2024 23:08 |
|
Klyith posted:I have poked at bad PSUs with multimeters, and frequently seen in-spec voltage. This was in situations where the PSU was 100% the problem, because a replacement solved the issues. I've had a completely different experience with PSU testers. It's something quick + cheap to rule something out. If other troubleshooting doesn't resolve whatever the issue is come back to the PSU. I mean who doesn't have 3 or 4 spare PSUs just lurking about anyway?
|
# ? Oct 19, 2022 22:28 |