|
Problem description: Windows is crashing with, but not limited to the following blue screen errors: CRITICAL_PROCESS_DIED SYSTEM_THREAD_EXCEPTION_NOT_HANDLED acpi.sys KERNEL_SECURITY_CHECK_FAILURE MEMORY_MANAGEMENT KERNEL_EXCEPTION_NOT_HANDLED KERNEL_MODE_HEAP_CORRUPTION Also some about filesystem, network and bluetooth drivers that I didn't write down. Crashes happen during boot, after login, when idle, during the Windows recovery process and once or twice when re-installing from a USB stick. Attempted fixes: I've reinstalled Windows on a brand new hard drive (upgrading to nvme m.2 from an old sata ssd). That seemed to fix things for a couple of days but I had to take everything apart short of the CPU to get at the m.2 slot so the RAM and most of the power cables got reseated, and now I'm back where I was before with bluescreen crashes all the time. I'm able to run the Intel CPU stress test, and all the OCCT tests for an hour without triggering a crash. memtest86 ran for 10 hours without errors with both sticks of RAM, then got stuck at 4GB on the first test with only one stick, then passed with the other, then passed with the first one again, then passed with each stick in both slots, then with both sticks in again. Running it again a couple of days later I got ~500 errors in test 3 with both sticks but that wasn't repeatable. I've blasted everything with compressed air (with the power off). It feels to me like my RAM is failing, but I can't get memtest to reliably show me that. Would reseating the memory temporarily hide bad RAM sticks? I've got some DDR 3600 I can swap in for testing but it's much smaller capacity and slightly I'm worried something in my PC is actively killing components. Plus going on the last couple of days of troubleshooting it might be a while before things go wrong again even if the RAM is good. Recent changes: My Radeon VII died slowly about 6 weeks ago so I replaced it with a 1660 super I scavenged from another PC. I had some crashes then that were solved by removing all the old display drivers and things had been stable for about a month. I had to physically move the PC to another room for a couple of hours and when I brought it back the current bluescreens started. The 1660 has now gone back to the other PC and I'm running on the Intel iGPU. Operating system: Windows 10 Pro 64 bit System specs: i7 8700k ASRock Z370 Gaming-ITX/ac 2x16GB G.Skill DDR4 3200 Sapphire Radeon VII (confirmed deceased in another machine) Gigabyte 1660 super Mini ITX OC (currently back in it's donor machine) Silverstone SST-SX600-G 600W PSU Corsair Force MP510 960GB m.2 SSD Samsung 850 Evo 500GB (currently unplugged) Location: UK I have Googled and read the FAQ: Yes
|
# ? Aug 31, 2021 21:56 |
|
|
# ? May 19, 2024 12:38 |
|
CPU is my first guess. Disable features on it in your bios. Speedstep, c states, boosting related.... I've seen cpus turn stable when you don't gently caress with the voltages or frequencies C States has been the best performer. Stops the CPU from dropping voltages real fast like.
|
# ? Sep 1, 2021 07:53 |
|
You're not boosting up and dropping down on a stress test. You're max everything for forever. Also that cpu has the memory controller in it.
|
# ? Sep 1, 2021 07:55 |
|
Holy poo poo asrock, you can disable individual components within the processor, that's terrifying and awesome. Good job lads.
|
# ? Sep 1, 2021 08:06 |
|
down1nit posted:CPU is my first guess. Disable features on it in your bios. Speedstep, c states, boosting related.... Thanks! Disabling C states seems to have made things more stable. I was previously able to force a bluescreen within 30 seconds of Windows startup but I'm now 30 minutes in without a crash. Is this a long term solution or should I be shopping for a new CPU?
|
# ? Sep 1, 2021 09:24 |
|
I'd RMA it but you can probably find a stable setting with all the poo poo asrock packed into that board. I now realize rma period is probably passed Edit: it's just a power saving thing really. So keeping it disabled is fine as long as your temperatures are good down1nit fucked around with this message at 16:53 on Sep 1, 2021 |
# ? Sep 1, 2021 16:49 |
|
I'm pretty sure the warranty expired last week. I had to take the heatsink off to find the serial number on the CPU and after re-mounting it (plus cleaning off and reapplying thermal paste) everything is now much worse. I can't even boot the Windows recovery USB and sometimes I don't get as far as the BIOS post screen. I've put in a support request with Intel anyway in case they're ok with the fact it actually failed last week (in warranty) even if it's taken me until now to diagnose, but I think I'm probably hosed. edit: Yeep fucked around with this message at 10:52 on Sep 2, 2021 |
# ? Sep 2, 2021 10:44 |
|
Yeep posted:I've put in a support request with Intel anyway in case they're ok with the fact it actually failed last week (in warranty) even if it's taken me until now to diagnose, but I think I'm probably hosed. No real update on this other than to say gently caress you Intel for making me buy a 15x macro lens for my phone before you'll consider an RMA. https://www.intel.com/content/www/us/en/support/articles/000021613/processors/intel-core-processors.html
|
# ? Sep 4, 2021 13:45 |
|
Oh god. Yeah they ask for that. It's just dots on a green background. How the gently caress is anyone going to read that. It's obviously a deterrent for returns right? Like, it has to be. They are fully capable of printing that on the heat spreader as well as the PCB.
|
# ? Sep 5, 2021 08:45 |
|
|
# ? May 19, 2024 12:38 |
|
Ohh wait it's cheap and easy to de-lid a cpu. Harder to move the silicon to another PCB...? Still, better letters please, Intel?
|
# ? Sep 5, 2021 08:50 |