Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Artadius
Nov 5, 2012
Problem description: Windows 11 desktop PC has been suffering from completely random hard freezes for quite a few months and I'm at my wits end trying to self troubleshoot and fix. It was a pre-built bought from Dell a couple of years ago. The only thing I've added since purchasing is a spare SSD for additional game install spaces on a free SATA port.

The freezes are hard (no blue screen of death) and if there is any sound playing at the time, it continues to play at that specific sound but locked up as well. The only remedy is to hold the tower power button down to force the power off, wait for power to cycle off, then press the power button again to restart. Oddly, this sometimes give me the blue screen asking if I want to try and repair or just restart sometimes. Other times it just restarts normally. I haven't seen a pattern to this behavior.

When I say random, I mean random. I can leave the computer just sitting there... go do something, an hour later come back and its frozen. I can be playing a game and it'll freeze while playing. I can go a few days at a time and experience no freezes whatsoever doing the same things I've done before where it did freeze. I've closed down all programs I'm not actively using and it'll still happen. I've closed everything while playing a game and it happens. I'd say on average, I get one freeze a day. Its not enough to make me want to replace it... but its enough that its pretty freaking annoying.


Attempted fixes: I've done extensive searching online for similar issues and I've done the following:

*As mentioned before, I've closed as many tasks/programs as possible and still experienced the freezes both using the computer and it sitting idle / not being used.
*Checking system logs hasn't provided any insight to any issues. It is likely freezing before it could write any issues to the logs I guess?
*I regularly install required and optional Windows updates.
*I update my Geforce drivers regularly.
*I update my hardware and bios drivers fairly regularly from Dell's application.
*I've run a memtest and chkdsk and have found no issues.
*I've unplugged and disconnected from my second monitor and just went one screen for a while.


I'm near to the point where I'm going to a OS reinstall and if that doesn't work, a complete wipe and OS install... but I would REALLY like to avoid that if at all possible just because of the time commitment to get everything set back up / downloaded / etc.

Is there some kind of more robust logging that I can access on the system maybe I don't know about or something I can download for better logging than the default Windows system logs?



Recent changes: No. I don't use the computer for anything but installing and playing games on Steam / Xbox and your usual home PC stuff (email, browsing, etc)



--



Operating system: Windows 11 Home



System specs: Dell XPS Desktop 8940
Windows 11 Home 10.0.22631 (23H2)
Intel i7-11700 @ 2.50 GHz
RAM: 32 GB
Virtual: 21.0 GB
Page File 4.83 GB
Nvidia 3060ti
C: 1 TB NVME drive SN730
D: 1 TB HDD
F: 1 TB SSD 870 EVO
Display 1 is an HP X27q 2560x1440 @ 164Hz
Display 2 is an HP E243 1920x1080 @ 60Hz



Location: USA - South Texas



I have Googled and read the FAQ: Yes

Adbot
ADBOT LOVES YOU

Zogo
Jul 29, 2003

I'd run https://www.hdsentinel.com/hard_disk_sentinel_trial.php to check the health of all the drives.

If the drives all look okay then I'd use onboard GPU temporarily and see if the freezing continued or not.

Artadius
Nov 5, 2012

Zogo posted:

I'd run https://www.hdsentinel.com/hard_disk_sentinel_trial.php to check the health of all the drives.

If the drives all look okay then I'd use onboard GPU temporarily and see if the freezing continued or not.

I will try that this evening. Thank you.

Artadius
Nov 5, 2012
Ok, update.

HD Sentinel is reporting bad sectors on my Samsung EVO SSD.

Message:

There are 231 bad sectors on the disk surface. The contents of these sectors were moved to the spare area.
17677 errors occurred during data transfer.
231 errors reported during write to the device.
At this point, warranty replacement of the disk is not yet possible, only if the health drops further.
In case of sudden system crash, reboot, blue-screen-of-death, inaccessible file(s)/folder(s), it is recommended to verify data and power cables, connections - and if possible try different cables to prevent further problems.
More information: https://www.hdsentinel.com/hard_disk_case_communication_error.php
It is recommended to examine the log of the disk regularly. All new problems found will be logged there.
The TRIM feature of the SSD is supported and enabled for optimal performance.

It is recommended to continuously monitor the hard disk status.

Estimated remaining lifetime: 568 days

Looks like I'm replacing an SSD. Not a big deal as its a auxiliary drive for me.

I went ahead and downloaded the Samsung Magician utility since its a Samsung drive and got a Failing LBA on SSD warning and the SMART tests are not completing. I went ahead and updated the drive's firmware and will see if that helps any (not expecting it to).

Could this be causing this type of freeze that I've reported? It is not my OS HD, it is basically a tertiary drive that I use for installing games so I don't overload the OS drive.

Zogo
Jul 29, 2003

Artadius posted:

Could this be causing this type of freeze that I've reported? It is not my OS HD, it is basically a tertiary drive that I use for installing games so I don't overload the OS drive.

Yes, it's possibly the culprit.

I'd just disconnect the SSD from the machine temporarily and see if the freezes continue or stop.

Artadius
Nov 5, 2012

Zogo posted:

Yes, it's possibly the culprit.

I'd just disconnect the SSD from the machine temporarily and see if the freezes continue or stop.

Update.

Ran machine all day yesterday after disconnecting the SATA cable (I did not disconnect the power to the drive though, not sure if that mattered). Also let it stay on overnight (I don't use any power saving mode other than monitors turning off after 15 minutes of inactivity). No freezes yet!

I'm hopeful, but not willing to say its fixed yet... going to do the same for a few days in case its a coincidence. But fingers crossed. Thank you for your help so far, Zogo.

Artadius
Nov 5, 2012
New update.

I had another freeze today while the aforementioned SSD was disconnected from SATA cable.

I'm now making sure I have the latest drivers for my onboard Intel graphics chipset and will switch to that output. Is it enough to disable the Nvidia card via Device Manager or should I remove the card completely from the motherboard as part of the next step of troubleshooting?

Zogo
Jul 29, 2003

Artadius posted:

(I did not disconnect the power to the drive though, not sure if that mattered).

Probably not but I'd do it anyway.

Artadius posted:

Is it enough to disable the Nvidia card via Device Manager or should I remove the card completely from the motherboard as part of the next step of troubleshooting?

It's better to fully remove it to be 100% sure.

Artadius
Nov 5, 2012
Ok, I ran a couple of days without any issues with the onboard Intel graphics. Re-enabled the nvidia card and ran the Display Driver Uninstaller utility you have recommended elsewhere to do a clean install... got a freeze within an hour or so after that whole process so I'm pretty sure its the 3060ti.

I am no longer under warranty with dell so I'm probably hosed there. The computer is just over 2 years old... is there an Nvidia warranty that I could pursue?

In the meantime, I've completely removed the 3060ti from the case and am running completely on the onboard graphics. I'll leave it running 24/7 just to be doubly sure.

I have another computer that is a few years older I can probably drop the 3060ti into to see if I can replicate it there. Assuming that happens, I guess its a hardware issue for sure.

Anything else you would recommend at this point?

Zogo
Jul 29, 2003

Artadius posted:

I am no longer under warranty with dell so I'm probably hosed there. The computer is just over 2 years old... is there an Nvidia warranty that I could pursue?

It looks like that one may have a three year warranty. You'd have to contact them to be sure though.

Artadius posted:

Anything else you would recommend at this point?

Nothing at this point.

Artadius
Nov 5, 2012

Zogo posted:

It looks like that one may have a three year warranty. You'd have to contact them to be sure though.

Nothing at this point.

Spare computer is a no go. It only has a 6 pin male power connector and the 3060ti needs an 8 pin.

Anything else you can recommend I try at this point? Any gpu settings either via Nvidia's software or bios? Best to try a full system wipe and reinstall of Windows? Just pursue the warranty claim with Nvidia?

Thanks.

Zogo
Jul 29, 2003

Artadius posted:

Anything else you can recommend I try at this point? Any gpu settings either via Nvidia's software or bios? Best to try a full system wipe and reinstall of Windows? Just pursue the warranty claim with Nvidia?

Thanks.

I doubt a reinstall of Windows would fix the issue.

I'd just try the warranty route but you could try undervolting/underclocking the GPU and see if it becomes more stable.

Artadius
Nov 5, 2012
Got off chat with Nivida, the card is actually made by Dell, so they won't warranty the card either.

:negative:

What tool is best for undervolting / underclocking?

Zogo
Jul 29, 2003

Artadius posted:

What tool is best for undervolting / underclocking?

I'd just try following this guide: https://www.wepc.com/how-to/underclock-gpu/

down1nit
Jan 10, 2004

outlive your enemies
This screams ram. Did you reseat anything, like an old school Nintendo? Unplug and reinsert?

Memtest?

Artadius
Nov 5, 2012

down1nit posted:

This screams ram. Did you reseat anything, like an old school Nintendo? Unplug and reinsert?

Memtest?

No, haven't reseated anything except for the 3060ti when I removed it to confirm I don't get the problem with the video card being used.

No issues with Memtest reported.

Going back to the previous recommendation, I downloaded Afterburner and as soon as I started messing with underclocking / setting a custom fan profile, I started immediately getting the freezes... like just a minute into setting the new values so it has to be the 3060 right?

I've ordered a new card... pretty much the only thing my power supply can take in the 4 series is the 4060 ti. I suspect this will eliminate the freezes. In the off chance it does not though, I'll simply return the new card and I'll just be at a loss at that point.

Zogo
Jul 29, 2003

Artadius posted:

Going back to the previous recommendation, I downloaded Afterburner and as soon as I started messing with underclocking / setting a custom fan profile, I started immediately getting the freezes... like just a minute into setting the new values so it has to be the 3060 right?

It's a very good bet.

Artadius posted:

I've ordered a new card... pretty much the only thing my power supply can take in the 4 series is the 4060 ti. I suspect this will eliminate the freezes. In the off chance it does not though, I'll simply return the new card and I'll just be at a loss at that point.

If the problems continue with another card then I'd suspect the PSU or the motherboard next. But it could theoretically be any piece of hardware.

Artadius
Nov 5, 2012

Zogo posted:

It's a very good bet.

If the problems continue with another card then I'd suspect the PSU or the motherboard next. But it could theoretically be any piece of hardware.

New card arrives later today. Is there a preferred method to go from one nvidia card to another?

My assumption:
1. Run the Wagnardsoft DDU to completely remove nvidia drivers/software
2. Shutdown computer and remove 3060 ti
3. Install 4060 ti
4. Boot and reinstall nvidia drivers/software

I don't need any other intermediate steps right like rebooting first back into windows between step 1 and 2 or needing to switch over to onboard intel graphics or anything right?

Zogo
Jul 29, 2003

Artadius posted:

New card arrives later today. Is there a preferred method to go from one nvidia card to another?

My assumption:
1. Run the Wagnardsoft DDU to completely remove nvidia drivers/software
2. Shutdown computer and remove 3060 ti
3. Install 4060 ti
4. Boot and reinstall nvidia drivers/software

I don't need any other intermediate steps right like rebooting first back into windows between step 1 and 2 or needing to switch over to onboard intel graphics or anything right?

Yeah, that should be fine.


If it was me I'd just install the new card and see if it worked. If any issues arose then I'd just run DDU with the 4060 installed and reinstall the GPU drivers.

Artadius
Nov 5, 2012
New card installed and it happened again

:negative:

Very strange because I could never replicate when I completely removed the video card from the system previously.

down1nit posted:

This screams ram. Did you reseat anything, like an old school Nintendo? Unplug and reinsert?

Memtest?

I'll do another memtest tonight. Any other ways to check the RAM? I'll reseat the modules also.

Over the weekend I'll run on just one module at a time and see if I can narrow down to a specific chip. Not really sure what else I can do at this point.

Extremely bummed right now too because now I've gotten a taste of the new video card and I'm honestly surprised at how much of an upgrade in performance it was vs the 3060 ti. I had read online to expect about 20% or so but it seems much more than that... and the biggest thing is that it is SO much quieter than the previous card.

Zogo
Jul 29, 2003

Artadius posted:

New card installed and it happened again

:negative:

Very strange because I could never replicate when I completely removed the video card from the system previously.

A degraded PSU could have trouble handling the card. Also, the motherboard could have an issue with the GPU slot etc.

Artadius posted:

I'll do another memtest tonight. Any other ways to check the RAM? I'll reseat the modules also.

Over the weekend I'll run on just one module at a time and see if I can narrow down to a specific chip. Not really sure what else I can do at this point.

Let Memtest run overnight at some point.

Yes, using one stick of RAM could narrow things down more. e.g. if you get freezes with one stick or slot but not another one.

Artadius
Nov 5, 2012

Zogo posted:

A degraded PSU could have trouble handling the card. Also, the motherboard could have an issue with the GPU slot etc.

Let Memtest run overnight at some point.

Yes, using one stick of RAM could narrow things down more. e.g. if you get freezes with one stick or slot but not another one.


I ran Memtest86 overnight and had 0 errors / 100% pass.

This evening I'll remove one of the memory modules and going from there.

down1nit
Jan 10, 2004

outlive your enemies
It's gonna be either ram or cpu probably.

Or board I guess.... I've seen over-clamped cooling systems do poo poo like this, literally bends the board so much it warps micrometer sized trace lengths. I've seen bad ram controller in cpus, RAM like I already said it'd be....

You started with the easiest thing. Now get to work. Ram first.

Pull all ram. Put one stick in and use the pc like normal. Report back! Underclock cpu, undervolt, disable c-states... Try a new power supply... a repair shop? .. Call me?

Owner/Lead Tech of Sagebrush Repair in Pinole, CA.

Artadius
Nov 5, 2012

down1nit posted:

It's gonna be either ram or cpu probably.

Or board I guess.... I've seen over-clamped cooling systems do poo poo like this, literally bends the board so much it warps micrometer sized trace lengths. I've seen bad ram controller in cpus, RAM like I already said it'd be....

You started with the easiest thing. Now get to work. Ram first.

Pull all ram. Put one stick in and use the pc like normal. Report back! Underclock cpu, undervolt, disable c-states... Try a new power supply... a repair shop? .. Call me?

Thanks for following up and continuing to check in. Same for Zogo. Very much appreciated.

I've removed one of the 16 GB modules and am running now going on three days without a freeze on the one. I've even left the system running overnight and have queued up multiple hours long youtube videos (full screen videos have tended to be the most likely occurrence of the freezes).

I think later today I'll switch them and run on the one I have out currently and see if any issues occur.

FYI, the ram is Adata brand. I've never heard of that brand... I guess that's what Dell is putting in their systems these days?

Artadius
Nov 5, 2012
Update for tonight.

Sat down to swap out the memory module like I said I was going to and noticed the system was frozen again. Though, like I said, this was the longest amount of consecutive time I remember it not freezing.

I went ahead and swapped out to the other 16 GB module. We'll see how it goes the next few days.

Artadius
Nov 5, 2012
4 days and counting on the second memory module only. Cautiously optimistic. Going to continue to let it stay on 24/7 hopefully into the weekend. Then I'll see what next steps are.

Speaking of which, what would be next steps? I would really like to go back to 32 GB of RAM... but if the issue is that one module... is it worth putting it back in and trying to underclock/undervolt etc... ?

If I decide to replace that stick, should I buy the exact same brand / model? Ok to get an equivalent of a different and more well regarded brand?

Zogo
Jul 29, 2003

Artadius posted:

Speaking of which, what would be next steps? I would really like to go back to 32 GB of RAM... but if the issue is that one module... is it worth putting it back in and trying to underclock/undervolt etc... ?

If the one stick of RAM is the issue then I wouldn't bother. It depends on your patience. If you're okay with doing weeks more of troubleshooting then you could try it.

Artadius posted:

If I decide to replace that stick, should I buy the exact same brand / model? Ok to get an equivalent of a different and more well regarded brand?

It's recommended to use pairs of RAM that are sold together. Sometimes mixing and matching works and sometimes not.

down1nit
Jan 10, 2004

outlive your enemies
oh look i was right. why did you not do this thing first.-

Adbot
ADBOT LOVES YOU

Artadius
Nov 5, 2012

down1nit posted:

oh look i was right. why did you not do this thing first.-

Welllll......

I did finally have a freeze over the weekend. And then yesterday evening I had another freeze a mere half hour after booting up (was just sitting there downloading updates on Steam).

So I don't think its either of the individual RAM modules (unless its both).

Guess I'll start looking into underclocking/undervolting... but at this point I'm thinking its just going to be something I have to live with. I don't think I have the time/patience to start going down the rabbit hole of buying and replacing the mb / psu / cpu to troubleshoot this. At that point, I might as well have just built a new computer.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply