Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Cyrano4747
Sep 25, 2006

Yes, I know I'm old, get off my fucking lawn so I can yell at these clouds.

Problem description: My computer has been getting steadily less stable over the last few months. At first it was the occasional BSOD in games etc. that I could write off as bad ports or whatever, but its' gotten steadily worse and got bad enough this weekend that I decided to try and tackle it. I suspected it was a RAM issue. I've got 32GB (two paired sets of 8gb each) so my tack was to try and test the sticks, find the bad one, and then downshift to 16gb by tossing the pair with the bad stick.

Attempted fixes: To start with I stress tested my GPU with furmark to make sure that wasn't the issue. No problems there. Also looked into my drive health and that's fine, shouldn't be any problems there. Then I reset my BIOS to all defaults (no XMP), nothing changed. From there I ran some tests with window's memory tester and got some errors, so I broke out Memtest86+. This is where it gets interesting. I'll spare the spreadsheet with all the permutations of stick/slot/combos of sticks that I tested, but the tl;dr is that any single stick on any single slot tested fine. When I put either pair on A2/B2 the test would run fine, but it would spit errors. Always in test #2, the address test. When I put either pair on A1/B1 the test would start running then crash, rebooting the machine. Putting any combination of sticks on non-paired slots (e.g. A1/A2) would make the machine not boot at all - just a boot loop without posting.

Running 16gb on a matched pair installed correctly on A2/B2 boots and is functional, and stability is better than it was with the full 32gb, but still not great. Running 8gb in a single slot is fine (I'm typing this right now doing that) but, well, that's 8gb of RAM. gently caress that noise.

Right now I'm wondering if it could be the memory controller just based on some googling of similar issues, but I've got no real idea, hence me asking you all.

Recent changes:

None. The most recent change would be upgrading the video card to a used 2080, but that was over a year ago now. Card tests fine with everything I can throw at it, so I don't think it's that. I've got an old 1070 kicking around if that's worth testing for some reason. No on-board video so I can't pull the card entirely or I would have already.

--

Operating system: Windows 11 Home 64 bit, 10.0.22000 Build 22000

System specs: Home brewed machine.
AMD Ryzen 7 2700X Eight-Core Processor 3.70 GHz
ASRock B450M Pro4 motherboard
RTX2080
32GB DDR4-3200 in two 8GBx2 matched pairs. GSkill Ripjaws.
OS drive - Samsung EVO 850 500GB
Other drives: Patriot Burst 1TB SSD, mystery meat 500gb laptop spinning HD I salvaged years ago, use for bulk storage for poo poo I don't care about, and which refuses to die.
edit: forgot the PSU: Corsair RM650

Location: USA

I have Googled and read the FAQ: Yep, that's what has me leaning towards this maybe being a memory controller issue. My understanding is that that is on the processor with AMD systems, is this correct?

I'm going to add one little last thing here, namely, what resolution I want. Obviously something's hosed and I'm pretty sure it's hardware. The way I see it there are three options:

1) new ram - I'm skeptical about this because each stick tests OK. But I'm extremely happy to be wrong about this because RAM is cheap.
2) new processor - this is the way I'm leaning, if it's a hosed memory controller. That's on the CPU on AMD chips, right?
3) new motherboard - I was kind of thinking it could be this earlier, but the fact that a single stick will test good on any individual slot makes me think it's not just a bad ram slot or something. Again, the memory controller is on the chip, not the mobo, right?

So, SH/SC, what do you all think? Are there any programs that I can use to specifically try to diagnose a hosed memory controller? Or any other approach I should be taking to narrow down what, exactly, the problem is? I'm very much trying to keep this to a single part replacement for financial reasons.

Thanks in advance.

Cyrano4747 fucked around with this message at 02:04 on Jul 5, 2023

Adbot
ADBOT LOVES YOU

Zogo
Jul 29, 2003

Cyrano4747 posted:

I've got an old 1070 kicking around if that's worth testing for some reason. No on-board video so I can't pull the card entirely or I would have already.

Probably not the issue but it wouldn't hurt to try before getting new hardware.

Other things I'd try before new hardware:

-Make sure you're on the latest motherboard BIOS.

-Disconnect all drives except the OS drive (not all drive issues will show up clearly with diagnostic programs).

Cyrano4747 posted:

So, SH/SC, what do you all think?

If the RAM you're using is fully compatible with the motherboard then it's probably okay.

The CPU does involve memory so it would be a good bet. But I wouldn't be surprised if it was a motherboard issue. There's no great way to test besides switching parts out.

Cyrano4747
Sep 25, 2006

Yes, I know I'm old, get off my fucking lawn so I can yell at these clouds.

So a bit of an update.

As best as I can figure out there were two issues going on.

The first is that, after doing a bunch more reading online, it seems like the second gen AMD chips were a bit dodgy running at 3200 in the first place. Lots and lots of comments online - especially surrounding the 2700x I have - about people needing to down-shift to 3133 etc to get the system stable. Also lots of reports of people with 4 sticks only being able to get into the 2133 ballpark (or whatever the major step around there is, not looking at my notes while I type this up). I don't know enough about CPUs and memory controllers to say why, but the thrust I'm getting is that they were slightly dodgy in the first place in that generation and 3200 was a bit aspirational. The tl;dr is that maybe you got a good chip that can hit that, maybe you got unlucky and need to down-shift.

The second is that the older set of RAM seems to be slightly dodgy in and of itself. I was able to get clean memtest reports from it, but only at lower memory clocks. I don't know if this is an actual defect in the RAM or if it's me playing RNG lottery some more with how well a specific batch can handle higher speeds - again, I'm also seeing people online talking about some RAM batches liking/not liking higher speeds, and if you really want to make sure you get your full performance to get some expensive RAM. This is just some mid-ranged G. Skill stuff so I assume that might be coming into play as well.

So with that in mind I started loving around with the newer set of RAM and the BIOS voltage, and got it running stable at 3200 when I goosed things to 1.4v instead of the normal XMP profile of 1.35. Memtest comes clean at that voltage. It's been stable for about four days now, so I'm calling it good.

I still have no idea why my full 32gb showed that pattern of progressively more severe CTD/BSOD behavior, but if I can be functional and stable at 16GB/3200 with some higher voltage I'm perfectly fine for now.

One final question, though: What are the realistic risks of running more voltage through my RAM like this? It's not something I normally gently caress with. I saw someone online saying that it reduced the lifespan of the RAM from "death of the solar system" to "death of the user" time frames, but that was some rando on reddit.

This mobo and CPU are pushing 4 years old now. If I can kick the can another year or two for AM5 to really mature and start getting a bit cheaper then I really dgaf if I'm driving this RAM like it's a rental in the meantime. But if it's going to fry the RAM in the next couple of months I'll have to do something else.

down1nit
Jan 10, 2004

outlive your enemies
More volts directly means more watts. More watts = more heat = potentially melted chips or voltage regulators... is the general gist

Think hotter engine in a corvette or whatever.

Cyrano4747
Sep 25, 2006

Yes, I know I'm old, get off my fucking lawn so I can yell at these clouds.

down1nit posted:

More volts directly means more watts. More watts = more heat = potentially melted chips or voltage regulators... is the general gist

Think hotter engine in a corvette or whatever.

Again, I'm not someone with any kind of experience in this. Using my extremely scientific temperature sampling method of "turn on a benchmark program, reach in the side of my case, and touch the RAM" it's not burning my fingertips or anything. I just don't have a framework to understand if .05v is just goosing things a little bit of getting into :supaburn: territory.

Adbot
ADBOT LOVES YOU

down1nit
Jan 10, 2004

outlive your enemies
Ah, you likely wouldn't be able to feel it. They are low low power devices anymore. The heat might be hot enough to see in a thermal setup but not clearly.

It's not fire time yet at that voltage. You just cranked it past 11. It can go all the way to 20 if you cool it.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply