Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
Problem description:
Built a new PC around early May-ish. Immediately after I bought the bits, AM5 became a processor-meltingly hot topic, so I upgraded to the TUF-GAMING-B650-PLUS-WIFI-ASUS-1413 BIOS as soon as it was possible.

The PC worked flawlessly from then until now. Zero issues.
Since everything was going well, I decided to move all my stuff over and give my trusty laptop some well earned rest.

I removed the SAMSUNG 840 EVO 500GB SSD from the laptop and installed it in the PC. I did this on the living room table, so I unplugged everything from the case beforehand. I am 95% sure I switched off the PSU before pulling the cord out of the case and returning it later, because that's what I always do, but it's been a stressful month so I cannot guarantee I didn't err.
The 870EVO SATA cable was in the slot just on the side of the GPU. I installed the 840EVO SATA cable in the adjacent slot underneath the GPU. (BIOS told me they were slots 4 and 3 I believe) It is possible I brushed against the GPU when working it in (static?), but I didn't manhandle it. The cable did click in the socket. I left the SSD loose on the bottom of the case to not have to bend the power cables (there are two SSD trays on the back of the mobo, with relatively little space between the tray and the back cover). I took power from the same cable that fed the 870EVO.

When I booted up the PC with the new SSD in, it hung on the login screen background image. No text, just the image. KB and mouse did nothing. Short clicking the power button did nothing. After waiting for several minutes I did a hard reset.
After this, the PC booted without issues, loaded into Windows and I got to copying things over (840EVO to 870EVO, no traffic on the Kingston). I may have made it some 80GB in when there was a blue screen and restart.

From then on, there were 3 or 4 blue screens, all with different texts 5-20 minutes apart, while I was trying to troubleshoot. I'm sorry I can't give you the exact blue screen texts, because the QR codes where displayed for too short to take a picture of and I formatted the drive afterwards to have a clean, fresh windows install. I can tell you that some of them appeared to be HDD activity related (trying to read and write from the same place, or something like that) based on my quick google searches while the loving thing decided to work for a few minutes. It kind of made sense, since I had been doing a HDD related activity the first few sessions of uptime.

There was also an odd moment where the screen flickered and for a moment, the desktop background image was tiled 2x2, immediately after which a blue screen or just an automatic restart followed.

After this first set of blue screens, Windows failed to load up, referring to a broken file. I removed the 840EVO in case that was somehow at fault and reinstalled Windows 10 on the NVMe drive (this is the same format+reinstall I refer to 2 paragraphs back). Windows loaded up and worked fine for an hour or 2. I moved some more stuff over on a USB stick and then fired up Jagged Alliance 3 (Steam on Kingston, game files on the 870 EVO) to stress test. I could play for a few dozen minutes, after which something went wrong - I think the game just closed with no warning. In any case, the steam library view failed to load any of its pictures/text, the browser failed to load any of its pages and then it crashed again.

On reboot, a Windows file was reported broken again.

I may be inaccurate on the exact number and timing on blue screens. There were several.

I've now retrieved my laptop to post this, I've been hesitant to boot up the PC again before having some kind of plan of action, because it's just been a constant shitfest of bluescreens and restarts and whatever is causing them, it can't be good for the hardware.

Current status:
The 840EVO is back in the old laptop. It doesn't have Windows on it, but it cheerfully tallies the sizes of folders and opens random files quickly, so seems to be not completely broken.
The Kingston Fury NVMe with the broken Windows is still in the new PC
The 870EVO 1tb data drive is still in the new PC
I have a brand new 500GB 870EVO to install into the new PC for a windows drive/troubleshooting because I'm not compromising the 1tb data drive with a windows install.
I have downloaded the latest drivers and BIOS - not all of them have been updated since spring, but the BIOS could be significant I guess. I have not yet flashed the BIOS in case somebody tells me it's probably the PSU in which case trying to flash the BIOS could result in bricking the mobo, too (?)

Attempted fixes: What have you tried to do to resolve the problem?
Besides the format/windows reinstall and some rushed google searches based on the event viewer, not much.
The plan of action is currently:
- One of you will hopefully say "can't be sure, but it's unlikely to be the PSU". That granted I would flash the BIOS to the latest version, try another format+windows install on the Kingston
- If the stupid thing stays up long enough to blink, try diagnostic tools - check the Kingston, memtest and ... ?
- If the stupid thing shits up the OS again, install Windows on the new 500GB SSD and see if that stays up. Is it safe to format the Kingston and leave it in so I could check it or would it be clever to try without first?

Recent changes:
I opened the case, installed another SSD and tried copying files. This prompted many blue screens. The SSD is back out, but the problems persist.

Operating system:
Windows 10 Pro 64bit

System specs:
Dell LCD U2722D
CORSAIR 110R Tempered Mid-Tower Case (with 3 extra bequiet fans (2x140+120) and front panel removed for airflow)
Processor AMD Ryzen™ 5 7600 BOX, 3.80GHz, AM5 (standard fan, temps were ~80-85°C with front panel removed when crumping things in BeamNG according to HWMonitor)
RAM Corsair Vengeance SACRR5032VENG01, DDR5, 32 GB, 4800 MHz
ASUS TUF GAMING B650-PLUS WIFI AM5 (was using the TUF-GAMING-B650-PLUS-WIFI-ASUS-1413 BIOS)
ASUS TUF RTX3060 12GB GDDR6 V2
CORSAIR RMx Series RM850x 80 PLUS Gold (I know it's a bit much, they were cheap)
SAMSUNG 870 EVO 1TB SSD
SAMSUNG 840 EVO 500GB SSD (The issue arose when this was added - although possibly not because of it (?))
KINGSTON FURY 500GB M.2 PCIE NVMe (Windows drive)

Location:
EU

I have Googled and read the FAQ:
Yes

Duuk fucked around with this message at 21:12 on Aug 8, 2023

Adbot
ADBOT LOVES YOU

grack
Jan 10, 2012

COACH TOTORO SAY REFEREE CAN BANISH WHISTLE TO LAND OF WIND AND GHOSTS!

quote:

After this first set of blue screens, Windows failed to load up, referring to a broken file. I removed the 840EVO in case that was somehow at fault and reinstalled Windows 10 on the NVMe drive (this is the same format+reinstall I refer to 2 paragraphs back). Windows loaded up and worked fine for an hour or 2. I moved some more stuff over on a USB stick and then fired up Jagged Alliance 3 (Steam on Kingston, game files on the 870 EVO) to stress test. I could play for a few dozen minutes, after which something went wrong - I think the game just closed with no warning. In any case, the steam library view failed to load any of its pictures/text, the browser failed to load any of its pages and then it crashed again.

It sounds like your 870 EVO is corrupted as the errors seem to propagate when using this drive. Unplug this drive and try using Windows for a while on just your NVME.

If that doesn't work, your NVME may be borked.

If it does work, get an external USB cable to attach your 870 EVO through a USB port and scan for errors. Don't start Windows with this drive installed.

Two questions:
1. Are all your drives on the same file system? You said the 840 EVO is on an older laptop
2. Did you set boot priority in BIOS when you installed your old SSD in to your Windows machine?

Zogo
Jul 29, 2003

Run https://www.hdsentinel.com/hard_disk_sentinel_trial.php to check drive health.

Also, try using different SATA cables and different SATA ports if possible.

Using onboard video temporarily would also rule out your GPU having an issue.

I wouldn't guess the PSU as a prime culprit at the moment.


edit: beaten by a minute :laffo:

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
Thanks for the feedback! Quick update, I've been working through the issue bit by bit:

I've flashed the BIOS.
I've taken out the SATA drive and formatted/reinstalled windows on the NVMe. Currently the computer only has the NVMe in it and appears to be stable. I haven't had the time to stress test it, but it has a couple of hours of uptime and several nights of sleeping without issues.
Hard disk sentinel has given the NVMe a "PERFECT" bill of health. Media and Data Integrity Errors = 0

I bought a USB cable for the SATA drive, going to test it as soon as I have a reasonable evening's time to do it.
I'll check back with the results.

The old EVO840 and the NVMe are NTFS. I'll check the 870 when I USB it back in.

Edit: I did not specifically check boot priority when I installed the third drive, on the assumption that Windows would find itself (there is no windows on the EVO840). Could this have created an issue?

Duuk fucked around with this message at 20:26 on Aug 14, 2023

grack
Jan 10, 2012

COACH TOTORO SAY REFEREE CAN BANISH WHISTLE TO LAND OF WIND AND GHOSTS!
It's possible. Whenever you add a second drive or even move them around to different ports you should go into the BIOS and ensure that the proper drive is set as Boot.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
Righto.

Nothing's changed in the configuration. Still just the NVMe.
Was playing Automation, screen goes grainy black, white and red (as if trying to display the image in just those colours, mostly black) and stops.
Then the PC restarts, tries to load into Windows and declares the windows installation corrupt.

I guess it has to be the NVMe, or the GPU, or the motherboard. Or the PSU against odds. Or the RAM I guess, I'll have to admit I haven't done memtest yet.

It seems unlikely that it would be the GPU - if that would be hosed, the computer might restart, but shouldn't corrupt windows (correct me if I'm wrong).
It seems unlikely that it would be the PSU or the mobo or even RAM for the same reason - at least in the times of yore, you didn't have to reinstall windows every time there was a power outage.

All signs point to the NVMe, regardless of the positive bill of health. Sounds about right?
The PC was working flawlessly for a week, including playing Automation for several hours. When I have time I'll tear out the NVMe and put in the little EVO870 with a fresh windows install and cross fingers.

Zogo
Jul 29, 2003

Duuk posted:

All signs point to the NVMe, regardless of the positive bill of health. Sounds about right?

It'd be the most likely thing.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
Thanks for the conformation.
Took out the NVMe. I was reluctant to mess with it physically earlier, for lack of having new thermal material on hand to replace, but perhaps I should have. There was a bit near the middle where the standard thermal pad clearly hadn't had good contact with the drive.

Now, I was running Hardware Monitor and never noticed high temps - but I guess if they only peaked at certain moments and the computer instantly shat itself as a result, I wouldn't ever have seen the HWM logs.
What also occurs to me is that the heatsinks on the mobo are massive slabs of metal - meaning if the thermal pad was squished at the ends and off-contact in the middle, it would have been the NVMe that had to bend out of the way, not the heatsink. The oem thermal pad material seems horribly stiff and bubblegummy, honestly.

I'm installing windows on the little SATA now. Give that some time to work and see if it sticks. Wouldn't be surprised either way at this point.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
Not the NVMe.
Windows installed on brand new 500gb EVO870 SATA. No other drive in the PC

Started Automation again, went into a photo scene, same place where I got the issue last time.
First, crashed to desktop, no other issues. Being an idiot, I started the game up again and went back into the photoscene.
A few seconds in got a black image with some white and red pixels (playing-doom-on-a-pregnancy-test graphics, same as happened with the NVMe). Upon restart:

The operating system couldn't be loaded, because the kernel is missing or contains errors

File:\windows\system32\ntoskrnl.exe
error code:0xc0000098

RAM, PSU or mobo? I would need to reinstall windows to run memtest. The PC only stayed up for 10 minutes since the hardware update, so if it's the RAM and I run memtest and it shits itself again I won't know whether it was because of the memtest or something else. Still not sure how the RAM would consistently break the windows installation.

Honestly, leaning towards ordering a new PSU first, just so if it's the culprit and giving out unstable voltages I don't break anything else while troubleshooting.
It would seem to make sense that if the voltages are fucky, it might not be able to load data from any storage device, NVMe or SATA
It would seem to make sense that if the voltages are fucky, then putting load on the GPU would create issues

google:

quote:

On the other hand, the no entry problem may be caused by disk write errors, power outages, boot sector viruses, or errors made while configuring the BCD manually.
Another reason for 0xc0000098 error is an installed incompatible hard drive driver. For example, both the Ntfs.sys and the aswVmm.sys mentioned above are such drivers.

Power outages?

I have not specifically installed any storage drivers - neither NVMe nor SATA. I did download the RAID drivers from the mobo support site, but I didn't install them, I wasn't planning to use RAID.

Zogo
Jul 29, 2003

Duuk posted:

RAM, PSU or mobo?

I'd guess RAM or GPU before motherboard or PSU. But I wouldn't be surprised if it was any of those four things.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
Thanks for the feedback, looks like you were right on the PSU. Either that or it was several things at once.

Replaced the PSU and the PC became stable under load. Seeing as the latest two data points with the old PSU were crashes+Win corruptions 2 minutes into an Automation photoscene and with the new PSU the computer handled it just fine, I thought I was on the home stretch.

Shut down the PC yesterday after a stress test (Automation, BeamNG), no issues.
Today it refuses to start, ntoskrnl.exe missing (again).

RAM is next. Making a new Win installation stick to eliminate the possibility my installations have been broken from the get-go (going to run system file checker too I guess). Then memory tests, first software and if that fails before getting results, then pulling them out one by one and retrying.


I'll be replacing the whole loving thing bit by bit at this rate and wasting hours confirming it's still a piece of poo poo at every step.

Zogo
Jul 29, 2003

Duuk posted:

I'll be replacing the whole loving thing bit by bit at this rate and wasting hours confirming it's still a piece of poo poo at every step.

Yes, that's the annoying thing with errors like this. It could be any piece of the hardware and unless you have a bunch of machines to put different parts into (to rule them out) then the whole process can take a long time. It's even more complicated if you have two bad pieces of hardware at the same time. Then the whole issue of delayed troubleshooting if the error happens only every few weeks.


Probably not the issue but you might eventually want to try a different keyboard, mouse, monitor etc. and disconnect any nonessential peripherals (printers etc.)

There are rare times when the peripherals will be causing strange hardware issues.

down1nit
Jan 10, 2004

outlive your enemies
Add some more voltage to your cpu and ram. Just a bit. Like .01-.06 Just for a laugh. There's quite a few of these I've solved by messing with power delivery in the UEFI.

Sometimes disabling c-states, sometimes disabling speedstep.

Apple had an entire macbook year with undervolted cpus causing random panics (and associated filesystem errors)

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
Thanks for the feedback.

The PC was sitting for a while as I was busy with other things. The last time I tried to start it, it failed with the ntoskrnl.exe missing BSOD.

I made a memtest USB, stuck it in and tapped F2 to get into BIOS and adjust boot order - except I had forgotten that I had unplugged the keyboard at some point.
The computer booted into Windows no problem. Lo and behold, ntoskrnl.exe had been found!

Windows system file checker found a couple of errors and fixed them (if I read the log right, they were about something being owned two times or none at all mostly)
Windows memory checker tool found zero RAM errors.
Memtest86 ticked over for four hours and found zero RAM errors (the second time - the first time it found all the errors in the world because I thought I hadn't made any significant changes in the CMOS so it should already be on factory settings, which was clearly the wrongest possible thing to think)

Right.
The hard drives are fine
The PSU is fine
The RAM is fine

Leaves the GPU and the mobo (and the faint but disturbing possibility that something has been lurking forever in the CMOS settings that adding a HDD awakened and a BIOS update didn't clear - if that is possible)
It seems testing the GPU is easier, I'll have to do some research - ideas welcome.

I'll do the peripheral-replacing too, the monitor will be tough to replace, but the others are easy. Harder is verifying whether the issue is fixed.
I'll leave the voltages for after I've exhausted every other option. I know it's not a big deal, I just feel like if the drat thing is stock and under warranty, it should either run or be replaced.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.
OKAY
I had a brainwave during the week and took out the GPU today to see if it still crashes without it (easiest way to test the GPU right there).

Loaded into Windows no problem. Had enough time to open the browser, figured I would try to watch a HD video on youtube or something for any semblance of stress test.

Screen became artifacted and stripey and hosed up before the home page had loaded, showing some manner of BSOD, not possible to make out exactly.
Then it settled into a nice clear BSOD with 0xc0000098 ntoskrnl.exe missing.

So it's the motherboard... Unless it's actually the processor? I don't have spare bits to test them separately. The various errors related to the file system would have made me think it was the mobo, but all that also goes through the processor doesn't it, doubly so when the integrated GPU is used?

Duuk fucked around with this message at 19:59 on Oct 7, 2023

Zogo
Jul 29, 2003

Duuk posted:

So it's the motherboard... Unless it's actually the processor? I don't have spare bits to test them separately. The various errors related to the file system would have made me think it was the mobo, but all that also goes through the processor doesn't it, doubly so when the integrated GPU is used?

Yeah, it could be either. Years ago motherboard failures were much more common (compared to CPU failures) but in recent years CPU failures have been increasing.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.

Zogo posted:

Yeah, it could be either. Years ago motherboard failures were much more common (compared to CPU failures) but in recent years CPU failures have been increasing.

There have been some developments.

I sent both the mobo and processor into warranty.
Both were refunded.

I bought a new Ryzen 5 7600X and Arctic tower cooler and a new mobo (exactly same as before) during the 2 month waiting period. I replaced the PSU some months ago with a Seasonic Prime PX-750, while the old one was being checked (and came back OK)

The loving thing still doesn't work right.

The behaviour is different though, I can see that the mobo/processor change has changed things and wonder whether they broke some other poo poo on the way out (or the other way around, or the whole initial shipment was struck by lightning)

I've had a few blue screens, IRQL_NOT_LESS_OR_EQUAL, most commonly the screen just goes black, the fans stay on. Sometimes the screen freezes for a moment before going. A few times early on, it restarted on its own, but usually I have to hold down the switch.

Event viewer has a recurring error about the Windows Game Bar timing out.
It also says something along the lines of "the computer has rebooted due to a Bug Check", there have been some different versions of the very long alphanumeric code that follows.

I can only assume it is either the GPU or the RAM (which has passed memcheck and several rounds of windows memory checker previously), pretty much the last 2 components of the original shipment.

I've done a couple of BIOS updates and GPU driver updates to no avail. The mobo drivers are up to date.
Usually it crashes a few minutes in, but sometimes it's been stable for longer whiles (i.e. idling for the whole night)
Last night I took out one of the sticks of RAM and disabled the GPU (using the integrated graphics instead). It was stable for about an hour, but I'm going to test it a couple more nights.
If it still shits itself, it probably has to be the RAM (?)
If it does not poo poo itself I'll put in the other stick of RAM and physically remove the GPU
If still does not poo poo itself then it has to be the GPU and I get to wait 70 days for them to make sure.

...it's not likely to be Windows this time, is it?

Thanks for the help so far.

Zogo
Jul 29, 2003

Duuk posted:

I can only assume it is either the GPU or the RAM (which has passed memcheck and several rounds of windows memory checker previously), pretty much the last 2 components of the original shipment.

A few months back you had crashes without using the GPU but it's always possible more than one piece of hardware has an issue.

Duuk posted:

If it still shits itself, it probably has to be the RAM (?)

No way to be 100% sure yet. Since you have new hardware it could potentially be anything.

Duuk posted:

...it's not likely to be Windows this time, is it?

If it's fully updated it shouldn't be having BSODs.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.

Zogo posted:

No way to be 100% sure yet. Since you have new hardware it could potentially be anything.

I know you're right. Mildly dreading an infinite loop of buying (or receiving from warranty) new bits that turn out to be also broken.
Like, I built this thing to do something with, not just games but personal projects, and while they are the kind of projects that can be postponed for six months without serious consequences, at some point I'm going to have to give up and buy an overpriced, incompetently specified pre-built turd where if it doesn't work I can just take the whole bloody thing back in one piece and call it their problem.

I've built several computers from scratch, latest 1.5 years ago which has worked with zero issues in this same room. I'm not knowledgeable, but I know I'm not putting RAM in the PCI-E slot.
Never seen anything like this.

Anyway, started it up again today. Dedicated GPU still disabled through Device Manager from yesterday and 1 stick of RAM.
Got a dark screen with the little sky blue circle (windows loading thing) in the middle, frozen. Sat there for five minutes. Short clicked the power button and it shut down.
Turned it on again, instant windows login screen.
Event viewer suggests it might have been because of the KB5034441 update which is being widely reported as broken.

About the GPU, yes, did have problems without it also. The new problems look different (black screen instead of blue and hanging with fans on instead of restart/windows corruption) so could be anything. As you say, it's possible it was all broken all along.
I'll just give it more time and see what crashes out.

Zogo
Jul 29, 2003

Duuk posted:

Dedicated GPU still disabled through Device Manager from yesterday and 1 stick of RAM.

You should remove the GPU entirely.

Duuk posted:

Event viewer suggests it might have been because of the KB5034441 update which is being widely reported as broken.

AFAIK that shouldn't be causing that.

Duuk
Sep 4, 2006

Victorious, he returned to us, claiming that he had slain the drought where even Orlanth could not. The god-talkers were not sure what to make of this.

Zogo posted:

You should remove the GPU entirely.

AFAIK that shouldn't be causing that.

Done. Started up OK, I'll give it a few days and see what comes.

KB5034441 failed to install when updating in windows day before yesterday, so I assumed it was trying to do it again yesterday at startup. Otherwise up to date.
Don't know what caused it to hang then, but it was awake enough to shut down from a short click as opposed to the black-screen when it needs the button held.

Adbot
ADBOT LOVES YOU

down1nit
Jan 10, 2004

outlive your enemies
If you want, you can take it to a pro. They will take forever (enjoy waiting for a computer to crash) but they will have all the parts on hand and can do the work of keeping track of poo poo.

For your sanity though I hope removing the GPU does it and, you're able to exchange it for a not lovely one

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply