GPU Megat[H]read - the cores of wrath grew heavy on the die that day

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > GPU Megat[H]read - the cores of wrath grew heavy on the die that day

«‹›3877 »

Farecoal: Oct 15, 2011; There he go

Cygni posted:

I mean its the same as everything else. Why have airlines cut every perk?

tendency of the rate of profit to fall

# ? Mar 24, 2023 12:51

Adbot: ADBOT LOVES YOU

# ? Jun 6, 2024 12:44

Taima: Dec 31, 2006; tfw you're peeing next to someone in the lineup and they don't know

Dr. Video Games 0031 posted:

It's worth noting that if you like to turn on ray tracing whenever possible, then it might help a lot more than 12%. But again, it depends on the game. Spider-man, Callisto Protocol, Witcher 3, HogLeg, and Hitman 3 are a handful of recent games that end up heavily CPU limited (usually single-thread limited) when turning on ray tracing. As in, they frequently struggle to maintain 60fps on zen 3. So it really all depends on what games you like to play.

Oh poo poo, I hadn't thought about this. So you're saying that, for example, the path traced CP2077 update coming out soon- that will require a beefy CPU as well? With ray tracing becoming more and more ubiquitous, does this mean that the future of PC gaming involves caring more about the CPU, even at 4k?

How pronounced is this interaction? Will upcoming path traced games be as vulnerable to CPU bottlenecking as GPU bottlenecking? Is there any data that shows, for example, how the new X3D chips uplift path tracing (or other high strain RTX features) in particular?

I grabbed everything I needed for the 7800X3D dropping soon and that makes me extra excited to see the uplift on games that really seem to benefit from it like, again, the forthcoming CP2077 path tracing. It's also, frankly, nice to see CPUs starting to matter again at 4K due to the sheer power of the 4090.

Taima fucked around with this message at 13:29 on Mar 24, 2023

# ? Mar 24, 2023 13:11

71wn: Mar 25, 2018

Any recommendations for AMD GPUs to target for 1440@240hz gaming? I'm running a 5800x3d with a Seasonic 750w power supply.

I'm looking for more frames than my 5700xt can deliver these days, and I'm sticking with Radeon since I'm on Linux. Thanks.

# ? Mar 24, 2023 19:05

sauer kraut: Oct 2, 2004

I mean anything from a 6600 to the 6800XT is great.
The more you pay the more you get.
I'm not a fan of the x900 top models or the RX 7000 chiplet series personally.

# ? Mar 24, 2023 19:17

Kibner: Oct 21, 2008; Acguy Supremacy

Not going to pretend I fully understand what this means for end-users, but this seems good.

https://twitter.com/JirayD/status/1639319522216755204?s=20

# ? Mar 24, 2023 19:24

repiv: Aug 13, 2009

https://gpuopen.com/gdc-presentations/2023/GDC-2023-Temporal-Upscaling.pdf

FSR3 is briefly discussed here (page 30) but there's not much to report besides "it's frame generation like everyone assumed"

they specifically use the word interpolation, so it's like DLSS3 in that it adds at least a frame of lag

# ? Mar 24, 2023 19:35

71wn: Mar 25, 2018

What are your strikes against the 7900's, besides cost?

sauer kraut posted:

I mean anything from a 6600 to the 6800XT is great.
The more you pay the more you get.
I'm not a fan of the x900 top models or the RX 7000 chiplet series personally.

# ? Mar 24, 2023 19:40

repiv: Aug 13, 2009

Kibner posted:

Not going to pretend I fully understand what this means for end-users, but this seems good.

it's not the kind of pre-compilation you're thinking of

shaders get compiled twice, first from source code to bytecode (which this AMD thing assists with) and then from bytecode to native code (which is what causes stutters if not done properly)

# ? Mar 24, 2023 19:42

Kibner: Oct 21, 2008; Acguy Supremacy

repiv posted:

it's not the kind of pre-compilation you're thinking of

shaders get compiled twice, first from source code to bytecode (which this AMD thing assists with) and then from bytecode to native code (which is what causes stutters if not done properly)

ahh, thanks!

# ? Mar 24, 2023 20:10

MarcusSA: Sep 23, 2007

71wn posted:

What are your strikes against the 7900's, besides cost?

Honestly that�s it for me tbh. They aren�t bad but they should be cheaper.

# ? Mar 24, 2023 20:25

sauer kraut: Oct 2, 2004

71wn posted:

What are your strikes against the 7900's, besides cost?

I really don't like the interconnects producing substantial amounts of heat and waste in Ryzen and RDNA3, with zero upside for the consumer.
Any cost savings (if there even are any) are kept by AMD.
Also the drivers for 7 series don't seem mature at all, just from perusing reddit a lot. 6 series are in a very nice spot comparatively.

sauer kraut fucked around with this message at 21:52 on Mar 24, 2023

# ? Mar 24, 2023 21:50

71wn: Mar 25, 2018

Thanks. I held off upgrading w/ RDNA2 because I was hoping to get a huge uplift in performance by skipping a gen. I'm starting to think the difference in performance is not worth the instability and idle power consumption that RDNA3 delivers at the moment.

I guess my last concern is if my 750w Seasonic would be sufficient to power a 6900/6950xt with my 5800x3d. pcpartspicker says "yes," but the GPU specs pages list 850+.

sauer kraut posted:

I really don't like the interconnects producing substantial amounts of heat and waste in Ryzen and RDNA3, with zero upside for the consumer.
Any cost savings (if there even are any) are kept by AMD.
Also the drivers for 7 series don't seem mature at all, just from perusing reddit a lot. 6 series are in a very nice spot comparatively.

# ? Mar 24, 2023 22:04

sauer kraut: Oct 2, 2004

Oh no the 7900 is a disappointing performance uplift on top of all its issues.

If you wanna spring for the full 6900XT chip please do. The 6950XT is just an overjuiced version of the same.
Wheter your PSU can handle it, I'd wager yes but :shrug:

sauer kraut fucked around with this message at 22:22 on Mar 24, 2023

# ? Mar 24, 2023 22:18

71wn: Mar 25, 2018

Good stuff. Didn't realize how close RDNA2 was in performance, at least for my use case.

sauer kraut posted:

Oh no the 7900 is a disappointing performance uplift on top of all its issues.

If you wanna spring for the full 6900XT chip please do. The 6950XT is just an overjuiced version of the same.
Wheter your PSU can handle it, I'd wager yes but

# ? Mar 24, 2023 22:32

lordfrikk: Mar 11, 2010; Oh, say it ain't fuckin' so,
you stupid fuck!

sauer kraut posted:

Oh no the 7900 is a disappointing performance uplift on top of all its issues.

If you wanna spring for the full 6900XT chip please do. The 6950XT is just an overjuiced version of the same.
Wheter your PSU can handle it, I'd wager yes but

I liked my 5700 XT but only after a ~12 months of driver problems on Linux (and Windows, too, IIRC). I was fully set on getting another AMD card despite that, but the performance uplift and pricing are not competitive. Result: got Nvidia 4090. Setting up Linux there is a lot of weird gotchas, but gaming works perfectly fine and video encoding/ML is straight-up better than AMD.

# ? Mar 24, 2023 22:51

71wn: Mar 25, 2018

lordfrikk posted:

I liked my 5700 XT but only after a ~12 months of driver problems on Linux (and Windows, too, IIRC). I was fully set on getting another AMD card despite that, but the performance uplift and pricing are not competitive. Result: got Nvidia 4090. Setting up Linux there is a lot of weird gotchas, but gaming works perfectly fine and video encoding/ML is straight-up better than AMD.

Are you running in Wayland? I recently switched over to Hyprland (Wayland compositor), I can't go back to X at this point.

# ? Mar 24, 2023 22:58

repiv: Aug 13, 2009

https://arstechnica.com/gadgets/2023/03/nvidia-quietly-boosts-the-video-encoding-capabilities-of-geforce-gpus/

nvidia forgot to tell anyone, but geforce cards now support 5 simultaneous nvenc streams without having to patch the driver

# ? Mar 24, 2023 23:13

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

repiv posted:

https://arstechnica.com/gadgets/2023/03/nvidia-quietly-boosts-the-video-encoding-capabilities-of-geforce-gpus/

nvidia forgot to tell anyone, but geforce cards now support 5 simultaneous nvenc streams without having to patch the driver

Wow, they updated the GTX 750 Ti, that's impressive. I'm guessing the demand for this is Plex?

# ? Mar 25, 2023 00:21

Rinkles: Oct 24, 2010; What I'm getting at is...
Do you feel the same way?

Finally tried out RTX super resolution. I was unimpressed by what it did to a 720p twitch stream, but I'd say it did improve old 480p Onion videos.

I didn't check the power draw, but it warmed up my 3060ti by 5-10 C on the highest setting.

# ? Mar 25, 2023 00:36

wargames: Mar 16, 2008; official yospos cat censor

sauer kraut posted:

Oh no the 7900 is a disappointing performance uplift on top of all its issues.

If you wanna spring for the full 6900XT chip please do. The 6950XT is just an overjuiced version of the same.
Wheter your PSU can handle it, I'd wager yes but

The reference model for the 7900xt is weak but the AIB from sapphire is quite nice

https://www.youtube.com/watch?v=h-qHtElnTbE

# ? Mar 25, 2023 02:39

Dr. Video Games 0031: Jul 17, 2004

wargames posted:

The reference model for the 7900xt is weak but the AIB from sapphire is quite nice

https://www.youtube.com/watch?v=h-qHtElnTbE

TechPowerUp reviewed the Pulse, and it was 1.4% faster than the reference model. So i dunno what's happening here. https://www.techpowerup.com/review/sapphire-radeon-rx-7900-xt-pulse/30.html

edit: The only other two comparisons I found only test a handful of games but they show basically the same thing: no real change from the reference model. https://www.pcworld.com/article/1658218/sapphire-pulse-radeon-rx-7900-xt-review.html https://pcper.com/2023/03/sapphire-pulse-radeon-rx-7900-xt-review/

Level one techs is the outlier here, and their findings are very strange because we basically never see this kind of improvement for AIB cards over the reference models in this day and age. Something had to have gone wrong in their testing.

Dr. Video Games 0031 fucked around with this message at 03:03 on Mar 25, 2023

# ? Mar 25, 2023 02:57

Cygni: Nov 12, 2005; raring to post

sauer kraut posted:

Oh no the 7900 is a disappointing performance uplift on top of all its issues.

moores law is dead, free lunches are over. the future (hell, present for GPUs) is performant, cheap, or efficient: choose 1.

# ? Mar 25, 2023 03:46

Dr. Video Games 0031: Jul 17, 2004

Except Nvidia managed to choose 2 for Ada, performant and efficient. It's just not cheap.

# ? Mar 25, 2023 03:53

Cygni: Nov 12, 2005; raring to post

Dr. Video Games 0031 posted:

Except Nvidia managed to choose 2 for Ada, performant and efficient. It's just not cheap.

yeah, i was mostly just being flippant. i personally think 3d packaging/chiplets will give one last "free" bump, but then the party is over and we pretty firmly enter choose 1 territory.

# ? Mar 25, 2023 04:25

Kazinsal: Dec 13, 2011

Cygni posted:

yeah, i was mostly just being flippant. i personally think 3d packaging/chiplets will give one last "free" bump, but then the party is over and we pretty firmly enter choose 1 territory.

We'll probably also be dropping the "cheap" option around that time.

# ? Mar 25, 2023 04:40

Lockback: Sep 3, 2006; All days are nights to see till I see thee; and nights bright days when dreams do show me thee.

Cheap will just be the older, used cards.

# ? Mar 25, 2023 04:42

Cygni: Nov 12, 2005; raring to post

my (purely pulled out of my rear end) thinking is that cheap parts will increasingly become based on older, monolithic process technologies (so not efficient or performant, but cost effective).

efficient parts would leverage the most efficient possible nodes with more expensive 3d stacking with small dies for mobile and some data center (but not cheap or performant)

and the Big Hoss gaming GPUs and workstation parts will be the most performant combinations of whatever can be put together, but it aint gonna be cheap or power efficient, because that doesnt really matter much for this segment.

This is all based on assumptions from current pricing trends and such, so it could be laughably wrong.

# ? Mar 25, 2023 04:53

New Zealand can eat me: Aug 29, 2008

Cygni posted:

yeah, i was mostly just being flippant. i personally think 3d packaging/chiplets will give one last "free" bump, but then the party is over and we pretty firmly enter choose 1 territory.

I recently read through this article from semi-eng that claimed that we'll be seeing ~30% reduction in power requirements once they get 'true 3d' nailed down. That's the kind of thermal difference that AMD would need to actually retain full overclocking capabilities on their X3D processors. But it also says that a lot of this work is 'theoretical' in that they can physically make it happen, but little effort has been applied to important things like clock sync, and the verification tools don't exist yet.

Really would not mind if the motherboard-processor-gpu-monoblock+aio was all one thing. Abstractly I hope that's where we're headed.

repiv posted:

https://gpuopen.com/gdc-presentations/2023/GDC-2023-Temporal-Upscaling.pdf

FSR3 is briefly discussed here (page 30) but there's not much to report besides "it's frame generation like everyone assumed"

they specifically use the word interpolation, so it's like DLSS3 in that it adds at least a frame of lag

The diagrams they presented made it seem like they're putting extra effort in to detect stale pixels/image kernels, especially for things like snowflakes etc, and presenting the scene as it should be seen to replace what would otherwise be a ghosty trail. I don't know enough about Nvidia's poo poo to say it doesn't do that, but they were acting like it was novel to have so many optional presentation points throughout the frame rendering/generation, post upscaling, but both pre and post resampling.

Also, I found it amusing that they put 650W through a modded & watercooled 7900XTX and it still wasn't getting to 110C. I think the peak was 103. Won't hold this up and say "they totally could have built a 4090 killer like they claimed" given that it's something like a 12-14% performance uplift for 40% more power consumption and that still falls far short of the 4090 scores in Timespy Extreme. But! If there ends up being an XTXH model with HBM3, maybe we'll see that happen? My armchair theory is that they'll actually reduce the size of the caches if they do that to get higher clocks and lean on the HBM?

I understand I'm the only person on planet earth who would pay $1700 and use 700W just to beat nvidia but I don't care I will lick Lisa Su's boot.

New Zealand can eat me fucked around with this message at 07:20 on Mar 25, 2023

# ? Mar 25, 2023 07:12

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

New Zealand can eat me posted:

I recently read through this article from semi-eng that claimed that we'll be seeing ~30% reduction in power requirements once they get 'true 3d' nailed down. That's the kind of thermal difference that AMD would need to actually retain full overclocking capabilities on their X3D processors. But it also says that a lot of this work is 'theoretical' in that they can physically make it happen, but little effort has been applied to important things like clock sync, and the verification tools don't exist yet.

I mean there will always be the desire to spend more power, there never is gonna be a platonic "time when we couldn't use 10 more watts in each CCD" or whatever.

And I think the mid-term answer is that the cache die goes under the compute die. Route the CCD signals down through through-silicon-vias, or have an "active interposer" where the cache is built into the interposer (effectively the same thing but with the IOD being bigger). Perhaps in the long term yeah you could pull parts of the chip onto each die to optimize efficiency a bit by reducing data movement.

iirc the cache dies are actually a special SRAM-optimized library (or nodelet?) so you can't currently mix-and-match quite like that (or at least you have to revert to the regular nodelet's SRAM cell, so there is a density penalty for mixing-and-matching).

quote:

But! If there ends up being an XTXH model with HBM3, maybe we'll see that happen? My armchair theory is that they'll actually reduce the size of the caches if they do that to get higher clocks and lean on the HBM?

In a way it's funny that we finally got to the "HBM is actually a cache while we slowly pull data from someplace else" dream. But the slow place is quad-pumped GDDR6 running at 34 GT/s and the HBM is actually just a fuckload of TSMC SRAM run on a max-density library/nodelet variant sitting on your IO die chiplets because that's way cheaper than actual HBM. Oh and yesterday's "slow place" pcie bus is now your disk drive and you can load stuff from it via directstorage RDMA at up to 8GB/s (theoretical limit of 4.0x4 NVMe), plus some more from compression.

HBM is great it's just the cost has never been acceptable. And now it doesn't have the GDDR PHY onboard so you need some die space for that somewhere else too. But like an SRAM cache (RDNA3 or X3D CPUs) it would be interesting to see HBM sprinkled in as a local cache, I have no idea how that would compare in performance terms (bandwidth/latency/etc) to SRAM/MCD.

But yes I'd love to see the evil-supervillian-twin GK210 version of ada where it's just a reticle limit monolithic die with stacked HBM2 cache, H100 with an image pipeline. Or a GCD/MCD split (I think it can work, I think AMD just faceplanted with RDNA3) and go reticle-limit GCD and a bunch of MCDs for PHYs and cache. Yeah 4090 is baller, Blackwell will be baller too, but like, 4090 isn't trying as hard as possible here, they could go harder if money weren't an object. Life is good... but it could be better.

Paul MaudDib fucked around with this message at 09:55 on Mar 25, 2023

# ? Mar 25, 2023 09:34

Zero VGS: Aug 16, 2002; ASK ME ABOUT HOW HUMAN LIVES THAT MADE VIDEO GAME CONTROLLERS ARE WORTH MORE; Lipstick Apathy

Dr. Video Games 0031 posted:

Except Nvidia managed to choose 2 for Ada, performant and efficient. It's just not cheap.

Ah yeah, I can see how efficient they are from the 3-slot 4080 vs the 2-slot 3080: efficiency so big it can't fit in anything.

# ? Mar 25, 2023 12:19

Dr. Video Games 0031: Jul 17, 2004

Zero VGS posted:

Ah yeah, I can see how efficient they are from the 3-slot 4080 vs the 2-slot 3080: efficiency so big it can't fit in anything.

you do realize that power efficiency isn't measured by cooler size right?

# ? Mar 25, 2023 12:50

KinkyJohn: Sep 19, 2002

Any tips from 4090 havers on how to get the most machine learning performance out of this card? Was doing 9 it/s for generating 512x704 euler-a, then I updated the cuda dlls to 8.11 in the lib folder, now it's doing 15 it/s, but I've heard about 4090s getting up to 60 it/s

I have xformers installed, although auto moans about it being an older version

Palit 4090 gamerock, Windows 10, amd 5600x cpu if that matters

(crossposting this question from the gbs ai thread because people here might know more)

# ? Mar 25, 2023 13:42

ConanTheLibrarian: Aug 13, 2004; dis buch is late; Fallen Rib

Cygni posted:

moores law is dead, free lunches are over. the future (hell, present for GPUs) is performant, cheap, or efficient: choose 1.

Substitute 'reasonable power draw' for 'efficiency' and you're spot on.

# ? Mar 25, 2023 13:53

Zephro: Nov 23, 2000; I suppose I could part with one and still be feared...

71wn posted:

Good stuff. Didn't realize how close RDNA2 was in performance, at least for my use case.

Ooh, that chart is super useful. Those are all "real" frames, right? ie no interpolation / AI frame generation stuff?

# ? Mar 26, 2023 00:19

New Zealand can eat me: Aug 29, 2008

Saw this float by on r/amd: 2x EPYC 7313 16-core, 8x Radeon VII 16GB. They're using it to do CFD that runs off of what could be 'as little as' a high end gaming PC. Using the GPU memory bandwidth to avoid moving around TBs of simulation data to disk. I didn't see them mention what kind of throughput they're getting, but theoretically there's 8TB/s of parallel GPU memory bandwidth in this picture. Said they only needed a slight undervolt to be happy. I'm assuming its in a rack with forced air because they're packed in there like loving sardines.

Paul MaudDib posted:

Life is good... but it could be better.

Always this. I wonder if the GPU clocks/throughput aren't high enough to where it would even have a benefit yet. Like I mentioned earlier, this 7900XTX hits 3.5ghz doing blender benchmarks, the memory was running at 2600mhz with fast timings enabled. Also tried lower/stock/undervolted GPU clocks with much faster memory speeds and it was nowhere near as quick (-14%). It's too bad we don't have MPT this generation, I bet custom memory timings would do a lot here.

Sucks that we have to wait until 2H2023 to find out how many CDNA3 cores are in the MI300, I bet it's just a couple.

KinkyJohn posted:

Any tips from 4090 havers on how to get the most machine learning performance out of this card? Was doing 9 it/s for generating 512x704 euler-a, then I updated the cuda dlls to 8.11 in the lib folder, now it's doing 15 it/s, but I've heard about 4090s getting up to 60 it/s

I have xformers installed, although auto moans about it being an older version

I would try updating that if it won't break anything first. Fans should be at 100%. Should be the same as AMD otherwise: undervolt/underclock gpu, find the highest stable memory clock and then back that down a step or two.

Are you sure those are windows scores? I think I remember reading that disabling Hardware Accelerated Scheduling will give you +30% or so. Honestly though, gently caress doing this on windows. Make a little ubuntu partition and do it there so you don't have to fight the OS to get perf. Instead you can fight the OS in general :v:

# ? Mar 27, 2023 00:24

Dr. Video Games 0031: Jul 17, 2004

New Zealand can eat me posted:

Saw this float by on r/amd: 2x EPYC 7313 16-core, 8x Radeon VII 16GB. They're using it to do CFD that runs off of what could be 'as little as' a high end gaming PC. Using the GPU memory bandwidth to avoid moving around TBs of simulation data to disk. I didn't see them mention what kind of throughput they're getting, but theoretically there's 8TB/s of parallel GPU memory bandwidth in this picture. Said they only needed a slight undervolt to be happy. I'm assuming its in a rack with forced air because they're packed in there like loving sardines.

I really don't understand how they get decent cooling in these setups. I mean, these aren't even blower coolers, just regular coolers with axial fans. And the reference coolers being used here were kinda poo poo too, if I recall correctly. Even with forced air in a rack, does this really work?

# ? Mar 27, 2023 00:40

K8.0: Feb 26, 2004; Her Majesty's 56th Regiment of Foot

Dr. Video Games 0031 posted:

Except Nvidia managed to choose 2 for Ada, performant and efficient. It's just not cheap.

Also Ada is cheap, they're just selling it to you at expensive prices.

Nvidia is really killing it from a technology standpoint, it's a shame that AMD is so busy milking the endless Epyc cash cow that they don't have the resources or need to really be pushing to be competitive with GPUs.

# ? Mar 27, 2023 00:44

Stanley Pain: Jun 16, 2001; by Fluffdaddy

Dr. Video Games 0031 posted:

I really don't understand how they get decent cooling in these setups. I mean, these aren't even blower coolers, just regular coolers with axial fans. And the reference coolers being used here were kinda poo poo too, if I recall correctly. Even with forced air in a rack, does this really work?

Those fans might as well be jet engines

# ? Mar 27, 2023 00:48

Rinkles: Oct 24, 2010; What I'm getting at is...
Do you feel the same way?

K8.0 posted:

Also Ada is cheap, they're just selling it to you at expensive prices.

Do we know this?

# ? Mar 27, 2023 00:52

Adbot: ADBOT LOVES YOU

# ? Jun 6, 2024 12:44

K8.0: Feb 26, 2004; Her Majesty's 56th Regiment of Foot

Unless something that has never happened before is happening with the relative cost of producing GPUs, or Nvidia is somehow selling their flagship product at a loss, yes.

The prices of everything below the 4090 are incredibly juiced, comparing the hardware to historical analogs they should be being sold as products about 2 tiers below what they're marketed as and priced accordingly.

# ? Mar 27, 2023 01:01

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > GPU Megat[H]read - the cores of wrath grew heavy on the die that day

«‹›3877 »