GPU Megat[H]read - the cores of wrath grew heavy on the die that day

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > GPU Megat[H]read - the cores of wrath grew heavy on the die that day

«‹›3877 »

Hace: Feb 13, 2012; <<Mobius 1, Engage.>>

I was promised video cards, but all I have are windows phones

# ? Sep 8, 2014 20:04

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 20:45

Rastor: Jun 2, 2001

Factory Factory posted:

Moore's law isn't dead and buried. It's an economic rule - the number of transistors that can be economically placed on a chip doubles every 24 months (with a different executive saying chip performance doubles every 18). Scaling with new processes is important to this, but it's not all there is to it. A number of times in the past, new nodes have taken more than two years, yet Moore's Law has held up regardless.

"the CPU scaling predicted by Moore�s Law is now dead. CPU performance no longer doubles every 18 months."

-- Bill Dally, chief scientist and senior VP, research, nVidia

"While many have recently predicted the imminent demise of Moore�s Law, we need to recognize that this actually has happened at 28nm. From this point on we will still be able to double the amount of transistors in a single device but not at lower cost. And, for most applications, the cost will actually go up."

"Transistors are expected to cost more in the future; something Moore�s Law doesn�t prescribe."

# ? Sep 8, 2014 21:02

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

Rastor posted:

"the CPU scaling predicted by Moore�s Law is now dead. CPU performance no longer doubles every 18 months."

-- Bill Dally, chief scientist and senior VP, research, nVidia

He says, misquoting Moore's Law.

And cost-per-wafer is also misleading when the industry has gone from processing 150mm wafers to 300 and is now moving to 450mm.

Moore's Law is about the total economics of putting transistors into chips, not about chip performance.

# ? Sep 8, 2014 21:16

Rastor: Jun 2, 2001

The articles I cited are about the total economics of putting transistors into chips getting worse, something Moore�s Law doesn�t prescribe.

# ? Sep 8, 2014 21:26

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

Even that's not the end of innovation. For example, Samsung's V-NAND is set to double capacity per die every year until 2017... all on a 40nm node, and it's more dense than 16nm done as planes of bits. Process node is not the final say in economic scaling when you can use more layers of an older, cheap node.

Maybe building upwards is less relevant to GPUs and CPUs than to NAND. Mobile power needs disfavor re-using older processes, but they also don't rule it out. All I'm saying is that there's some more to be had. Maybe not much. Maybe none from process nodes. Maybe progress has slowed. But there's a little more progress yet to be had.

# ? Sep 8, 2014 22:54

Rastor: Jun 2, 2001

I did specifically say "there are plenty of tweaks left".

# ? Sep 9, 2014 00:15

Rime: Nov 2, 2011; by Games Forum

At this point the vast gains need to come in the form of code being written in a non-lazy and non-lovely fashion anyways. There's oceans of performance left to be wrung out of the hardware we have today, hundreds of miles worth of fancy-rear end instruction sets that nobody bothers to use, favoring poo poo that dates back to the 90's or worse. It's depressing to see how much CPU and GPU power is just left sitting on the table, unused, each generation.

# ? Sep 9, 2014 00:32

SlayVus: Jul 10, 2009; Grimey Drawer

Factory Factory posted:

Maybe building upwards is less relevant to GPUs and CPUs than to NAND. Mobile power needs disfavor re-using older processes, but they also don't rule it out. All I'm saying is that there's some more to be had. Maybe not much. Maybe none from process nodes. Maybe progress has slowed. But there's a little more progress yet to be had.

Doesn't 3D building allow for more transistors in a smaller package though? I thought this was why Intel was going to build CPUs in 3d.

Rime posted:

At this point the vast gains need to come in the form of code being written in a non-lazy and non-lovely fashion anyways. There's oceans of performance left to be wrung out of the hardware we have today, hundreds of miles worth of fancy-rear end instruction sets that nobody bothers to use, favoring poo poo that dates back to the 90's or worse. It's depressing to see how much CPU and GPU power is just left sitting on the table, unused, each generation.

Freaking computer games! Always using two cores when we have up to 16 available. Maybe current gen consoles will make multi threaded cpu processing more likely on PC since they resemble PCs so closely.

SlayVus fucked around with this message at 00:37 on Sep 9, 2014

# ? Sep 9, 2014 00:34

isndl: May 2, 2012; I WON A CONTEST IN TG AND ALL I GOT WAS THIS CUSTOM TITLE

SlayVus posted:

Freaking computer games! Always using two cores when we have up to 16 available. Maybe current gen consoles will make multi threaded cpu processing more likely on PC since they resemble PCs so closely.

Concurrency is a serious bitch to work with and takes incredible pre-planning to make it work right. Games in particular have it rough in that everything is slaved to responding to real-time inputs with as minimal a delay as possible - it's not like e.g. video encoding where you hit a button and wait for the result to be churned out.

# ? Sep 9, 2014 02:29

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

SlayVus posted:

Doesn't 3D building allow for more transistors in a smaller package though? I thought this was why Intel was going to build CPUs in 3d.

Yeah, that's the idea. But it may not be the best strategy for CPUs for a couple reasons.

1) Leakage, i.e. the power use of the circuit. Larger process nodes require more voltage to operate at the same clock speed and, as a result, can be limited in their maximum clock speed when the same uarch on a smaller node is not. We haven't seen this very much lately because Intel's uarch changes have overwhelmed this effect with chips inherently less and less able to clock high. But in a mobile, TDP-limited world, higher power use both at idle and under load can be a dealbreaker.

2) Heat density. For about 8-10 years now, the fundamental limit of CPU and GPU performance has been heat production and cooling the chip. Every time the process shrinks, the 95W or whatever of the processor gets packed into a smaller and smaller space. If you stack a chip very high, you could have trouble cooling the lower layers of the chip well enough for it to function.

# ? Sep 9, 2014 03:00

Arzachel: May 12, 2012

Factory Factory posted:

Yeah, that's the idea. But it may not be the best strategy for CPUs for a couple reasons.

1) Leakage, i.e. the power use of the circuit. Larger process nodes require more voltage to operate at the same clock speed and, as a result, can be limited in their maximum clock speed when the same uarch on a smaller node is not. We haven't seen this very much lately because Intel's uarch changes have overwhelmed this effect with chips inherently less and less able to clock high. But in a mobile, TDP-limited world, higher power use both at idle and under load can be a dealbreaker.

Leakage is actually worse on smaller nodes at the same voltage since larger gates are better at keeping electrons from slipping past them. Per shader performance isn't nearly as important as per thread performance for CPUs, so the clockspeed<->die size tradeoff might be workable for GPUs. Your second point still stands though so it's all conjecture anyways.

# ? Sep 9, 2014 08:05

Welmu: Oct 9, 2007; Metri. Piiri. Sekunti.

Grim Up North posted:

Wasn't it the 19th for the actual NDA lift/paper launch

nVidia is supposedly unveiling Maxwell GTX980/970 cards on September 18th at their 24 hour GAME24 event.

Leaked specs for the GTX980:

# ? Sep 9, 2014 11:44

SeaGoatSupreme: Dec 26, 2009; Ask me about fixed-gear bikes (aka "fixies")

What's a decent "max sane temp" to shoot for with a gtx 750? I've got it sitting stable at 1410MHz base and a 5.8GHz memory overclock. It is jammed in a tiny ssf case with a single 80mm intake fan and peaks in the mid seventies temps during synthetic benchmarks. Can I push for any higher, or should I just be happy with what I have? It's already like a 40% overclock and I seriously didn't expect that from a tiny, quiet card like this.

# ? Sep 9, 2014 11:45

Palladium: May 8, 2012; Very Good
✔️✔️✔️✔️

Nostrum posted:

AMD should sue Qualcomm for allowing them to make the worst business decision of the century. Even the judge would give them an "aww, you dumb kids!" kind of look before dismissing it.

You forgot not making the potential Nvidia merger a reality (JHH is a genius running a firm compared to the AMD jokers) and outright purchasing ATI at a massively inflated price and then replacing the ATI brand when it is more publicly recognized than AMD itself.

# ? Sep 9, 2014 17:25

Rime: Nov 2, 2011; by Games Forum

Even today, I still slip up and refer to a card as ATI...

# ? Sep 9, 2014 19:11

1gnoirents: Jun 28, 2014; hello

SeaGoatSupreme posted:

What's a decent "max sane temp" to shoot for with a gtx 750? I've got it sitting stable at 1410MHz base and a 5.8GHz memory overclock. It is jammed in a tiny ssf case with a single 80mm intake fan and peaks in the mid seventies temps during synthetic benchmarks. Can I push for any higher, or should I just be happy with what I have? It's already like a 40% overclock and I seriously didn't expect that from a tiny, quiet card like this.

It's the standard 95 degrees, but I dont know if it throttles before that. I highly doubt you'll hit that before you hit stability issues though I'm not sure I'd worry about it

# ? Sep 9, 2014 19:24

SeaGoatSupreme: Dec 26, 2009; Ask me about fixed-gear bikes (aka "fixies")

1gnoirents posted:

It's the standard 95 degrees, but I dont know if it throttles before that. I highly doubt you'll hit that before you hit stability issues though I'm not sure I'd worry about it

Stability goes down the tubes at 82c/1550MHz, if I pop off the side panel to this case I'm pretty sure I could keep this stable around 1525MHz. It's absurd. Why are the stock clocks on these things so low? Did I get a miracle chip?

# ? Sep 10, 2014 00:39

Hace: Feb 13, 2012; <<Mobius 1, Engage.>>

What are you using to test it? Have tried Unigine Heaven?

# ? Sep 10, 2014 00:41

1gnoirents: Jun 28, 2014; hello

Yes, that is ludicrous, to me at least. I think 1300 is more average. Don't be shocked if it crashes during actual gameplay but if its stable with Heaven (or something) you won't have to back off much

efb

# ? Sep 10, 2014 00:42

SeaGoatSupreme: Dec 26, 2009; Ask me about fixed-gear bikes (aka "fixies")

I've been using Heaven and just letting it sit for a half hour waiting for temps to stabilize/it to crash, whichever happens first. It hasn't been throttling, so I'm gonna just assume it really is a miracle chip then! I think I'll leave it at 1450ishMHz and be done with it, the temp spikes a bit to get that last little bit of speed and it's not worth cooking my cpu in this tiny case.

# ? Sep 10, 2014 00:51

Zero VGS: Aug 16, 2002; ASK ME ABOUT HOW HUMAN LIVES THAT MADE VIDEO GAME CONTROLLERS ARE WORTH MORE; Lipstick Apathy

I don't know much about encoding, but does anyone here know if GPUs with hardware H.264 will be able to accelerate H.265 in any way? Or will current CPUs be able to encode it effectively? I'm very interested in how it can give the same video quality at roughly half the bitrate because I stream a lot of my games over Steam In-Home Streaming and Comcast is starting to look at me funny when I'm maxing out my 10mbps up for hours on end.

1gnoirents posted:

It's the standard 95 degrees, but I dont know if it throttles before that. I highly doubt you'll hit that before you hit stability issues though I'm not sure I'd worry about it

You can't overvolt the card and even at the highest clocks on single fan models I haven't seen it come close to 95c, just clock it as high as you can without crashing.

# ? Sep 10, 2014 01:34

BurritoJustice: Oct 9, 2012

1gnoirents posted:

Yes, that is ludicrous, to me at least. I think 1300 is more average. Don't be shocked if it crashes during actual gameplay but if its stable with Heaven (or something) you won't have to back off much

efb

1300 isn't really average. There are 750ti's that have stock clocks of like 1250 and still have mad headroom. I have two friends with 750ti's and both get 1400+. They are really good cards.

# ? Sep 10, 2014 01:35

1gnoirents: Jun 28, 2014; hello

Yeah I'm just basing that off of vague reviews I remember from a while ago.

# ? Sep 10, 2014 01:37

Khagan: Aug 8, 2012; Words cannot describe just how terrible Vietnamese are.

At worst the September 18th event is an announcement of an announcement. At best it's just a paper launch with JHH holding an ES or wooden mockup.

# ? Sep 10, 2014 01:50

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

Zero VGS posted:

I don't know much about encoding, but does anyone here know if GPUs with hardware H.264 will be able to accelerate H.265 in any way? Or will current CPUs be able to encode it effectively? I'm very interested in how it can give the same video quality at roughly half the bitrate because I stream a lot of my games over Steam In-Home Streaming and Comcast is starting to look at me funny when I'm maxing out my 10mbps up for hours on end.

Don't consider this definitive, but I don't think so re: any current hardware being h.265-ready. H.265 is like... 5x to 10x more compute intense at the same resolution, and nobody has fixed-function encode/decode ready. Broadwell will have only partial hardware decode - most of the algorithm will actually be run on shader hardware in order to make it work. Nvidia and AMD haven't said anything, but I think that means they're not doing much better.

In terms of encoding, well:

That's frames per second. 8-core Haswell-E can manage about 2.5 FPS of encode. Now, 4K is between 4x and 16x more complex than 1080p in h.265, but that's still going to end up as 10-40 FPS using an entire octocore CPU.

It's possible to offload this to the GPU and see great performance gains (there are some OpenCL encode/decode packages being developed), but without fixed-function hardware, that's going to come at the expense of game performance.

# ? Sep 10, 2014 01:53

Don Lapre: Mar 28, 2001; If you're having problems you're either holding the phone wrong or you have tiny girl hands.

Factory Factory posted:

Don't consider this definitive, but I don't think so re: any current hardware being h.265-ready. H.265 is like... 5x to 10x more compute intense at the same resolution, and nobody has fixed-function encode/decode ready. Broadwell will have only partial hardware decode - most of the algorithm will actually be run on shader hardware in order to make it work. Nvidia and AMD haven't said anything, but I think that means they're not doing much better.

In terms of encoding, well:

That's frames per second. 8-core Haswell-E can manage about 2.5 FPS of encode. Now, 4K is between 4x and 16x more complex than 1080p in h.265, but that's still going to end up as 10-40 FPS using an entire octocore CPU.

It's possible to offload this to the GPU and see great performance gains (there are some OpenCL encode/decode packages being developed), but without fixed-function hardware, that's going to come at the expense of game performance.

Isn't encoding a lot more stress than decoding though?

# ? Sep 10, 2014 02:41

Zero VGS: Aug 16, 2002; ASK ME ABOUT HOW HUMAN LIVES THAT MADE VIDEO GAME CONTROLLERS ARE WORTH MORE; Lipstick Apathy

Don Lapre posted:

Isn't encoding a lot more stress than decoding though?

Probably, but I need to both encode and decode; I've streaming from my house to my friends houses and work so that I don't have to lug a Deep Silence everywhere I go.

Edit: Kinda crazy that a Raspberry Pi can encode h.264 but nothing can handle h.265 so far.

Zero VGS fucked around with this message at 02:52 on Sep 10, 2014

# ? Sep 10, 2014 02:47

DaNzA: Sep 11, 2001; :D; Grimey Drawer

Apple's A8 can already do Facetime with H.265 for some magical reasons :q:

Which means it should be able to do encode/decode using the same chip later on.

# ? Sep 10, 2014 08:14

Methylethylaldehyde: Oct 23, 2004; BAKA BAKA

Zero VGS posted:

Probably, but I need to both encode and decode; I've streaming from my house to my friends houses and work so that I don't have to lug a Deep Silence everywhere I go.

Edit: Kinda crazy that a Raspberry Pi can encode h.264 but nothing can handle h.265 so far.

h.264 and h.265 are hilariously asymmetric in terms of horsepower to encode vs. horsepower to decode. 1200 seconds to encode a video using the x.264 HIGH profile, ~9 seconds to decode it using the same processor. That's the cost of having such nice video in such a small package. 15 years ago it would have been witchcraft of the highest sort. Now it's just that thing we use to share cat videos online.

# ? Sep 10, 2014 08:43

Lowen SoDium: Jun 5, 2003; Highen Fiber; Clapping Larry

Zero VGS posted:

Kinda crazy that a Raspberry Pi can encode h.264 but nothing can handle h.265 so far.

Not really considering that the Raspberry Pi has a hardware encoder to allow it to be able to encode H.264 at a decent rate, where as H.265 is pretty new and there really aren't any Hardware encoders available yet so it has to be done (mostly) in software. H.265 is also an order of magnitude more complex than H.264 is.

It will be a while before you see H.265 streaming in Steam, not just for the reasons above, but also because it will be a while before low-latency encoders and decoders are available for it as well.

# ? Sep 10, 2014 12:45

Factory Factory: Mar 19, 2010; This is what
Arcane Velocity was like.

A bit late compared to other sites, but AnandTech finally reviewed the Radeon R9-285. They're indeed calling it GCN 1.2, and there's a bit more in there about what makes it different and how it keeps performance up at a lower thermal budget point. Besides the framebuffer compression and geometry frontend improvements, the card also works its mojo by hobbling FP64 from 1/4th FP32 to 1/24th (1/16th? I think the article says two different things).

As we saw in GK110 in Titan (and reinforced in the GeForce 780), FP64 is a power-hungry, compute-focused feature. By throttling FP64 compute, a lot of thermal budget is freed up. In the case of the Titan and other GK110 products, the budget is repurposed to higher clocks for increased graphics/gaming performance. On the R9-285, it was used to deliver R9-280 performance at only 10W more than an R9-270X.

But that's not to say that the R9-285 isn't a compute card. Rather, it's built for efficient and consumer-focused compute. It introduces FP16 instructions particularly helpful to media and content creation GPGPU - especially in power-constrained mobile GPUs. GCN 1.2 also has some (currently black-box) improvements to GPGPU task scheduling and context-switching. This is presumably the product of AMD's 2012 roadmap goal to bring APU HSA features to discrete GPUs. That said, a new uarch, less RAM bandwidth, and less RAM - expect both regressions and advantages depending on workload.

The hardware video decoder is up to 4Kp60 h.264 (profile 5.2). Apparently previous GCN decoders were supposed to do profile 5.1/4Kp30, but there were bugs and they left it disabled. But now AMD has caught up to the Nvidia Maxwell and Intel Broadwell decode blocks. And the R9-285 is the best gaming card that can do 4Kp60, though I'm sure that won't last long. The encoder block has also been buffed.

Conspicuously missing, though? H.265 support in any form. Nobody has a full hardware decode block yet, but AMD isn't even doing a hybrid shader-based decode like Nvidia and Intel are. Though there's nothing stopping AMD from adding it in a driver update, like Nvidia just did for Kepler.

It looks like companies are re-using their R9-280 coolers, so on the lower-TDP R9-285, cards should be quiet as balls. E: Or at least you'd think. Apparently despite a 60W board power reduction, the at-the-wall savings was only 13W. :wtc:

Article suggests that a stock R9-280 doesn't draw its total board power when gaming.

Mantle is a big issue, surprisingly. Because it's so low-level, Mantle on Tonga is not optimized. Mantle is apparently RAM-hungry, and Tonga has less RAM. It's impressive that such a low-level API runs at all on a new hardware revision, and further that there are no bugs or errors. But performance of Mantle + Tonga on BF4 and Thief is below Direct3D performance. Bringing performance up to snuff is solved through game patching, too, rather than driver updates. So, y'know, good luck there.

The article talks about everything I mentioned here and more, but in more depth. Pro read for giant nerds.

Factory Factory fucked around with this message at 21:32 on Sep 10, 2014

# ? Sep 10, 2014 21:28

1gnoirents: Jun 28, 2014; hello

I think they'd have been better off not releasing that card before others :/ regardless of its actual improvements, I can guess what the general gpu buying public is going to think of it.

# ? Sep 10, 2014 22:29

Agreed: Dec 30, 2003; The price of meat has just gone up, and your old lady has just gone down

1gnoirents posted:

I think they'd have been better off not releasing that card before others :/ regardless of its actual improvements, I can guess what the general gpu buying public is going to think of it.

I was going to be a jackass and say

"That it is very good at what it does and AMD isn't asleep at the wheel?"

but then I read http://store.steampowered.com/hwsurvey/videocard/ and woah

Edit: For what it's worth, this particular line of criticism is probably misplaced if for no other reason than what's good for the goose is good for the gander - nV launched the 750Ti to show off what they can do with architectural improvements on the same process, now it's AMD's turn. :shrug:

Agreed fucked around with this message at 00:11 on Sep 11, 2014

# ? Sep 10, 2014 23:52

Hace: Feb 13, 2012; <<Mobius 1, Engage.>>

Agreed posted:

but then I read http://store.steampowered.com/hwsurvey/videocard/ and woah

For what it's worth, I don't think this is able to tell who has an r7/r9 card for whatever reason.

# ? Sep 11, 2014 00:17

1gnoirents: Jun 28, 2014; hello

Agreed posted:

I was going to be a jackass and say

"That it is very good at what it does and AMD isn't asleep at the wheel?"

but then I read http://store.steampowered.com/hwsurvey/videocard/ and woah

Edit: For what it's worth, this particular line of criticism is probably misplaced if for no other reason than what's good for the goose is good for the gander - nV launched the 750Ti to show off what they can do with architectural improvements on the same process, now it's AMD's turn.

That list is somewhat shocking. But I wonder what the "other" category is, since that is a massive percentage.

I tried to compare the 285 to the 750ti, but it just doesn't have that same feel. The 750ti was new and exciting. I know these aren't tangible or real things I'm talking about but I feel like this release could have been handled a little differently.

# ? Sep 11, 2014 01:23

Seamonster: Apr 30, 2007; IMMER SIEGREICH

Factory Factory posted:

R9 285

I was hoping for an unfucking of Crossfire so 2 of these would crank on 4K gaming reasonably well but then 2 GB of 256 bit wide memory? Also means the added expense of the 4GB cards will make crossfiring even more of a no go.

# ? Sep 11, 2014 01:37

Sidesaddle Cavalry: Mar 15, 2013; Oh Boy Desert Map

970 stuff is leaking in parts of the interwebs. Was there a GM210 or something professional-application related for Maxwell hinted in any roadmap? I can smell money burning already at the possibility of another derivative for consumers again.

# ? Sep 11, 2014 03:05

Yaoi Gagarin: Feb 20, 2014

Seamonster posted:

I was hoping for an unfucking of Crossfire so 2 of these would crank on 4K gaming reasonably well but then 2 GB of 256 bit wide memory? Also means the added expense of the 4GB cards will make crossfiring even more of a no go.

I'm pretty sure two of those wouldn't be enough for 4K even with more VRAM. 4K is in 2x780ti/2x290x territory right now.

# ? Sep 11, 2014 03:15

The Lord Bude: May 23, 2007; ASK ME ABOUT MY SHITTY, BOUGIE INTERIOR DECORATING ADVICE

VostokProgram posted:

I'm pretty sure two of those wouldn't be enough for 4K even with more VRAM. 4K is in 2x780ti/2x290x territory right now.

Even a pair of 780ti or 290Xs is only just good enough, kinda like using a gtx770 for 1440p.

# ? Sep 11, 2014 03:27

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 20:45

Guni: Mar 11, 2010

Is the 970/980 going to be a significant improvement over say a 290 at 1440p? Debating whether to pick a 290 up now or wait till they're out.

Do we know prices?

# ? Sep 11, 2014 09:18

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > GPU Megat[H]read - the cores of wrath grew heavy on the die that day

«‹›3877 »