Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
pyrotek
May 21, 2004



MarsellusWallace posted:

I've seen this bandied about a bunch of times - what does fast storage get you? I went from a Samsung 850 to a fast NVMe drive and other than file transfers have seen basically no difference. Even load times 'feel' very similar, and I have seen no impact whatsoever to framerates. NVMe has been out long enough that surely developers have had time to take advantage of whatever gains could be had. Is it marketting bupkiss, or will game engines now actually benefit from blazing fast storage?

You are comparing going from an SATA SSD to an NVMe SSD. I was comparing to the current generation consoles, so they are going from a platter drive to a SSD that is at least equivalent to the best currently available on PC.

As far as how it will benefit games, we shall see. This article is something of a puff piece but still contains some interesting information.

gradenko_2000 posted:

The 3600 is 3.6 Ghz base / 4.2 Ghz turbo
The 3600X is 3.8 Ghz base / 4.4 Ghz turbo
The 3600XT is 4.0 Ghz base / 4.7 Ghz turbo (so +200 Mhz on the base, +300 Mhz on the turbo, compared to the 3600X)

The 3700X is 3.6 Ghz base / 4.4 Ghz turbo
The 3800X is 3.9 Ghz base / 4.5 Ghz turbo
The 3800XT is 4.2 Ghz base / 4.8 Ghz turbo (so +300 Mhz on the base and turbo, compared to the 3800X)

The 3900X is 3.8 Ghz base / 4.6 Ghz turbo
The 3900XT is 4.1 Ghz base / 4.8 Ghz turbo (so +300 Mhz on the base, +200 Mhz on the turbo, compared to the 3900X)

So are the XT chips expected to take the old retail prices of the X chips?

sincx posted:

the simple answer is that the Sony undersized the PS5 GPU compared to the Xbox Series X (to about the same extent as when Microsoft undersized the Xbone's GPU compared to the PS4), so Sony is trying to get whatever marketing edge they can

no amount of fast storage can make up for the PS5's GPU being hobbled with only 36 CUs, vs the Xbox X's 52 CUs

The CUs on the PS5 are also clocked 22% faster, so the difference isn't as big as it sounds, except for ray tracing, where it will cause a huge deficit in performance.

Adbot
ADBOT LOVES YOU

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast

pyrotek posted:

The CUs on the PS5 are also clocked 22% faster, so the difference isn't as big as it sounds, except for ray tracing, where it will cause a huge deficit in performance.

That's not guaranteed.. The power budget needs to be shifted all to the GPU to provide those clocks, and will result in lower CPU clocks. There's not a scenario in which the PS5 ends up with more compute power than the stupidly named Xbox Series X

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Amd turbo is basically a fantasy clock speed that might be hit by one core for few milliseconds once

I'm more interested in sustained clocks or the average boost that all the cores can hit regularly. Seems like it's a 200-300mhz bump across the board

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast

Malcolm XML posted:

Amd turbo is basically a fantasy clock speed that might be hit by one core for few milliseconds once

Intel has subscribed to that school of thought these days too

ConanTheLibrarian
Aug 13, 2004


dis buch is late
Fallen Rib
Not to derail the thread too much, but considering the custom compression hardware in the PS5, does that mean a PC with the latest PCIe4 NVMe drive wouldn't be able to pull the same trick with streaming textures?


I suppose if the answer is no, there's always the simple and effective alternative of having enough RAM to store the textures in the first place.

SwissArmyDruid
Feb 14, 2014

by sebmojo
I mean, they can, once the industry stops limiting NVME drives to x4?

What's the number they cite, 5 GB/sec? As soon as you get some more pins in there that's toast. Even on PCIe 3, x8 lanes will give you 7.88 GB/s of throughput. Which, since PCIe operates on doubling, means you'll be able to hit that on x4 lanes of PCIe 4.

Like, we only just started getting mainstream PCIe 4 SSDs right? Wasn't Samsung showing off a PCIe 4 NVME drive at CES 2020 like, a billionty years ago?

SwissArmyDruid fucked around with this message at 17:27 on Jun 5, 2020

repiv
Aug 13, 2009

to make up for the lack of decompression hardware with sheer physical bandwidth you'd also need to data to be uncompressed on the drive though

hope you don't mind 400gb games

SwissArmyDruid
Feb 14, 2014

by sebmojo
....or you could just download that poo poo compressed/copy it off the physical media compressed, and leave that poo poo compressed at all points up until after it gets streamed off the drive and into the hands of the CPU. You know, instead of compressing it and then decompressing it. Drive space is still at a premium here, they can't just pop the lid off and add another terabyte alongside.

Actually, in that regard, this is slightly brilliant on behalf of Sony. Assuming they can just copying things straight from the disk onto the drive without having to decompress anything, install times should be faster. Do we know what format they're using for compression?

SwissArmyDruid fucked around with this message at 17:40 on Jun 5, 2020

repiv
Aug 13, 2009

SwissArmyDruid posted:

....or you could just download that poo poo compressed/copy it off the physical media compressed, and leave that poo poo compressed at all points up until after it gets streamed off the drive and into the hands of the CPU. You know, instead of compressing it and then decompressing it. Drive space is still at a premium here, they can't just pop the lid off and add another terabyte alongside.

You can't decompress gigabytes per second on a consumer CPU even in a vacuum, nevermind do it in the background while also running the rest of a game engine. The consoles can only get away with streaming compressed data at those rates because they have the fixed-function block for decompression, on PC you'd have no choice but to store the data uncompressed and throw more physical bandwidth at the problem to achieve the same throughput.

SwissArmyDruid posted:

Actually, in that regard, this is slightly brilliant on behalf of Sony. Assuming they can just copying things straight from the disk onto the drive without having to decompress anything, install times should be faster. Do we know what format they're using for compression?

Sony are licensing the proprietary Kraken codec from RAD: http://www.radgametools.com/oodlekraken.htm

repiv fucked around with this message at 17:52 on Jun 5, 2020

Klyith
Aug 3, 2007

GBS Pledge Week

ConanTheLibrarian posted:

Not to derail the thread too much, but considering the custom compression hardware in the PS5, does that mean a PC with the latest PCIe4 NVMe drive wouldn't be able to pull the same trick with streaming textures?


I suppose if the answer is no, there's always the simple and effective alternative of having enough RAM to store the textures in the first place.

If you assume that a PS5 game is maxing out the storage interface then no, a current PC drive won't be able to do the same. But that's a big if true assumption.

Textures, for example, are already stored compressed on the drive and stay compressed all the way to the GPU. So the sony bandwidth-increasing compression trick doesn't work on them. Now maybe the cool thing about the new generation of hardware is that textures are gonna stay the same as before, and super-detailed models plus ray tracing are gonna be the new hotness. Those would have much more benefit from the custom storage system.



Additionally, there's a question of who is going to make these games that use sustained 8GB/s streaming from storage. That implies a few things about the install size of the game and the number of man/hours to create the assets. I'd love to get a breakdown of how much it cost to make a 7 minute demo trailer.

Cygni
Nov 12, 2005

raring to post

I am remaining fully skeptical that it will matter at all outside of exclusives.

Fantastic Foreskin
Jan 6, 2013

A golden helix streaked skyward from the Helvault. A thunderous explosion shattered the silver monolith and Avacyn emerged, free from her prison at last.

Cygni posted:

I am remaining fully skeptical that it will matter at all outside of exclusives.

:same:

Unless it's very easy or very very necessary to use, I have visions of multiplats ignoring any features exclusive to one console or the other.

SwissArmyDruid
Feb 14, 2014

by sebmojo

repiv posted:

You can't decompress gigabytes per second on a consumer CPU even in a vacuum, nevermind do it in the background while also running the rest of a game engine. The consoles can only get away with streaming compressed data at those rates because they have the fixed-function block for decompression, on PC you'd have no choice but to store the data uncompressed and throw more physical bandwidth at the problem to achieve the same throughput.


Sony are licensing the proprietary Kraken codec from RAD: http://www.radgametools.com/oodlekraken.htm

Hmmmm. Welp, point still stands, even if the 5 GB/sec is compressed data, throw more PCIe lanes at an SSD to bump the throughput up to match.

This is gonna be absolute rear end for anyone not on Zen3 and later, though. =T

repiv
Aug 13, 2009

Klyith posted:

Additionally, there's a question of who is going to make these games that use sustained 8GB/s streaming from storage. That implies a few things about the install size of the game and the number of man/hours to create the assets. I'd love to get a breakdown of how much it cost to make a 7 minute demo trailer.

I don't think sustained streaming is the point outside of initial loading times, the benefit of having that much bandwidth on tap is latency. Developers will be able to use just-in-time streaming where only the assets required to render the current frame are necessarily resident in VRAM, because if the player moves the camera and brings a non-resident asset into view they can make it resident in a single frame. 9GB/sec effective bandwidth means they can pull 150MB of assets from the disk every frame at 60fps, if they need to.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
the compression ratios won't be as high as with a true fixed-function block, but most processors currently sold do have a nice little parallel processor right on the die that's normally sitting there idle while you game that can be used for some decompression type stuff. We call it an "iGPU".

AMD havers need not apply, of course, but most of those users already bought more processor than they really needed in order to make up for things like the lack of a decent hardware video encoding block on their AMD graphics cards (since all AMD users are very important streamers with millions of worldwide followers). would suck for 3600 and 3700X owners though, as they'd have neither the iGPU of a 8700K/9900K nor the pure core brawn of the 3900X.

it really won't actually be a thing outside first-party exclusive titles though

Paul MaudDib fucked around with this message at 18:49 on Jun 5, 2020

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

SwissArmyDruid posted:

I mean, they can, once the industry stops limiting NVME drives to x4?

What's the number they cite, 5 GB/sec? As soon as you get some more pins in there that's toast. Even on PCIe 3, x8 lanes will give you 7.88 GB/s of throughput. Which, since PCIe operates on doubling, means you'll be able to hit that on x4 lanes of PCIe 4.

Like, we only just started getting mainstream PCIe 4 SSDs right? Wasn't Samsung showing off a PCIe 4 NVME drive at CES 2020 like, a billionty years ago?

nvme physically can't do more than x4

just slap em on a riser card if you must have x16, or do what servers do with x4 u.2 drives and just chain em all up

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

repiv posted:

You can't decompress gigabytes per second on a consumer CPU even in a vacuum, nevermind do it in the background while also running the rest of a game engine. The consoles can only get away with streaming compressed data at those rates because they have the fixed-function block for decompression, on PC you'd have no choice but to store the data uncompressed and throw more physical bandwidth at the problem to achieve the same throughput.


Sony are licensing the proprietary Kraken codec from RAD: http://www.radgametools.com/oodlekraken.htm

yes you can, LZO and some other compression algorithms can approach GB/s on a single core. It's just wasting that single core vs spending silicon area is a tradeoff that sony wanted to make.

repiv
Aug 13, 2009

Malcolm XML posted:

yes you can, LZO and some other compression algorithms can approach GB/s on a single core. It's just wasting that single core vs spending silicon area is a tradeoff that sony wanted to make.

Approach 1GB/sec is still a way off from realtime decoding 2.4GB/sec or 5.5GB/sec off the wire :shrug:

Maybe that's enough but :rip: six-core havers like me if we'll need to have an entire core just decrunching data.

Pablo Bluth
Sep 7, 2007

I've made a huge mistake.
I suspect completely bypassing the CPU when loading data to the GPU memory will be more important than the data compression side. Here's a nvidia devblog about their direct gpu storage solution where they claim up to a x8 throughput speedup when you don't have to have the CPU make a copy in main-memory as part of the process.

karoshi
Nov 4, 2008

"Can somebody mspaint eyes on the steaming packages? TIA" yeah well fuck you too buddy, this is the best you're gonna get. Is this even "work-safe"? Let's find out!
Also power. A fly-poop of silicon hardwired to a decompression algorithm is probably 3+ orders of magnitude more power efficient than a big fat OOO x86 core with it's uncore posse. Also also, better latency, relevant for video games.

BlankSystemDaemon
Mar 13, 2009




Some Goon posted:

NVMe only pulls meaningfully ahead at higher queue depths than the average consumer uses their SSD, for games the difference isn't perceptible.

The PS5 is supposed to have dedicated decompression hardware such that it can stream assets directly from storage in 'new and revolutionary' way, but Sony's been pretty mum about the console as a whole so other than some game devs totally swearing it'll change everything, nobody knows.
If it's based off FreeBSD like PS4 was (signs point to yes, since it's supposedly backwards compatible with it), this is as simple as them using ZFS with transparent ARC LZ4 compression, which is done entirely in software.
It loving owns, here's my daily-driver laptops output of 'top':
pre:
last pid: 76109;  load averages:  1.85,  1.40,  1.19; battery: 100%  up 4+23:29:40    22:28:16
94 processes:  1 running, 93 sleeping
CPU:  7.1% user,  0.0% nice,  3.4% system,  0.7% interrupt, 88.8% idle
Mem: 1262M Active, 1601M Inact, 319M Laundry, 11G Wired, 1917M Free
ARC: 8218M Total, 6993M MFU, 182M MRU, 420K Anon, 160M Header, 883M Other
     6574M Compressed, 13G Uncompressed, 2.10:1 Ratio
Swap: 2048M Total, 2048M Free
Basically, I'm getting about 7GB of memory completely free.
With zstandard, it'll likely be closer to 14GB or 21GB with 3:1 or 4:1 ratios which zstd can easily accomplish.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

repiv posted:

Approach 1GB/sec is still a way off from realtime decoding 2.4GB/sec or 5.5GB/sec off the wire :shrug:

Maybe that's enough but :rip: six-core havers like me if we'll need to have an entire core just decrunching data.

The best can hit 4GB/s on a skylake core

Pablo Bluth
Sep 7, 2007

I've made a huge mistake.
https://www.youtube.com/watch?v=4ehDRCE1Z38

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

D. Ebdrup posted:

If it's based off FreeBSD like PS4 was (signs point to yes, since it's supposedly backwards compatible with it), this is as simple as them using ZFS with transparent ARC LZ4 compression, which is done entirely in software.
It loving owns, here's my daily-driver laptops output of 'top':
pre:
last pid: 76109;  load averages:  1.85,  1.40,  1.19; battery: 100%  up 4+23:29:40    22:28:16
94 processes:  1 running, 93 sleeping
CPU:  7.1% user,  0.0% nice,  3.4% system,  0.7% interrupt, 88.8% idle
Mem: 1262M Active, 1601M Inact, 319M Laundry, 11G Wired, 1917M Free
ARC: 8218M Total, 6993M MFU, 182M MRU, 420K Anon, 160M Header, 883M Other
     6574M Compressed, 13G Uncompressed, 2.10:1 Ratio
Swap: 2048M Total, 2048M Free
Basically, I'm getting about 7GB of memory completely free.
With zstandard, it'll likely be closer to 14GB or 21GB with 3:1 or 4:1 ratios which zstd can easily accomplish.

In memory page compression is completely different than the nvme stuff and is frankly something that could be done by either company

Klyith
Aug 3, 2007

GBS Pledge Week

Cygni posted:

I am remaining fully skeptical that it will matter at all outside of exclusives.

Well obviously only Sony exclusives are gonna spend much time optimizing for the Sony-only storage controller. But the baseline of fast storage on both consoles will matter, and may eventually make the sata vs NVMe gap on PC matter.


But it's definitely not gonna be any difference at all for at least 2021, because the new consoles are expected to be pretty expensive and they're launching into an economic collapse. Games gonna do cross-gen for at least a year in all likelyhood. It's gonna be a bit before we even get rid of the HDD stink.

Fantastic Foreskin
Jan 6, 2013

A golden helix streaked skyward from the Helvault. A thunderous explosion shattered the silver monolith and Avacyn emerged, free from her prison at last.

The other thing to consider is that consoles only have 16gb of total memory, whereas your PC will have 16gb of system memory and X dedicated vram, so it's much more important that the console can load and unload assets quickly to minimize ram usage. Of course, it all depends on how developers map out their ram usage.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Klyith posted:

Well obviously only Sony exclusives are gonna spend much time optimizing for the Sony-only storage controller. But the baseline of fast storage on both consoles will matter, and may eventually make the sata vs NVMe gap on PC matter.


But it's definitely not gonna be any difference at all for at least 2021, because the new consoles are expected to be pretty expensive and they're launching into an economic collapse. Games gonna do cross-gen for at least a year in all likelyhood. It's gonna be a bit before we even get rid of the HDD stink.

My assumption, unburdened by actual facts, was that cross-gen games would use higher quality textures/geometry on the newer gen, and use the faster IO path to get them into play as quickly as the older gen systems could handle the lesser assets. That might require some asset duplication, I guess, or using Kraken/etc in software to decompress on older systems, but at least for digital delivery you could sort that all out at manifest or install time, right?

SwissArmyDruid
Feb 14, 2014

by sebmojo

Malcolm XML posted:

nvme physically can't do more than x4

just slap em on a riser card if you must have x16, or do what servers do with x4 u.2 drives and just chain em all up

I think you're conflating NVME with M.2 here. M.2 is electrically limited to x4 lanes because that's what the connector supports. NVME doesn't (shouldn't) give a poo poo what the interface is.

SwissArmyDruid fucked around with this message at 23:25 on Jun 5, 2020

BlankSystemDaemon
Mar 13, 2009




U.2 is the superior interface. :colbert:

v1ld
Apr 16, 2012

Pablo Bluth posted:

I suspect completely bypassing the CPU when loading data to the GPU memory will be more important than the data compression side. Here's a nvidia devblog about their direct gpu storage solution where they claim up to a x8 throughput speedup when you don't have to have the CPU make a copy in main-memory as part of the process.

The DMA from SSD direct to memory with decompression not blowing cpu caches and related data paths is definitely a touted features. Thanks for the NVidia paper link.

I wonder if we'll see GPUs add some of these features over time. Would it be possible to have the GPU fetch data from the SSD and implement the decompress itself? It'll be interesting to see if GPUs implement any of these features directly. Ie., if some of the I/O complex here could move into the GPU (phone screenshot as you can see):


Cerny called the phase of pulling data into memory the Check In and quoted one Zen 2 core as being needed in some cases for just that copy overhead. He also quotes up to 9 x Zen 2 cores for the Kraken decompression. Even assuming those are the lower clock cores in the PS5 and he's taking a high usage boundary case, that's still a lot of cpu in better situations.

Boy, Cerny gives good presentation - he lays out the motivation for each feature so well. Spends a lot of time at the end on the #CUs vs sizeof(CU) discussion so that's obviously a sore point. But still, one of the cooler tech design presentations I've seen in a while. Even the reason for the abysmally slow patching on a PS4 is evident now.

SwissArmyDruid
Feb 14, 2014

by sebmojo

D. Ebdrup posted:

U.2 is the superior interface. :colbert:

Um, ackshually, u.3 is the superior interface.

But all of these interfaces, they can be iterated on. Here's hoping NVMe 2 or some poo poo allows copies to bypass the CPU.

ConanTheLibrarian
Aug 13, 2004


dis buch is late
Fallen Rib

v1ld posted:

He also quotes up to 9 x Zen 2 cores for the Kraken decompression.

Looking forward to justifying the need for a 4950X so as to keep up with the might of the new consoles.

WhyteRyce
Dec 30, 2001

U.3 is unnecessary garbage U.2 supremacy

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

SwissArmyDruid posted:

Um, ackshually, u.3 is the superior interface.

But all of these interfaces, they can be iterated on. Here's hoping NVMe 2 or some poo poo allows copies to bypass the CPU.

Wait, NVMe can’t do direct DMA on copies? I thought it only had to touch the CPU to decompress. That sounds like madness. Can it at least bypass cache for those ops?

Worf
Sep 12, 2017

If only Seth would love me like I love him!

I miss IDE

Kraftwerk
Aug 13, 2011
i do not have 10,000 bircoins, please stop asking


Don’t. Please. SCSI and ISA too goddam. Those ribbon cables give me nightmares and those drat molex connectors that cut my fingertips every time I tried to plug them in or remove them. Or that godawful clump of tangled cables that sat in an unsightly ball in the middle of the computer. We’re seriously spoiled with all those modular power supplies and SSDs that plug straight into the board with no unsightly cables.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

SwissArmyDruid posted:

I think you're conflating NVME with M.2 here. M.2 is electrically limited to x4 lanes because that's what the connector supports. NVME doesn't (shouldn't) give a poo poo what the interface is.

yep, yes m.2 cannot physically deal with more than x4

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

D. Ebdrup posted:

U.2 is the superior interface. :colbert:

kinda surprised it didn't take off. PCIe on a cable is pretty great! slap 4x of them for the gpu...

WhyteRyce
Dec 30, 2001

Subjunctive posted:

Wait, NVMe can’t do direct DMA on copies? I thought it only had to touch the CPU to decompress. That sounds like madness. Can it at least bypass cache for those ops?

You can't DMA to anything other than memory I thought like most other interfaces

Adbot
ADBOT LOVES YOU

Worf
Sep 12, 2017

If only Seth would love me like I love him!

Kraftwerk posted:

Don’t. Please. SCSI and ISA too goddam. Those ribbon cables give me nightmares and those drat molex connectors that cut my fingertips every time I tried to plug them in or remove them. Or that godawful clump of tangled cables that sat in an unsightly ball in the middle of the computer. We’re seriously spoiled with all those modular power supplies and SSDs that plug straight into the board with no unsightly cables.

everything is going according to plan

E; actually i cant even fake it. plopping an m2 ssd right into the mobo fuckin rules

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply