|
hi i'm a gpu guy i work on the grpahics at video games and previously i worked on the mali gpu driver to try and make it not suck (i wasnt successful) i'm planning on some actual seriouspost about why gpus suck but for now, post about your matrox g200 station or your $800 s3 savage card agp master race
|
# ? Sep 29, 2017 07:17 |
|
|
# ? May 4, 2024 16:05 |
|
let's start off with what a gpu actually is. you know what a cpu is, right? imagine like a pentium 2. it's single-core and sucks at branching and if you interrupt it it will have to think for a while before caring. if they crash or segfault the whole gpu basically breaks so DON'T DO THAT. on all the console platforms i work on if we have a gpu crash we have to literally unplug the kit and reboot it from scratch. now imagine thousands of them grouped up in bizarre ways, all running in parallel. for strange and unnecessary reasons gpu vendors don't actually prefer to tell you how many of these cores it has but even consumer gpus have thousands. nvidia calls them "cuda cores" and amd calls them "stream processors" but theyre basically, not very powerful cpus. they don't need to be powerful. usually there's 2 or 4 cpus (w/ alu) per fpu. the way people program these things in the graphics world is known as a "shader". the gpu vendor provides a shader compiler which compiles from some high-level source material to a gpu-specific (and sometimes private) isa. usually this is shader model bytecode, but a lot of gpus also support glsl (poorly) and some others support spir-v amd and intel publish their isa's. no other vendors do. people have reverse engineered others of these isa's over time. for instance, the people behind the nouveau open source drivers have some docs on tesla's gpu isa. these really aren't complicated cores. let's talk about pixel shaders for now. when the gpu wants to render a polygon, each individual tiny pixel that is inside that polygon is calculated through the shader. scheduling these things often happens in groups of 16. nvidia calls this a "warp", i think most other vendors either don't have a name or say "thread group". so, for instance, there are 16 pixels being calculated at a time, in a 4x4 box. this is done because on a cpu, instruction decode takes a lot of surface area, so they bunch these gpus up into a large number and run them together in lockstep. this also means that branching and looping can be pretty expensive. if one pixel shader needs to loop 10 times but the other 15 only need to loop once, all 16 pixel shaders will loop 10 times but those instructions will basically "not work" -- the memory writes they do will be masked away. in the early days of gpu's you didn't even have dynamic looping -- loops were implemented by being completely unrolled (copy/pasting the code N times). branching was also rare and expensive, and was done with a conditional mov rather than actual flow control. by the way i'm mostly talking about the big desktop vendors: amd / intel / nvidia. android is a whole other shitshow: vivante, arm mali, powervr, adreno are the big players i think now that broadcom and ti dropped out. hopefully this was interesting and cool. ask me questions about stuff. maybe next time we'll talk about tiling gpus vs. "brute force gpus"
|
# ? Sep 29, 2017 08:28 |
|
oh wait first i wanna say. gpu drivers suck balls. every single gpu driver cheats and is absolutely horrible to work with. games vendors do not get secret access to any special nvidia debugging tools or anything. they will strcmp your process name and do nasty tricks and basically bite you when you least expect it. let me give a quick example. back in the 90s, engines wanted their games to go fast so they started preloading all their textures at level load time. nvidia noticed that engines were preloading a texture to the gpu and then not actually using it right away, so they decided to "optimize their driver" by not actually loading the texture into the gpu until you actually draw with it. this caused frame drops mid-level when the texture loaded in but nvidia got to market their lovely dumb driver as "loads faster" and gaming magazines tested this and consumers bought the marketing hype. so then games, during load times, draws a 1x1 square off screen in the top left to convince nvidia to actually load the texture. nvidia started checking for 1x1 sized rect draws and then skipping those too. so the typical advice nowadays is to actually draw "random" sized rects (the engine i work on has a pattern that alternates between 1x4 and 1x6) so nvidia's driver actually loads poo poo at the loading screen. also since nvidia did it and got fast review scores every other vendor started doing it, but slightly differently because nvidia didnt exactly tell people how the trick worked precisely. so now everybody has to do this stupid rear end trick and has to figure out the "right" way to do it so it works on all gpus and this is just tribal industry knowledge. this is one example of many that i could give you about how lovely drivers are. the vendors have also lobbied consistently against compliance suite tests because they claim they cant actually pass any of these "unrealistic" tests without breaking a ton of games and apps. nvidia also does not give you any ability to debug their gpu's. there's no printf debugging, there's no breakpoint / step debugger, there's no disassembler to figure out what your shader code compiles into. with every major aaa game release there is an associated nvidia driver that "optimizes" the game. i'd say more about what poo poo this actually is but i've already posted too much cool guy and forums user baldurk writes renderdoc for a living and it's the best thing we have in terms of gpu debugging. all it does is record direct3d api calls and save them and allow us to replay them and tweak the parameters. he doesn't work for a gpu vendor.
|
# ? Sep 29, 2017 08:38 |
|
wat
|
# ? Sep 29, 2017 09:02 |
|
I had an s3 card once (s3 virge or something) and it was a total piece of poo poo but then again it was 2d only which is still a concept I find weird to think about. it was one of those pieces of hardware that would have a specific box on system requirements for "this will not run on the following, get a good computer you scrub" op, have you considered coding your drivers in node.js???
|
# ? Sep 29, 2017 09:08 |
|
hm, you're painting a very grisly picture regarding debug tools, why is there a voice in my head telling me that the situation is not _that_ grim..? maybe i'm thinking of working with mesa or something? are you completely off linux nowadays? since you mention direct3d and such
|
# ? Sep 29, 2017 09:16 |
|
I haven't used Linux for like a year
|
# ? Sep 29, 2017 09:40 |
|
bah, tried to find my opencl error log with "struct float3 aka struct ___do_not_use_float3_type_not_supported___" but it's nowhere to be found
|
# ? Sep 29, 2017 09:41 |
|
Someone in the hn thread wanted me to make a thread for my gpu ramblings.
|
# ? Sep 29, 2017 09:41 |
|
Suspicious Dish posted:Someone in the hn thread wanted me to make a thread for my gpu ramblings. ty I look forward to reading more of this thread also the fact that it's the Windows GPU drivers that detect and "optimize" the render pipelines for AAA titles may just be a little bit of why there's a performance difference with the same games on the same hardware under other operating systems (of course there's plenty of blame to go around, there are also developers who can't handle API impedance mismatch)
|
# ? Sep 29, 2017 10:01 |
|
what do you think of the use of GPGPU stuff for machine learning, do you think it'll lead to GPUs being more consistent due to the need for repeatability?
|
# ? Sep 29, 2017 10:03 |
|
also MechWarrior 2 with hardware acceleration on the Mac was real nice, I wish it hadn't been a pack-in with only certain cards, it would've been awesome on a G4 with a Radeon something
|
# ? Sep 29, 2017 10:06 |
|
I used to have to go to many trade show meetings with the HW vendors, listen to their waffle, and then try to blag as much free hardware as possible for the studio. I never had the heart to tell the vendor that unless they gave us money to cover the costs, we would mostly ignore what they said and just try to get the drat thing running on the shittiest 3 year old gpus out there (S3s - hah!*) as that's what most customers actually have. if it worked on them, then high end cards would work too. * nVidia was first out the gate with HW vertex shaders, S3 were a year or so behind. unfortunately they discovered a crash bug in the clipping after they had built the cards, so the last S3 consumer cards had it all turned off. then they had to focus on cheap OEM stuff that made embedded intel gpus look good. then they went bust. which was a shame cos the S3 trade show tactic would be hand you hw the minute you walked in (we didn't want it), and then crack open a case of beer and bitch about ati/nvidia being boring farts at trade shows.
|
# ? Sep 29, 2017 10:10 |
|
eschaton posted:what do you think of the use of GPGPU stuff for machine learning, do you think it'll lead to GPUs being more consistent due to the need for repeatability? I have no thoughts on anything related to that except that apparently Google uses machine learning and somehow Google Now thinks I live twenty miles south of where I do so I don't trust any of that junk If machine learning is going to continue to be a giant black box of an algorithm then the GPUs can be as inconsistent as they want and nobody cares
|
# ? Sep 29, 2017 10:17 |
|
Suspicious Dish posted:back in the 90s, engines wanted their games to go fast so they started preloading all their textures at level load time. nvidia noticed that engines were preloading a texture to the gpu and then not actually using it right away, so they decided to "optimize their driver" by not actually loading the texture into the gpu until you actually draw with it. this caused frame drops mid-level when the texture loaded in but nvidia got to market their lovely dumb driver as "loads faster" and gaming magazines tested this and consumers bought the marketing hype. This is really interesting; I knew the big card manufacturer's "cheat" for benchmarks and stuff like that (I remember reading that they could detect when 3dmark was playing and change their behavior) but I never assumed that cheating would have adverse effect on games... So now nvidia can claim it "loads textures faster" but it has to explain why its frame rates drop randomly mid game... or did they pass that blame off on the game developers? Do triple a studios really not have access to nvidia/ati reps? I would have assumed they'd fall over themselves to either give developers tips or even change their driver behavior per title (I had thought the "game specific driver tweaks" were made IN CONJUNCTION with the game developers, not totally isolated from them) . Why wouldn't either want developers/the studio earnestly saying "our poo poo works better with <whatever> because they worked with us and made sure it did.
|
# ? Sep 29, 2017 13:49 |
|
ADINSX posted:So now nvidia can claim it "loads textures faster" but it has to explain why its frame rates drop randomly mid game... or did they pass that blame off on the game developers? Oh sweet summer child. Of course they passed on the blame.
|
# ? Sep 29, 2017 13:56 |
|
so I'm trying to get ffmpeg running nice and fast on my dinky old server box just for fun and I notice it has CUDA support, great cool. I go and get my old graphics card and shove it in (only like 5 years old at this point) and no actually it's the wrong kind of CUDA you need the kind of CUDA that does video, which is only available on the newest cards. what the heck?
|
# ? Sep 29, 2017 14:36 |
|
this is extremely interesting and I love video cards that go vroom vroom which is why I use only the top nvidias cards for my vidogames good thread Op Edit : first 3d card: voodoo extreme 3 feet long version
|
# ? Sep 29, 2017 14:44 |
|
Suspicious Dish posted:so then games, during load times, draws a 1x1 square off screen in the top left to convince nvidia to actually load the texture. nvidia started checking for 1x1 sized rect draws and then skipping those too. so the typical advice nowadays is to actually draw "random" sized rects (the engine i work on has a pattern that alternates between 1x4 and 1x6) so nvidia's driver actually loads poo poo at the loading screen. I thought vulkan/metal/whatever was supposed to put an end to this.
|
# ? Sep 29, 2017 14:45 |
|
akadajet posted:I thought vulkan/metal/whatever was supposed to put an end to this. my prediction: it will put an end to it until a manufacturer notices they can juice their benchmark numbers by ignoring the spec (iow, it won't at all, because they'll be doing out-of-spec, visually lovely-looking, but better numbers tricks right from the beginning)
|
# ? Sep 29, 2017 14:48 |
|
i thought you could step through debug shaders. i've never done it, but nvidia's nsight tool advertises it can - catch is you need a second machine since the GPU is halted
|
# ? Sep 29, 2017 14:50 |
|
so i was doing some ML stuff at work and we have a bunch of servers with as many titans as we can fit/buy at the time because there are shortages shoved into them for this purpose outside of the display card, OSes and GPGPU libraries have no real concept of handling scheduling so processes can't share cards; if someone starts using one they're going to bogart the whole thing no matter what they're doing this is really great in a research environment because people love repls and jupiter notebooks so they will hold onto cards even when their program is doing absolutely nothing i used to have a script to grab a card and loop forever on demonstration days so I wouldn't be surprised by finding no free resources
|
# ? Sep 29, 2017 15:02 |
|
opencl or cuda? or are they both terrible also thanks for the thread, always wondered how much more fuckery is going on after the quake/quack thing and what is nvidia doing with those 'game optimized' drivers
|
# ? Sep 29, 2017 15:06 |
|
lancemantis posted:
it's so horrible i wonder how the big hpc systems do it
|
# ? Sep 29, 2017 15:26 |
|
Thank you for this thread suspicious dish
|
# ? Sep 29, 2017 16:17 |
|
nice thread. I know gpu vendors, like all proprietary hardware vendors, are just terrible shitheads about hiding their implementation. however, we do have an unreliable narrator here. my experience with game devs is that they will do the most awful hacks imaginable if they find that it saves them a microsecond of render time. and of course those hacks break on every new chip / os / driver version.
|
# ? Sep 29, 2017 16:25 |
|
ADINSX posted:Do triple a studios really not have access to nvidia/ati reps? I would have assumed they'd fall over themselves to either give developers tips or even change their driver behavior per title (I had thought the "game specific driver tweaks" were made IN CONJUNCTION with the game developers, not totally isolated from them) . Why wouldn't either want developers/the studio earnestly saying "our poo poo works better with <whatever> because they worked with us and made sure it did. really, really big game releases (think Skyrim or Call of Duty) have 2-3 nv/amd reps dedicated to the project. i don't believe they get anything special with debugging tools (though i'm not 100% sure). my understanding is that they are special reps that basically rewrite shaders for the game (this is extremely common, it's caused breakage before, i'll explain later) and yeah write special code paths to make sure that the game runs as fast as possible and help answer debug / support questions. i know that the studio i work for doesn't have dedicated support engineers.
|
# ? Sep 29, 2017 16:46 |
|
akadajet posted:I thought vulkan/metal/whatever was supposed to put an end to this. it's really supposed to. vulkan has all the right ideas: an optional open-source validation layer, an optional open-source debugging layer and associated tooling, and a really thin api surface that's hard to optimize. basically the biggest thing left is the shader compiler since it's basically the biggest thing, surface-wise. they'll probably continue to "cheat" there (swap in hand-optimized shaders for big, important things since their shader compiler has to run real-time and be fast and can't do that much in terms of optimization)
|
# ? Sep 29, 2017 16:51 |
|
akadajet posted:I thought vulkan/metal/whatever was supposed to put an end to this. explicit apis make it much harder for the driver to guess at what is going on, especially without incurring significant overhead. nvidia is still going to inject their own micro-optimized shaders so they can put "18% performance increase in aaa shootman game!" in the driver release notes though.
|
# ? Sep 29, 2017 16:51 |
|
Spatial posted:i thought you could step through debug shaders. i've never done it, but nvidia's nsight tool advertises it can - catch is you need a second machine since the GPU is halted yeah they're slowly getting better about this but the last time i tried n-sight it didn't really work. the watch window basically was missing every variable, the single-stepping went all over the place (optimization), and on some shaders simply running it under debugging mode influenced the result (usually implies some sort of undefined behavior caused by / taken advantage of by optimization). it very well might have been our fault, but debugging is the tool that's supposed to help find that!
|
# ? Sep 29, 2017 16:54 |
|
fritz posted:it's so horrible hpc systems don't share node resources; they just try to pack what they can into slices of the cluster
|
# ? Sep 29, 2017 16:58 |
|
this thread needs more power
|
# ? Sep 29, 2017 17:00 |
|
[quote="“Suspicious Dish”" post="“476887625”"] yeah they’re slowly getting better about this but the last time i tried n-sight it didn’t really work. the watch window basically was missing every variable, the single-stepping went all over the place (optimization), and on some shaders simply running it under debugging mode influenced the result (usually implies some sort of undefined behavior caused by / taken advantage of by optimization). it very well might have been our fault, but debugging is the tool that’s supposed to help find that! [/quote] so basically at about the same level as every other embedded debugging toolchain out there.
|
# ? Sep 29, 2017 17:03 |
|
honestly yeah. it's maybe not as bad as i think it is i just know that nsight was less of a help than renderdoc and tracking down the nan manually by excluding terms. most shaders by definition are functional things so there's rarely any major state and you can run them over and over without affecting too much.
|
# ? Sep 29, 2017 17:06 |
|
posting in a suspicious dish thread I just started doing some work with directx8 because I'm a weirdo so I'm definitely interested here
|
# ? Sep 29, 2017 17:29 |
|
Encrypted posted:opencl or cuda? i think the only dnn framework with any opencl support beyond 'lol good luck' is caffe
|
# ? Sep 29, 2017 18:04 |
|
the biggest of the big games might have direct support from amd/nvidia but when i was working on some AAA "big" games but not like...super mega huge, we'd have an nvidia or amd rep come by the studio with a handful of new cards and be like "hey can you give these to the graphics guys to make sure they all work with your, here's my business card, let me know if you guys want to do a marketing promo or something" the programmers would take those cards home and put them in their gaming rigs, and we'd just continue working on whatever 3 year old graphics cards were in the work computers.
|
# ? Sep 29, 2017 18:37 |
|
Lightbulb Out posted:this thread needs more power
|
# ? Sep 29, 2017 18:55 |
|
anthonypants posted:what in the gently caress it's an EVGA EPOWER V. it's for mad overclocking, yo. http://forum.kingpincooling.com/showthread.php?t=3900 the gpu hardware is real interesting to me and im curious if anyone has any interesting tales
|
# ? Sep 29, 2017 19:06 |
|
|
# ? May 4, 2024 16:05 |
|
The_Franz posted:explicit apis make it much harder for the driver to guess at what is going on, especially without incurring significant overhead. nvidia is still going to inject their own micro-optimized shaders so they can put "18% performance increase in aaa shootman game!" in the driver release notes though. ~challenge eeeeverything~
|
# ? Sep 29, 2017 19:09 |