gpgpgpu hot take thread: the "gp" stands for "garbage pile"

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > gpgpgpu hot take thread: the "gp" stands for "garbage pile"

Shame Boy: Mar 2, 2010

so I'm trying to get ffmpeg running nice and fast on my dinky old server box just for fun and I notice it has CUDA support, great cool. I go and get my old graphics card and shove it in (only like 5 years old at this point) and no actually it's the wrong kind of CUDA you need the kind of CUDA that does video, which is only available on the newest cards. what the heck?

# ¿ Sep 29, 2017 14:36

Adbot: ADBOT LOVES YOU

# ¿ May 23, 2024 07:09

Shame Boy: Mar 2, 2010

The_Franz posted:

explicit apis make it much harder for the driver to guess at what is going on, especially without incurring significant overhead. nvidia is still going to inject their own micro-optimized shaders so they can put "18% performance increase in aaa shootman game!" in the driver release notes though.

~challenge eeeeverything~

# ¿ Sep 29, 2017 19:09

Shame Boy: Mar 2, 2010

Suspicious Dish posted:

webgl is honestly pretty cool to get started with and it's where i do all my side project work

have some more random resources:

3d basics talk i gave for a bunch of random developers at my previous linux job https://www.youtube.com/watch?v=v8jUNC5WdYE
source code at https://github.com/magcius/sw3dv

some random open-source things i have written:

https://magcius.github.io/model-viewer/ https://github.com/magcius/model-viewer
https://magcius.github.io/bmdview.js/bmdview.html https://github.com/magcius/bmdview.js
https://magcius.github.io/pbrtview/src/pbrtview.html https://github.com/magcius/pbrtview

oh hey i'm pretty sure i used your model viewer (and another one) to base my own model viewer project on and then got bored and forgot about it half-done, neat

# ¿ Oct 1, 2017 05:33

Shame Boy: Mar 2, 2010

yeah I was whining about this earlier - ffmpeg can actually use it (in fact it can use both, and use them in a way that the entire transcode process takes place on the graphics card so it doesn't have to send the video to the card, decode it, send it back to the cpu, send it back to the card again, encode it etc). i kinda thought someone would have implemented a bog standard CUDA/GPGPU/whateveritscalled way of doing this without the weird special hardware but if they have ffmpeg doesn't have support for it so you need one of the specific graphics card architectures with the magic encode / decode hardware block glued on to the side.

# ¿ Oct 3, 2017 10:38

Shame Boy: Mar 2, 2010

josh04 posted:

nvidia had H.264 running in CUDA

mind directing me to this, because i'd really like to be able to actually use graphics card acceleration for encoding jobs on my server and i wasn't able to find anything like it

# ¿ Oct 3, 2017 11:28

Shame Boy: Mar 2, 2010

Truga posted:

the gpu itself wouldn't be all that good at encoding. having several threads helps getting quantization of a frame right faster, but having hundreds does not help with speed at all, since you need the previous frame to encode the next. plus gpus have very simple cores, as mentioned before, so you're likely to have to run frames through a gpu multiple times anyway.

most of the encoding happens on the specialised chip, though I think a few cores do get used to process some video things when using nvenc or similar.

all that said, even the specialized hardware that's encoding streams on your gpu doesn't produce quite as good results as software solutions like x264 though, and that's just when limited to realtime encoding. if you go into multipass territory, software encoding just runs away with quality and never looks back. doesn't matter much when just recording clips since you can just throw bitrate at it, but for archival or streaming (i.e. as small as possible without visible loss of quality), the difference can be pretty stark.

huh, i kinda thought video encoding would be great for massive parallelization but yeah i guess there's too much like, cross-referencing between frames and different parts of the same frames and stuff. oh well

i just wish i could get VP9 to encode more than 4 frames per second durnit :sigh:

# ¿ Oct 3, 2017 11:32

Shame Boy: Mar 2, 2010

josh04 posted:

this was back in ~2013 and i think they've removed/broken it since. it was called NVCUVENC, anyhow, and it claimed to be CUDA but it's possible they were lying.

yeah that's the "special" one that uses the dedicated hardware. they call it CUDA or CUVENC to make it seem like it's normal CUDA but it actually uses a separate little module on the card. it still works fine / isn't broken but it can't really do "real" CUDA

# ¿ Oct 3, 2017 12:44

Shame Boy: Mar 2, 2010

ufarn posted:

is there quality parity from using cuda for video encoding versus nvenc, or will cpu encoding always be preferable?

idk about nvidia's stuff specifically, but according to google their hardware implementation of VP9 gets consistently better quality than the software implementation, so it's certainly possible that it could go either way

# ¿ Oct 3, 2017 13:42

Shame Boy: Mar 2, 2010

Doc Block posted:

main memory is still slower than most dedicated GPU's fast dedicated VRAM.

i agree with your post but presumably this part at least would go away if they were commonly / always integrated from the get-go and the system just had that fast RAM itself

# ¿ Oct 3, 2017 13:56

Shame Boy: Mar 2, 2010

josh04 posted:

i don't mean NVENC, which replaced it, but the thing they're talking about taking out of the driver here, which worked on pre-kepler cards which afaik didn't have encoding hardware: https://web.archive.org/web/20150804115020/https://developer.nvidia.com/nvidia-video-codec-sdk

oh i was getting it mixed up with NVCUVID which is both dedicated hardware (that's been on cards going back much farther than NVENC) and also only decodes (so it's basically worthless to me), huh

# ¿ Oct 3, 2017 14:19

Shame Boy: Mar 2, 2010

atomicthumbs posted:

"cuda" might not actually be cuda in this case; graphics cards have video encoding and decoding blocks built in these days, and their capabilities differ by generation

yeah that's what I was saying in that post, and then what we've discussed for like two days since that post lol

# ¿ Oct 3, 2017 21:56

Shame Boy: Mar 2, 2010

atomicthumbs posted:

well excuuuuuuuuuse me mister "i read the entire thread for context before responding"

after i made that post i got a bit worried that you had like, made another post acknowledging it or something because i sure as hell didn't read any posts after yours before mashing reply :v:

# ¿ Oct 3, 2017 22:14

Shame Boy: Mar 2, 2010

Doc Block posted:

consoles get worse texturing etc performance IIRC. even on equal hardware with the GPU directly attached to system memory, that the memory is shared is gonna give the system worse performance than dedicated VRAM.

and as long as desktop GPUs are removable/upgradeable, I don't think you're gonna see them have a direct connection to main memory vs over an expansion bus. as to your earlier post, them being removable is why the display controller etc is on the graphics card instead of the motherboard.

how practical would it be to just make them like, a standard socket like the CPU? so instead of swapping out a whole massive card you just have a second CPU-like socket that you put a chip in, and then it shares the rest of the stuff (or maybe has its own RAM sticks, but you can swap them too like normal RAM)

# ¿ Oct 3, 2017 23:42

Shame Boy: Mar 2, 2010

feedmegin posted:

It's kind of been covered sort of, but I have bad news about how 'standard' CPU sockets are.

eh i was kinda thinking it wouldn't be that different from how PCIe gets a new revision every couple years it seems but then i remembered that that's actually a proper standard and it's backwards compatible, and good luck doing that with a chip lol

# ¿ Oct 4, 2017 12:27

Shame Boy: Mar 2, 2010

so i dabbled in a tiny bit of opencl a while ago and eventually i'd like to come back around to it and play around with it a bit because i'm a turbodork who finds this kinda thing interesting, should i even bother with it if it's going out of style or should i look into something else instead?

# ¿ Oct 5, 2017 00:51

Shame Boy: Mar 2, 2010

josh04 posted:

it still has the advantage of being multi-platform and multi-manufacturer. on my macbook pro i can run tasks on both the internal gpus etc. and nvidia added opencl 1.2 support to their driver in the last two years so they all support roughly the same featureset.

it's just sad reading about cl 2.0 and not being able to get any of the nice features unless you have a specific AMD card and do the magic driver rain dance.

yeah the thing that appealed to me was supposedly being able to compile it to run on an FPGA somehow??? which seems real cool since I have a few FPGA dev boards I'd like more excuses to play with

someone above mentioned they made some translation layer (or are making?) to translate to the new thing (vulkan? i have no idea what these things are) so i guess it's reasonably future-proof to go with for at least a little while then?

# ¿ Oct 5, 2017 12:00

Shame Boy: Mar 2, 2010

i have an original Ageia PhysX card that i just found lying in a desk at my last job. i took the fan off and polished up the shiny chip to a mirror finish just for fun but idk what else to do with it

i guess i could try very carefully desoldering it and putting it in a little frame?

# ¿ Oct 5, 2017 17:50

Shame Boy: Mar 2, 2010

Cybernetic Vermin posted:

i assume the closest electronics recycling bin is too far away for the obvious answer?

idk i like weird hardware relics of dead/failed ideas

# ¿ Oct 5, 2017 18:14

Shame Boy: Mar 2, 2010

Phobeste posted:

you don't have to be that careful if you don't want the rest of the card just point a hot air gun at it til it falls off. or put it in a reflow oven aka a toaster upside down

i have a heat pencil hot air gun that can dial in temperatures just for this sort of job, but i did a test-run on a lovely old graphics card with the same construction of chip (green board-like material with a shiny metal die in the middle) and it wound up discoloring it pretty badly despite not being up that hot at all, maybe i just need to tweak the temperature even more...

# ¿ Oct 6, 2017 03:30

Shame Boy: Mar 2, 2010

they're pretty to look at but real fuckin' easy to crack if you actually work on them in my experience, plus they cost a ton more

# ¿ Oct 6, 2017 04:08

Shame Boy: Mar 2, 2010

blowfish posted:

People who want to miniaturise hand soldered parts (ie three dozen hobbyists)?

you can hand-solder surface mount parts, it looks really hard but once you do it a few times it's actually pretty easy :shobon:

# ¿ Oct 7, 2017 21:53

Shame Boy: Mar 2, 2010

sorry, could you also explain what the hell a 1-bit texture is or how that's useful? is it just for like, masking?

# ¿ Oct 18, 2017 13:27

Shame Boy: Mar 2, 2010

josh04 posted:

weird, what changed?

in the valve paper they're drawing a 2D vector shape (the text "no trespassing") with no built-in texturing onto a 3D object, so the 1-bit texture is a high-res mask of the vector shape. the point is to draw a crisp vector image at any distance, but only using a raster texture on the gpu.

ah ok, that's kinda what i was thinking, thanks

# ¿ Oct 18, 2017 14:30

Shame Boy: Mar 2, 2010

Bloody posted:

I'm the 300 Amp gpu

um its called a mining rig

# ¿ Oct 19, 2017 06:38

Shame Boy: Mar 2, 2010

Bloody posted:

im also the use of amps when watts would be more meaningful

yeah well im as powerful as a 120 volt battery :smug:

# ¿ Oct 20, 2017 02:10

Shame Boy: Mar 2, 2010

Farmer Crack-rear end posted:

is that like in apartment complexes or something because that seems like it would get outrageously expensive real quick

i assume the server company would subsidize at least part of the electricity since otherwise it's just an even more inefficient electric furnace that "wastes" some of the power it could be turning into heat by shuffling bits around

# ¿ Oct 20, 2017 16:16

Shame Boy: Mar 2, 2010

Cocoa Crispies posted:

the shuffling bits around still manifests as heat tho

actually yeah for some reason i thought information would consume energy in some kind of magical other way

# ¿ Oct 20, 2017 16:38

Shame Boy: Mar 2, 2010

NEED MORE MILK posted:

as a person with a rack in their house: you cant hear anything unless youre right outside the door. just put it in a utility room or something

i have a closet that conveniently is where the fiber comes into the apartment, where the patch panel for the ethernet in the walls is, and it even has its own little A/C vent for cooling. I put my lil' half-height rack in there and just close the door and don't even hear it, though all my servers are 4-u ones with fancy quiet fans in them that aren't loud normally

# ¿ Oct 20, 2017 20:18

Shame Boy: Mar 2, 2010

Malcolm XML posted:

lol how do you think actual work gets done? I mean you're like 99% right but the actual computation consumes some energy, just a very tiny bit of it

how does computation itself consume energy? where does that energy go if it doesn't just become heat?

# ¿ Oct 21, 2017 01:18

Shame Boy: Mar 2, 2010

i'm seriously asking, like is "ordered information" some kind of potential energy since it's not as low an energy state as random / higher entropy states could be?

# ¿ Oct 21, 2017 01:19

Shame Boy: Mar 2, 2010

Sagebrush posted:

i guess i don't really see how the energy is converted to anything but heat, if you take it far enough.

like take the example of that spinning hard drive. assume it's in a perfect vacuum and the only thing we have to consider is the inertia of the platter. some electrical energy is converted into rotational kinetic energy. when the platter spins down, it loses that energy through friction in its bearings, converting it to heat (and sound, which becomes heat).

the majority of the power drawn by a graphics card is required to push the electrons against whatever resistance is in the circuits. that energy by definition is turned into heat. where is the actual "work" of computation, though? i guess every time you flip a transistor you have to build up a charge on the gate, and that takes some energy. but when that charge decays or is neutralized, it doesn't just disappear...it turns into heat.

like, this is the fundamental concept behind entropy, right? every kind of energy just gradually turns into a slightly lower frequency form of energy, which we experience as heat -- spreading that heat around the universe as everything gets cooler and cooler.

yeah but (as other posters pointed out) for at least a short time the results of that computation aren't 100% heat, because some of it is "the potential energy of a bunch of electrons being shoved in a box and kept there," which then decays slowly over time as they leak out (and then becomes heat or super low frequency EM radiation or whatever)

# ¿ Oct 21, 2017 03:00

Shame Boy: Mar 2, 2010

Malcolm XML posted:

actual Verilog gpu implementation: https://github.com/VerticalResearchGroup/miaow

is stuff like this generated by some other tool, because i know i can read some basic verilog just from playing around with a dinky little FPGA i have but then i look in this and get to giant blobs of weirdly-named poo poo with thousands of different wires and my eyes just glaze over

# ¿ Oct 21, 2017 19:49

Shame Boy: Mar 2, 2010

Suspicious Dish posted:

apparently i wont effortpost again because you guys are more interested in linux

i appreciate your effortposts and hope you continue, friend :shobon:

# ¿ Oct 24, 2017 04:47

Shame Boy: Mar 2, 2010

peepsalot posted:

the problem happens when some computer scientist decides to train a nn to be a computer scientist/neural network design expert.

google's literally already doing that and they saw increases in performance of *gasp* a whole 20%!!

turns out there's a ton of saddle points and local minima that neural networks love to get stuck in and the whole "exponentially self-improving AI" thing is entirely fiction

# ¿ Oct 24, 2017 04:48

Shame Boy: Mar 2, 2010

Malcolm XML posted:

how is it that difficult u just put a few reverse biased transistors and whiten the bits using von Neumann's trick

genuiniely interested

i assume it's challenging to do it in a way that's like, provably reliable and also fast? even with key whitening if the reverse-biased transistors get out of whack you could wind up with it drifting to the point where it's spitting out all-ones (and thus not emitting anything) or something

like mine did :v:

e: i think the "real" ones use avalanche diodes or some other mechanism that's a bit more predictable for this reason actually

# ¿ Oct 24, 2017 14:33

Shame Boy: Mar 2, 2010

atomicthumbs posted:

i'm running with an integrated gpu so that cuda can have all the gtx 780's ram to itself because boy howdy does it need it

i tried to do this on my server box and the motherboard hates the idea, you either have to set it to use the integrated graphics adapter by default (in which case the GPU doesn't even show up at all once the system boots) or in any other configuration the integrated graphics adapter is never started and the only display signal is coming out of the card itself

# ¿ Nov 15, 2017 08:10

Shame Boy: Mar 2, 2010

people use weyland for dork reasons so that really doesn't matter at all

it's like complaining that gentoo doesn't make any money

# ¿ Nov 15, 2017 20:03

Shame Boy: Mar 2, 2010

Notorious b.s.d. posted:

but no one ever complains that vendors are unresponsive to gentoo's needs as a project

i guarantee tons of people do, you just don't hear them yelling from the basement

# ¿ Nov 15, 2017 22:14

Adbot: ADBOT LOVES YOU

# ¿ May 23, 2024 07:09

Shame Boy: Mar 2, 2010

PCjr sidecar posted:

funny to see intel pushing opencl now that they sell fpgas

yeah i somehow missed Intel buying Altera and was super confused when Mouser was advertising INTEL CYCLONE 10 FPGA's

# ¿ Nov 18, 2017 22:27

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > gpgpgpu hot take thread: the "gp" stands for "garbage pile"