gpgpgpu hot take thread: the "gp" stands for "garbage pile"

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > gpgpgpu hot take thread: the "gp" stands for "garbage pile"

«‹›15 »

eschaton: Mar 7, 2007; Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Suspicious Dish posted:

also the main developer of lima literally put out a blog post to insult me, lol.

where's your response

will this turn into a rap battle

# ? Oct 1, 2017 00:31

Adbot: ADBOT LOVES YOU

# ? May 4, 2024 09:28

eschaton: Mar 7, 2007; Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Harik posted:

was that in the kernel driver or the libutgard-mali.so? I�m assuming the latter since the former is basically forced to be open sores because it�s a kernel driver. the part that pisses me off the most is the incompatiblity is entirely due to laziness. they just have an enum for driver operations and change it at a whim on pointlevel releases. it�s ridiculously stupid way of doing things. if they ever get dragged kicking & screaming into upstream that poo poo will stop instantly because nobody will let them break existing ABIs.

lol if you think ABI matters in the Linux world

they probably learned that behavior by watching Linus himself

at least good on them for not buying the "all drivers must be entirely Open Source" bullshit and just implementing a stub for their driver

# ? Oct 1, 2017 00:35

Harik: Sep 9, 2001; From the hard streets of Moscow
First dog to touch the stars; Plaster Town Cop

eschaton posted:

lol if you think ABI matters in the Linux world

they probably learned that behavior by watching Linus himself

at least good on them for not buying the "all drivers must be entirely Open Source" bullshit and just implementing a stub for their driver

p. much every driver is the same way, a kernel stub - maybe even mainlined - that basically passes through the binary-only library to the hardware. AMD open-sourced their new openGL/vulkan library version because they don't have anywhere near enough programmers to keep up and want community help.

not sure what the linus crack meant though - if he catches a whiff of the mainline kernel->userspace ABI changing in any way he drops in and releases a thunderous shitstorm down on everyone's head. only way to change something is make a v2, mark v1 as deprecated and add it to the eventual removal schedule. only way around that is if you can demonstrate that v1 hasn't actually worked in years because nobody actually cares about it, and that usually just buys you a warn_once on use and accelerated removal schedule.

outside the kernel nobody at all even tries.

libmali is tied a lot harder to specific kernel builds than anything in desktop linux is, though, due to the ioctl enum being essentially random every point release.

# ? Oct 1, 2017 01:04

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

eschaton posted:

where's your response

will this turn into a rap battle

https://www.reddit.com/r/linux/comments/43zvn9/odroid_c2_2ghz_quad_core_64bit_arm_board_2gb_ram/czn9yaj/

# ? Oct 1, 2017 01:46

fritz: Jul 26, 2003

Suspicious Dish posted:

https://www.reddit.com/r/linux/comments/43zvn9/odroid_c2_2ghz_quad_core_64bit_arm_board_2gb_ram/czn9yaj/

Oh wow. You really should have refrained from coming with more slander.

# ? Oct 1, 2017 02:15

Shame Boy: Mar 2, 2010

Suspicious Dish posted:

webgl is honestly pretty cool to get started with and it's where i do all my side project work

have some more random resources:

3d basics talk i gave for a bunch of random developers at my previous linux job https://www.youtube.com/watch?v=v8jUNC5WdYE
source code at https://github.com/magcius/sw3dv

some random open-source things i have written:

https://magcius.github.io/model-viewer/ https://github.com/magcius/model-viewer
https://magcius.github.io/bmdview.js/bmdview.html https://github.com/magcius/bmdview.js
https://magcius.github.io/pbrtview/src/pbrtview.html https://github.com/magcius/pbrtview

oh hey i'm pretty sure i used your model viewer (and another one) to base my own model viewer project on and then got bored and forgot about it half-done, neat

# ? Oct 1, 2017 05:33

Xarn: Jun 26, 2015

Suspicious Dish posted:

https://www.reddit.com/r/linux/comments/43zvn9/odroid_c2_2ghz_quad_core_64bit_arm_board_2gb_ram/czn9yaj/

# ? Oct 1, 2017 06:08

BobHoward: Feb 13, 2012; The only thing white people deserve is a bullet to their empty skull

Harik posted:

ARM is both an instruction set (really a range of them) and an implementation. Everybody licenses one of their cores in verilog/vhdl source form then buys a bunch of random perhepherals like upgraded memory controllers or sound codecs and mashes them together then poops the whole thing out to a fab like gloflo or samsung. this means there's a lot of very very bad ARM processors out there.

nitpick derail: most companies sublicense arm intellectual property (ip) from glofo or samsung or some other intermediary, not direct from arm, and they don't get source for the crown jewels like cpu cores

this arrangement is cheaper for licensees (arm charges a buttload of money for cpu source code) and gives superior results. the foundry pays to harden key high performance cores and transform them into macros -- chunks of finished physical design that can be copy/pasted into any chip using that core. after a few licensees have taped out chips using a given hardened macro it will be very well characterized and debugged for everyone who follows. besides confidence that it will work the first time, you also get higher clock speed and reduced power compared to running source code through synthesis and automated place & route.

basically, the only reason to buy a source code license is if you're a huge multinational company with the resources to do hardening in-house and you want to make a splash by having the latest and greatest core at the highest clock rate first. for everyone else it's almost always best to sublicense from the foundry.

Suspicious Dish posted:

yeah, though ARM has apparently been trying to corner the market by making the other peripherals too. and all the low-end soc shops just use arm's since it's cheaper than maintaining their own. so now arm sells a memory controller, ethernet controller, gpu, display controller...

most of the peripherals are relatively low value, and sold as a package deal. you sublicensed a cpu? ok you get the entire stable of low performance peripherals for free, friend!

soc shops will frequently use in-house ip instead of arm ip anyways. i worked at a network chip company on an arm based soc and you better believe we used our in-house ethernet mac, a custom packet routing engine, and so on. the network poo poo is what we were supposed to be doing better than generic after all

# ? Oct 1, 2017 20:29

The_Franz: Aug 8, 2003

happy s3tc patent expiration day :toot:

# ? Oct 2, 2017 15:12

feedmegin: Jul 30, 2008

Suspicious Dish posted:

also the main developer of lima literally put out a blog post to insult me, lol.

Yeah, Lima was pretty much two dudes, one of whom, the guy you're talking about here, is an ex co worker of mine from a few years back and an 'interesting' person.

(Typical thinks-he's-Linus sort of nerd)

# ? Oct 2, 2017 17:11

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

feedmegin posted:

Yeah, Lima was pretty much two dudes, one of whom, the guy you're talking about here, is an ex co worker of mine from a few years back and an 'interesting' person.

(Typical thinks-he's-Linus sort of nerd)

oh man, you worked with libv? i'm sorry.

i assume it was at suse, since i dont think anybody else has actually hired him since. any stories other than his insane martyrdom?

# ? Oct 2, 2017 17:19

feedmegin: Jul 30, 2008

Suspicious Dish posted:

oh man, you worked with libv? i'm sorry.

i assume it was at suse, since i dont think anybody else has actually hired him since. any stories other than his insane martyrdom?

Nope, he used to work for a consultancy in Manchester for a bit. If you check out his LJ you can probably figure it out.

Not really tbh, he worked remotely and spent a lot of time on customer sites doing (paid closed source) GPU driver stuff. So mostly he was the guy being an rear end in a top hat on IRC.

# ? Oct 2, 2017 18:59

Sapozhnik: Jan 2, 2005; Nap Ghost

that sounds like an nda lawsuit waiting to happen, what the gently caress

# ? Oct 2, 2017 19:21

feedmegin: Jul 30, 2008

Sapozhnik posted:

that sounds like an nda lawsuit waiting to happen, what the gently caress

He never did any work for ARM, that would have been right out of course, but otherwise it's not really a problem. For that matter the Adreno guy is employed by one of Adreno's manufacturer's competitors.

# ? Oct 2, 2017 19:23

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

you mean freedreno? i thought rob was still at red hat. rob's awesome btw, worked with him in person, along with ajax / jerome glisse

# ? Oct 2, 2017 19:30

Cocoa Crispies: Jul 20, 2001; Vehicular Manslaughter!; Pillbug

for the small world that gpu on linux stuff seems to be it sure sounds like it's full of assholes

# ? Oct 2, 2017 19:33

Sapozhnik: Jan 2, 2005; Nap Ghost

the pool of people with reverse-engineering skills seems to have an above-average incidence of dickheads, idk why

# ? Oct 2, 2017 19:46

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

eh, most everyone is awesome. libv sucks but it seems we pissed him off enough and ran him out of town

https://cgit.freedesktop.org/xorg/driver/xf86-video-radeonhd/commit/?h=spigot&id=231683e2f111bb064125f64f2da797d744cde7fa

# ? Oct 2, 2017 19:49

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Sapozhnik posted:

the pool of people with reverse-engineering skills seems to have an above-average incidence of dickheads, idk why

you have to be a certain kind of special to enjoy slamming your head at a computer repeatedly

# ? Oct 2, 2017 20:12

Spatial: Nov 15, 2007

computer touchers are inherently scum and the only possible redemption comes from the things we do outside that field. if you don't do anything else, you remain pure and awful for life

# ? Oct 2, 2017 20:23

feedmegin: Jul 30, 2008

Suspicious Dish posted:

you mean freedreno? i thought rob was still at red hat. rob's awesome btw, worked with him in person, along with ajax / jerome glisse

Sorry, was. He was at TI when he started it and working with PowerVR stuff that's under NDA I think? There was definitely a reason he picked that GPU because it wasnt going to get him sued.

# ? Oct 2, 2017 20:34

Farmer Crack-Ass: Jan 2, 2001; this is me posting irl

where's Dr. Honked i bet he'd have some cool stories to share about games and giraffics cards

# ? Oct 3, 2017 06:01

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

Suspicious Dish posted:

eh, most everyone is awesome. libv sucks but it seems we pissed him off enough and ran him out of town

https://cgit.freedesktop.org/xorg/driver/xf86-video-radeonhd/commit/?h=spigot&id=231683e2f111bb064125f64f2da797d744cde7fa

lol

# ? Oct 3, 2017 08:58

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

ok, i want to write up my next infopost. today is going to be on what a "graphics card" is, besides just a gpu, which i covered in my first infopost. graphics cards are a quirk of the desktop pc architecture, where the only way to insert new modules is through the pcie expansion card slot, and thus the pcie bus.

pcie is, compared to a native connection to the memory controller -- a fairly small bus with limited bandwidth. the reason your graphics card contains the monitor out, and the reason your graphics card has separate vram, is because of this. it's not horribly small, but, well, 1920*1080 * 4 bytes per pixel * 60 frames per second

this is the primary rule of graphics cards on desktop machines: *it's expensive to move poo poo from the cpu to the gpu, and the gpu to the cpu*.

let's talk about how other systems work: on laptops, on consoles, on phones, basically every other device has the cpu and gpu in embedded machines and both of them have equal access to system memory. i mentioned that a gpu is basically a giant parallel array of horribly broken and underpowered cpu's, and really what they're doing is computing a giant set of pixels in ram. usually, the cpu allocates some part of system memory and tells the gpu "render into this bit of memory and tell me when you're done", so there's sort of a hand-off, and it's not a free-for-all.

btw, this is known as a "unified memory architecture". fun fact: amd actually tried to introduce this on desktops by bundling a radeon in their cpus but nvidia countered it with some marketing that plays right into /r/pcmasterrace: "amd's gpus are so lovely they can only sell them by bundling them with cpu's" and it died.

on these other devices, the "display controller" is separate from the gpu, and often times these are made by different companies. the "display controller" is a separate chip whose primary job is called "scanout", which is to send this image of the screen across a wire to the monitor or panel.

on consoles, this wire is hdmi, and the display controller is often called the "video interface". on mobiles, this wire tends to be either FPD-Link or eDP

on desktops, reminder, it would be expensive to send the frame back to the cpu after the gpu renders it, so the display controller is on "the other side of the fence".

usually you just send the display controller a memory address, width, height, and it does the rest. most display controllers also can tell the main cpu when it's "vsync" time with an interrupt, so that the system can swap in a new buffer when the display is "in between frames", preventing tearing.

also, the display controller often has some basic compositing and rendering support. the display controller is in charge of rendering the cursor to the screen -- older display controllers had support for a special "cursor plane" which was likely a hardcoded 64x64 plane you could set and move around on the screen. these days, display controllers often have multiple planes, supporting multiple RGB or YUV planes and can do basic alpha blending and scaling in them. they're slowly increasing in complexity, and mostly because the real gpu is too expensive on battery life and heat on mobile devices.

graphics cards also tend to have some other parts unrelated to gpu's and display controllers. these days they also include hardware video decoders / encoders (often called "VDEC" / "VENC" blocks) because why not. they seem to just be par for the course on modern graphics cards. both amd and nvidia include them. i think the idea was to prove how powerful CUDA and GPUs were back before GPGPUs caught on but it's a cheat because it's custom designed hardware for the purpose.

next time we'll talk about gpu pipelines in general at a high level, beyond the "tons of cpus" description, and talk about older, fixed-function gpus and how they worked.

# ? Oct 3, 2017 09:01

eschaton: Mar 7, 2007; Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Suspicious Dish posted:

graphics cards also tend to have some other parts unrelated to gpu's and display controllers. these days they also include hardware video decoders / encoders (often called "VDEC" / "VENC" blocks) because why not.

couldn't possibly have anything to do with making it harder to decode video frames for anything other than user playback

# ? Oct 3, 2017 09:56

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

You'd think they'd decode some magical proprietary format if that was the case but nah it's just the same H.264 that you can already decode in software.

# ? Oct 3, 2017 10:05

Shame Boy: Mar 2, 2010

yeah I was whining about this earlier - ffmpeg can actually use it (in fact it can use both, and use them in a way that the entire transcode process takes place on the graphics card so it doesn't have to send the video to the card, decode it, send it back to the cpu, send it back to the card again, encode it etc). i kinda thought someone would have implemented a bog standard CUDA/GPGPU/whateveritscalled way of doing this without the weird special hardware but if they have ffmpeg doesn't have support for it so you need one of the specific graphics card architectures with the magic encode / decode hardware block glued on to the side.

# ? Oct 3, 2017 10:38

Workaday Wizard: Oct 23, 2009; by Pragmatica

good thread rated :five:

can someone tell me how do gpus interact with virtualization? i heard gpus are stateful as gently caress

# ? Oct 3, 2017 10:51

munce: Oct 23, 2010

i'm making a game that relies on a shader to run. It works fine on desktop, laptop and one 5 year old phone. I want to test it on a newer phone with higher resolution, so i get the phone and load it up. nothing.

many hours of testing, android debugging and searching eventually leads me to the culprit - a single line in the vertex shader:
- a simple texture lookup, that is getting the value of a pixel from a texture in memory. the problem is that because that line is in the Vertex shader, not the Fragmet shader, it just doesn't work. why? more searching reveals that its because the gpu on that phone can't do a Vertex Texture Fetch due to how it is constructed.

having identified the problem i try to get an idea of what phones/gpus i could run my thing on. Turns out that spec sheets and documentation for phones and their gpus is pretty bad. Trying to find out which gpu is in a phone can be a bit of work, finding out if the gpu can do a VTF is almost impossible.

so after a load of effort all i know is that my thing will work on some adreno gpus and not on some mali ones. beyond that i have no idea and i'm not buying one of every phone to test it on.

in summary: mobile phones and their gpus suck.

# ? Oct 3, 2017 10:58

josh04: Oct 19, 2008; "THE FLASH IS THE REASON
TO RACE TO THE THEATRES"
_{This title contains sponsored content.}

gpus have video encoders so you can do screen capture without sending a full image of the screen back down the bus to main memory to be encoded on the CPU, which is slow and would also cut into the CPU resource available for xtreme gaming action. nvidia had H.264 running in CUDA but it was kinda slow and crappy and also cut into the amount of GPU time available for graphics drawing, and they already had a video decoder up there so w/e.

fun fact: the nvidia hardware encoder supports something silly like 8 simultaneous encode streams so you can put out an entire HLS block at once.

# ? Oct 3, 2017 11:16

Truga: May 4, 2014; Lipstick Apathy

ate all the Oreos posted:

i kinda thought someone would have implemented a bog standard CUDA/GPGPU/whateveritscalled way of doing this without the weird special hardwar.

the gpu itself wouldn't be all that good at encoding. having several threads helps getting quantization of a frame right faster, but having hundreds does not help with speed at all, since you need the previous frame to encode the next. plus gpus have very simple cores, as mentioned before, so you're likely to have to run frames through a gpu multiple times anyway.

most of the encoding happens on the specialised chip, though I think a few cores do get used to process some video things when using nvenc or similar.

all that said, even the specialized hardware that's encoding streams on your gpu doesn't produce quite as good results as software solutions like x264 though, and that's just when limited to realtime encoding. if you go into multipass territory, software encoding just runs away with quality and never looks back. doesn't matter much when just recording clips since you can just throw bitrate at it, but for archival or streaming (i.e. as small as possible without visible loss of quality), the difference can be pretty stark.

# ? Oct 3, 2017 11:23

Shame Boy: Mar 2, 2010

josh04 posted:

nvidia had H.264 running in CUDA

mind directing me to this, because i'd really like to be able to actually use graphics card acceleration for encoding jobs on my server and i wasn't able to find anything like it

# ? Oct 3, 2017 11:28

Shame Boy: Mar 2, 2010

Truga posted:

the gpu itself wouldn't be all that good at encoding. having several threads helps getting quantization of a frame right faster, but having hundreds does not help with speed at all, since you need the previous frame to encode the next. plus gpus have very simple cores, as mentioned before, so you're likely to have to run frames through a gpu multiple times anyway.

most of the encoding happens on the specialised chip, though I think a few cores do get used to process some video things when using nvenc or similar.

all that said, even the specialized hardware that's encoding streams on your gpu doesn't produce quite as good results as software solutions like x264 though, and that's just when limited to realtime encoding. if you go into multipass territory, software encoding just runs away with quality and never looks back. doesn't matter much when just recording clips since you can just throw bitrate at it, but for archival or streaming (i.e. as small as possible without visible loss of quality), the difference can be pretty stark.

huh, i kinda thought video encoding would be great for massive parallelization but yeah i guess there's too much like, cross-referencing between frames and different parts of the same frames and stuff. oh well

i just wish i could get VP9 to encode more than 4 frames per second durnit :sigh:

# ? Oct 3, 2017 11:32

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

I wonder how much compression you'd lose if you had a format where each frame is based on the one N frames ago rather than the one immediately previous. You'd definitely lose something, the question is whether it'd be more or less than what you can gain by being able to usefully throw N times more parallel computation at it.

# ? Oct 3, 2017 11:34

josh04: Oct 19, 2008; "THE FLASH IS THE REASON
TO RACE TO THE THEATRES"
_{This title contains sponsored content.}

ate all the Oreos posted:

mind directing me to this, because i'd really like to be able to actually use graphics card acceleration for encoding jobs on my server and i wasn't able to find anything like it

this was back in ~2013 and i think they've removed/broken it since. it was called NVCUVENC, anyhow, and it claimed to be CUDA but it's possible they were lying.

# ? Oct 3, 2017 11:35

Cybernetic Vermin: Apr 18, 2005

Jabor posted:

I wonder how much compression you'd lose if you had a format where each frame is based on the one N frames ago rather than the one immediately previous. You'd definitely lose something, the question is whether it'd be more or less than what you can gain by being able to usefully throw N times more parallel computation at it.

h264 permits 16 reference frames iirc, so in principle encoding h264 can be trivially be parallelized 16-way if you can deal with the hit to quality (just run 16 encodes permitting only one reference frame each on the mod 0..15 frames mod i, interleave outputs), 16-way is not even a start on gpu though

# ? Oct 3, 2017 11:47

Shame Boy: Mar 2, 2010

josh04 posted:

this was back in ~2013 and i think they've removed/broken it since. it was called NVCUVENC, anyhow, and it claimed to be CUDA but it's possible they were lying.

yeah that's the "special" one that uses the dedicated hardware. they call it CUDA or CUVENC to make it seem like it's normal CUDA but it actually uses a separate little module on the card. it still works fine / isn't broken but it can't really do "real" CUDA

# ? Oct 3, 2017 12:44

ufarn: May 30, 2009

is there quality parity from using cuda for video encoding versus nvenc, or will cpu encoding always be preferable?

# ? Oct 3, 2017 13:03

Doc Block: Apr 15, 2003; Fun Shoe

munce posted:

i'm making a game that relies on a shader to run. It works fine on desktop, laptop and one 5 year old phone. I want to test it on a newer phone with higher resolution, so i get the phone and load it up. nothing.

many hours of testing, android debugging and searching eventually leads me to the culprit - a single line in the vertex shader:
- a simple texture lookup, that is getting the value of a pixel from a texture in memory. the problem is that because that line is in the Vertex shader, not the Fragmet shader, it just doesn't work. why? more searching reveals that its because the gpu on that phone can't do a Vertex Texture Fetch due to how it is constructed.

having identified the problem i try to get an idea of what phones/gpus i could run my thing on. Turns out that spec sheets and documentation for phones and their gpus is pretty bad. Trying to find out which gpu is in a phone can be a bit of work, finding out if the gpu can do a VTF is almost impossible.

so after a load of effort all i know is that my thing will work on some adreno gpus and not on some mali ones. beyond that i have no idea and i'm not buying one of every phone to test it on.

in summary: mobile phones and their gpus suck.

you can check via opengl whether or not the GPU supports texture fetch in the vertex shader. of course, we're talking about android, so the driver might lie, but still. I forget the exact enum, but it's something like GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS.

# ? Oct 3, 2017 13:34

Adbot: ADBOT LOVES YOU

# ? May 4, 2024 09:28

Shame Boy: Mar 2, 2010

ufarn posted:

is there quality parity from using cuda for video encoding versus nvenc, or will cpu encoding always be preferable?

idk about nvidia's stuff specifically, but according to google their hardware implementation of VP9 gets consistently better quality than the software implementation, so it's certainly possible that it could go either way

# ? Oct 3, 2017 13:42

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > gpgpgpu hot take thread: the "gp" stands for "garbage pile"

«‹›15 »