Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
echinopsis
Apr 13, 2004

by Fluffdaddy
i like this threade

Adbot
ADBOT LOVES YOU

Qwertycoatl
Dec 31, 2008

i used to work for a company making a now-defunct mobile gpu (i just did video stuff, not 3d, so don't blame me too hard please)

we ran into a problem with a game that didn't render properly because they had some code like
code:
switch (gpu_type) {
  case MALI:
   render_things_using_mali_workarounds();
   break;
  case POWERVR:
   render_things_using_powervr_workarounds();
   break;
.....
  default:
   // don't bother rendering anything lol
}
and we weren't in the list

actually there's no point to this story but i've typed it now so here you go, my experience of gpu-related shittiness

Arcteryx Anarchist
Sep 15, 2007

Fun Shoe
yeah i've done some stuff with caffe and its a great reminder of how army-of-grad-students-and-postdocs maintained this kind of stuff still is

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

rjmccall posted:

gpu hardware is the absolute worst and half of why gpu drivers are always terrible is that they are full of terrible workarounds for the broken-rear end hardware. they do not want to give you a specification for the hardware because an honest specification would have to say stuff like "never put a bundle with operations 1 and 2 immediately after a bundle with operation 3 or the gpu will hard lock"

rjmccall posted:

the vast majority of cpu errata involve system facilities where it's more understandable, like to take the first entries from the first google hit for intel errata:
- a specific system register used in virtualization sometimes reports incorrect information
- writing directly to the page table without invalidating existing entries can gently caress things up
- invalid encodings similar to certain virtualization instructions trigger the wrong exception

gpus get super basic poo poo wrong all the time

JawnV6 posted:

even then, 50%+ errata only apply before a ucode patch is applied, but BIOS authors still need to be aware & avoid those situations so they're published

way back when i was proximate to GPU workers, the bar for bugs was staggering. CPU's kinda sorta need to be rock solid, GPU's kicking out incorrect results in the lower bits of a division is "uhhh so is there a driver workaround?"

posting these from the hn thread. i havent worked at any gpu company so i dont know all the dirty details but here

Qwertycoatl
Dec 31, 2008

having worked at two hardware companies: hardware is full of bugs and it's firmware writers' job to desperately try to hide all the shittiness from users

The Management
Jan 2, 2010

sup, bitch?

Qwertycoatl posted:

having worked at two hardware companies: hardware is full of bugs and it's firmware writers' job to desperately try to hide all the shittiness from users

hardware is just software that's been etched into silicon. it's just as buggy. and it turns out that if you wait long enough, all hardware bugs become software's problem so #yolo

Qwertycoatl
Dec 31, 2008

The Management posted:

hardware is just software that's been etched into silicon. it's just as buggy. and it turns out that if you wait long enough, all hardware bugs become software's problem so #yolo

yeah, fairly often i come across people who think hardware isn't buggy. i don't know whether to laugh or drink

Cybernetic Vermin
Apr 18, 2005

Qwertycoatl posted:

yeah, fairly often i come across people who think hardware isn't buggy. i don't know whether to laugh or drink

they go together really well

also: a really fun thread for people like me, who knows very little about this at depth despite being strangely invested in it for several decades

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
also please remember that gamers suck. they are extremely gullible to marketing while barely understanding what game developers actually do.

gamers generally have a lot of misconceptions about how graphics and gpus actually work because marketing for these companies relies on tricking people into thinking "numbers and specs matter"

nvidia absolutely loves when /r/pcmasterrace continues to spew garbage about gigabit clock speeds and the number of smx per sli slot because they can sell more things to more idiots but it makes it hard to figure out how any of this poo poo actually works.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

lancemantis posted:

so i was doing some ML stuff at work and we have a bunch of servers with as many titans as we can fit/buy at the time because there are shortages shoved into them for this purpose

outside of the display card, OSes and GPGPU libraries have no real concept of handling scheduling so processes can't share cards; if someone starts using one they're going to bogart the whole thing no matter what they're doing

this is really great in a research environment because people love repls and jupiter notebooks so they will hold onto cards even when their program is doing absolutely nothing

i used to have a script to grab a card and loop forever on demonstration days so I wouldn't be surprised by finding no free resources

tell your server monkeys to use nvidia-smi to set compute mode to default/shared instead of exclusive (usually will make things worse when inexperienced people are actually using the cards; welcome to the social)

fritz posted:

it's so horrible

i wonder how the big hpc systems do it

for big centers the minimum scheduled increment is usually the node gently caress you doing there if you need less than a thousand
midsize centers usually have some workload software that does moderate fuckery to split up nodes and gpus per user

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


The Management posted:

hardware is just software that's been etched into silicon. it's just as buggy. and it turns out that if you wait long enough, all hardware bugs become software's problem so #yolo

wait so what you're saying is as a gpu you either die as a villain or are patched enough to be a hero?

ADINSX
Sep 9, 2003

Wanna run with my crew huh? Rule cyberspace and crunch numbers like I do?

Beast of Bourbon posted:

the biggest of the big games might have direct support from amd/nvidia but when i was working on some AAA "big" games but not like...super mega huge, we'd have an nvidia or amd rep come by the studio with a handful of new cards and be like "hey can you give these to the graphics guys to make sure they all work with your, here's my business card, let me know if you guys want to do a marketing promo or something"

the programmers would take those cards home and put them in their gaming rigs, and we'd just continue working on whatever 3 year old graphics cards were in the work computers.

pro

Qwertycoatl posted:

i used to work for a company making a now-defunct mobile gpu (i just did video stuff, not 3d, so don't blame me too hard please)

we ran into a problem with a game that didn't render properly because they had some code like
code:
switch (gpu_type) {
  case MALI:
   render_things_using_mali_workarounds();
   break;
  case POWERVR:
   render_things_using_powervr_workarounds();
   break;
.....
  default:
   // don't bother rendering anything lol
}
and we weren't in the list

actually there's no point to this story but i've typed it now so here you go, my experience of gpu-related shittiness

lol

ADINSX
Sep 9, 2003

Wanna run with my crew huh? Rule cyberspace and crunch numbers like I do?

Seems like a consistent theme that people are developing on relatively old graphics cards, because if it works there, it'll work on basically anything.

So where do you think we go from here? Whats the next big thing that would actually make you use some of the new tech new cards supposedly have? Other than "poo poo should just work right", and "poo poo should be debug-able" what features would you wanna see in some theoretical next generation of graphics cards?

Volmarias
Dec 31, 2002

EMAIL... THE INTERNET... SEARCH ENGINES...

Suspicious Dish posted:

also please remember that gamers suck. they are extremely gullible to marketing while barely understanding what game developers actually do.

gamers generally have a lot of misconceptions about how graphics and gpus actually work because marketing for these companies relies on tricking people into thinking "numbers and specs matter"

nvidia absolutely loves when /r/pcmasterrace continues to spew garbage about gigabit clock speeds and the number of smx per sli slot because they can sell more things to more idiots but it makes it hard to figure out how any of this poo poo actually works.

I would love an idea of what to actually know about that does not require me to be a professional in the field.

Truga
May 4, 2014
Lipstick Apathy

Volmarias posted:

I would love an idea of what to actually know about that does not require me to be a professional in the field.

same.

all i know is, bigger numbers push more frames onto the monitor, get super expensive in the high end, and also the higher ones are basically hairdryers, thermally (1080ti/vega will happily draw 300W, though gigantic heatsinks usually make them very quiet hairdryers these days, at least). that's it. i look at benchmarks, see what's good for my monitor, go buy it.

i tried stepping into the rendering on a gpu thing once and couldn't figure out poo poo by myself. rendering things onto the screen with d3d/gl is simple enough to pick up, understanding how it all works on a basic level ended up being super beyond me though and i'm not sure it can be described well without posting a novel about it, so even just pointers in the right direction would be super appreciated at this point :v:

Arcteryx Anarchist
Sep 15, 2007

Fun Shoe

PCjr sidecar posted:

tell your server monkeys to use nvidia-smi to set compute mode to default/shared instead of exclusive (usually will make things worse when inexperienced people are actually using the cards; welcome to the social)

yeah I should probably tell them that if they haven't figured it out by now

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
here's some really good introductory material from a guy much smarter than me

how gpu's work: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/

how 3d triangles work: https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/

and here's an article by me describing the basics of 2d rendering on a cpu: http://magcius.github.io/xplain/article/rast1.html

(i'm working on a followup)

Volmarias
Dec 31, 2002

EMAIL... THE INTERNET... SEARCH ENGINES...
.

Fuzzy Mammal
Aug 15, 2001

Lipstick Apathy
when i did graphics in uni in 2004 the lab just got its first non fixed-function gpus and i remember in my term project making some asm shader that was a bitch to make working and barely had any documentation but now it sounds like there's pretty good toolchains and a high level language and poo poo. is there a good reference for poking around and learning that stuff now? just making a toy app in visual studio or something?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
webgl is honestly pretty cool to get started with and it's where i do all my side project work

have some more random resources:

3d basics talk i gave for a bunch of random developers at my previous linux job https://www.youtube.com/watch?v=v8jUNC5WdYE
source code at https://github.com/magcius/sw3dv

some random open-source things i have written:

https://magcius.github.io/model-viewer/ https://github.com/magcius/model-viewer
https://magcius.github.io/bmdview.js/bmdview.html https://github.com/magcius/bmdview.js
https://magcius.github.io/pbrtview/src/pbrtview.html https://github.com/magcius/pbrtview

Comfy Fleece Sweater
Apr 2, 2013

You see, but you do not observe.

ADINSX posted:

Other than "poo poo should just work right", and "poo poo should be debug-able" what features would you wanna see in some theoretical next generation of graphics cards?

sucks my dick and tells me I'm a good boy

anthonypants
May 6, 2007

by Nyc_Tattoo
Dinosaur Gum

Comfy Fleece Sweater posted:

sucks my dick and tells me I'm a good boy
yeah put me down for one of these also

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

hey op have you used this https://developer.nvidia.com/linux-graphics-debugger and do you think its a POS?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
not that specific build, but i've used the tegra version of it, and another private version of it

https://developer.nvidia.com/tegra-graphics-debugger

it sucks.

feedmegin
Jul 30, 2008

Suspicious Dish posted:

since their shader compiler has to run real-time and be fast and can't do that much in terms of optimization)

Why

josh04
Oct 19, 2008


"THE FLASH IS THE REASON
TO RACE TO THE THEATRES"

This title contains sponsored content.

my fellow, have you heard of SPIR-V???

Main Paineframe
Oct 27, 2010

Suspicious Dish posted:

webgl is honestly pretty cool to get started with and it's where i do all my side project work

webgl seems like one of those things that should be either hilarious or terrifying

"hello, we're just gonna let random javascript on some random website talk directly to your graphics card without your knowledge or confirmation, this sounds like a fantastic idea with no downsides whatsoever"

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
nah, google basically made their own libgl which compiles to direct3d called angle for safety (and driver workaround) purposes

webgl doesn't just run the driver's libgl.

fun fact: microsoft said that webgl would be the most stupid security thing and then people thought microsoft was being anticompetitive until people found a bunch of stupid security things

Suspicious Dish fucked around with this message at 17:18 on Sep 30, 2017

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

because users want fast loading times

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
random side-project update

android ships a software libgl called "pixelflinger": https://android.googlesource.com/platform/system/core/+/master/libpixelflinger/

i thought it died out years ago since it was really only meant as a hw bringup kind of thing, but apparently some hardware drivers still use it in a fallback path or something, because i just got a call stack on my android tablet which had pixelflinger in it

Notorious b.s.d.
Jan 25, 2003

by Reene

Suspicious Dish posted:

random side-project update

android ships a software libgl called "pixelflinger": https://android.googlesource.com/platform/system/core/+/master/libpixelflinger/

i thought it died out years ago since it was really only meant as a hw bringup kind of thing, but apparently some hardware drivers still use it in a fallback path or something, because i just got a call stack on my android tablet which had pixelflinger in it

i thought android tablets are a wasteland of broken code and abandoned SoCs

this kinda confirms that impression

The Management
Jan 2, 2010

sup, bitch?
wait, they still make android tablets?

fishmech
Jul 16, 2006

by VideoGames
Salad Prong

The Management posted:

wait, they still make android tablets?

they're massively more popular than the failing ipad

Doc Block
Apr 15, 2003
Fun Shoe
android tablets are doing great work and they're being recognized more and more. not like the failing ipad. sad!

feedmegin
Jul 30, 2008

The Management posted:

wait, they still make android tablets?

Literally reading this on my Galaxy Tab S2

Harik
Sep 9, 2001

From the hard streets of Moscow
First dog to touch the stars


Plaster Town Cop

Suspicious Dish posted:

random side-project update

android ships a software libgl called "pixelflinger": https://android.googlesource.com/platform/system/core/+/master/libpixelflinger/

i thought it died out years ago since it was really only meant as a hw bringup kind of thing, but apparently some hardware drivers still use it in a fallback path or something, because i just got a call stack on my android tablet which had pixelflinger in it

apparently it's still a thing, since Android Backstage still has talks about it in 2017.

Suspicious Dish posted:

hi i'm a gpu guy i work on the grpahics at video games and previously i worked on the mali gpu driver to try and make it not suck (i wasnt successful)

oh man, MALI. god that pile of bad decisions never fails to piss me off. quick background for people who don't work with embedded poo poo:

ARM is both an instruction set (really a range of them) and an implementation. Everybody licenses one of their cores in verilog/vhdl source form then buys a bunch of random perhepherals like upgraded memory controllers or sound codecs and mashes them together then poops the whole thing out to a fab like gloflo or samsung. this means there's a lot of very very bad ARM processors out there.

MALI came about because there were approximately 17 billion competing GPU companies selling incompatible modules to ARM. remember the good old days when games only worked on some subset of phones? Qwertycoatl ran into that because their SOCs core wasn't included in a lovely game like mali, tegra or powerVR were.

ARM bought falanx and started shipping MALI with their cores, to start some sort of standardization. Yay, great. Problem: GPUs are memory hungry, and ARM didn't bother with things like "standards for embedded graphics memory access" so every individual SOC has to roll their own memory management.

naturally the place to do this is in the graphics library, not in the graphics driver. there's no such thing as a "MALI" graphics library, because it can't do anything at all because it has no RAM. instead, there's X-brand MALI graphics library, compiled against X-brand bioniC or libC or elibC or whatever they make their BSP against, and you can never upgrade your software unless the SOC vendor deigns to support a design that's not their current one.

to compound this, nearly every lovely SOC has at least one autist on staff who thinks they know all about GPUs because they have a liquid cooled AMD Titan they use for playing shootmans and makes "improvements" to the HDL or driver layer (they are never improvements)

people keep trying to make an open-sores version (LIMA) but the completely fractured nature of the mali landscape and developer-hostile attitude of everyone involved make it super slow going.

OP which mali project were you working on?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Harik posted:

apparently it's still a thing, since Android Backstage still has talks about it in 2017.


oh man, MALI. god that pile of bad decisions never fails to piss me off. quick background for people who don't work with embedded poo poo:

ARM is both an instruction set (really a range of them) and an implementation. Everybody licenses one of their cores in verilog/vhdl source form then buys a bunch of random perhepherals like upgraded memory controllers or sound codecs and mashes them together then poops the whole thing out to a fab like gloflo or samsung. this means there's a lot of very very bad ARM processors out there.

yeah, though ARM has apparently been trying to corner the market by making the other peripherals too. and all the low-end soc shops just use arm's since it's cheaper than maintaining their own. so now arm sells a memory controller, ethernet controller, gpu, display controller...

Harik posted:

MALI came about because there were approximately 17 billion competing GPU companies selling incompatible modules to ARM. remember the good old days when games only worked on some subset of phones?

lol, they never left, buddy. the latest disaster was samsung patching the mali driver to break it for literally everybody in the galaxy s8.

Harik posted:

ARM bought falanx and started shipping MALI with their cores, to start some sort of standardization. Yay, great. Problem: GPUs are memory hungry, and ARM didn't bother with things like "standards for embedded graphics memory access" so every individual SOC has to roll their own memory management.

naturally the place to do this is in the graphics library, not in the graphics driver. there's no such thing as a "MALI" graphics library, because it can't do anything at all because it has no RAM. instead, there's X-brand MALI graphics library, compiled against X-brand bioniC or libC or elibC or whatever they make their BSP against, and you can never upgrade your software unless the SOC vendor deigns to support a design that's not their current one.

nah, mali standardized on UMP a while back, which is a kernel driver that hands out carve-out chunks to the gpu. it's bad, but at least it's standard now.

Harik posted:

to compound this, nearly every lovely SOC has at least one autist on staff who thinks they know all about GPUs because they have a liquid cooled AMD Titan they use for playing shootmans and makes "improvements" to the HDL or driver layer (they are never improvements)

people keep trying to make an open-sores version (LIMA) but the completely fractured nature of the mali landscape and developer-hostile attitude of everyone involved make it super slow going.

OP which mali project were you working on?

the company i was at before paid something to the tune of $200,000 to arm for the source code for their mali 450 utgard driver and my job was to literally just fix bugs and get it to work on the embedded deivce we were shipping.

also the main developer of lima literally put out a blog post to insult me, lol.

Harik
Sep 9, 2001

From the hard streets of Moscow
First dog to touch the stars


Plaster Town Cop

Suspicious Dish posted:

lol, they never left, buddy. the latest disaster was samsung patching the mali driver to break it for literally everybody in the galaxy s8.
:lol:

Suspicious Dish posted:

nah, mali standardized on UMP a while back, which is a kernel driver that hands out carve-out chunks to the gpu. it's bad, but at least it's standard now.
yeah, my post was already too long. there's still a bunch of per-chip tweaks for whatever reason, which means you need the exact library for your exact SoC and gently caress you if they only build it against kitkat.

Suspicious Dish posted:

the company i was at before paid something to the tune of $200,000 to arm for the source code for their mali 450 utgard driver and my job was to literally just fix bugs and get it to work on the embedded deivce we were shipping.

also the main developer of lima literally put out a blog post to insult me, lol.
was that in the kernel driver or the libutgard-mali.so? I'm assuming the latter since the former is basically forced to be open sores because it's a kernel driver. the part that pisses me off the most is the incompatiblity is entirely due to laziness. they just have an enum for driver operations and change it at a whim on pointlevel releases. it's ridiculously stupid way of doing things. if they ever get dragged kicking & screaming into upstream that poo poo will stop instantly because nobody will let them break existing ABIs.

i guess I should find the best -400 MP4 implementation out there and replace this SOCs garbage kernel driver with theirs and use their libraries. Already had to do it once to get sound working. how garbage is this soc? it has my all-time favorite kernel commit message:

quote:

In A.D. 1582 Pope Gregory XIII found that the existing Julian calendar insufficiently represented reality, and changed the rules about calculating leap years to account for this. Similarly, in A.D. 2013 Rockchip hardware engineers found that the new Gregorian calendar still contained flaws, and that the month of November should be counted up to 31 days instead. Unfortunately it takes a long time for calendar changes to gain widespread adoption, and just like more than 300 years went by before the last Protestant nation implemented Greg's proposal, we will have to wait a while until all religions and operating system kernels acknowledge the inherent advantages of the Rockchip system. Until then we need to translate dates read from (and written to) Rockchip hardware back to the Gregorian format.

anthonypants
May 6, 2007

by Nyc_Tattoo
Dinosaur Gum

Harik posted:

i guess I should find the best -400 MP4 implementation out there and replace this SOCs garbage kernel driver with theirs and use their libraries. Already had to do it once to get sound working. how garbage is this soc? it has my all-time favorite kernel commit message:]

quote:

In A.D. 1582 Pope Gregory XIII found that the existing Julian calendar insufficiently represented reality, and changed the rules about calculating leap years to account for this. Similarly, in A.D. 2013 Rockchip hardware engineers found that the new Gregorian calendar still contained flaws, and that the month of November should be counted up to 31 days instead. Unfortunately it takes a long time for calendar changes to gain widespread adoption, and just like more than 300 years went by before the last Protestant nation implemented Greg's proposal, we will have to wait a while until all religions and operating system kernels acknowledge the inherent advantages of the Rockchip system. Until then we need to translate dates read from (and written to) Rockchip hardware back to the Gregorian format.

some nerd posted:

Well, perhaps I'm having a sense of humour failure, but IMO in a setting which is supposed to be for information, jokes should not be present. People reading patch notes, release notes, and other informational material should be able to assume that everything written there is 100% sincere.

Adbot
ADBOT LOVES YOU

Truga
May 4, 2014
Lipstick Apathy

jesus loving christ

also, thank you for posting the articles on gpus, op, much appreciated

  • Locked thread