gpgpgpu hot take thread: the "gp" stands for "garbage pile"

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > gpgpgpu hot take thread: the "gp" stands for "garbage pile"

OzyMandrill: Aug 12, 2013; Look upon my words
and despair

I used to have to go to many trade show meetings with the HW vendors, listen to their waffle, and then try to blag as much free hardware as possible for the studio. I never had the heart to tell the vendor that unless they gave us money to cover the costs, we would mostly ignore what they said and just try to get the drat thing running on the shittiest 3 year old gpus out there (S3s - hah!*) as that's what most customers actually have. if it worked on them, then high end cards would work too.

* nVidia was first out the gate with HW vertex shaders, S3 were a year or so behind. unfortunately they discovered a crash bug in the clipping after they had built the cards, so the last S3 consumer cards had it all turned off. then they had to focus on cheap OEM stuff that made embedded intel gpus look good. then they went bust. which was a shame cos the S3 trade show tactic would be hand you hw the minute you walked in (we didn't want it), and then crack open a case of beer and bitch about ati/nvidia being boring farts at trade shows.

# ¿ Sep 29, 2017 10:10

Adbot: ADBOT LOVES YOU

# ¿ May 18, 2024 06:15

OzyMandrill: Aug 12, 2013; Look upon my words
and despair

pvr is tiled-based, but internally uses a scanline renderer. Tris within a tile are z-sorted, and then only the topmost pixel is drawn in lines. the actual fill rate is pretty poor compared to amd/nvidia as it is using more general processors instead of dedicated vertex/pixel pipelines which are considerably simpler in terms of instructions they can do. pvr really falls down on multiple layered transparencies and full screen effects which the normal gpus are chomping through with greater performance.

deferred rendering is becoming ubiquitous, which for those who don't follow this stuff is where you draw everything with no textures to just work out the z-buffer. you have to run all the vertex shaders, but the pixels just render depth. then you render everything again with full pixel shaders, so only the visible pixels incur the cost of the pixel shader. but you don't write out final colours, you just store the normal/base color/roughness/conductivity/etc. then you do a full screen pass that applies shadows/lighting/reflections. based on the normals/physical parameters once for each pixel in the final image. then you do the fullscreen post process effects (colour balance, tone mapping, antialiasing, depth of field, fog, etc). as the hw is designed to do multiple passes on the scene, it's optimised to do these passes very efficiently.

recent hardware has introduced 'tiles' to these buffers, but it is fundamentally different. pvr draws each tile full of final colours from z-sorted lists of prims. regular cards use tiles as cache units, and to track simple state flags to accelerate the composite/clear passes. there's no storage of prims/sorting them like pvr does.

# ¿ Oct 4, 2017 11:35

OzyMandrill: Aug 12, 2013; Look upon my words
and despair

Rosoboronexport posted:

What's the hot take on DX12? I guess people are hoping for a major speed boost, but currently on Nvidias cards most likely one should stick with DX11 rendering path. Maybe things will change when middleware (Unity etc) has fast and bug-free DX12 rendering.

it removes a lot of the responsibility of memory management from the gpu/driver side and lets the programmer handle it, which brings in a whole bunch of cool optimisations. virtually no commercial engines take account of this, and to do it justice would require building a new graphics engine from scratch, so don't expect it to fully kick in for a couple of years. as the boost is probably 10% at best and will be coming in gradually, you would be hard pressed to ever notice the difference. i guess it makes life simpler on the driver side so less chance for fuckups there, but i'm sure they'll manage somehow.

# ¿ Oct 7, 2017 17:05

OzyMandrill: Aug 12, 2013; Look upon my words
and despair

ate all the Oreos posted:

i'm seriously asking, like is "ordered information" some kind of potential energy since it's not as low an energy state as random / higher entropy states could be?

physicist in the house! (tho it was many years ago and i've been touching computers ever since)

yes
in order to not be random, you need to expend energy to change the bits. 1 bit is 1 unit of 'Shannon Entropy', which has been proved to be the same as the entropy we are familiar with in thermodynamics, but it's like the quantum-scale unit. the theory goes something like this...
for a given algorithm, you expect a certain output (in x bits). the more precise you need the answer, the less combinations of output bits are allowed, the lower the entropy, and the more work it takes to generate (why approximations run faster). there will be a theoretical lower limit of energy required, which varies depending on the algorithm.

take the simple case of generating the square root of a number. a double precision root op is expensive, 32bit is less expensive, and on a cpu we can use some bit twiddling to get a close approximation with fewer instructions. the fastest method of all (depending on memory transaction costs) would be a look up table, but this requires spending memory (fixed entropy) to save runtime energy - the memory is literally a store of pre-generated entropy that can be used. it has to be filled by doing the algorithm on all inputs (spent energy). a typical optimisation would be to only store 1% of the inputs. if we use it as-is, there will be some lower bits that are 'wrong'. this can be improved with interpolation - but that obviously is just changing the balance of 90% less stored entropy in exchange for the extra operations (energy spent) to interpolate between two entries each time the algorithm is used. the trick for optimisation is balancing the costs of work done (cycles on the processor) with static entropy (memory) to get the desired result within a desired margin of error.

this also means that theoretically a formatted hard disc weighs slightly more than an unformatted one by a comedy 'weighs less than an atom' amount, as energy is spent on changing the state, and it turns out to be true - a 1TB drive can contains something like 5J worth of extra energy in the potential energy of the magnetic dipoles. the dipoles want to be randomly aligned, alternating n/s/n/s or lined up in rings & random swirly fractal shapes, but when formatted the area of each bit needs the dipoles aligned together. if you could attach little pulleys to the dipoles and let em go, they would swing around to the random shape and the energy could be 'extracted'. chip based memory will have similar properties - in order to store a 0 or a 1 the states of the atoms need to be put in an unnatural state and will want to decay back, the difference in energy states is the potential energy. solid state memory has a larger energy barrier before they can flip which allows them to remember stuff when unpowered (thermal fluctuations in the atoms energy are not enough to push it over the energy hump) which means they can preserve their entropy for longer. but by definition this means the energy cost of using them is higher - no free lunches with thermodynamics.

note that most of the energy costs are due to the massively inefficient systems we use. single atom scale devices would use less than a billionth of what we use now, but the same principles would apply. to make any pattern of bits/information will require changing states of something, and the states must be different energy levels of some kind. changing state will require energy, and this energy must be larger than 'thermal noise' else the information will decay. the energy difference between these states will be the potential energy stored per bit, and by e=mc^2, it will have extra mass to be formatted.

# ¿ Oct 24, 2017 10:33

OzyMandrill: Aug 12, 2013; Look upon my words
and despair

amd have the lead in the games market, playstation & xbox both use amd hardware. i guess that is a significant percentage of profit that nvidia can't touch.
this has the knock on effect that unless it's a dedicated pc game, all time/money will be spent optimising shaders for amd hardware. pc performance is secondary - go buy a faster board if you want bigger numbers.

# ¿ Nov 13, 2017 14:11

Adbot: ADBOT LOVES YOU

# ¿ May 18, 2024 06:15

OzyMandrill: Aug 12, 2013; Look upon my words
and despair

a quick google suggests that intel are the bosses of the gpus market from their integrated stuff, and they've started on a combined radeon/intel core for laptops. i wouldnt be surprised if long term, radeon becomes the default 'hi performance' integrated gpu for cheap mass market 'gaming rigs'

# ¿ Nov 13, 2017 14:53

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > gpgpgpu hot take thread: the "gp" stands for "garbage pile"