3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

«‹›2 »

Doc Block: Apr 15, 2003; Fun Shoe

Am I allowed to ask Metal questions in here?

# ¿ Nov 5, 2015 23:47

Adbot: ADBOT LOVES YOU

# ¿ May 11, 2024 13:11

Doc Block: Apr 15, 2003; Fun Shoe

Well, here's the thing. I want to add a bloom effect to my AppleTV game (whose engine I wrote using Metal), and I've got it working, but it's slooooow.

The basic steps I'm doing right now:
1) Render the scene into a framebuffer (RGBA16F) that's the same resolution as the screen
2) Create a specular buffer by rendering the output of Step 1 into a texture that's 1/4th the screen resolution, using a shader that drops any values below a brightness threshold.
3) Blur the specular buffer
4) Render a fullscreen quad to the screen, with a shader that takes the output of Step 1 and Step 3 and combines them.

This pretty much kills the device's fillrate and is not kind to the GPU's tile cache. But there doesn't really seem to be a better way.

I can use multiple render targets to get rid of Step 2, but then the specular buffer has to be full resolution and makes Step 3 a lot slower. And there's still the need to store values outside of the tile cache and then read them back in, which is a performance no-no on PowerVR.

Basically, does anyone have a better way to do this in Metal? Hopefully one that's able to do most/all of the work inside the tile cache so the GPU doesn't have to touch main memory until the tile is finished?

# ¿ Nov 6, 2015 19:32

Doc Block: Apr 15, 2003; Fun Shoe

And, of course, now that I've written it all out it occurs to me that large chunks of that could possibly be combined into just one or two compute shaders. Welp. v :shobon:

# ¿ Nov 6, 2015 19:36

Doc Block: Apr 15, 2003; Fun Shoe

That's what I'm using to do the blur, but there's a performance penalty involved in switching the GPU from draw mode to compute mode and then back within the same frame. The Metal Performance Shader documentation suggests either doing all your compute at the beginning or at the end of the frame to avoid this.

Even without that penalty, I imagine it'd be a lot faster if I wrote a compute shader (or two) that took the initial color buffer, downscaled it & clipped the sub-threshold colors to black, blurred that, then combined that with the original and just wrote it to the final drawable in as few operations as possible instead of a bunch of discrete steps like I'm doing now.

Plus, Metal Performance Shaders are only available on A8 GPUs and up, so if I bring the game to iPhone later I'd have to come up with an alternate solution for A7 devices anyway (never mind having to do an OpenGL renderer).

Doc Block fucked around with this message at 20:45 on Nov 6, 2015

# ¿ Nov 6, 2015 20:43

Doc Block: Apr 15, 2003; Fun Shoe

I'm using RGBA16F for the offscreen color buffer so that it preserves brightness above 1.0 instead of just clipping, which is useful for doing post processing effects like bloom.

I've tried downsampling the specular buffer further, but then it looks blocky and shimmers if I downsample it enough to make a noticeable dent in performance. It's actually 1/8th the screen resolution, not 1/4 as I said earlier (divided by 4 in each direction = 1/8th not 1/4th, whoops).

The specular buffer pixel format is just R8.

I could definitely stand to have at least the bloom effect be a frame behind. Hadn't thought of that...

# ¿ Nov 7, 2015 07:37

Doc Block: Apr 15, 2003; Fun Shoe

Xcode has some pretty nice tools for Metal. You can do a GPU frame capture and it will give you an exact breakdown of everything, including how long it all takes, what the various state objects are set to at any given point during the frame, and even what the frame looked like at any specific draw call (with funky green outlines showing you what exactly was drawn in that particular call).

And yeah, the two biggest bottlenecks are the blur kernel and the composite pass, which combines the color buffer with the (downsampled and blurred) specular buffer and then writes it to the framebuffer.

(the "3D Mesh (normal)" pipeline isn't really a bottleneck since in that frame it's drawing multiple models with large, poorly-optimized geometry sets that take up large portions of the screen and use a crappy fragment shader that I haven't really optimized yet).

edit: those 8 million compute pipelines are the implementation of a Gaussian Blur filter from Apple's Metal Performance Shader framework).

Doc Block fucked around with this message at 08:39 on Nov 7, 2015

# ¿ Nov 7, 2015 08:36

Doc Block: Apr 15, 2003; Fun Shoe

The Metal Performance Shader library has tons of permutations for each operation, optimized for various image sizes, kernel sizes, and pixel formats. The API chooses which to use at runtime.

If memory serves, in one of the WWDC sessions Apple said they had created 60+ different Gaussian blur implementations.

# ¿ Nov 7, 2015 23:27

Doc Block: Apr 15, 2003; Fun Shoe

They explain in the WWDC session video: https://developer.apple.com/videos/play/wwdc2015-607

Anyway, I'll try a 2-stage blur, but given what I've seen so far of dependent texture reads on PowerVR, in not confident it'll be fast. Especially not if it has to swap out the tile cache to main memory. I think having to do multiple tile cache stores per frame is part of my current problem.

# ¿ Nov 7, 2015 23:56

Doc Block: Apr 15, 2003; Fun Shoe

Whatever they're called then when you specify the texcoords in the fragment shader so the GPU doesn't know to prefetch the necessary texels.

Or is that not an issue anymore? I'll have to play around and see.

# ¿ Nov 8, 2015 02:37

Doc Block: Apr 15, 2003; Fun Shoe

I'm just way overthinking this, it seems. I'll try to get back to obsessing over this stupid bloom effect that I just had to have :rolleyes:

in my game later this week or next weekend.

Thanks guys.

Doc Block fucked around with this message at 08:26 on Nov 8, 2015

# ¿ Nov 8, 2015 08:12

Doc Block: Apr 15, 2003; Fun Shoe

Can you read from the destination color buffer in OpenGL? If so, just write a shader that reads the corresponding pixel in the destination color buffer, and if its alpha > 0 then discard the fragment by calling gl_DiscardFragment() (or whatever it's actually named).

What are you trying to do?

# ¿ Nov 20, 2015 05:01

Doc Block: Apr 15, 2003; Fun Shoe

You can't read the destination color buffer in a shader in OpenGL? I thought you could...

You can in Metal :colbert:

Doc Block fucked around with this message at 05:35 on Nov 20, 2015

# ¿ Nov 20, 2015 05:30

Doc Block: Apr 15, 2003; Fun Shoe

It seems like no one agrees on the definition for the "handedness" of a coordinate system. I've heard OpenGL's called left handed, heard it called right handed, same for DirectX and Metal.

Some people say that your thumb points towards +X, index towards +Y, and middle finger towards +Z, and whether you use your left or right hand is the "handedness" (i.e. left handed means +Z goes away from the viewer, right handed has +Z coming towards the viewer), but that doesn't account for weirdos who insist X and Y are both horizontal and that Z represents height.

What would you call this coordinate system?

I'm asking because I seem to have screwed up my matrix math somewhere, using examples from sources using different definitions of handedness.

Doc Block fucked around with this message at 06:28 on Nov 23, 2015

# ¿ Nov 23, 2015 06:22

Doc Block: Apr 15, 2003; Fun Shoe

You can do that with either hand, though. The discussion is complicated by the weirdos who insist on having X and Y be the horizontal axes, with Z being up instead of depth.

But yeah, the general consensus seems to be that the coordinate system in the image I posted is left handed. But it makes me wonder why I've seen people say OpenGL's coordinate system is right handed, since IIRC in OpenGL +Z is going into the screen.

Ugh. I just want my matrixes to not be weird. Right now I have to do a special case for rotation, and invert the angle when I rotate something manually but leave it alone when I'm just plugging the physics engine's rotation values into the rotation matrix function.

Doc Block fucked around with this message at 08:44 on Nov 23, 2015

# ¿ Nov 23, 2015 08:40

Doc Block: Apr 15, 2003; Fun Shoe

I see...

3D math is my greatest weakness when it comes to game development.

# ¿ Nov 23, 2015 08:45

Doc Block: Apr 15, 2003; Fun Shoe

I'm using Metal, which is device space as well. I ~~stole~~ copied Apple's simd matrix math functions from their Metal example code, whose projection matrix functions set up a left-handed coordinate system (+Z into the screen).

For whatever reason, their rotation matrix function seems to be rotating things counter-clockwise (so +45 degrees around the Z axis results in the object being tilted to the left), while the physics engine I'm using is rotating things clockwise, so I have to special case out manual vs physics engine rotation and invert the angle for one of them.

I had thought that maybe rotations appearing "wrong" had to do with the Z axis getting flipped somewhere, but that doesn't seem to be the case. And when I started looking online for alternate rotation matrix functions that applied the rotation clockwise, my sleep-deprived brain got confused.

Doc Block fucked around with this message at 09:08 on Nov 23, 2015

# ¿ Nov 23, 2015 08:57

Doc Block: Apr 15, 2003; Fun Shoe

Joda posted:

Are you sure your physics engine doesn't have a way to get the model transformation matrix directly? Not sure about other physics engines/libraries, but Bullet 's motion states have a getOpenGLMatrix() function that will fill out an OpenGL formatted 4x4 matrix for you. I imagine something like that would be a fairly standard feature for any physics engine aimed at games. Of course, Metal may use a different major form than OpenGL, but that should be an easy fix.

I should point out that my physics engine is 2D. Since my game is top-down, I'm cheating and using Chipmunk2D for physics since I've used it to make games before, and know it can do exactly what I need (namely planetary/solar orbiting).

In the interest of doing what a 2D game developer would expect, Chipmunk2D handles rotation so that positive values result in clockwise rotation, and 0 degrees points North instead of East.

I've fixed my game now, so that it knows to invert the rotation value if it isn't coming from the physics engine. I know that counter-clockwise is probably the mathematically correct way, but it's easier on my brain if I keep it so that positive value = clockwise.

# ¿ Nov 24, 2015 00:27

Doc Block: Apr 15, 2003; Fun Shoe

I don't think he means in terms of "this is what API calls you make to do X" but in terms of "this is how a modern PC GPU works" and "these are the techniques modern 3D engines use"?

b/c that's more important than how to do phong shading etc IMHO

edit: of course, you have to get that stuff out of the way before you can get to subsurface scattering, SSAO, etc

# ¿ Dec 6, 2015 05:58

Doc Block: Apr 15, 2003; Fun Shoe

One little nitpick is that OpenGL was based on Iris GL, which wasn't some private internal API but rather was simply specific to SGI machines and their IRIX operating system.

# ¿ Dec 10, 2015 07:58

Doc Block: Apr 15, 2003; Fun Shoe

On iOS, the call to glClear() is treated as a hint to the GPU that it doesn't need to copy the tile back in. I'd be kinda surprised if even Android didn't do the same.

edit: Also, in the third paragraph you state, "OpenGL has the advantage of being implemented independently by most vendors, and is generally platform-specific." Should that be "is generally not platform-specific" ?

# ¿ Dec 10, 2015 08:34

Doc Block: Apr 15, 2003; Fun Shoe

IIRC, Apple's OpenGL docs for iOS specifically say that you should always call glClear() at the start of a frame because it's a hint to their OpenGL driver that the application doesn't care about the previous frame's contents and so the GPU can just clear the tile cache instead of loading the previous frame into it.

edit: that's for iOS/tvOS, not OS X, obviously.

edit 2: which isn't as fast as being able to tell the GPU not to bother loading the previous frame AND not to bother clearing it, but is better than having the GPU fill the tile cache with the previous frame's contents every frame, though.

Doc Block fucked around with this message at 09:13 on Dec 10, 2015

# ¿ Dec 10, 2015 09:07

Doc Block: Apr 15, 2003; Fun Shoe

A small side benefit of Vulkan/Metal/DX12 being new, modern APIs is that all the information you find online about them is also new and modern. Right now, if you Google for "how to do X in OpenGL", half the links are to articles covering OpenGL 1.x or 2.x.

# ¿ Dec 12, 2015 01:49

Doc Block: Apr 15, 2003; Fun Shoe

Keep in mind that Metal doesn't draw anything when you encode a draw call. So if you make a single, universal uniform buffer for modelViewProjectionMatrix and friends and then keep changing the contents before committing the command buffer, only the last contents will be in there when it actually does the draw calls.

edit: there's a method you can call that basically does what OpenGL does (copy the uniforms into driver-managed memory) without the overhead of creating a new buffer, with the caveat that your uniform struct has to be at least 256 bytes and a multiple of 16 or 256 bytes long (depending on if you're writing for iOS or OS X). I'll look it up when I get home in a few hours.

Doc Block fucked around with this message at 18:04 on Dec 12, 2015

# ¿ Dec 12, 2015 17:59

Doc Block: Apr 15, 2003; Fun Shoe

If that's what you want to do, yeah.

My engine cheats and just gives each object its own uniforms buffer. So it attaches their buffer before issuing the draw call for them, and since each -setVertexBuffer... call is an individual command it's OK.

Most objects don't change often in my game, so their uniforms don't change often and so it was easier to just have the objects cache them in a Metal buffer and set that buffer at draw time. Please don't laugh at me...

# ¿ Dec 12, 2015 18:12

Doc Block: Apr 15, 2003; Fun Shoe

Turn off back face culling. The shading language should have a way to determine if the current fragment is on the front or back face (Metal shading language does, GLSL probably does too).

edit: never mind, that doesn't help determine if the winding is wrong.

# ¿ Feb 16, 2016 17:16

Doc Block: Apr 15, 2003; Fun Shoe

The_Franz posted:

Apple's Metal is actually somewhat of a nice middle-ground since it's basically GL4/DX11 with pipeline objects and command buffers. It's explicit enough to not introduce unexpected stalls without requiring you to manually manage memory and barriers. Unfortunately it still has some design decisions that make it feel like you are working with mittens on and lacks support for things like tessellation and geometry shaders.

Metal does kinda require you to manage memory, just not to the level that Vulkan seems to.

The thing about Metal is that it was designed for mobile GPUs with shared memory, where the memory management concerns are about whether all your textures will fit in memory alongside your other assets without getting your app killed, rather than about how to slice up available VRAM.

That's one of the things that I like about Metal as opposed to OpenGL ES: Metal doesn't pretend your little mobile GPU has its own VRAM. Textures and vertexes and shader arguments are all just backed by untyped buffers in main memory at the end of the day. You don't need to map/unmap/lock/unlock/whatever a buffer before you change it, since there's no copying between main memory and VRAM that needs to happen. Makes the API cleaner and nicer to use IMHO. Of course, that means there's nothing stopping you from modifying a buffer or texture while the GPU is using it, either...

# ¿ Feb 20, 2016 01:07

Doc Block: Apr 15, 2003; Fun Shoe

Metal on OS X has some tacked on stuff that lets you specify you'd like a given buffer/texture to go in VRAM if the GPU has any, which almost certainly requires mapping then unmapping the buffer when you modify it. (I've only glanced at Metal OS X since my iMac is too old for it).

It also has methods to let you choose which GPU to use if the system has more than one, so you can pick the integrated GPU if your app isn't too demanding and you want to be nice to laptop users.

It's got different alignment requirements in some areas too, and obviously the GPU features are different.

# ¿ Feb 20, 2016 02:54

Doc Block: Apr 15, 2003; Fun Shoe

So... just don't do any scaling? Lots and lots of 2D games use OpenGL so they can have hardware accelerated scaling, rotation, and alpha blending, plus shaders etc

# ¿ Apr 5, 2016 06:24

Doc Block: Apr 15, 2003; Fun Shoe

To be helpful, set up your orthographic projection so that coordinates match the viewport pixels 1:1.

What are you scaling, and why? Your 2D artwork still has to match the size you want it to appear on screen, or else you'll have to scale it up/down.

Doc Block fucked around with this message at 06:32 on Apr 5, 2016

# ¿ Apr 5, 2016 06:26

Doc Block: Apr 15, 2003; Fun Shoe

You can set it to match the viewport, that's fairly common for 2D games. Then your map quads will have what are effectively screen-space coordinates.

But then you have to take into account different screen resolutions in your map drawing code...

edit: you may want to look at offsetting the ortho projection by a half pixel, so each texel of your map texture gets drawn in the center of each screen pixel. This used to be standard advice for GPU-accelerated 2D games, but IDK if anyone does it anymore.

Doc Block fucked around with this message at 15:56 on Apr 5, 2016

# ¿ Apr 5, 2016 15:50

Doc Block: Apr 15, 2003; Fun Shoe

I heard recently that ARGB is a faster texture format for modern GPUs than RGBA.

Is there any truth to this?

Doc Block fucked around with this message at 04:32 on Apr 8, 2016

# ¿ Apr 8, 2016 01:35

Doc Block: Apr 15, 2003; Fun Shoe

What I mean is that images are stored on disk as RGBA, and so once they're loaded I've just been uploading them as-is. For OpenGL/OpenGL ES this is fine since (if memory serves) the driver is free to turn an RGBA texture into ARGB etc, whereas Metal doesn't change the texture format (and probably neither do Vulkan or DX12, I'd imagine) and it's up to the application to use the most optimal texture format.

So I guess I need to add a byte-swapping stage to my texture loader.

edit:

Suspicious Dish posted:

For rendering to or texturing from?

For texturing from.

Doc Block fucked around with this message at 04:30 on Apr 8, 2016

# ¿ Apr 8, 2016 04:27

Doc Block: Apr 15, 2003; Fun Shoe

If I'm remembering correctly, when Apple introduced Metal they said in the corresponding WWDC session that textures don't get changed. But I could be mistaken. v :shobon:

# ¿ Apr 8, 2016 05:55

Doc Block: Apr 15, 2003; Fun Shoe

I just rewatched the WWDC Metal intro session video, and you're right. They specifically say that Metal reformats textures and the underlying buffer is "implementation private". Must've misheard the first time or misremembered.

So there's no point in adding a byte swapping stage to my texture loader, at least.

edit: If you really want to, though, applications can access the underlying texture data by supplying their own Metal buffer for storage, but Apple warns this prevents texture optimization, and rows have to be padded out to 64 byte boundaries. It only works on iOS, though.

Doc Block fucked around with this message at 08:28 on Apr 8, 2016

# ¿ Apr 8, 2016 06:45

Doc Block: Apr 15, 2003; Fun Shoe

How complicated/expensive is your fragment shader? How many values are being interpolated for each fragment? What's the blend mode?

edit: is this on iOS or OS X? I'm not familiar with Metal on OS X (my 2011 iMac is too old for it). I know some of the alignment requirements are different. And if your buffers are in system memory instead of VRAM then obviously things will be slower.

Doc Block fucked around with this message at 20:15 on Apr 22, 2016

# ¿ Apr 22, 2016 20:08

Doc Block: Apr 15, 2003; Fun Shoe

lord funk posted:

I did a test and made a fragment shader that just returns a constant color. Still chugs when zoomed in. I've also tried turning blending off completely, and depth testing on / off.

Here is a video of it in action (the results are the same even when not blending):
https://www.youtube.com/watch?v=b7vut8k_tOc

OS X. Yep - everything on OS X has to be 256 byte aligned.

I am going to look into the memory location. Xcode is telling me that I'm using all CPU and no GPU in the debug view, but that may just be Xcode being its usual POS broken self (the Metal system trace tool isn't even supported on OS X).

Seems like it only happens once you're inside the models. Maybe something to do with clipping?

# ¿ Apr 24, 2016 00:24

Doc Block: Apr 15, 2003; Fun Shoe

lord funk posted:

Well it's not because of filling with triangles - lines do the same thing.

Here is a new datapoint: this seems to only happen on my hi-dpi monitor. At 1280x800 on a projection it runs at 60fps.

Maybe a weird driver bug?

If this is on a Retina Mac, try disabling Retina mode or whatever just for your application. This will be in different places depending on how you're setting up your window and whether you're using MetalKit or not.

# ¿ Apr 24, 2016 00:39

Doc Block: Apr 15, 2003; Fun Shoe

He's running it on OS X, so it's either a desktop/laptop AMD GPU, or an integrated Intel one.

Metal restricts your framebuffer to being whatever the drawable's format is, which is in turn whatever format the windowing system uses. For now, that's BGRA8 on iOS and probably also OS X (offscreen FBOs can be in whatever format you want, obviously).

# ¿ Apr 24, 2016 14:02

Doc Block: Apr 15, 2003; Fun Shoe

Zerf posted:

Fill rate would be my guess - check the blending states and especially measure overdraw - if you see a lot of objects at the same time doing alphablending and ignoring the z-buffer, and fill the entire screen with it, you could see large performance drops. But instead of speculating about it, there should be some performance analyzer program that you could download? Intel GPA maybe? I know very little about the tools you have available on OSX, so can't give you any proper recommendations.

On iOS, Apple has performance analyzers for Metal that will detect some things that hurt performance and give you a complete breakdown/trace analysis for a frame, showing you every API call made during that frame and how long each draw call took etc. But according to Lord Funk, the trace analyzer for Metal isn't available on OS X.

edit: I kinda wanna say it's related to clipping somehow (driver bug?), since it really only seems to happen once the view goes inside the models but not when the models take up a lot of the screen but the viewer is still outside them. Lord Funk said changing to lines and/or just using a shader that outputs solid white with no blending doesn't make the problem go away, so it doesn't seem like a fill rate problem.

Doc Block fucked around with this message at 00:21 on Apr 25, 2016

# ¿ Apr 25, 2016 00:16

Adbot: ADBOT LOVES YOU

# ¿ May 11, 2024 13:11

Doc Block: Apr 15, 2003; Fun Shoe

Would there even be a 3rd party performance analyzer for Metal? A quick Google search doesn't turn anything up.

# ¿ Apr 25, 2016 15:48

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

«‹›2 »