Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Schmerm
Sep 1, 2000
College Slice
Would it be possible to generate vertex normal data using a geometry shader?

Adbot
ADBOT LOVES YOU

slovach
Oct 6, 2005
Lennie Fuckin' Briscoe
I'm not sure if these warnings are legit or not when I exit. I have it down to basically a bare window that just clears to a color. Here's all the objects:

obj:1,,DXGI Swap Chain,,0,,0,0,0,0,1
obj:2,,D3D11 Device,*,0,,0,0,0,0,1
obj:3,,D3D11 Device Context,*,0,,0,0,0,0,1
obj:4,,DXGI Device,,0,,0,0,0,0,1
obj:5,,DXGI Surface,,0,DXGI_FORMAT_R8G8B8A8_UNORM,0,640,480,0,1
obj:6,,D3D11 Texture2D,,1228800,DXGI_FORMAT_R8G8B8A8_UNORM,1,640,480,0,1
obj:7,,D3D11 Render Target View,*,0,,0,0,0,0,1
obj:8,,D3D11 Texture2D,,1228800,DXGI_FORMAT_D24_UNORM_S8_UINT,1,640,480,0,1
obj:9,,D3D11 Depth-Stencil View,*,0,,0,0,0,0,1
obj:10,,D3D11 Depth-Stencil State,*,0,,0,0,0,0,1
obj:11,,D3D11 Rasterizer State,*,0,,0,0,0,0,1


Then I close and it whines about stuff like:
code:
DXGI WARNING: Live Object :      1 [ STATE_CREATION WARNING #0: ]

(Lots of these, and it changes per run. I don't get the same amount)
D3D11 WARNING: 	Live Object at 0x0000000002B52E20, Refcount: 0. [ STATE_CREATION WARNING #0: UNKNOWN]
D3D11 WARNING: 	Live Object at 0x0000000002B536B0, Refcount: 0. [ STATE_CREATION WARNING #0: UNKNOWN]
D3D11 WARNING: 	Live Object at 0x0000000002B53A80, Refcount: 0. [ STATE_CREATION WARNING #0: UNKNOWN]
D3D11 WARNING: 	Live Object at 0x0000000002BF11F0, Refcount: 0. [ STATE_CREATION WARNING #0: UNKNOWN]
D3D11 WARNING: Live                         Object :     32 [ STATE_CREATION WARNING #0: UNKNOWN]
Well, that's the swap chain to start, which I guess is reasonable enough... but it's definitely released before this. I have no idea what those other warnings can be for, and object 32 doesn't even look like anything I own. Can this be some kind of driver thing, cause I'm starting to think this list is composed of lies, or I'm misunderstanding something.

edit: I'm loving dumb and really was leaking an object somewhere. Somewhere along the line :downs: happened and the end result was, for example using my backbuffer, that I was releasing the texture object, but not the render target view.

slovach fucked around with this message at 05:52 on Aug 31, 2013

Madox
Oct 25, 2004
Recedite, plebes!
Not sure if you know this or not, but you can assign a string to each object which then shows up during that end program spew to make stuff easier to read whats what.

code:
resource->SetPrivateData(WKPDID_D3DDebugObjectName, strlen(name), name);
Where 'resource' is pretty much anything that is created by a DX call.

Boz0r
Sep 7, 2006
The Rocketship in action.
Crossposting from the Linux thread:

I'm trying to install the CUDA Toolkit in Ubuntu, but I can't get it to work.

I'm following the guide on NVIDIA's homepage, but it doesn't work.

This is what I'm getting:
code:
$ apt-get install cuda
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-5-5 (= 5.5-22) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
What am I doing wrong? I don't know a lot about Linux yet.

Boz0r
Sep 7, 2006
The Rocketship in action.
I hope it's okay that I make a new post just after my last one, but I don't think anyone will read it if I just add it :)

I have to do a report in parallel computing in CUDA of my own choice. I've been looking at different forms of particle physics like fluids, as I mentioned a few posts back, and N-body simulation. Does anyone have any recommendations for introductory material in this area, or another area in particle physics that would be cool to implement? Nothing too fancy as I haven't tried it before and I have three weeks to do it in.

Xerophyte
Mar 17, 2008

This space intentionally left blank
This is perhaps more on the rendering side than the simulation side but "A survey of ocean simulation and rendering techniques in computer graphics" is a nice paper to start with if you want pretty oceans (fluid simulation in general, not so much). It's got a handy lists of papers with appropriate models that one may implement, looking up the more recent ones should give something you can use. I hope.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

I have a couple of newbie questions, if anyone can help.

I've been doing a bit of work learning OpenGL ES 2.0, and I'm having a bit of trouble finding information. It might be best if I explain what I think is going on and what I'm trying to do:

So I have a bunch of 2D textures generated and filled with bitmaps, and when it comes to using them in the shader program I need to activate a texture unit and bind the texture to it. Then I pass in to the shader the numbers of the texture units I'm using, and the shader can use them. This all works fine.

My understanding is that a texture unit is actually in hardware, maybe some dedicated memory near the GPU? Different devices have different numbers of texture units available - the one I'm using (a Galaxy Nexus, if it matters) reports 8 available texture units when I call
code:
GLES20.glGetIntegerv(GLES20.GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS,maxTexUnits,0);
which is the maximum number available, not the max any one shader can use.

If these are in hardware, it seems like there would be a performance benefit in keeping textures in a texture unit as long as possible, and even if they're not I'd also avoid unnecessary glActiveTexture and glBindTexture calls which don't look cheap. But my phone is reporting 8 texture units, and I can actually activate and bind to any of the texture unit constants, right up to GL_TEXTURE31 (the highest in the API) and everything still renders just fine.

So my questions are really: are texture units in fast GPU memory, so am I taking the right approach trying to fill them and not swap too much in and out? And what's with the ID numbers, are they just for my reference (with the actual memory arrangement handled however the system sees fit) so I can use any 8 I like, but no more?

Or is this a really bad way to do things, and they're really meant for passing several textures to the shader at once and not for any kind of actual storage?

MarsMattel
May 25, 2001

God, I've heard about those cults Ted. People dressing up in black and saying Our Lord's going to come back and save us all.
I'm working on an isosurface renderer in OpenGL, and have hit a bit of a problem. I have started using textures (previously I just used vertex colours) and now have a bit of a problem (I think) with the way I'm storing my vertex & index data.

The world is split into "blocks" each of which contains a "mesh" which is list of vertices and the triangles that use them. When my materials just corresponded to a vertex colour, this was fine as I could just use each vertex's colour directly. Now my vertices have a material index which I use to determine which sampler2d should be used. My code in the fragment shader looks a bit like this:

code:
vec4 colour;
if (vertexMaterial == 0)
{
    colour = texture(tex0, coord);
} 
else if (vertexMaterial == 1)
{
    colour = texture(tex1, coord);
}
I'm under the impression that branching in shaders is a bad idea performance wise, and of course this doesn't seem particularly scaleable -- I'll need a separate sampler2D for each material I want to use (or worse, if I want multiple textures per material).

I know the correct/standard/traditional way of handling this would be to sort my triangles by material and draw all the triangles of a given material type, then the next material type and so on.

However, this is not really compatible with how I'm managing my buffers for the meshes. Since the meshes represent a voxel dataset, I want to be able to update the mesh quickly when adding & removing vertices via CSG operations. To that end, I manage the vertex and triangle lists as a compact array -- when one element is removed, the end element is copied into its place and the size of the list decremented. Sorting this list by material type would impact performance quite a bit (I suspect, not actually tried) due to additional swap operations. Additionally, I often have triangles that contain more than 1 material type, so I'm not sure how those would be handled with the batching of triangles by material approach.

I hope that all makes sense :)

My questions then are:
1. Is branching like this in a shader a bad idea?
2. Is there a way to do something like the batching of triangles by material while also keeping my compact array approach? I've not benchmarked any other approaches to managing the VBOs really, so I'm open to other approaches here.

Any other input / questioning of my sanity would also be welcome :)

Orzo
Sep 3, 2004

IT! IT is confusing! Say your goddamn pronouns!
Branching in a shader is not recommended, as you've pointed out. Hell, I do it anyway because it hasn't impacted performance for me, but you could just do this:
code:
vec4 c1 = texture(tex0, coord) * (1 - vertexMaterial);
vec4 c2 = texture(tex1, coord) * vertexMaterial;
colour = c1 + c2;
Remember to profile both solutions to see which is better.

It seems counter-intuitive, but texture lookups are presumably faster than branching.

MarsMattel
May 25, 2001

God, I've heard about those cults Ted. People dressing up in black and saying Our Lord's going to come back and save us all.
Interesting, I had forgotten about that technique. I've seen it mentioned before in avoid-branching-in-shaders discussions. I suspect I'll need to do some benchmarking to see which approach is best.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...
Branching in a shader like that is completely harmless, since it should be uniform across all pixels generated by the same triangle. In other words, the triangle will have a single material id, so the branch will essentially be constant. Branching only has a real cost (beyond the relatively cheap ops needed to compute the branch condition) if the branch is divergent -- i.e. if two pixels that are part of the same SIMD workload on the core ("close in screenspace" for lack of a better term) take different paths at the branch. Otherwise, it might as well not be there.

The Texture approach, on the other hand, is going to be far worse because you're effectively doubling your texture bandwidth requirements (not to mention texture fetch requests) in every case. Again, if a triangle has a uniform material type, then what you've effectively done is is cut your texture cache size in half by doubling the number of requests for no real gain.

Of course, in reality the original shader is still flawed. Assuming that Tex0 and Tex1 are sampled with mip-maps, then the compiler will actually have to move BOTH fetches outside of the branches and replace it with a conditional move anyways, meaning that they'll end up generating almost exactly the same code at the GPU level anyways. Why? Because in order to find the mip level, the shader needs the screen-space derivative of the texture coordinates (dDX(U), dDY(U), dDX(V), dDY(V)) to figure out the minificiation power. Since the GPU does this by comparing registers from adjacent pixels, it requires that all shaders have a common code-path to the point where the derivatives are calculated (and thus to where the textures are fetched). Thus your shader below becomes:

code:
vec4 colour;
vec4 colour0 = texture(tex0, coord);
vec4 colour1 = texture(tex1, coord);
vec4 colour = (vertexMaterial == 0) ? colour0 : colour1;
You can get around this problem by computing the derivatives manually outside the loop, and then doing the fetch in a branch as before:

code:
vec4 colour;
vec2 coord_ddx = ddx(coord);
vec2 coord_ddy = ddy(coord);
if (vertexMaterial == 0)
{
    colour = textureGrad(tex0, coord, coord_dx, coord_dy);
} 
else if (vertexMaterial == 1)
{
    colour = textureGrad(tex, coord, coord_dx, coord_dy);
}
This should be the most efficient method, unless you're already saturating your texture fetch pipeline in the shader (because TextureGrad can't be issued as fast as the simpler Texture instruction).

MarsMattel
May 25, 2001

God, I've heard about those cults Ted. People dressing up in black and saying Our Lord's going to come back and save us all.
Interesting, thanks!

Now, I'm not sure this is the correct way of doing things (i.e. I'm not sure if I'm handling multiple materials correctly in my polygon generation), but I can currently occasionally end up with a single triangle having multiple materials. How would that impact things? Would there be a performance hit only on those triangles with multiple materials (which since there would be a small number of these triangles, probably isn't much of a problem)?

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

MarsMattel posted:

Interesting, thanks!

Now, I'm not sure this is the correct way of doing things (i.e. I'm not sure if I'm handling multiple materials correctly in my polygon generation), but I can currently occasionally end up with a single triangle having multiple materials. How would that impact things? Would there be a performance hit only on those triangles with multiple materials (which since there would be a small number of these triangles, probably isn't much of a problem)?

Correct; you'd pay the cost for the divergence only for triangles with fragments (actually 32-fragment clusters -- something like 8x4 pixel blocks usually) that follow both branches.

Check this out http://www.nvidia.com/content/PDF/GDC2011/Nathan_Hoobler.pdf (p. 22)
This is a bit more technical/specific than you are looking for, but the concepts are basically the same, and the diagrams should make it a bit more clear. The other sections of that article are also applicable.

Hubis fucked around with this message at 04:35 on Nov 14, 2013

MarsMattel
May 25, 2001

God, I've heard about those cults Ted. People dressing up in black and saying Our Lord's going to come back and save us all.
That's great, thanks again.

Spite
Jul 27, 2001

Small chance of that...

baka kaba posted:

I have a couple of newbie questions, if anyone can help.

I've been doing a bit of work learning OpenGL ES 2.0, and I'm having a bit of trouble finding information. It might be best if I explain what I think is going on and what I'm trying to do:

So I have a bunch of 2D textures generated and filled with bitmaps, and when it comes to using them in the shader program I need to activate a texture unit and bind the texture to it. Then I pass in to the shader the numbers of the texture units I'm using, and the shader can use them. This all works fine.

My understanding is that a texture unit is actually in hardware, maybe some dedicated memory near the GPU? Different devices have different numbers of texture units available - the one I'm using (a Galaxy Nexus, if it matters) reports 8 available texture units when I call
code:
GLES20.glGetIntegerv(GLES20.GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS,maxTexUnits,0);
which is the maximum number available, not the max any one shader can use.

If these are in hardware, it seems like there would be a performance benefit in keeping textures in a texture unit as long as possible, and even if they're not I'd also avoid unnecessary glActiveTexture and glBindTexture calls which don't look cheap. But my phone is reporting 8 texture units, and I can actually activate and bind to any of the texture unit constants, right up to GL_TEXTURE31 (the highest in the API) and everything still renders just fine.

So my questions are really: are texture units in fast GPU memory, so am I taking the right approach trying to fill them and not swap too much in and out? And what's with the ID numbers, are they just for my reference (with the actual memory arrangement handled however the system sees fit) so I can use any 8 I like, but no more?

Or is this a really bad way to do things, and they're really meant for passing several textures to the shader at once and not for any kind of actual storage?

So, this is old but I can perhaps shed some light on it.

Firstly, calling ActiveTexture with anything more than GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS-1 should be an error. It is a bug in the implementation if you can set, say, texture 8. You are probably just reseting the same unit over and over (which would be one you specified with the last call to glActiveTexture that succeeded).

You're misunderstanding what it means by "texture unit."

On a basic level, texturing involves some memory on the GPU to hold the actual texture data in some format (which is almost always something other than linear the normal array of pixels you're used to thinking of). There's usual a texture cache, because texture accesses tend to be pretty coherent spatially (if you are sampling a pixel, it's very likely you'll be looking at ones immediately around it soon).

There is also usually specific hardware to take your texture coordinates, use them to lookup into whatever format the data is actually in, fetch it and do whatever sampling, interpolation, etc, you've asked for. If you set up filtering, this will actually result in multiple lookups and some interpolation to return you an actual color. This piece of hardware is what you usually refer to as a texture unit - if you could read memory directly you could write code to do it all by hand. But it's always doing the same thing and is faster to have hardware do it.

So your hardware has 8 of these units, which means you can access 8 different textures in a shader, one per unit. They aren't specific caches or pieces of memory, they are more like lookup and translation units.

As for keeping them bound you are correct that it will be faster, but not for that reason. Calling glActiveTexture and glBindTexture does not actually move data around, it changes GL state. And it actually changes 2 levels of state because OpenGL is a terribly designed API.

Think of OpenGL has a giant state machine. When you create a context, you are making a big struct with all the OpenGL state that is mandated by the OpenGL spec. You can't access this struct directly, you have to use the GL API calls. Originally, OpenGL only allowed one texture, so it only had the one call to change the current texture state, which is glBindTexture. You give it the type and the texture 'name' (that numerical id you got from glGenTextures) and that's that. Eventually they added the support for multiple textures, but instead of letting you specify which unit to modify directly, they added the glActiveTexture call, which tells GL which texture unit to modify in subsequent calls.

Think of the code looking like this:
code:
struct Giant_GL_Context
{
.....
//some state
uint32_t currentActiveTexture;
.....
//texture state
GLInternalDriverTexture* textures[INTERNAL_MAX_TEXTURES];
.....
};

void glActiveTexture(GLenum texture)
{
    //there would be error checks here

    //note: GL_TEXTURE0 is 0x84C0 but we want a real index
    context->currentActiveTexture = texture-GL_TEXTURE0;
}

void glBindTexture(GLenum target, GLuint texture)
{
    //there would also be error checking to make sure target is sane and texture exists, etc
    GLInternalDriverTexture* tex = internalLookupTextureByName(context, texture);
    context->textures[context->currentActiveTexture] = tex;

    //other stuff
}
State changes are the most expensive thing in a graphics API on the CPU side. So if you can reorder things to minimize them you'll see a benefit. If your app is small, it's probably not worth the effort, but if you are doing 100s of draws a frame, it's worth your while.
Instead of doing:
set texture A
draw mesh A
set texture B
draw mesh B
set texture A
draw mesh C
set texture B
draw mesh D
try:
set texture A
draw mesh A
draw mesh C
set texture B
draw mesh B
draw mesh D

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Thanks for this, I really appreciate it!

Yeah I'm doing a fair amount of draws on each frame, so I'm trying to get as much general efficiency as I can - I don't control the actual content so I can't necessarily draw things grouped by texture, but I'll look into that later. Right now I'm doing some basic redundancy checks (don't rebind a buffer or texture if it was the last one bound), and I have a little function that juggles bound texture units as a cache. That last one turned out not to work with more than two texture units, for some reason - weird things happen, maybe I've done something very silly that I can discover later.

If I'm understanding you right, texture units are really like hardware assistants - you tell them which texture they'll be working with, and changing state is really just reconfiguring them to work with another set of data. I guess my confusion is in understanding exactly where my textures exist. When I call glTexImage2D, where is that data actually held? I was assuming it was somewhere in the normal memory space, and that by binding a texture it was actually shuttling it across to the GPU's memory, which is why I was trying to cache as much as I could. Does OpenGL manage this internally, moving textures to the GPU when necessary, or am I responsible for managing GPU memory myself, unloading and reloading textures as necessary?

These sound like some seriously basic questions and I feel like I have a massive gap in my understanding here, so if anyone has some good resources that deal with this I'd appreciate it. I'm starting to work on memory management and if GPU memory is getting involved I can see a world of hurt heading my way

MarsMattel
May 25, 2001

God, I've heard about those cults Ted. People dressing up in black and saying Our Lord's going to come back and save us all.
I've implemented a basic deferred rendering setup where I write position, normal and diffuse colour data to 3 textures and then read them in a second pass to produce the final image. I'm now trying to access the depth buffer as a texture, but when I view the texture in gDEBugger all the values are set to 255.

My code looks like this:
code:
glActiveTexture(GL_TEXTURE3);
glGenTextures(1, &depthBuffer);
glBindTexture(GL_TEXTURE_2D, depthBuffer);
glTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT, screenWidth_, screenHeight_, 0, GL_DEPTH_COMPONENT, GL_UNSIGNED_BYTE, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_COMPARE_MODE, GL_COMPARE_REF_TO_TEXTURE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_COMPARE_FUNC, GL_LESS);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, depthBuffer, 0);

createGBufferTexture(GL_TEXTURE0, GL_RGB32F, posTex);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, posTex, 0);

createGBufferTexture(GL_TEXTURE1, GL_RGB32F, normalTex);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, normalTex, 0);

createGBufferTexture(GL_TEXTURE2, GL_RGB32F, colourTex);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, colourTex, 0);

GLenum drawBuffers[] = { GL_NONE, GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, GL_COLOR_ATTACHMENT2 };
glDrawBuffers(4, drawBuffers);
If I remove the first glFramebufferTexture2D() call then the depth testing goes wonky, which would suggest that OpenGL is using depthBuffer for the depth testing.

Am I right in thinking then if I examine depthBuffer in gDEBugger it should have the depth data OpenGL used? Is there anything obvious I've missed? Or am I just misreading gDEBugger?

Grocer Goodwill
Jul 17, 2003

Not just one kind of bread, but a whole variety.

baka kaba posted:

When I call glTexImage2D, where is that data actually held? I was assuming it was somewhere in the normal memory space, and that by binding a texture it was actually shuttling it across to the GPU's memory, which is why I was trying to cache as much as I could. Does OpenGL manage this internally, moving textures to the GPU when necessary, or am I responsible for managing GPU memory myself, unloading and reloading textures as necessary?

Textures live in VRAM whether they're bound or not. The driver wont move them out into system memory unless VRAM is exhausted. (This is a simplification since the GPU is a shared device and drivers have all sorts of fancy caching schemes, but this is the GL programmer's view of things). You don't have to worry about texture memory management unless you have so many of them that they can't all fit in VRAM at once. And even then, the only control you have over it is the order of your draw calls.

MarsMattel
May 25, 2001

God, I've heard about those cults Ted. People dressing up in black and saying Our Lord's going to come back and save us all.
It seems the problem was my depth buffer is non-linear so when I viewed the image in gDEBugger all I saw was white because all the values were either 1 or very close to it. When I implemented the transform from this using the value read from my depth buffer, I got the output I was expecting.

I think what's really thrown me here is that the default FBO's depth buffer seems to have this conversion done (or perhaps its linear?) automatically, so when I view that in gDEBugger, I see the a linear image. This is quite confusing :)

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Grocer Goodwill posted:

Textures live in VRAM whether they're bound or not. The driver wont move them out into system memory unless VRAM is exhausted. (This is a simplification since the GPU is a shared device and drivers have all sorts of fancy caching schemes, but this is the GL programmer's view of things). You don't have to worry about texture memory management unless you have so many of them that they can't all fit in VRAM at once. And even then, the only control you have over it is the order of your draw calls.

Do you mean (and again sorry for my slow drawl approach to OpenGL here) that it's only a problem if you have a shader program trying to make use of >VRAM-worth of textures in one draw call? And otherwise there'll be some internal housekeeping going on that will work fine, it'll just suffer some performance hits if there's a lot of shifting going on?

So could I technically assign any amount of texture data (provided the memory is available on my app's heap), and as long as I don't try to access too much of it simultaneously in one draw call, OpenGL will take care of the rest? I'm currently only ever using a single texture with each mesh, but I'd like to keep as much texture data cached as possible since loading and converting bitmaps is fairly expensive.

Grocer Goodwill
Jul 17, 2003

Not just one kind of bread, but a whole variety.

baka kaba posted:

Do you mean (and again sorry for my slow drawl approach to OpenGL here) that it's only a problem if you have a shader program trying to make use of >VRAM-worth of textures in one draw call?

It's not per draw, it's over the lifetime of your app.

baka kaba posted:

And otherwise there'll be some internal housekeeping going on that will work fine, it'll just suffer some performance hits if there's a lot of shifting going on?

That's correct.

baka kaba posted:

So could I technically assign any amount of texture data (provided the memory is available on my app's heap), and as long as I don't try to access too much of it simultaneously in one draw call, OpenGL will take care of the rest? I'm currently only ever using a single texture with each mesh, but I'd like to keep as much texture data cached as possible since loading and converting bitmaps is fairly expensive.

OpenGL will always take care of it. Once you call glTexImage you can (and should) free the memory that you passed in. Once GL has the texture data, you will never need to upload it again unless you delete the texture object altogether with glDeleteTextures.

haveblue
Aug 15, 2005



Toilet Rascal

Grocer Goodwill posted:

It's not per draw, it's over the lifetime of your app.

The lifetime of the scene, really. It depends a lot on what your app is trying to do (i.e. a game transitioning to a new level cannot avoid shuffling a lot of stuff around) but in general don't mix draw calls and changes to the working set.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
So, one thing's really been bugging me about heightmap terrain. Generally, the ideal size of a terrain heightmap is a power of 2 + 1 height samples per axis, because then performing LOD on it is just a matter of collapsing it to half resolution, and it can be partially collapsed.

How do you align textures with that though (i.e. alpha masks for terrain features), since those generally need to be (or perform much better if they're) a power of 2 on each axis instead?

OneEightHundred fucked around with this message at 00:07 on Jan 1, 2014

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

OneEightHundred posted:

So, one thing's really been bugging me about heightmap terrain. Generally, the ideal size of a terrain heightmap is a power of 2 + 1 height samples per axis, because then performing LOD on it is just a matter of collapsing it to half resolution, and it can be partially collapsed.

How do you align textures with that though (i.e. alpha masks for terrain features), since those generally need to be (or perform much better if they're) a power of 2 on each axis instead?

Well, two parts:

First, performance on non-power-of-2 textures is on par with every other surface on any reasonably modern GPU -- if someone is saying otherwise, either it's very old advice, or they've got results I'd be very interested in seeing.

Second, it doesn't matter because your geometry is at (2^N + 1) but your textures can still just be (2^M). If M = N, you'll end up with one texel per "square" in the geometry, since the heightmap describes the edges of the patches. Draw it out and you'll see what I mean.

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

Hubis posted:

Second, it doesn't matter because your geometry is at (2^N + 1) but your textures can still just be (2^M). If M = N, you'll end up with one texel per "square" in the geometry, since the heightmap describes the edges of the patches. Draw it out and you'll see what I mean.
That's the problem though, they'll have one texel per quad, but they'll be centered on the terrain quads, which has problems with the edges in particular because the area between the edge texels and the edge of the terrain quads will either be clamped or mirror the opposite side of the terrain.

What I'd like is for the terrain texels to be centered on the mesh points, but doing that requires a 2^N+1 texture.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

OneEightHundred posted:

That's the problem though, they'll have one texel per quad, but they'll be centered on the terrain quads, which has problems with the edges in particular because the area between the edge texels and the edge of the terrain quads will either be clamped or mirror the opposite side of the terrain.

What I'd like is for the terrain texels to be centered on the mesh points, but doing that requires a 2^N+1 texture.

Ah, I see what you're saying. In those cases, your options are either resize the textures to (2^N+1) like you said, OR resize it to (2^N+2) and create a guard band by copying the second / inner row of the neighboring texture into outer row of the texture so that it inerpolates seamlessly. Note that you'll have to be crafty with how you generate mip maps in either case, and still continue to have discontinuities with anisotropic filtering. On the other hand, the guard-band method can be extended to be up to 16px wide (for example) so that anisotropic filtering still works correctly.

Madox
Oct 25, 2004
Recedite, plebes!
Hey guys - I am having a problem using multisamapled textures in DX11 and really have nowhere else to turn.

I am using a multisampled render target for the main scene and need to do some post processing on it, which is no big deal so I am rendering to texture which gets fed into the post processing shader. I discovered I had to use Texture2DMS<float4> as the resource type and use .Load() to sample the shader, but it wants the sample level as a parameter.

Assuming I want the post processing shader to just do nothing and only output the input texture, how do output the multisampled nature of the input properly? Do I need to .Load() each sample (IE 8 loads if its 8x MS)? How do I output those 8 values from the pixel shader?

Also as a test, I only sampled level 0 of the texture and it always seemed to return unexpected colors, where as everything is fine with a normal texture. Do i need to combine all the levels somehow to get the expected color?

edit:
I'm aware that I can probably copy the multisample texture to a plain texture and then use that instead, but wonder if there is a proper way to use the multisample texture directly without that extra step

Madox fucked around with this message at 19:56 on Jan 2, 2014

ani47
Jul 25, 2007
+
I've never touched dx11 but I'll give it a go...

When you have a msaa surface you apply the aa blur by doing a resolve of the surface (combining 2, 4, 8 pixels into 1 using the msaa depth map). The driver/dx normally takes care of this for you, in dx11 though you seem to be able to call ResolveSubresource to do it manually. You would nearly always do this before any post process.

If you needed to read write the un-resolved surface (I guess) you would have alias the surface with a render target with an adjusted size. So for a 2xmsaa 1280x720 surface the alias target would be 2560x720 and read/write from the same pixels. This is how you would write a shader yourself to do the resolve but I don't know if you can do this through dx (this is how you do it on the consoles).

Hopefully that's some help :).

Madox
Oct 25, 2004
Recedite, plebes!
Yes, ResolveSubresource() is what I'm using for the 'copy the multisample texture to a plain texture' step and it works great. I didn't realize that I was asking how to write my own resolver which makes sense now. Still don't really know what the specifics of that would be so I'll still with ResolveSubresource.

High Protein
Jul 12, 2009
I've done it by just binding a same-size none-MSAA render target, and using Load() for each sample of the original MSAA target, and averaging them. However I was using the DX11 Effects framework and compiled the shader at run-time, which made it easy to substitute in the actual level of MSAA to use in my loop.

DX11 can also execute shaders for each individual sample, for what it's worth.

BattleMaster
Aug 14, 2000

I'm playing around with stencil buffer reflections in OpenGL 4. I've discovered that creating a mirror of my geometry with a reflection matrix will flip the polygons around, which turns things inside out when back face culling is on. Should I just switch to front face culling (or switch the "front" of my polygons to the clockwise direction) when rendering the reflection or is there a nicer way to do it?

BattleMaster fucked around with this message at 07:13 on Jan 3, 2014

haveblue
Aug 15, 2005



Toilet Rascal
No, that's the right way to do it. If you reflect polygons across a plane you effectively reverse their winding, and it's easier to tell OpenGL to match that than try to undo it by further modifying the geometry.

BattleMaster
Aug 14, 2000

Cool thanks, it works great. It's funny, a few days ago I was looking at the different options for face culling and I was wondering "why would anyone want to use anything other than back?" I guess I should assume that if something exists in the spec it exists for a reason.

Earlier I was also puzzling over why the UniformMatrix family of functions had an argument to transpose your matrix. And last night I was having trouble getting an orthogonal projection matrix to work. It turns out that I was formatting my matrix as row major when OpenGL wants them as column major. So I set transpose to true when I fed it my matrix and it worked perfectly. I guess they put that there for people/libraries that prefer to work with it in row-major. Honestly it's a wonder that my rotation and perspective matrices worked when I was giving it the transpose by accident.

Edit: Kind of a silly question but would it be problematic to feed my shader several transform matrices (for instance, a transpose, rotation, and scale matrices) and have the GPU multiply them together rather than doing it on the CPU and feeding only one to it? I have a feeling that the GPU is faster at multiplying floating-point matrices than the CPU is but I'm not sure how expensive it is to feed uniforms to a shader program. (Though I guess doing it once on the CPU instead of having the GPU do it 3 times for every triangle might be better.)

BattleMaster fucked around with this message at 09:32 on Jan 3, 2014

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

BattleMaster posted:


Edit: Kind of a silly question but would it be problematic to feed my shader several transform matrices (for instance, a transpose, rotation, and scale matrices) and have the GPU multiply them together rather than doing it on the CPU and feeding only one to it? I have a feeling that the GPU is faster at multiplying floating-point matrices than the CPU is but I'm not sure how expensive it is to feed uniforms to a shader program. (Though I guess doing it once on the CPU instead of having the GPU do it 3 times for every triangle might be better.)

I thought this but remember every GPU thread will then have to recompute it (maybe it'll be cached per workgroup/warp but I don't know how much that applies to shaders)

Sending three matrices is probably gonna take longer than a few optimized mults here and there.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Madox posted:

edit:
I'm aware that I can probably copy the multisample texture to a plain texture and then use that instead, but wonder if there is a proper way to use the multisample texture directly without that extra step

This has kind of already been answered, but to underline -- No, there's no way to Sample() from a MSAA texture. This is because the samples in MSAA are not at regular intervals, and so the interpolator hardware can't know how to automatically interpolate at a given texture coordinate. Load()ing individual samples explicitly and then resolving them yourself (either as a pre-process, or in the sampling shader) is the only semantically meaningful way to access it.

ResolveSubresource() triggers a fast path that does a box filter on the MSAA texture to a non-MSAA texture of the same resolution (in other words, averaging the samples for each pixel). It's worth noting that this actually isn't the best way to resolve an MSAA texture quality-wise, but that gets into sampling theory. The other option would be to, as you said, either write your own resolve pass or do the MSAA resolve per-sample in the shader (probably bad performance-wise).

Spite
Jul 27, 2001

Small chance of that...

BattleMaster posted:

Cool thanks, it works great. It's funny, a few days ago I was looking at the different options for face culling and I was wondering "why would anyone want to use anything other than back?" I guess I should assume that if something exists in the spec it exists for a reason.

Don't assume this :) The spec is a giant political document as much as a functional one.

BattleMaster posted:

Earlier I was also puzzling over why the UniformMatrix family of functions had an argument to transpose your matrix. And last night I was having trouble getting an orthogonal projection matrix to work. It turns out that I was formatting my matrix as row major when OpenGL wants them as column major. So I set transpose to true when I fed it my matrix and it worked perfectly. I guess they put that there for people/libraries that prefer to work with it in row-major. Honestly it's a wonder that my rotation and perspective matrices worked when I was giving it the transpose by accident.

Direct3D also takes row-major matrixes. That argument was mainly a bone to throw to people porting stuff to GL from D3D. Transgaming, etc. I think it's a bad feature since I feel the GL should not do conversions itself and this requires the CPU to do the transpose.

BattleMaster posted:

Edit: Kind of a silly question but would it be problematic to feed my shader several transform matrices (for instance, a transpose, rotation, and scale matrices) and have the GPU multiply them together rather than doing it on the CPU and feeding only one to it? I have a feeling that the GPU is faster at multiplying floating-point matrices than the CPU is but I'm not sure how expensive it is to feed uniforms to a shader program. (Though I guess doing it once on the CPU instead of having the GPU do it 3 times for every triangle might be better.)

Generally, hoist everything you can. So if you can hoist something off the GPU (or push it up the pipeline), it's worthwhile. You'd have one matrix multiply instead of # of vertexes multiplies.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Spite posted:

Generally, hoist everything you can. So if you can hoist something off the GPU (or push it up the pipeline), it's worthwhile. You'd have one matrix multiply instead of # of vertexes multiplies.

While this is generally true, I'd only do it if it doesn't require you to update constant buffers more often. Mapping and updating buffers adds several memcpy and buffer management overhead in the API, wheras the cost of the transform is probably negligible (because it will only matter if your vertex transform is actually bottlenecking throughput).

Sauer
Sep 13, 2005

Socialize Everything!
With Direct3D 11 Microsoft introduced safe multi-threading capabilities to the Direct3D API such as letting you create resources in a different thread and build lists of rendering commands in a context running on another thread. You can then submit the command lists to the main thread for drawing and the API/Driver handles the synchronization for you.

Is there an equivalent to this in OpenGL? I imagine multi-threaded resource creation can be done by creating shared contexts in separate threads; but what about building rendering commands? I have this notion of using old display lists on a shared context but that seems silly.

I've been out of touch with OpenGL for a long time and want to port some code to a non-Windows platform.

BattleMaster
Aug 14, 2000

Spite posted:

Direct3D also takes row-major matrixes. That argument was mainly a bone to throw to people porting stuff to GL from D3D. Transgaming, etc. I think it's a bad feature since I feel the GL should not do conversions itself and this requires the CPU to do the transpose.

Oh so the CPU has to transpose it instead of the GPU doing it with hardware? I guess I'll just get used to column-major matrices.

quote:

Generally, hoist everything you can. So if you can hoist something off the GPU (or push it up the pipeline), it's worthwhile. You'd have one matrix multiply instead of # of vertexes multiplies.

I did some more reading and it turns out that uniforms are treated as constant during the execution of the shader, so I'm guessing that a number of uniforms multiplied together is only done once until one of them is updated again.

Adbot
ADBOT LOVES YOU

Grocer Goodwill
Jul 17, 2003

Not just one kind of bread, but a whole variety.

BattleMaster posted:

Oh so the CPU has to transpose it instead of the GPU doing it with hardware? I guess I'll just get used to column-major matrices.

It's slightly more efficient to leave the matrices row-major and reverse the multiplication in the shader. i.e. mul(vec, mat) instead of mul(mat, vec). This compiles to 4 independent dot product instructions instead of 4 dependent madds. Though, this is less relevant on modern scalar-only hardware.

BattleMaster posted:

I did some more reading and it turns out that uniforms are treated as constant during the execution of the shader, so I'm guessing that a number of uniforms multiplied together is only done once until one of them is updated again.

They're not constant in the same sense as a compile-time constant in C/C++. There's no magic the shader compiler can do to coalesce several uniform matrix multiplies down to one.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply