3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

Zerf: Dec 17, 2004; I miss you, sandman

Scaevolus posted:

In OpenGL, what's the best way to render a million cubes? (in a 100x100x100 grid) They won't be moving. Should I use display lists or vertex buffer objects?

Take this with a grain of salt, I haven't worked with GL for a few years. Why don't you just use both? VBOs and display lists are mutually exclusive.

VBOs makes your data reside on the graphics card, which will be much faster than immediate mode(glVertex3f etc). Display lists just records your GL function calls.

If we just play with the thought that you would issue a draw call for each of the million boxes(which you're not, I hope) you could make a VBO out for the box, and record the million drawcalls into a display list, so when you need to draw everything you have an already-compiled command buffer to use(i.e. you traded function call time for memory).

# ¿ May 12, 2010 21:12

Adbot: ADBOT LOVES YOU

# ¿ May 11, 2024 18:40

Zerf: Dec 17, 2004; I miss you, sandman

Contero posted:

I can get some recommendations on papers or tutorials for rendering fire effects? Any kind of fire.

Do you want it to be useful or just play around? If I had time some more time to just experiment I'd definitely look into this: http://users.skynet.be/fquake/

# ¿ Jun 24, 2010 23:02

Zerf: Dec 17, 2004; I miss you, sandman

Hammertime posted:

what's the best way for me to gain a thorough modern OpenGL education?

I find that the easiest way is just to skip the specifics(i.e. what API you're using) and just read what the IHVs gives presentations about, be it DirectX or OpenGL. Working with graphics, you're going to be bound by hardware anyways, so what works good in DirectX is probably going to work good in OpenGL too.

What I would do is just skim through Nvidia/AMD dev websites and look at performance articles/presentations such as http://developer.download.nvidia.com/presentations/2008/GDC/GDC08-D3DDay-Performance.pdf (this is a bit old, I just took something as an example)

You'll find lots of information on http://developer.amd.com/, http://developer.nvidia.com/ and maybe even http://software.intel.com/en-us/visual-computing/. And when in doubt, use a profiler!

Zerf fucked around with this message at 20:29 on Jul 19, 2010

# ¿ Jul 19, 2010 20:25

Zerf: Dec 17, 2004; I miss you, sandman

Harokey posted:

This has worked fine for normal shapes, but I'm having a bit of trouble doing it for my arbitrary "polygon" shape. Is there an algorithm to do this? Or am I maybe going about this the wrong way?

I don't know how far you want to push this, but if you want an algorithm that handles odd cases you should look up http://en.wikipedia.org/wiki/Straight_skeleton

The algorithm I used last time I implemented straight skeletons was quite difficult to get robust though, so I'd advise against using them unless it's really necessary.

# ¿ May 1, 2011 22:42

Zerf: Dec 17, 2004; I miss you, sandman

unixbeard posted:

Does anyone know a good book on intermediate/advanced opengl programming? I'd like something that covers commonly used techniques that are beyond introductory basics, so stuff like SSAO, volumetric rendering, etc. I know I can find lots of info on the web about this stuff but I'd like it if there was a book I could just work through that has consistent writing and code style. I've been reading through realtime rendering which is great, so something around that level but perhaps less comprehensive and with more code.

Try any of the GPU Gems, ShaderX or GPU Pro books. They're not OpenGL specific but they have some really neat articles. The GPU Gems are also available for free at Nvidia(https://developer.nvidia.com/content/gpu-gems-3 for example), so you can see if there's something you are interested in. I'd also rate the GPU Gems series higher than the others in terms of quality, but it was a while ago since the latest release.

I don't know if there is such a book you are asking for, usually more advanced stuff like this you learn from the web, articles and/or talks at SIGGRAPH/GDC.

# ¿ Aug 17, 2013 08:10

Zerf: Dec 17, 2004; I miss you, sandman

czg posted:

So, I'm trying to write some stuff using SlimDX which is a C# wrapper around DirectX, using DirectX 11.
What's the best way to debug the rendering here?
PIX stopped working a long time ago after some .NET update.
The Visual Studio graphics diagnostics tool works perfectly fine, until I try stepping through a shader and all my locals are either 0 or NaN which makes it pretty much worthless.
Nvidia Nsight is set up right and I think it works, except when it starts the .exe it just immediately throws an exception with "The operation completed successfully" and closes.

Working with SlimDX has been pretty smooth so far, but now I'm trying to debug my shadow mapping and it's a nightmare not being able to tell exactly what is going on in the shaders.

You could give Intel GPA a shot I suppose - but pretty much every PC app for graphics debugging sucks compared to the console tools. Personally I find that GPA does at least a decent job, but it's far from perfect.

# ¿ Feb 5, 2014 19:01

Zerf: Dec 17, 2004; I miss you, sandman

Raenir Salazar posted:

*skinning stuff*

It's kind of hard to know exactly what your transformations look like just from this code, but in general this is what you want to do for each joint:

code:

jointTransform = ...(parent transforms here)... * transform[parentJoint] * transform[thisJoint] * inverseBindPose[thisJoint]

Note that inverseBindPose should only occur once in a jointTransform and not be "inherited" from the parent. Also, don't rely exactly on the matrix multiplication order, because I'm too tired to think about the correct order.

Once you have a transform for each joint, you can upload all those to a shader and use the formula you posted, something like this(if you limit influence to four transforms):

code:

for( int i = 0; i < 4; i++ ) {
  skinnedPosition += mul( jointTransform[jointIndex[i]], Position ) * Weight[i];
}

It's maybe not the best of explanations, but if you are missing any of these parts, it could perhaps give you a hint on where your bugs are hiding.

# ¿ Mar 11, 2014 19:33

Zerf: Dec 17, 2004; I miss you, sandman

Raenir Salazar posted:

by shaders do you mean modern opengl stuff? We've been using immediate mode/old opengl so far.

What actually outputs the mesh is:

So what I've been trying to do is transform the mesh, upload the new mesh, and then that gets drawn.

I don't have a real heirarchy for my skeleton, I just draw lines between two points for each bone and each bone is a pair of two joints kept in a vector array.

So to clarify, my skeleton animation works perfectly but making the jump from that to my mesh is whats difficult.

Then I think what you are missing is the inverse bind pose transform. Is this a school assignment? Is that transform mentioned somewhere? A simple example why it's needed:

Imagine that we have two joints, one at position j1abs(5,0,0) and one at position j2abs(5,2,0). j2abs has a relative transform to j1abs which looks like j2rel(0,2,0). Now we have a vertex which we want to skin. This vertex is placed at v1abs(5,3,0). For simplicity, we want to attach this vertex only to the j2 joint. We cannot apply j2s absolute transform to the vertex position right away(that would give us a new position v1'abs(5+5,2+3,0+0)=(10,5,0) which is not what we want). Therefore, we define the inverse bind pose transform to be the transformation from a joint to the origin of the model. In other words, we want a transform which transforms a position in the model to the local space of a joint. With translations, this is simple, we can just invert the transform by negating it, giving us j2invBindPose(-5,-2,0).

Now, lets try and apply these transformations. First we take the vertex v1 and multiply with the inverse bind pose for j2. This results in a position(0,1,0)(see, we are now in joint local space). Now we can simply apply j1rel(5,0,0) and j2rel(0,2,0) which gives us v1'abs(0+5+0,1+0+2,0+0+0)=(5,3,0), right were we started.

Now imagine we change j1rels transform to j1rel(6,1,0). We again take v1abs(5,3,0)*j2invBindPose(-5,-2,0)*j1rel(6,1,0)*j2rel(0,2,0) = ( 5 + -5 + 6 + 0, 3 + -2 + 1 + 2, 0+0+0+0) = v1'abs(6,4,0), which is exactly what we want.

So, does this explanation make sense to you or have I succeeded in making you more confused?

# ¿ Mar 11, 2014 20:42

Zerf: Dec 17, 2004; I miss you, sandman

Raenir Salazar posted:

I think I see what you mean but isn't that handled by assigning weights?

Example, vertex[0] = <0.0018 0.0003 0.8716 0.0003 0.0004 0 0 0.0006 0.0007 0.0001 0 0.0063 0.0046 0 0.0585 0.0546 0.0002>

We're given the file that has the associated weights for every vertex. Each float is for a particular bone from 0 to 16; the file has the 17 weights for each of the 6669 vertexes.

e: Out of curiosity Zerf do you have skype and any chance could I add you for not just figuring this out but for opengl help and advice in general.

There might be other ways to do this but the most common way is to use weights and apply them to different joint transforms(usually in shaders but there's no stopping you from doing it on the CPU if you really want to).

Following my example, say that you have three joints: j1, j2 and j3. j2 has j1 as parent and j3 has j2 as parent. You then compute all three joint transforms, including the inverse bind pose, like so:

j1compound = j1inverseBindPose * j1rel
j2compound = j2inverseBindPose * j1rel * j2rel
j3compound = j3inverseBindPose * j1rel * j2rel * j3rel

Then, for a vertex that is affected by all three with the following weights <0.2,0.3,0.5>, calculate its skinned position with the formula you first posted, i.e.:

v1' = v1 * j1compound * 0.2 + v1 * j2compound * 0.3 + v1 * j3compound * 0.5

That should give you the correct position for a vertex that is skinned to all three joints.

As for my Skype, I'm really rusty on OpenGL and I usually don't have much time to answer questions like these, so I'd rather not give it away. You can always PM me questions though, just beware that sometimes I might not find time to answer them for a couple of days.

# ¿ Mar 11, 2014 22:20

Zerf: Dec 17, 2004; I miss you, sandman

Raenir Salazar posted:

Okay so to compute the inverseBindPose, you said:

The problem I see here is that the skeleton/joint, the animations and mesh coordinates were all given seperately, meaning that they actually are not aligned. I have no idea which vertexes vaguely align with which joint; so I would need the skeleton aligned first (I only have managed this imperfectly with mostly trial and error.

I think I do have the 'center' (lets call it C)of the mesh, so when I take the coordinates of a joint, make a vector between it and C and then transform them, its moved close to but not exactly to it, and is off by some strange offset which I've determined is roughly <.2,.1>.

So take that combined value, the constant offset, plus the vector <C-rJnt>; and now make a vector between that and the C center of the mesh? Each animation frame is with respect to the rest pose and not sequential.

I have the animation now vaguely working and renders, not using the above, but the skeleton desync's with the mesh and seems to slowly veers away from the mesh as it goes on.

Well, with origin of the model I actually meant (0,0,0). You don't need any other connection between the joint and the vertices other than the inverse bind pose, because that will transform the vertex to joint-local space no matter where the vertex is from the beginning. You don't need to involve the point C at all in your calculations.

Also note that the inverse bind pose is constant over time, you only need to calculate it once. The compound transforms you need to compute each frame(obviously since the relative position between joints might change).

# ¿ Mar 12, 2014 19:05

Zerf: Dec 17, 2004; I miss you, sandman

Raenir Salazar posted:

code:

if (b[0] == 0)
{
 if (b[1] == 1)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(H, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(I, N).rgb;
 }
 else if (b[1] == 2)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(H, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(J, N).rgb;
 }
 else if (b[1] == 3)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(H, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(K, N).rgb;
 }
 else if (b[1] == 4)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(H, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(L, N).rgb;
 }
 else if (b[1] == 5)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(H, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(M, N).rgb;
 }
}
else if (b[0] == 1)
{
 if (b[1] == 0)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(I, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(H, N).rgb;
 }
 else if (b[1] == 2)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(I, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(J, N).rgb;
 }
 else if (b[1] == 3)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(I, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(K, N).rgb;
 }
 else if (b[1] == 4)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(I, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(L, N).rgb;
 }
 else if (b[1] == 5)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(I, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(M, N).rgb;
 }
}
else if (b[0] == 2)
{
 if (b[1] == 0)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(J, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(H, N).rgb;
 }
 else if (b[1] == 1)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(J, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(I, N).rgb;
 }
 else if (b[1] == 3)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(J, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(K, N).rgb;
 }
 else if (b[1] == 4)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(J, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(L, N).rgb;
 }
 else if (b[1] == 5)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(J, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(M, N).rgb;
 }
}
else if (b[0] == 3)
{
 if (b[1] == 0)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(K, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(H, N).rgb;
 }
 else if (b[1] == 1)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(K, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(I, N).rgb;
 }
 else if (b[1] == 2)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(K, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(J, N).rgb;
 }
 else if (b[1] == 4)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(K, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(L, N).rgb;
 }
 else if (b[1] == 5)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(K, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(M, N).rgb;
 }
}
else if (b[0] == 4)
{
 if (b[1] == 0)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(L, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(H, N).rgb;
 }
 else if (b[1] == 1)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(L, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(I, N).rgb;
 }
 else if (b[1] == 2)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(L, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(J, N).rgb;
 }
 else if (b[1] == 3)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(L, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(K, N).rgb;
 }
 else if (b[1] == 5)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(L, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(M, N).rgb;
 }
}
else if (b[0] == 5)
{
 if (b[1] == 0)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(M, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(H, N).rgb;
 }
 else if (b[1] == 1)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(M, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(I, N).rgb;
 }
 else if (b[1] == 2)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(M, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(J, N).rgb;
 }
 else if (b[1] == 3)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(M, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(K, N).rgb;
 }
 else if (b[1] == 4)
 {
  tempColor = (a[0]/(a[0] + a[1]))*tex2D(M, N).rgb + (1 -  (a[0]/(a[0] + a[1]))) * tex2D(L, N).rgb;
 }
}

Did you mean to write this?

code:

sampler2D lookup[6] = { H, I, J, K, L, M };
tempColor = lerp( tex2D(lookup[b[1]], N).rgb, tex2D(lookup[b[0]], N).rgb, a[0]/(a[0] + a[1]) );

(I'm unfamiliar with Unity/C#/whatever Unity uses to write shaders, but the code you posted must be able to be written in a much better way...)

# ¿ Aug 23, 2015 07:57

Zerf: Dec 17, 2004; I miss you, sandman

Raenir Salazar posted:

The error I got was that the sampler2D had to be a "literal expression" which seemed to rule out any sort of variable assignment and thus ruled out much simpler code. It's possible that stuffing them into an array would be more helpful, I'll try it out.

If you haven't got support for texture arrays, this should work with your existing code:

code:

vec3 lookup[6] = { 
	tex2D(H, N).rgb,
	tex2D(I, N).rgb,
	tex2D(J, N).rgb,
	tex2D(K, N).rgb,
	tex2D(L, N).rgb,
	tex2D(M, N).rgb };
tempColor = lerp( lookup[b[1]], lookup[b[0]], a[0]/(a[0] + a[1]) );

It's of course slower(6 samples instead of 2), but since you're trying things out I think you should opt for code that is easy to understand and modify. Texture arrays(or an atlas texture) can be implemented later if it turns out you're GPU bound.

# ¿ Aug 23, 2015 08:23

Zerf: Dec 17, 2004; I miss you, sandman

Joda posted:

Could it be fill related? When you're zoomed in the triangles take up more of the viewport, so if you're committing them all to memory you're gonna be doing way more writes when you're zoomed in. What happens if you just draw lines in stead of full triangles?

Fill rate would be my guess - check the blending states and especially measure overdraw - if you see a lot of objects at the same time doing alphablending and ignoring the z-buffer, and fill the entire screen with it, you could see large performance drops. But instead of speculating about it, there should be some performance analyzer program that you could download? Intel GPA maybe? I know very little about the tools you have available on OSX, so can't give you any proper recommendations.

# ¿ Apr 24, 2016 21:41

Zerf: Dec 17, 2004; I miss you, sandman

Doc Block posted:

On iOS, Apple has performance analyzers for Metal that will detect some things that hurt performance and give you a complete breakdown/trace analysis for a frame, showing you every API call made during that frame and how long each draw call took etc. But according to Lord Funk, the trace analyzer for Metal isn't available on OS X.

edit: I kinda wanna say it's related to clipping somehow (driver bug?), since it really only seems to happen once the view goes inside the models but not when the models take up a lot of the screen but the viewer is still outside them. Lord Funk said changing to lines and/or just using a shader that outputs solid white with no blending doesn't make the problem go away, so it doesn't seem like a fill rate problem.

How did I miss that entire post? My reading comprehension yesterday must've been broken. Sorry for that Lord funk.

Advice still stands though, try downloading a performance analyzer of some kind, to get more information on what takes up more time.

# ¿ Apr 25, 2016 11:07

Zerf: Dec 17, 2004; I miss you, sandman

Distance fields also have some other nice properties when it comes to text effects, like dropshadows, outer/inner glow etc. Doing similar things for meshes/vector fonts is non-trivial and involves computing the straight skeleton or similar.

Ralith posted:

They're also both slower to render (even ignoring preprocessing!)...

Please elaborate on this. If we ignore preprocessing, rendering distance field fonts is just a plain texture lookup and some simple maths(which is essentially free because the texture lookup).

# ¿ Apr 13, 2017 20:25

Zerf: Dec 17, 2004; I miss you, sandman

Ralith posted:

It surprised me too. I linked experimental data. There's discussion of implementation details as well.

I skimmed through the link, but where do you come to the conclusion that this is faster than distance field rendering? All the comparisons seems to be against CPU-based rasterizers, and the GPU part seems non-trivial to implement.

It's probably faster than distance fields if you include the preprocessing they require, but ideally you preprocess each glyph once (or once per desired resolution) and end up with a super-low-res image that can easily be cached and is fully satisfactory for most use cases.

Don't get me wrong, that link seems like a good idea when dealing with rasterization of fonts though, but I still believe distance fields provide much more bang for the buck.

# ¿ Apr 14, 2017 10:08

Zerf: Dec 17, 2004; I miss you, sandman

Xerophyte posted:

GLyphy is a GPU implementation that uses signed distance fields:

It surprises me, last I heard anyone say anything on the subject Loop-Blinn was considered complicated and (with filtering, at least) pretty slow.

Oh, I see, that's why. Thanks. On the other hand, here's an excerpt from the GLyphy Github repo:

quote:

The main difference between GLyphy and other SDF-based OpenGL renderers is that most other projects sample the SDF into a texture. This has all the usual problems that sampling has. Ie. it distorts the outline and is low quality.

GLyphy instead represents the SDF using actual vectors submitted to the GPU. This results in very high quality rendering, though at a much higher runtime cost.

So sure, if you are going to use SDF without computing it to a texture, it's going to be expensive. I still believe regular, texture-based SDF variants will be both simpler and faster than doing any other font rasterization on the GPU (but with the caveat that generating the texture is expensive and sampling artifacts can occur).

Zerf fucked around with this message at 12:17 on Apr 14, 2017

# ¿ Apr 14, 2017 12:13

Zerf: Dec 17, 2004; I miss you, sandman

peepsalot posted:

Are there any online calculator or utility that help create transformation matrices?

I use WolframAlpha quite much; it's super handy. For example, it can do symbolic matrix inversions etc.

What are you after specifically?

# ¿ Nov 29, 2017 20:41

Zerf: Dec 17, 2004; I miss you, sandman

lord funk posted:

Yeah that makes total sense! Thanks for the approach details.

I do want to render the objects each frame, so they can react to environment lighting changes.

Heh, funny you should bring this up. I just implemented this last week. My solution was to handle it in the shader. Since perspective transform is non-linear, it means that each affected vertex now needs to be multiplied by two matrices instead of one and some meddling between the multiplications. Quite a simple solution, but it works well.

# ¿ Mar 22, 2018 21:05

Zerf: Dec 17, 2004; I miss you, sandman

Hubis posted:

Sorry, was phone-posting!

Geometry Shaders: The reason they're usually bad is because DirectX did not provide any relaxation to the "Rasterization Order" requirement -- the primitives must be rasterized downstream in the exact order in which they are generated (at least in circumstances where they would overlap). This can become a problem if you do expansion (or culling) in the GS because now each GS invocation has to serialize to make sure the outputs are written to a buffer for later rasterization in the right order. It might not be an issue if you're not actually geometry-limited, but it's generally something to be concerned about. Slow isn't useless though, and NVIDIA has come up with some cool ways to use the GS without invoking any major performance penalty (like Multi-Projection) but it has a bad rap in general.

Quad-Per-Instance: GPUs are wide processors. NVIDIA shader units essentially process 32 threads in parallel each instruction (and they have many such shader units). One quirk is that, at least on some iterations of the hardware, a given 32-thread "warp" can only process one instance at a time when executing vertex shaders. This means that if you have a 4-vertex instance then 4 threads are going to be enabled and 28 threads are going to be predicated off (essentially idle). Your vertex processing will be running at 12.5% efficiency! If you're doing particle rendering it might be that you're going to be pixel shader/blending rate bound before the vertex shading becomes an issue, but often you have enough vertices that it will bite you.

So if you use instancing (but have multiple quads/instance so you are using all 32 threads) then you avoid all these potholes.

Graphics is fun!

Nice, currently sitting doing some Vulkan stuff, enjoying bindless, and doing a lot of stuff in batches, so this was really informative. But I take it then, that instancing in itself isn't that bad, it's just the extremes when you get really low vertex count per instance?

# ¿ Mar 28, 2018 13:54

Zerf: Dec 17, 2004; I miss you, sandman

Can you even model that using blend modes? The equation contains abs(base-blend), and there's no standard way of doing that AFAIK(but I think Nvidia has a ton of blend extensions). Reference formula: https://github.com/jamieowen/glsl-blend/blob/master/difference.glsl

Edit:
Here's GL:s extension: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_blend_equation_advanced.txt

Maybe something like that exists for Metal?

Zerf fucked around with this message at 18:58 on Mar 20, 2019

# ¿ Mar 20, 2019 18:55

Zerf: Dec 17, 2004; I miss you, sandman

Doc Block posted:

Vulkan question:

Is there really no better way to be able to change textures mid-frame than by having a separate descriptor set for each and every image and then binding them as needed?

Sure, there's the insane bind-less "Just make a huge array of textures, index into it with a push constant" method, but that seems to bring a bunch of problems with it. What if you don't know ahead of time how many images you'll be loading? What if you pick an array size that's too big for what the hardware supports? Both of those seem to suggest building and compiling the shaders at run-time, which sucks. And might mean rebuilding and recompiling the shader(s) if you wind up needing to load more images later.

The push descriptors extension seems custom-made for fixing this and bringing a bit of sanity and programmer harm reduction to Vulkan, so of course AMD hates it and doesn't support it.

There's the descriptor update templates extension, but it looks like you can't queue the updates up in a command buffer unless you've also got the push descriptors extension, in which case you're still screwed if you want to support AMD. Otherwise you're stuck updating between frames, which completely misses the point.

We use the bindless approach and use an array of texture arrays, so texture count is not really an issue. We bind each texture array once upon creation, but the rest of the time a texture manager keeps track of which indices in each array that points to which texture, and streams them in/out when needed. As such, there's no need to compile shaders at runtime, because we're well within limits.

We don't use push constants for the texture indices either, rather they are placed in a storage buffer. Each model/mesh then gets fed an entity index in some way (via per instance data/push constant/glBaseInstanceIndex etc.) and this index is used to access different model/mesh settings.

# ¿ May 25, 2019 09:41

Zerf: Dec 17, 2004; I miss you, sandman

Doc Block posted:

Doesn't that still leave you having to create a million descriptors, though? I read somewhere that some implementations have a really low number of max descriptors. And still seems like a hassle if you need to load or unload images on the fly, since wouldn't you have to rebuild the buffer of descriptor sets?

We create a total of 2 descriptor sets, since we double buffer most things. Each descriptor set contains approx 100 entries.

The only extra work after initial setup is if we run out of space in a texture array and need to create and bind another one. Again, no biggie to update 2 descriptor sets...

# ¿ May 26, 2019 07:51

Adbot: ADBOT LOVES YOU

# ¿ May 11, 2024 18:40

Zerf: Dec 17, 2004; I miss you, sandman

If I understand your problem correctly, you don't actually want to use lerp at all, because mixing red and blue doesn't make sense for the middle values.

You might be looking for the over operator found here: https://en.wikipedia.org/wiki/Alpha_compositing

I.e. "out = outline + shadow * (1-outline.a);"

# ¿ Aug 22, 2020 18:55

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)