Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Rhusitaurion
Sep 16, 2003

One never knows, do one?

Ralith posted:

Semaphores introduce an execution dependency, not a memory barrier. You cannot use semaphores as a substitute for memory barriers under any circumstances. For operations that span queues you need both; for operations on a single queue, semaphores aren't useful.

I'm probably misinterpreting the spec here, but the section on semaphore signaling says that all memory accesses by the device are in the first access scope, and similarly for waiting, all memory accesses by the device are in the second scope. Granted it might not be the best way to do it, but it seems like relying on a semaphore for memory dependencies is allowed.

Adbot
ADBOT LOVES YOU

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Rhusitaurion posted:

I'm probably misinterpreting the spec here, but the section on semaphore signaling says that all memory accesses by the device are in the first access scope, and similarly for waiting, all memory accesses by the device are in the second scope. Granted it might not be the best way to do it, but it seems like relying on a semaphore for memory dependencies is allowed.
No, you're right, I misremembered. Using a semaphore as you were is not unsound, just unnecessary effort for extra overhead. Note that you do typically need explicit barriers when expressing inter-queue dependencies regardless, but that's for managing ownership transitions when using resources with exclusive sharing mode.

Rhusitaurion
Sep 16, 2003

One never knows, do one?

Ralith posted:

No, you're right, I misremembered. Using a semaphore as you were is not unsound, just unnecessary effort for extra overhead. Note that you do typically need explicit barriers when expressing inter-queue dependencies regardless, but that's for managing ownership transitions when using resources with exclusive sharing mode.

Got it. Thanks for the advice - I've switched over to a single command buffer with barriers, and it seems like it works. Not sure if I got the src and dst masks and whatnot correct, but the validation layers are not complaining, at least!

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Rhusitaurion posted:

Got it. Thanks for the advice - I've switched over to a single command buffer with barriers, and it seems like it works. Not sure if I got the src and dst masks and whatnot correct, but the validation layers are not complaining, at least!
The validation layers mostly can't detect errors in barrier specification, unfortunately.

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

I need help debugging some OpenGL code which is very old and crusty (still has a mix of some fixed function pipeline stuff in there :stonk:).
Right now I'm trying to find the source of some weird graphical glitches which only show up on the Mac CI server, which is running:
OpenGL Version: 2.1 APPLE-16.7.4
GL Renderer: Apple Software Renderer

The report from the server includes framebuffer screenshots, and the glitches show as perfectly horizontal blank lines for various 3d rendered triangles, and the background just shows through. Each tri has a different set of these lines missing (exact lines are not global to screen/framebuffer).

One thing I just noticed is that the shader code is basically written with single precision float/vec3/vec4 variables in mind, but when vertex attributes are passed to GPU, glVertexAttrib3d is called, so its passing in doubles.
So my question at the moment is would mixing single/double precision in that way likely cause problems?
Does OpenGL just know to coerce doubles into floats, or is there some risk of writing out of bounds with these double width values, or behaviour is undefined in such cases, or what?

I don't use Macs so its a bit difficult to debug the problem via remote CI server.
I updated the shader recently which introduced these glitches, but it tested fine on other Linux systems etc.

Only other thing I can think is that the Apple Software Renderer has some bug in its fwidth function, which was one part of my changes.

Xerophyte
Mar 17, 2008

This space intentionally left blank

peepsalot posted:

Does OpenGL just know to coerce doubles into floats, or is there some risk of writing out of bounds with these double width values, or behaviour is undefined in such cases, or what?

Double precision vertex attributes do not exist in GL 2.1, using glVertexAttrib3d is merely specifying that the input data is doubles and they will be converted. I don't believe 2.1 has integer vertex attributes either: using glVertexAttrib3s will convert the input int16s to floats in [-1,1]. Functions that let you set 64-bit vertex attributes using doubles have an L suffix, like glVertexAttribL3d, and were added in GL4.

I would be surprised if your error was due to the lowering conversion doing something especially strange on mac.

Xeom
Mar 16, 2007
I've started to experiment a little with opengl for some 2D rendering. I've been coding my own math functions and such because hey its a fun hobby.
I got a little scene with some quads going, but I've run into a problem I can't seem to figure out.

I decided to use different Z depths to control what quad goes in front of the other, but they begin to shrink as they go away from the camera even though I'm using an orthographic matrix, which as far as I understand means that distance should not affect size. Yet they do shrink, and their location in relation to the x, y axis also changes. Its almost as if everything is being scaled towards the origin. Everything seems to work fine, until I play with the Z axis.

I do all my scaling and rotation in a 2x2 matrix, and then "promote" that matrix into a 4x4 matrix. Meaning I just copy the 2x2 values into the 4x4 identity matrix. Then I multiply that matrix by a 4x4 translation matrix. All my functions seem correct, and I'm following the second edition of 3D math for Graphics and games development for the math. Left handed convention and row ordered.

Here is some of the pertinent code.
https://pastebin.com/2CuKTUwW

Absurd Alhazred
Mar 27, 2010

by Athanatos

Xeom posted:

I've started to experiment a little with opengl for some 2D rendering. I've been coding my own math functions and such because hey its a fun hobby.
I got a little scene with some quads going, but I've run into a problem I can't seem to figure out.

I decided to use different Z depths to control what quad goes in front of the other, but they begin to shrink as they go away from the camera even though I'm using an orthographic matrix, which as far as I understand means that distance should not affect size. Yet they do shrink, and their location in relation to the x, y axis also changes. Its almost as if everything is being scaled towards the origin. Everything seems to work fine, until I play with the Z axis.

I do all my scaling and rotation in a 2x2 matrix, and then "promote" that matrix into a 4x4 matrix. Meaning I just copy the 2x2 values into the 4x4 identity matrix. Then I multiply that matrix by a 4x4 translation matrix. All my functions seem correct, and I'm following the second edition of 3D math for Graphics and games development for the math. Left handed convention and row ordered.

Here is some of the pertinent code.
https://pastebin.com/2CuKTUwW

Your "orthographic" projection looks like perspective to me.

Xerophyte
Mar 17, 2008

This space intentionally left blank
Remember that opengl matrices are column-major. Your ortho matrix sets w = n2 * p.z + 1 if using column vector math, which means you will do a sort of perspective division.

Xeom
Mar 16, 2007

Xerophyte posted:

Remember that opengl matrices are column-major. Your ortho matrix sets w = n2 * p.z + 1 if using column vector math, which means you will do a sort of perspective division.

AAaaahhh!! I remember telling myself this before implementing it and then I totally forgot. I remember even having to convince myself that row-major would actually work for everything else. Now it all looks weird to me and I'll have to convince myself again.

Thanks.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
row-major still works you just have to set the layout(row_major) flag in GLSL and hope that your GPU vendor didn't gently caress it up (they probably did)

Absurd Alhazred
Mar 27, 2010

by Athanatos

Suspicious Dish posted:

row-major still works you just have to set the layout(row_major) flag in GLSL and hope that your GPU vendor didn't gently caress it up (they probably did)

But the translation is column major. It's best to stick to just one type instead of mixing them.

Xeom
Mar 16, 2007
So my question is for functions like "glUniformMatrix4fv" that can be placed into row major mode what does that mean inside GLSL?
Currently everything seems to be working right, but I really feel like I'm missing some key element because all my math seems to mostly be in row major form yet I had to make that change yesterday. I am doing the math inside glsl with row major form meaning things read left to right rather than right to left.

Why is it working?

I should just switch to column major mode as ugly as it is to my eyes.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
glUniformMatrix4fv uploads a float[16] matrix to the GPU's memory. If you pass the row-major flag, it will transpose it to column-major when it uploads it.

GLSL has the ability to load a row-major matrix from memory with the layout(row_major) flag. This came into existence much layer than the row-major flag in glUniformMatrix4fv, and was originally intended for uniform buffers, which you're not using.

All array access inside the GLSL language is column-major, but just don't ever try to take apart or put back together matrices in GLSL and you shouldn't run into major problems.

There are other options you have, but I won't mention them so I don't confuse you.

Absurd Alhazred posted:

But the translation is column major. It's best to stick to just one type instead of mixing them.

you don't have to mix them? I don't know what this means.

Absurd Alhazred
Mar 27, 2010

by Athanatos

Suspicious Dish posted:

you don't have to mix them? I don't know what this means.

In the code that we were presented, the translation and "ortho" projection matrices were in opposite majority for the intended use. You should stick to a single majority instead of mixing them.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
oh, sure. I just use row-major for everything, I like the "3 vec4s" notation a lot for affine matricies and it fits in my head nicely.

Xeom
Mar 16, 2007
Now I'm even more confused because the book I'm using claims that matrix is already in row major form. The 4th row being in the form {dx, dy, dz, 1}. In fact the other matrix being in row form shouldn't have mattered either because my uniform call is set with GL_TRUE.

code:
	glUniformMatrix4fv(glGetUniformLocation(ID, name), 1, GL_TRUE, matrix.n);

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
to me (I could be wrong!)

row major is this, which is what GL accepts if you pass GL_TRUE to glUniformMatrix4fv

[ a b c tx ]
[ d e f ty ]
[ g h i tz ]
[ 0 0 0 1 ]

The translation components are in indexes 3, 7 and 11.

column major is this, which is what GL accepts if you pass GL_FALSE to glUniformMatrix4fv

[ a d g 0 ]
[ b e h 0 ]
[ c f i 0 ]
[ tx ty tz 1 ]

The translation components are in indexes 12, 13 and 14.

I prefer the first one because you can get a bit extra packing efficiency and store as 3 vec4s instead of 4 vec3s.

Absurd Alhazred
Mar 27, 2010

by Athanatos
I'm going to be honest, I always have to double-check myself whenever I'm editing any new matrix things in because I get confused and then different matrix libraries have inconsistent ways they handle the row of initializers.

Xerophyte
Mar 17, 2008

This space intentionally left blank
I suspect the book you're looking at is also using row vectors, in addition to row-major storage layout.

One of the annoying parts of the entire row-vs-column kerfuffle is that a lot of guides and textbooks tend to conflate matrix memory layout and vector math convention. It's historically common to use row-major matrices with the (awful, no good, very bad) row vector math convention -- i.e. float3 p1 = p0 * M -- and column-major matrices with the column vector math convention -- i.e. float3 p1 = M * p0. This convention split happened sometime in the 90s: early Iris GL used row major storage and row vectors, Open GL swapped to column major storage and column vectors.

You get subtle errors because both changing the layout type and changing the vector convention have the same result as transposing the matrix, since (Mv)T = vTMT for a vector v and matrix M. This was in fact the entire reason Open GL swapped the storage convention in the first place: it let them swap to a column vector math convention in the documentation without changing the API and breaking existing code.

tl;dr: what's a "row major" vs "column major" transform matrix depends on how the author is applying the transforms.

I'm not 100% sure that's your problem but what I can definitely say is that
C++ code:
matrix4 matrix4::ortho(const float width, const float height, 
                       const float near_plane, const float far_plane) {
    return matrix4{{ zoom_x, 0, 0, 0,
                     0, zoom_y, 0, 0,
                     0, 0, n1, n2,
                     0, 0, 0, 1,}};
}
and
C++ code:
matrix4 matrix4::translation(const float x, const float y, const float z) { 
    return matrix4{{ 1, 0, 0, 0,
                     0, 1, 0, 0,
                     0, 0, 1, 0,
                     x, y, z, 1,}};
}
are not using the same convention. If you are using the column vector-style application (float3 p1 = M * p0) then the first matrix is written assuming row-major storage, and the second is written assuming column-major storage.

If it's any consolation, I'm pretty sure everyone who has ever written any CG-related code has at least one "oh for gently caress's sake"-moment per year related to this mess.

Xeom
Mar 16, 2007
When multiplying a vec4(row) by a mat4 I don't see how your row matrix would lead to translation. Your w term in your vector would have all the translation information.

https://imgur.com/Y5ITil8

Seems to make sense to me, but clearly I'm missing something.

EDIT: written towards Suspicious Dish reading Xeros post now.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Xero's post has the more complete answer -- I intentionally didn't mention the full explanation because 99% of people only need the shorthand, but if you're going through a book, it will have its own conventions!

It's worth noting that math has had its own conventions for years, and they have goals that don't necessarily apply to modern computer science. If your book takes a maths-first perspective, then both HLSL and GLSL are backwards!

Xerophyte
Mar 17, 2008

This space intentionally left blank

Xeom posted:

When multiplying a vec4(row) by a mat4 I don't see how your row matrix would lead to translation. Your w term in your vector would have all the translation information.

https://imgur.com/Y5ITil8

Seems to make sense to me, but clearly I'm missing something.

EDIT: written towards Suspicious Dish reading Xeros post now.

That excerpt says

quote:

Then we could rotate and then translate a point v to compute a new point v' by

v' = vRT
meaning it's using the row vector math convention when applying transforms. If you're instead using the column vector math convention where you'd do v' = TRv you need to transpose those matrices for the same result

The row vector style is not typical in GL and is not used in any of the GL documentation. Also, and this is just my personal opinion, it's a lovely pile of unintuitive garbage that should be set on fire and shot into the sun.

Xeom
Mar 16, 2007
I did understand the difference between row and column vectors, but I totally forgot that GLSL was doing the math assuming a column vector. Currently my shader is setup as if GLSL did row vector multiplication. Somehow it all worked out because everything seems fine in my test program. I'll have to figure out exactly WHY it worked at a latter time. At least I can go about fixing everything now. Bugs and math can be really weird sometimes.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
If you do vec * mtx in GLSL, it will type-cast the vector to a row-vector, so it's effectively the same as transpose(mtx) * vec. Your two mistakes canceled each other out.

Xeom
Mar 16, 2007
:goofy:

MrMoo
Sep 14, 2000

I am having a bad time with alpha blending because I have no idea what I am doing,

So, I have MSDF fonts on a curve which look ok:



I have an outline shader because I don't really want to re-encode the fonts with an SDF channel, and I have managed to get anti-aliasing to some degree on both the inside and outside:



However adding a drop-shadow shader highlights the failures in processing the anti-aliasing applied at the outside of the outline.



Where there should be a lerp between the shadow and the outline there is a black halo, which could be taken as artistic effect, but would like to address.

So, one of the ugly shaders for creating the shadow:
code:
uniform sampler2D tDiffuse;
uniform vec3 shadow_color;
uniform float shadow_width;
uniform float shadow_direction;
varying vec2 vUv;

void main() {
  vec2 u_textureRes = vec2(float(textureSize(tDiffuse, 0).x), float(textureSize(tDiffuse, 0).y));

  vec2 testPoint = vec2(shadow_direction * shadow_width/u_textureRes.x, shadow_width/u_textureRes.y);
  testPoint = vUv + testPoint;
  float shadowAlpha = texture( tDiffuse,  testPoint ).a;

  vec4 shadow = vec4(shadow_color, shadowAlpha);
  vec4 texel = texture2D(tDiffuse, vUv);
  if(texel.a < 1.0) {
//            gl_FragColor = shadow;
    gl_FragColor = mix(shadow, texel, texel.a);
  } else {
    gl_FragColor = texel;
  }
}
If I use gl_FragColor = shadow; it highlights the leak coming from the inside anti-aliased border,



If I remove the anti-aliasing on the outline then I can get a correct aliased shadow, but the aliasing is pretty bad on the thin typeface at such a low resolution.

I made the outline shader a bit worse than when it started,

code:
uniform sampler2D tDiffuse;
uniform vec3 outline_color;
uniform float outline_width;
varying vec2 vUv;

#define PI 3.14159265359
#define SAMPLES 32

void main() {
  vec2 u_textureRes = vec2(float(textureSize(tDiffuse, 0).x), float(textureSize(tDiffuse, 0).y));
  
  float outlineAlpha = 0.0;
  float angle = 0.0;
  for( int i=0; i<SAMPLES; i++ ){
    angle += 1.0/(float(SAMPLES)/2.0) * PI;
    vec2 testPoint = vec2( (outline_width/u_textureRes.x)*cos(angle), (outline_width/u_textureRes.y)*sin(angle) );
    testPoint = vUv + testPoint;
    float sampledAlpha = texture( tDiffuse,  testPoint ).a;
    outlineAlpha = max( outlineAlpha, sampledAlpha );
  }
//        gl_FragColor = mix( vec4(0.0), vec4(outline_color, 1.0), outlineAlpha );
  
  vec4 texel = texture2D(tDiffuse, vUv);
//          gl_FragColor = mix(gl_FragColor, texel, texel.a);
  
  vec4 outline = vec4(outline_color, outlineAlpha);
  if(texel.a < 1.0) {
//            gl_FragColor = outline;
    gl_FragColor = mix(outline, texel, texel.a);
  } else {
    gl_FragColor = texel;
  }
}

MrMoo fucked around with this message at 18:47 on Aug 22, 2020

Zerf
Dec 17, 2004

I miss you, sandman
If I understand your problem correctly, you don't actually want to use lerp at all, because mixing red and blue doesn't make sense for the middle values.

You might be looking for the over operator found here: https://en.wikipedia.org/wiki/Alpha_compositing

I.e. "out = outline + shadow * (1-outline.a);"

MrMoo
Sep 14, 2000

This saturates the shadow, although it does cleanup the halo for the most part.



Changing the outline shader from lerp/mix to the over function fixes the remaining halo issues,



So now all remains is the colour itself.

MrMoo fucked around with this message at 19:54 on Aug 22, 2020

MrMoo
Sep 14, 2000

Ironically the halo works perfectly for Cincinnati,

Xeom
Mar 16, 2007
After those interesting posts I'm going to post a beginner question that is completely boring.

I'm making a font texture atlas using Freetype2. Everything seems to be working well and I can get a png out with the exact results I want, but something goes completely wrong when I try to load it into opengl. The best way I can describe it is that the texture becomes skewed and compressed. Funnily enough I can load the png I saved into that same quad with the same VAO and shader and it looks completely fine.

code:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RED, width, height, 0, GL_RED, GL_UNSIGNED_BYTE, mem);
Is there something special I should be doing when working with a single byte per pixel image?
I tried vec4(1,1,1,texture(blah,blah).r), but it doesn't seem to help.

haveblue
Aug 15, 2005



Toilet Rascal
Skewed image usually means the row length is wrong, are you calling glPixelStore?

Xeom
Mar 16, 2007

haveblue posted:

Skewed image usually means the row length is wrong, are you calling glPixelStore?

Thank you for the help with the stupid questions.
Everything looks good, just gotta flip it now.

Xeom
Mar 16, 2007
I'm rendering font, but it seems to be taking a long time to render even with a texture atlas. Printing the string "The quick fox jumped over the brown fence" is taking about 0.1 to 0.05 milliseconds, which seems like a long time for this sort of things. Right now I'm using a texture atlas and using glBufferSubData to update the texture coordinates for each character printed. I'm also using glUniform to provide updates to a projection,view, and model matrix. Only the model matrix gets updated per character. I'm guessing updating the texture coordinates is what is taking so long, but I'm not quite sure what to do. Should I just build a VAO for each character and just switch between those?

I originally switched to the texture atlas to avoid switching between textures, but I guess updating a VBO is worse.

Doc Block
Apr 15, 2003
Fun Shoe
You don’t need to change the model matrix per character. Just output a VBO of 2D textured quads with the vertex coordinates in screen space, and then draw the whole thing at once.

Depending on how you do it, you don’t even need to multiply by a modelViewProjection matrix in your shader; just pass the view width & height, and use that to convert screen space coords into NDC coords (aka -1.0 to 1.0)

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
I'm porting something from D3D11 to OpenGL ES 2. D3D11 pretty much eliminated the standard attribute bindings, and the D3D version just produces screen coordinates from a generic attribute. I know GLES2 has generic attributes, but will the draw succeed if nothing is ever specified via glVertexPointer?

Xerophyte
Mar 17, 2008

This space intentionally left blank
I was going to ask a question about how best to do asynchronous, progressive compute work in vulkan/modern GPU frameworks when I want to continually display the work in progress, but I think that in typing it out I managed to figure out what the best approach -- well, an approach, at least -- would be. Thank you, pseudonymous rubber duck collective.

However, it did make me come up with another, related question: what is the current state of atomics in GPU land? I was planning on accumulating path tracer samples by using atomic increments to scatter, which I expect would be helpful in a wavefront path tracer as I'll probably be grouping work by the rays' Morton order instead of the source pixel. However, if I understand correctly base vulkan only offers atomic writes for single int values, and floats are a very recent Nvidia-only extension. Do people just do it with floatBitsToInt and atomicExchange & co? Are atomics currently a thing to avoid outside of very specific and limited cases?

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today
Contended global atomics are very slow. I've had good results from using subgroup operations to do one atomic op per subgroup, though.

Xerophyte
Mar 17, 2008

This space intentionally left blank

Ralith posted:

Contended global atomics are very slow. I've had good results from using subgroup operations to do one atomic op per subgroup, though.

This lead me down a rabbit hole of looking at the subgroup stuff from 1.1 which I was completely unaware existed; I'm not very current or good with GPU framework stuff which is why I started this litte hobby project. Thanks! I noticed that the 1 atomic/subgroup was exactly what the subgroup tutorial recommends too. I expect the subgroup operations will be very useful for stuff like sampling, since I can make the subgroup collectively vote on the BRDF to sample which should be efficient. Unfortunately I don't think I can boil down path tracing sample accumulation to scan local group + one atomic op in that way.

The problem with GPU pathtracing has always been that it's incoherent: paths started in nearby pixels will very quickly diverge and veer off into paths that access completely different parts of the scene. Most GPU path tracers deal with this by doing wavefront tracing. Generate a lot of subpath rays, sort them by their position and direction, dispatch work according to that order so the local work group always access the same region of the scene. The problem with that is that now the local work group will include paths with vastly different origin pixels instead, and writing any new samples is a big incoherent scatter write. I expect I can deal with that by just sorting the samples back into image space buckets or something like that, it'll just be a little more annoying than just atomically adding them to the target accumulation storage image immediately when I have them.

Adbot
ADBOT LOVES YOU

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today
If the target locations are effectively random, contention might not be too big an issue, though I suppose that's scene dependent.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply