3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

«‹›2 »

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

I think glVertexAttribPointer converts values to floats. Try glVertexAttribIPointer.

# ¿ Nov 4, 2015 07:54

Adbot: ADBOT LOVES YOU

# ¿ Apr 29, 2024 12:05

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Handedness is simple. Pick a hand, stick your thumb out to the side, index finger parallel with your palm, and middle finger in the direction your palm is facing. If you can orient your hand to match the positive axes, then that's the handedness of the coordinate system.

# ¿ Nov 23, 2015 08:28

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Doc Block posted:

You can do that with either hand, though...

Er, sorry, I forgot to mention that your thumb is X, index finger Y, middle finger Z.

Doc Block posted:

But yeah, the general consensus seems to be that the coordinate system in the image I posted is left handed, which leaves me wondering why I'm seeing people say OpenGL's coordinate system is right handed, since IIRC in OpenGL +Z is going into the screen.

IIRC the old fixed-function OpenGL/GLU convenience functions took inputs in right-handed coordinates, while the low-level device space has always been left-handed. Or maybe it's the other way around. The confusion arises from them differing, so when you say "OpenGL's coordinate system" nobody really knows what you're talking about.

# ¿ Nov 23, 2015 08:43

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

To be clear, if you're using modern OpenGL and constructing matrices yourself, you're working directly in device space. The GLU functions (and many libraries that imitate them) constructed their matrices such that vertex data would be reflected out of the other space.

# ¿ Nov 23, 2015 08:47

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Suspicious Dish posted:

By default, GL defines clip space as being right-handed, but, again, this is just a function of the near and far planes in clip space. You can change it with glDepthRange to flip the near and far planes around, which has existed since day 1, no extension required.

OpenGL documentation posted:

After clipping and division by w, depth coordinates range from -1 to 1, corresponding to the near and far clipping planes.

Huh? This sounds like left-handed coordinates. Or is +y down?

Suspicious Dish posted:

I've never actually heard of or considered backface culling to be about handed-ness, but I can see your point. You can change it with glFrontFace, as usual.

It's not that backface culling is about handedness, it's that winding changes when you reflect geometry.

# ¿ Nov 23, 2015 20:05

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Xerophyte posted:

Well, it kinda is. The direction of the geometry normal is defined by the triangle's vertex order, and the normal can be either pointing out from a counter-clockwise rotation (a right-handed system) or pointing out from a clockwise rotation (a left-handed system). Facing is a type of handedness in that sense. Mirror transforms reverse the direction of the winding and therefore also the direction of the geometry normal. This is mostly academic and you're not going to find an API asking you to select your primitive handedness or anything.

I work on a component called "Raytracing API Development" on the org chart so my view of a graphics API is probably less hardware-oriented than most this thread. We don't have to worry about some of the handedness issues of a raster API -- our projection transform is an arbitrary user-defined function that maps from image raster positions to rays -- and I'll happily admit that my knowledge on exactly how clip space works in GL is fuzzy at best.

The normal of a triangle is determined based on winding, as specified by glFrontFace. If the matrices you transform your vertex data by involve reflection, then you'll need to account for this in your winding. Handedness is irrelevant, you just need to be sure your winding is as intended at the final stage.

# ¿ Nov 24, 2015 10:03

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

I think the trick is that the dev-to-efficient-game-engine time ends up being much shorter in turn.

# ¿ Dec 10, 2015 20:38

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

If anything, Vulkan represents a drastic increase in accessibility--no longer will we be so dependent upon proprietary "this is how to make the driver happy" incantations.

# ¿ Dec 19, 2015 02:04

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

mobby_6kl posted:

Ok stupid question time because I'm obviously not able to figure this out on my own: How do I convert between the coordinates on the screen/viewport and the X/Y at a particular z depth in the scene? I found some information that mostly focuses on identifying objects in the scene, so the solution involves intersecting rays and crap that I obviously don't need - my goal is to place an object at, say, z=-5 (away from the camera) so that it appears at a particular x,y in the viewport.

The graphics card is doing a specific sequence of well-defined linear transformations on the vertex data you upload to decide which pixels to color. You basically just need to run those backwards, which is generally easy because you can just invert the matrices. Depending on how you're doing your rendering, you might either already be intensely familiar with those transformations due to having written them yourself in shaders, or you might need to dig the model/view/projection matrices out of a monolithic engine somewhere.

# ¿ Mar 26, 2016 19:07

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

If you really want to learn a low level graphics API these days you should probably just go straight to Vulkan. It's big and complicated and the drivers aren't super stable yet, but at least it makes sense.

# ¿ Dec 14, 2016 00:25

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Uploading texture data to the GPU in parallel to rendering is a good idea (if you map the memory you can even avoid going through userspace system memory first) and, executed correctly, will provide a much better user experience. Precisely controlling parallelism is much easier with Vulkan than GL, though; in GL you may find yourself relying on the driver correctly guessing what your intention is.

# ¿ Feb 12, 2017 01:21

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Colonel J posted:

I'm trying to finally wrap up my Master's degree, and I'm looking for a scene to try out the technique I'm developing; surprisingly finding a good free scene on Google is a pretty awful process, and I'm not having much luck finding good stuff. I'm looking mainly for an interior type scene, such as an apartment, ideally with a couple rooms and textures / good color contrast between the different surfaces (I'm working on indirect illumination).

Can you share anything about your work? I've been reading about realtime GI lately and am quite interested in new developments.

# ¿ Mar 13, 2017 07:09

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Colonel J posted:

Static lighting and environments then? I've been particularly interested in totally dynamic solutions; I've played with a very compelling demo based on Radiance Hints (Papaioannou, 2011), which uses a regular grid of "probes" computed live by sampling reflective shadow maps. It uses screen-space techniques to reduce leakage, which isn't perfect but reduces the incidence of visually obvious errors. Someday I want to try applying it to massive environments by using toroidally-addressed clipmaps for the radiance cache.

It surprises me a bit to hear that you can work in terms of individual SH coefficients for gradient-descent and error-bounding. Is that mathematically rigorous? I'm 100% prepared to believe it is, my grasp of SH math is pretty loose.

I wonder if you could improve your results by adjusting your error function. It sounds like you're currently minimizing the global error, but are dissatisfied with the results due to the visual impact of local errors. What if you defined your error function to be the maximum local error? I imagine this might require a more stochastic approach to gradient descent than you currently need, and likely quite a lot more CPU time, but it seems more consistent with your desired results.

I'm having trouble following why every probe has global influence on shading. That certainly seems like an issue, since then the time complexity of shading a single point is proportional to the size of the entire scene.

Ralith fucked around with this message at 22:35 on Mar 13, 2017

# ¿ Mar 13, 2017 22:33

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Xerophyte posted:

You could try a non-bitmap approach. The maximum quality solution is to render the TrueType splines directly but it's quite slow, challenging to handle filtering and aliasing, plus you have annoying corner cases for overlapping patches. I think at least some of our applications at Autodesk at least used to do fonts by switching between mipmapped bitmaps for small glyphs and tesselation for large glyphs; not really sure if that's still the case though.

Far as I know the current recommended approaches for both good quality and good performance are distance fields or median-of-3 distance fields, both of which effectively boil down to storing the glyphs as filterable, piecewise linear edge data in a low resolution texture. They can provide zoomable and alias-free text rendering at a single modest storage resolution. The drawback is that the fields can be somewhat tricky to generate, especially for the 3-channel median version. There are tools available to do the baking, I have no idea how easy they are to use.

I wouldn't recommend distance fields for a dynamic application. They're usually computed by rasterizing your vector data at very high resolution and then running a somewhat expensive preprocessing pass on that output to compute a low-resolution distance field. They're also both slower to render (even ignoring preprocessing!) and lower-quality than a good GPU-accelerated rasterizer, which doesn't even require super recent gpu hardware.

I think for 2D CAD purposes where aesthetics aren't super important people often just approximate text outlines with a series of lines that you can handle the same as any other lines. Definitely don't try to store data for every possible zoom level; that will force you to use discrete zoom levels and require more space than just storing the high-resolution version and using hardware downsampling as necessary.

# ¿ Apr 12, 2017 22:12

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Hyvok posted:

Just rendering the text as a mesh might be an option as well, I just saw some site recommend against that due to the amount of vertices you will need to handle. Which did sound a bit odd since you can have A LOT of vertices nowadays...

Yes, rendering text as geometry should work fine if you don't mind visible pointy bits when people zoom waaaaaay in.

Zerf posted:

Please elaborate on this. If we ignore preprocessing, rendering distance field fonts is just a plain texture lookup and some simple maths(which is essentially free because the texture lookup).

It surprised me too. I linked experimental data. There's discussion of implementation details as well.

haveblue posted:

How much unicode do you want to support? The occasional accented character is probably easy enough but if you want to get really into the weeds with composed characters and bidirectional writing and such you might want to considering letting the OS text service handle the whole thing and make you one finished texture per label. Rolling your own unicode-aware layout is not something to be entered into lightly.

Also, this. Correct text layout is really really hard. Make it someone else's problem if you possibly can. If you can't, you're going to have to go learn how to use HarfBuzz (which isn't really documented at all) or something similar.

Ralith fucked around with this message at 03:58 on Apr 14, 2017

# ¿ Apr 14, 2017 03:54

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Zerf posted:

Oh, I see, that's why. Thanks. On the other hand, here's an excerpt from the GLyphy Github repo:
So sure, if you are going to use SDF without computing it to a texture, it's going to be expensive. I still believe regular, texture-based SDF variants will be both simpler and faster than doing any other font rasterization on the GPU (but with the caveat that generating the texture is expensive and sampling artifacts can occur).

Ahah, good find! That explains it. I wonder if I can get the Pathfinder guy to contrast against msdfgen or similar as well.

# ¿ Apr 14, 2017 18:38

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

Hi this is not specifically for OpenGL/DX but I have a 3D geometry question. I'm looking for some info on how to implement a general algorithm that can take a 3d triangle mesh (closed/solid) and a Z value and return the 2d intersection of the 3d object with the plane at Z.

So far, I'm thinking to loop over every edge, and if the edge spans the given z value, calculate the intersection point with simple linear interpolation and then you can connect the dots sort of. But the trickier part seems to be knowing how to connect all these points, and knowing which polygons of the 2d cut are "holes".

The input data I have is in the form of a list of 3D points plus a list of indices of points for each triangle in the mesh.

Every triangle in the 3D mesh corresponds to 0, 1, or 2 vertices in the set of 2D polygons that represent the slice you're looking for. If the mesh is watertight, each vertex should lie on a multiple of two edges if the polygon it's associated with is nonempty. One approach would therefore be to loop over the set of triangles, compute the edge or vertex (if any) contributed by that triangle, and then compute the polygons by deduplicating vertexes. To determine which side of any given edge is "inside," just project the normal of the associated triangle onto your plane (or store your edges in a way manner that encodes the sidedness in the first place).

# ¿ Jul 11, 2017 20:14

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

How would you handle coplanar tris, or tri having a single edge coincident with the plane. Also, I'm thinking for the tris where only one point intersects the plane that those can be safely ignored?

If you need to handle degenerate solids, you can resolve ambiguity by labeling each vertex you generate with the edge or input vertex it arises from, then in the second pass only merge vertices that arise from the same edge or input vertex--or you could forego having two passes entirely and keep a lookup table while generating vertexes to ensure you never emit duplicates, and instead for each triangle generate either nothing, a single vertex, a single edge, a vertex and an edge, or two vertices and an edge.

Triangles that only intersect at a single point only need to be handled if you care about them. I don't know what exactly your application is, so I can't answer that for you, but if you're going to be re-generating a mesh of the object that you want to fit it pretty well then you'll probably want to retain them so that pointed shapes with an axis perpendicular to your planes don't get blunted. For sufficiently high plane density/low probability of an exact intersection this of course isn't necessary, but if you ignore the case entirely it'll make things fragile.

peepsalot posted:

I should probably also mention that this is intended to eventually re-mesh the whole 3d object, so that all the vertices in the new mesh are aligned with layers (similar to how 3d printing "slicer" programs work).
So another challenge is that I'd like to be able to determine which vertices connect between two different Z layers.

I'm not sure there's a trivial connectivity-based solution to this, because there's no limit on how complex the geometry between any two successive layers might be. I haven't studied this sort of problem much, but I'd start by trying to look purely at pairs of output layers and generate triangles that connect them in a way that makes sense, and seal off any gaps that remain. Or maybe some sort of search on the input mesh for a path between a pair of output vertices on two adjacent layers that isn't incident to any other output vertex?

Ralith fucked around with this message at 03:00 on Jul 12, 2017

# ¿ Jul 12, 2017 02:56

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Speaking of projection matrices, I learned about reversed Z projections not too long ago and got one working in Vulkan recently. Infinite far planes without Z fighting are fun! There's basically no downside AFAICT, if your hardware supports 32-bit float depth buffers (it should) and you aren't almost out of memory.

# ¿ Jul 15, 2017 06:45

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Jewel posted:

Same, only yesterday! Porting a AAA title and noticed they have a depth buffer that goes from 0 to 250000. Strangely they still use a normal one too but haven't seen how much use is on each.

Huh? The depth buffer should be 0-1. You get a really weird precision distribution otherwise.

# ¿ Jul 15, 2017 20:11

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Joda posted:

Don't APIs act real strange if you put the near-plane at exactly 0 though?

When your depth buffer is 0-1, 0 is what the near plane gets mapped to, not from. Don't put your near plane at 0.

Xerophyte posted:

Spontaneously, it seems like if you're using float32 depth storage then it makes sense to use a bigger range than [0,1] to make better use of the full range of the type. I have no idea if the different distribution of quantization points will interact badly with the reciprocal, I don't immediately see why it would though. Floats have logarithmic-ish spacing over the entire range (well, ignoring denormals).

I believe using a larger range will result in higher precision for closer things and lower precision for further things. If you play with this neat interactive chart, you can see that 0-1 alone is already way, way more than enough precision for any conceivable scene. For example, if your far plane is at infinity and you're using 0-1, the maximum depth error at any point in the first 100km is about +/-6mm, and in the first 1,000km is about +/-5cm. If you're rendering things 5cm apart a million meters away, you need to reexamine your LoD mechanism. If you have a finite far plane, the error is even smaller.

0-1 makes the math simple and there's practically no benefit to going for increased precision, and you might well end up with significant errors for astronomically distant stuff.

# ¿ Jul 16, 2017 20:11

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

I have this idea to define a sort of B-rep solid 3D model in a probabalistic way. Given a set of trivariate normal distributions(gaussian probability blobs in 3D space), I want to take a weighted sum of each probability distribution in the set, and render a contour surface wherever this sum equals some threshold.

Has this sort of thing been done? I'm thinking I'd need to do raymarching to find the surface?

Sounds like signed distance fields. Not probabilistic per se, but you define a scalar field in 3D space and define surfaces as points where the fields has value 0. It's popular in the demo scene, and you should be able to find lots of examples on shadertoy. I'm not an expert but I think rendering is usually done with raymarching, yeah.

# ¿ Oct 26, 2017 19:33

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

haveblue posted:

The mathematical name for that is implicit surface and while that wiki page is way above my pay grade it may give you some ideas to start with.

Signed distance functions are a strict subset of implicit surfaces, to be clear; they're particularly suitable for rendering since they make raymarching and computing good ambient occlusion trivial.

# ¿ Oct 26, 2017 19:55

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Xerophyte posted:

I'm not really that familiar with meshing algorithms. I don't believe SDFs can be meshed in in a better way than any other implicit surface there.

The fact that you can fairly reliably extract a normal means they're easier to mesh well than things that only give you in/out, at least. Obviously that's only relevant if you're using a more advanced meshing algo than marching cubes.

Xerophyte posted:

Anecdotally, I know the approach Weta took for meshing the various SDF fractals they used for Ego in Guardians of the Galaxy 2 was to do a bunch of simple renders of the SDF, feed those renders into their photogrammetry software, then get a point cloud, then mesh that. I don't think I'd recommend that approach, but apparently it's good enough for production.

That's hilariously hacky.

# ¿ Oct 30, 2017 01:00

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Absurd Alhazred posted:

I know you're not suppose to multiply too many primitives, but it is standard to use it for points -> billboards, right?

This is massive overkill, and I wouldn't be shocked if it actually performed slower than, say, instancing.

# ¿ Mar 27, 2018 05:16

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Absurd Alhazred posted:

So having most of the information in the instance variables and applying it to a single quad is better than having the same number of vertices and using a geomtry shader to expand them into quads?

That's my intuition, yes. Geometry shaders are a big hammer, for when there's no other viable approach, i.e. when your geometry is actually unpredictable in advance. Just turning them on may make your driver pessimize the whole pipeline.

The only way to be sure is to find or build some benchmarks that model your usecase on your hardware, of course.

# ¿ Mar 27, 2018 05:42

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

schme posted:

I have failed to find what this is about :
code:
vec3 thing = (0., 1., time);
It seems to just make thing equal to the rightmost thing, but I've no idea why that syntax works or what it does. Found it while removing a function in Kodelife.. My google-fu is weak.

You're using the comma operator, which evaluates to the right-hand value. The other two are being discarded.

# ¿ Mar 27, 2018 20:40

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Side-effects are rarer in GLSL, so it's even more esoteric a pattern, but yeah the semantics are the same.

# ¿ Mar 27, 2018 22:54

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Absurd Alhazred posted:

WMR's native API is a mess. Unless they've changed things significantly, you need to have a managed Windows app just to access it directly. Your best bet is to use SteamVR, instead, which doesn't have the same issues. There's Windows Mixed Reality for Steam VR, they're all free as far as I know (once you have the WMR set).

The SteamVR API is a disaster zone itself, actually, though maybe for different reasons. Did anybody not screw up their VR APIs?

At this point, I've put off future VR development until OpenXR comes out. I have a lot of respect for the Unity/Epic engineers who were faced with the task of wrapping SteamVR up into something approximately reliable.

# ¿ Sep 30, 2018 07:56

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Absurd Alhazred posted:

I mean, at least SteamVR lets you just use it in C++ with OpenGL, DirectX, or Vulkan. What don't you like about it? I do have more experience with Oculus, I will admit.

Man, I have a whole list somewhere, but I'm not sure where I left it. The short story is that it's practically undocumented and rife with undefined behavior and myopically short-sighted design decisions. Yeah, it's nice that it isn't opinionated about the other tools you use, but that's just baseline sanity--a standard which it otherwise largely fails. Their own headers get their ABI totally wrong in places.

# ¿ Oct 1, 2018 05:49

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Absurd Alhazred posted:

Might they have gotten better since you last used them? I just added a new feature to our code last week using a more recent addition to the API, and it was mostly painless.

Some of the new APIs are better thought out--for example, the SteamVR Input stuff is basically a preview of OpenXR input, and while they insisted on injecting some weird idiosyncrasies it's still pretty good--but the core stuff is as garbage as it ever was, and they're ignoring the issues.

# ¿ Oct 2, 2018 23:53

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

I dunno what graphics API you're using, but have you correctly specified the memory dependency between that shader and whatever previously wrote to Texture?

# ¿ Oct 8, 2018 08:07

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Only way to be sure is by testing, especially on diverse hardware. Note that the maximum varies considerably.

# ¿ May 28, 2019 22:12

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Rhusitaurion posted:

Many months later, I actually ended up doing something like this, using Vulkan, but I'm wondering if there's a better way than what I've done.

For each object, I allocate one large buffer that will contain the input vertices, space for computed vertices/indices, an indirect draw struct, and another SSBO with a vertex counter. Then, on each frame for each object
1. In one command buffer (not per-object), reset the vertex and index counters to 0 with fill commands
2. In another command buffer, dispatch the compute shader. It operates on the input vertex buffer and atomically increments the vertex and index count SSBOs to get indices of the output buffers to which to write vertices and indices.
3. In another command buffer, do an indirect draw call.

Then I submit the 3 command buffers with semaphores to make sure that they execute in the order above. The first submission also depends on a semaphore that the draw submission from the last frame triggers.

This seems to work fine (except when I hard lock my GPU with compute shader buffer indexing bugs), but I'm wondering if I'm doing anything obviously stupid. I could double-buffer the computed buffers, perhaps, but I'm not sure if it's worth the hassle. I thought about using events instead of semaphores, but 1. not sure if it's wise to use an event per rendered object and 2. can't use events across queues, and compute queue is not necessarily the same as graphics queue.

Thoughts?

No comment on the abstract algorithm, but there are a few technical errors here.

First, you don't need three command buffers. If you're only using a single queue, which is probably the case, you only need one command buffer for the entire frame. Semaphores are only used for synchronizing presentation and operations that span multiple queues. Note that you don't need to use a dedicated compute queue just because it's there; the graphics queue is guaranteed to support compute operations, and for work that your frame is blocked on it's the right place. Events definitely aren't appropriate. What you need here is a memory barrier between your writes and the following reads, and between your reads and the following writes. Without suitable barriers your code is unsound, even if it appears to work in a particular case.

Second, maybe I misunderstood but it sounds like you're zeroing out memory, then immediately overwriting it? That's not necessary.

Third, a single global atomic will probably serialize your compute operations, severely compromising performance. Solutions to this can get pretty complex; maybe look into a parallel prefix sum scheme to allocate vertex space.

A separate set of buffers per frame is a good idea, because it will allow one frame's vertex state to be pipelined with the next frame's compute stage.

Ralith fucked around with this message at 04:10 on May 8, 2020

# ¿ May 8, 2020 03:57

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Rhusitaurion posted:

Yeah I realize now that I didn't explain this well. The compute stage treats an indirect draw struct's indexCount as an atomic, to "allocate" space in a buffer to write index data in. That index data changes per-frame, so I have to re-zero the counter before each compute dispatch. There's also another atomic that works the same way for the vertex data that the indices index. Is there some other way to reset or avoid resetting these?

Oh, for some reason I read buffers where you wrote counters. Yeah, that makes sense.

[quote="Rhusitaurion" post=""504705630"]
Well, it's 2 atomics per object, but yeah, it's probably not great. Thanks for the pointer. I'll look into it, but it sounds complicated so the current solution may remain in place for a while...
[/quote]
Yeah, it's a whole big complicated thing, don't blame you for punting it. It'd be nice if there was reusable code for this somewhere, but reusable abstractions in glsl are hard.

# ¿ May 8, 2020 17:03

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Rhusitaurion posted:

Dumb question about memory barriers - this page says that no GPU gives a poo poo about VkBufferMemoryBarrier vs. VkMemoryBarrier. This seems to imply that if I use a VkBufferMemoryBarrier per object to synchronize reset->compute->draw, it will be implemented as a global barrier, so I might as well just do all resets, then all computes, then all draws with global barriers in between.

I don't have any special knowledge about implementation behavior (if you really want to know, you can go spelunking in AMD's or Intel's open source drivers), but I've heard similar things. You should be structuring things phase-by-phase like that regardless to reduce pipeline switching, of course.

Rhusitaurion posted:

But as far as I can tell, this is essentially what my semaphore solution is currently accomplishing, since semaphores work like a full memory barrier.

Semaphores introduce an execution dependency, not a memory barrier. You cannot use semaphores as a substitute for memory barriers under any circumstances. For operations that span queues you need both; for operations on a single queue, semaphores aren't useful.

# ¿ May 8, 2020 20:26

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Rhusitaurion posted:

I'm probably misinterpreting the spec here, but the section on semaphore signaling says that all memory accesses by the device are in the first access scope, and similarly for waiting, all memory accesses by the device are in the second scope. Granted it might not be the best way to do it, but it seems like relying on a semaphore for memory dependencies is allowed.

No, you're right, I misremembered. Using a semaphore as you were is not unsound, just unnecessary effort for extra overhead. Note that you do typically need explicit barriers when expressing inter-queue dependencies regardless, but that's for managing ownership transitions when using resources with exclusive sharing mode.

# ¿ May 9, 2020 05:43

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Rhusitaurion posted:

Got it. Thanks for the advice - I've switched over to a single command buffer with barriers, and it seems like it works. Not sure if I got the src and dst masks and whatnot correct, but the validation layers are not complaining, at least!

The validation layers mostly can't detect errors in barrier specification, unfortunately.

# ¿ May 9, 2020 18:08

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Contended global atomics are very slow. I've had good results from using subgroup operations to do one atomic op per subgroup, though.

# ¿ Oct 31, 2020 17:13

Adbot: ADBOT LOVES YOU

# ¿ Apr 29, 2024 12:05

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

If the target locations are effectively random, contention might not be too big an issue, though I suppose that's scene dependent.

# ¿ Nov 2, 2020 17:48

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

«‹›2 »