|
Anybody ever implemented Dual Contouring? I'm working on implementing it to meshify signed distance functions, and I've got it mostly working except for the quadratic error function stuff. All of the implementations I can find online (including the reference implementation from the paper) do stupid poo poo like type out every single individual multiply in every matrix operation, rather than using a library. I think I've worked out what that nonsense is doing, and translated it to use Eigen, but I don't really know anything about least-squares minimization or how this stuff is supposed to work: code:
If I just use the midpoint (e.g. "return mp") it at least looks symmetrical, but still kind of "boxy":
|
# ¿ Apr 8, 2019 04:08 |
|
|
# ¿ May 12, 2024 01:57 |
|
Nippashish posted:I don't know anything about dual contouring, but I think this line is wrong: Yup, looks like you're right: I misinterpreted what Eigen's SVD would give me - I thought V would come back as as V-transpose (for some reason), and I'd have to transpose it back. Thanks!
|
# ¿ Apr 9, 2019 02:51 |
|
I have a question about geometry shaders. I'm using them to generate 3D geometry from 4D geometry. For example: https://imgur.com/zD9J15J The way this works is I have a tetrahedral mesh that I send into the geometry shader as lines_adjacency (since it gives you 4 points at time - very convenient). There (and this is the sketchy part), I have a bunch of branchy code that determines if every tetrahedron intersects the view 3-plane, and emits somewhere between 0 and 6 (for the case where the whole tetrahedron is in-plane) vertices in a triangle strip. It's a neat trick, but it seems sketchy. I'm no GPU wizard, but my understanding is that geometry shaders are slow, and branchy shaders are slow. Additionally they don't seem to be supported in WebGL, or Metal. Is there any reasonable alternative for generating geometry that's dependent on transformed vertices? I could do this on the CPU, but I'd have to end up doing essentially all the vertex transforms there, which seems lovely. I could save a lot of work with some kind of BVH, but still. Compute shaders seem promising, but I think I'd have to send the transformed vertices back to the CPU to get the 4-to-many vertices thing.
|
# ¿ Nov 11, 2019 05:11 |
|
Hubis posted:Basically, have a shared memory array that is the size of your maximum possible triangles per dispatch, then compute your actual triangles into there and use atomic increment on a global counter to fetch and then increment a write offset into the output array by that amount. You are effectively reimplementing the GS behavior, but completely relaxing the order dependency. That makes sense, thanks! I've not really messed with compute shaders before, so I wasn't sure what is and isn't possible. I think this will be pretty doable, since the maximum number of vertices is not that much more than the number of input vertices. Suspicious Dish posted:Also worth pointing out that it becomes a lot easier if you generate indexed triangles instead of triangle strips, since you can just jam triangles through without guaranteeing that strips are similar. I think using indexed triangles would actually make some of the logic easier as well, so that's good to know. I'm thinking something like have a vertex buffer with 6 output vertices for every (4 choose 2) combination of a tetrahedron's vertices. Then also have an index buffer to connect up the ones that actually land in the view 3-plane, using Hubis's suggestion.
|
# ¿ Nov 12, 2019 01:48 |
|
Hubis posted:attempt at a Grand Unified Geometry Pipleine to fix both Geometry Shaders and Tessellation, but are still not broad platform. For what you need to do I would concur with the suggestion of using a computer shader that reads a vertex array as input and produces an index buffer as output. For optimal performance you might have to be creative by creating a local index buffer in shared memory and then appending it to your output IB as a single block (to preserve vertex reuse). Basically, have a shared memory array that is the size of your maximum possible triangles per dispatch, then compute your actual triangles into there and use atomic increment on a global counter to fetch and then increment a write offset into the output array by that amount. You are effectively reimplementing the GS behavior, but completely relaxing the order dependency. Many months later, I actually ended up doing something like this, using Vulkan, but I'm wondering if there's a better way than what I've done. For each object, I allocate one large buffer that will contain the input vertices, space for computed vertices/indices, an indirect draw struct, and another SSBO with a vertex counter. Then, on each frame for each object 1. In one command buffer (not per-object), reset the vertex and index counters to 0 with fill commands 2. In another command buffer, dispatch the compute shader. It operates on the input vertex buffer and atomically increments the vertex and index count SSBOs to get indices of the output buffers to which to write vertices and indices. 3. In another command buffer, do an indirect draw call. Then I submit the 3 command buffers with semaphores to make sure that they execute in the order above. The first submission also depends on a semaphore that the draw submission from the last frame triggers. This seems to work fine (except when I hard lock my GPU with compute shader buffer indexing bugs), but I'm wondering if I'm doing anything obviously stupid. I could double-buffer the computed buffers, perhaps, but I'm not sure if it's worth the hassle. I thought about using events instead of semaphores, but 1. not sure if it's wise to use an event per rendered object and 2. can't use events across queues, and compute queue is not necessarily the same as graphics queue. Thoughts? Rhusitaurion fucked around with this message at 00:16 on May 8, 2020 |
# ¿ May 7, 2020 23:50 |
|
Ralith posted:First, you don't need three command buffers. If you're only using a single queue, which is probably the case, you only need one command buffer for the entire frame. Semaphores are only used for synchronizing presentation and operations that span multiple queues. Note that you don't need to use a dedicated compute queue just because it's there; the graphics queue is guaranteed to support compute operations, and for work that your frame is blocked on it's the right place. quote:Events definitely aren't appropriate. What you need here is a memory barrier between your writes and the following reads, and between your reads and the following writes. Without suitable barriers your code is unsound, even if it appears to work in a particular case. quote:Second, maybe I misunderstood but it sounds like you're zeroing out memory, then immediately overwriting it? That's not necessary. quote:Third, a single global atomic will probably serialize your compute operations, severely compromising performance. Solutions to this can get pretty complex; maybe look into a parallel prefix sum scheme to allocate vertex space.
|
# ¿ May 8, 2020 15:02 |
|
Dumb question about memory barriers - this page says that no GPU gives a poo poo about VkBufferMemoryBarrier vs. VkMemoryBarrier. This seems to imply that if I use a VkBufferMemoryBarrier per object to synchronize reset->compute->draw, it will be implemented as a global barrier, so I might as well just do all resets, then all computes, then all draws with global barriers in between. But as far as I can tell, this is essentially what my semaphore solution is currently accomplishing, since semaphores work like a full memory barrier. Is that post full of poo poo, or can I use VkBufferMemoryBarriers as they seem to be intended, i.e. to provide fine-grain synchronization? Rhusitaurion fucked around with this message at 19:46 on May 8, 2020 |
# ¿ May 8, 2020 19:01 |
|
Ralith posted:Semaphores introduce an execution dependency, not a memory barrier. You cannot use semaphores as a substitute for memory barriers under any circumstances. For operations that span queues you need both; for operations on a single queue, semaphores aren't useful. I'm probably misinterpreting the spec here, but the section on semaphore signaling says that all memory accesses by the device are in the first access scope, and similarly for waiting, all memory accesses by the device are in the second scope. Granted it might not be the best way to do it, but it seems like relying on a semaphore for memory dependencies is allowed.
|
# ¿ May 8, 2020 20:57 |
|
|
# ¿ May 12, 2024 01:57 |
|
Ralith posted:No, you're right, I misremembered. Using a semaphore as you were is not unsound, just unnecessary effort for extra overhead. Note that you do typically need explicit barriers when expressing inter-queue dependencies regardless, but that's for managing ownership transitions when using resources with exclusive sharing mode. Got it. Thanks for the advice - I've switched over to a single command buffer with barriers, and it seems like it works. Not sure if I got the src and dst masks and whatnot correct, but the validation layers are not complaining, at least!
|
# ¿ May 9, 2020 17:36 |