3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

«‹›74 »

Minsky: May 23, 2001

Tres Burritos posted:

So the deal with SPIRV is that you'll compile your shader language to SPIRV, which is then what the GPU runs, right? This'll allow people to come up with new languages to run on GPUs? So maybe academic types will come up with something that works really well for astronomy computations (or something) and then they can write code in that instead of GLSL or whatever? Is this where the "new generation graphics and compute API" comes in?

I don't really "get" SPIRV.

Yes, drivers consume SPIRV exclusively as their input to describe shader code that they then internally translate to their proprietary instruction set. It's a much less error-prone method (though not trivial) than parsing GLSL code first and then doing the same thing, which is how OpenGL works.

Your first point is correct that it does let you basically write your shaders in whatever high-level language you want, but it doesn't necessarily allow some magical quantum leaps of computational capability since every language is still at the mercy of the SPIRV instruction set* which is at the mercy of how current GPUs work. It's more basically a Vulkan equivalent of HLSL bytecode, but that's not insignificant: 1. It lets you develop in whatever high-level language you want, 2. it lets you compile it offline and ship a lighter-weight binary, 3. Because it's easier to translate SPIRV than to compile a high-level language, there is less likely to be driver bugs, 4. You can run your SPIRV output through an optimizing compiler and in theory get those same optimizations on every HW platform (I think I saw an open source project that pipes SPIRV through the LLVM compiler)

Sex Bumbo posted:

Minsky, have you been using the reference glslang tool or something else?

I've used the reference glslang tool and other internal tools.

* With the exception that it's non-proprietary and is fully extendable by HW vendors like OpenGL is

Minsky fucked around with this message at 18:12 on Feb 17, 2016

# ? Feb 17, 2016 18:07

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 19:05

Sex Bumbo: Aug 14, 2004

A bunch of mobile drivers do like zero glsl optimizations, it's awful.

# ? Feb 17, 2016 18:12

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

The worst thing about GLSL was that it had no compliance suite. Well, it had a compliance suite if you're part of Khronos, but no vendor paid attention to it, citing backwards-compatibility concerns.

I've dealt with enough dumb loving GLSL implementation bugs and I'm just so done with it. I'm happy SPIR-V has tighter semantics and a conformance suite that matters.

# ? Feb 17, 2016 18:19

Sex Bumbo: Aug 14, 2004

Minsky posted:

In any case I've basically been breathing Vulkan for about six months now, so if there's technical questions about the API I can try to answer them.

I'm noticing a typical pattern of:

vkDeviceWaitIdle
vkAcquireNextImageKHR(...semaphore...)
(add stuff to queue)
vkQueueSubmit(queue, semaphore, etc)
vkQueuePresentKHR(queue)
vkQueueWaitIdle(queue) (x2) (bug?)
vkDeviceWaitIdle

from the Sascha Willems examples and others.

* I'm not sure where vkAcquireNextImageKHR documentation is supposed to be but I assume it's a nonblocking call that will signal the semaphore when the image index is actually available? And then queueSubmit waits on the semaphore before actually submitting? Just kind of guessing here.

* Why does vkDeviceWaitIdle need to be called? Furthermore why do the waits need to be called twice?

* vkQueueWaitIdle waits for the submitted command lists to finish, right? But why wait for it to finish? Shouldn't it be okay to cram multiple frames of poo poo into it? In DX12 you can create multiple command lists and reuse them, and you only need to make sure the command list isn't being used before resetting it -- not wait for the whole queue to flush out.

# ? Feb 19, 2016 01:35

pseudorandom name: May 6, 2007

vkAcquireNextImageKHR documented on page 534 where it clearly says include::../VK_KHR_swapchain/wsi.txt.

# ? Feb 19, 2016 02:35

The_Franz: Aug 8, 2003

pseudorandom name posted:

vkAcquireNextImageKHR documented on page 534 where it clearly says include::../VK_KHR_swapchain/wsi.txt.

https://github.com/KhronosGroup/Vulkan-Docs/blob/1.0-VK_KHR_swapchain/doc/specs/vulkan/chapters/VK_KHR_swapchain/wsi.txt

The formatting isn't pretty, but the info is there.

# ? Feb 19, 2016 02:57

Sex Bumbo: Aug 14, 2004

Okay so queue submit says

quote:

If the pname:waitSemaphoreCount member of slink:VkSubmitInfo is not zero,
then pname:pWaitSemaphores is a pointer to an array of
pname:waitSemaphoreCount sname:VkSemaphore handles which will be waited on
before any further work is performed by the queue.

I also found https://github.com/KhronosGroup/Vulkan-Docs/blob/7e195e846e9b3d7b91c2fe57406a97288261fc5e/doc/specs/vulkan/man/vkAcquireNextImageKHR.txt
which says the semaphore is

quote:

A VkSemaphore that will become signaled when the presentation engine has released ownership of the image.

I don't really get this ordering though... When you acquire the the next image, it's not owned by the presentation engine, right? After all the wsi doc says:

quote:

7) Does vkAcquireNextImageKHR() block if no images are available?
RESOLVED: The command takes a timeout parameter. Special values
for the timeout are 0, which makes the call a non-blocking
operation, and UINT64_MAX, which blocks indefinitely. Values in
between will block for up to the specified time. The call will
return when an image becomes available or an error occurs. It
may, but is not required to, return before the specified timeout
expires if the swapchain becomes out of date.

which seems to imply that the presentation engine can block if there aren't any images available. I assume the semaphore never gets signaled if you pass a timeout of 0 E: and it fails?

But more importantly, the queue submission sounds like it waits on the semaphore after it submits its work -- this sounds backwards. Like, in DX12 if I had two swapchain buffers I'd do something simple like

Submit frame 0
add fence value 0

Submit frame 1
add fence value 1

wait on fence value 0
(frame 0's list is finished at this point)
submit frame 2
add fence value 2
etc

Vulkan samples seem to be like

get frame index 0
submit command list frame 0
- but also make the command list wait for the frame to present itself
wait for the command list to finish
wait for all command lists to finish
wait for godot
wait for heat death of universe

Are the samples just being dumb?

Sex Bumbo fucked around with this message at 03:08 on Feb 19, 2016

# ? Feb 19, 2016 03:03

The_Franz: Aug 8, 2003

Sex Bumbo posted:

Vulkan samples seem to be like

get frame index 0
submit command list frame 0
- but also make the command list wait for the frame to present itself
wait for the command list to finish
wait for all command lists to finish
wait for godot
wait for heat death of universe

Are the samples just being dumb?

It probably generates cleaner traces if you wait for the pipeline to be flushed after every frame. Obviously in a real application you would want to keep the pipeline full so the GPU isn't sitting idle.

# ? Feb 19, 2016 03:32

pseudorandom name: May 6, 2007

The_Franz posted:

https://github.com/KhronosGroup/Vulkan-Docs/blob/1.0-VK_KHR_swapchain/doc/specs/vulkan/chapters/VK_KHR_swapchain/wsi.txt

The formatting isn't pretty, but the info is there.

# ? Feb 19, 2016 04:24

Minsky: May 23, 2001

Sex Bumbo posted:

I'm noticing a typical pattern of:

vkDeviceWaitIdle
vkAcquireNextImageKHR(...semaphore...)
(add stuff to queue)
vkQueueSubmit(queue, semaphore, etc)
vkQueuePresentKHR(queue)
vkQueueWaitIdle(queue) (x2) (bug?)
vkDeviceWaitIdle

from the Sascha Willems examples and others.

* I'm not sure where vkAcquireNextImageKHR documentation is supposed to be but I assume it's a nonblocking call that will signal the semaphore when the image index is actually available? And then queueSubmit waits on the semaphore before actually submitting? Just kind of guessing here.

So, let me start with by saying I think the 1.0 spec text is somehow screwed up at the moment. My go-to reference is the Vulkan 1.0 + WSI Extensions version because it mashes all the presentation extensions into the main spec, so you can look at everything in one place, but for some reason in the 1.0 version all the detail about the swapchain extension functions is missing. Maybe it's because they removed all the extension-specific language into their respective extension documents, but they still left all the examples and the Issues discussion in the main spec, so that doesn't really make sense. I also can't stand navigating the asciidoc source directory labyrinth, so I have to do this from memory and from what I can infer from what's there in that spec.

To answer your question: I think that the acquire function may block if you call it enough times to acquire more swap chain images than you have created without queuing presents for any of them. My understanding is that what it conceptually does is it removes a swapchain image that is "available" to be acquired and hands it back to you for use (I leave it in quotes because I'm not exactly sure what constitutes availability other than not having been acquired before since it was last presented), and at the same time directs the present queue to signal the OS VkSemaphore object once the GPU is finished presenting to that image, and future command buffers waiting in the queue may start executing writes to it (a VkSemaphore is an OS-level inter-queue synchronization object at command buffer granularity). The driver is free to immediately start building command buffers that write to that acquired image (and may even do so in parallel with multiple images), but it must enqueue a wait on that acquired semaphore before executing any of those command buffers. There is also a similar semaphore on the backend prior to presenting to wait for the rendering to complete. Example 3 is a great simple example of it in that spec I linked under the VK_KHR_swapchain section.

As you're probably aware since you work with DX12, there is a concept of asynchronousness here: you are given the next image handle immediately and you are free to start referencing it in command buffers, but that image might still be visible on your monitor. Once it is time to submit those new command buffers that overwrite the image when they hit the GPU, the OS semaphores are there to stall the cmdbufs from reaching the GPU before the image leaves the display. I am also abstracting things that happen under the hood; there are sometimes copies involved depending on what windowing system you are running.

Sex Bumbo posted:

* Why does vkDeviceWaitIdle need to be called? Furthermore why do the waits need to be called twice?

To answer your first question: I haven't looked at the code, but I am guessing that the vkDeviceWaitIdle is just done out of laziness because these are just basic examples to get people up and running. That function, vkDeviceWaitIdle, does what it says: it blocks the CPU until all queues are idle (in that logical VkDevice), and is about as heavy-handed as it sounds. It is the Vulkan equivalent of glFinish(). More graceful synchronization is recommended, but is of course more complicated.

Probably the main reason for the vkDeviceWaitIdle is because in Vulkan you are not allowed to reset the command buffers (or the command pools that allocated them) for redefinition with new contents until you can guarantee the GPU is done executing them. This also applies to resources other than command buffers that you may want to reuse and redefine between frames, like descriptor sets or buffers containing dynamic data. So it quickly becomes a real engineering task which can really distract someone who just wants to learn how to get a triangle on screen. That's my guess at least regarding the motivation. As you sound like you're well aware, there are very few good reasons to do a pure vkDeviceWaitIdle.

Regarding calling vkQueueWaitIdle twice: it's certainly redundant if the queue handle is literally the same, but I think there's a bit more to it than that. In Vulkan I think it's required that there must be at least one queue type ("family index") that can support graphics or compute operations, and one queue type that can support presents if that extension is supported, but those may not necessarily be the same queue type (take this part with a grain of salt becaus I can't double-check right now -- it used to be true, but this "freedom" may have since been removed). To figure out which queue type supports presents, you need to call uhh... some function I forget when enumerating physical devices. The first example in the spec I linked shows it though. The point is that if the sample is idling two queues, it may be that they intended to idle the graphics queue and the present queue but accidentally used the same handle for both.

quote:

* vkQueueWaitIdle waits for the submitted command lists to finish, right? But why wait for it to finish? Shouldn't it be okay to cram multiple frames of poo poo into it? In DX12 you can create multiple command lists and reuse them, and you only need to make sure the command list isn't being used before resetting it -- not wait for the whole queue to flush out.

Yes, the function vkQueueWaitIdle blocks the CPU until all command buffers submitted to that queue have finished executing. Like I mentioned above, my guess is that it's almost certainly there to handle waiting for the GPU to be finished with the previous frame's command buffers before redefining them. This is also required in Vulkan as it's illegal to reset a command buffer (or its pool) that is pending execution.

Your suspicion is correct in that doing it this way by idling the whole queue is heavy-handed, but I wouldn't read too much into it in simple sample apps like these.

Minsky fucked around with this message at 07:42 on Feb 19, 2016

# ? Feb 19, 2016 07:35

The_Franz: Aug 8, 2003

What happened to the VOGL derived debugger they were showing off months ago? Did it get pushed aside due to more urgent work or was it dumped in favor of Renderdoc?

The_Franz fucked around with this message at 15:38 on Feb 19, 2016

# ? Feb 19, 2016 15:30

Hubis: May 18, 2003; Boy, I wish we had one of those doomsday machines...

Sex Bumbo posted:

Also I disagree with this, OpenGL should be forgotten forever. It sucks enough that I'd rather use, I don't know, literally anything else if I need some trivial graphics task done.

I don't really get the push to keep people futzing around on DX11. I know it's going to still be supported, but even if I'm just dicking around at home, DX11 is so unsatisfying now. People aren't idiots, they can figure out how to initialize swap chains and descriptor tables and learn about the hardware at the same time. It's fun.

Well, OpenGL will be around forever for the same reason that OpenGL has been such a PITA to push forward -- there's a ton of heavily used software packages that rely on it and aren't going to be ported to something new, let alone something fundamentally different.

I kind of agree about DX11 -- it feels like a bad middle-ground between the "low-level" APIs of DX12/Vulkan and the more high-level wrapper engines. That being said, I definitely think it will be around for quite a long time (longer than DX9, which I think is only just now finally dying off) because there is a huge complexity gap between DX11 and DX12, and there aren't a lot of developers who are going to be able to take advantage of the benefits it confers in the near term.

Minsky posted:

Swap chains and descriptors are all fine and nice and easy to understand.

The thing that scares me most about looking at Vulkan code as someone who writes drivers is the responsibility for application to handle resource barriers. Meaning, if you render to a texture in one draw and then want to read from it in another draw, you the app developer have to manually put a barrier command there in between to make sure the first draw finishes and the relevant caches are synchronized. In OGL/DX, the driver would detect all of this for you.

This puts more control in your hands, but it also can introduce a lot more hardware-specific errors that you may not be aware of if you choose to primarily develop on a particular hardware vendor's GPU that happens to have coherent caches between those two kinds of operations. Vulkan ships with a debug runtime to catch these kinds of mistakes, but it is probably not very mature just yet.

What I anticipate happening is that there will be a lot of growning pains where people start crashing machines because their code that was previously pretty well sandboxed in DX11 is now stomping all over video memory and hitting nasty race conditions. A lot of people "fix" this by wrapping the low level APIs with classes that are overly aggressive with barriers and mutexes, making them "safe" but slow. Over time these wrapper layers will get better and you'll get closer to peak performance, but I think on average CPU-side performance will be *worse* for ported apps over their DX11 equivalents, and GPU performance may not be as good either because the driver isn't going to have the leeway to do some of the smart things that have "bloated" into it over the past decade or so.

There are definitely ways to get much better performance out of the low level APIs, but you've got to do a lot of work to get there -- and it will only really help if you're already being limited by driver/API overhead.

# ? Feb 19, 2016 15:49

The_Franz: Aug 8, 2003

Hubis posted:

Well, OpenGL will be around forever for the same reason that OpenGL has been such a PITA to push forward -- there's a ton of heavily used software packages that rely on it and aren't going to be ported to something new, let alone something fundamentally different.

I kind of agree about DX11 -- it feels like a bad middle-ground between the "low-level" APIs of DX12/Vulkan and the more high-level wrapper engines. That being said, I definitely think it will be around for quite a long time (longer than DX9, which I think is only just now finally dying off) because there is a huge complexity gap between DX11 and DX12, and there aren't a lot of developers who are going to be able to take advantage of the benefits it confers in the near term.

Apple's Metal is actually somewhat of a nice middle-ground since it's basically GL4/DX11 with pipeline objects and command buffers. It's explicit enough to not introduce unexpected stalls without requiring you to manually manage memory and barriers. Unfortunately it still has some design decisions that make it feel like you are working with mittens on and lacks support for things like tessellation and geometry shaders.

We are definitely entering a time where there will be more wrapper libraries along the lines of bgfx. Personally I don't see what is so horrifically hard about Vulkan and DX12. Yes, they require you to plan ahead and think about what is actually going on in the global scope of your program, but the concepts aren't that difficult to wrap your head around and the result will almost certainly be better than hurling a mess of calls and state changes at the driver and letting it sort it out, especially if you aren't a giant company with direct numbers to Nvidia and AMD driver guys. Than again, there are still people whining about OpenGL deprecating immediate mode, so :shrug:

# ? Feb 19, 2016 16:47

Sex Bumbo: Aug 14, 2004

Thanks for the response Minsky.

I use bgfx. I actually work with the guy who makes it. I like it a lot, of course being a little biased.

# ? Feb 19, 2016 17:17

Doc Block: Apr 15, 2003; Fun Shoe

The_Franz posted:

Apple's Metal is actually somewhat of a nice middle-ground since it's basically GL4/DX11 with pipeline objects and command buffers. It's explicit enough to not introduce unexpected stalls without requiring you to manually manage memory and barriers. Unfortunately it still has some design decisions that make it feel like you are working with mittens on and lacks support for things like tessellation and geometry shaders.

Metal does kinda require you to manage memory, just not to the level that Vulkan seems to.

The thing about Metal is that it was designed for mobile GPUs with shared memory, where the memory management concerns are about whether all your textures will fit in memory alongside your other assets without getting your app killed, rather than about how to slice up available VRAM.

That's one of the things that I like about Metal as opposed to OpenGL ES: Metal doesn't pretend your little mobile GPU has its own VRAM. Textures and vertexes and shader arguments are all just backed by untyped buffers in main memory at the end of the day. You don't need to map/unmap/lock/unlock/whatever a buffer before you change it, since there's no copying between main memory and VRAM that needs to happen. Makes the API cleaner and nicer to use IMHO. Of course, that means there's nothing stopping you from modifying a buffer or texture while the GPU is using it, either...

# ? Feb 20, 2016 01:07

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Doc Block posted:

Metal does kinda require you to manage memory, just not to the level that Vulkan seems to.

The thing about Metal is that it was designed for mobile GPUs with shared memory, where the memory management concerns are about whether all your textures will fit in memory alongside your other assets without getting your app killed, rather than about how to slice up available VRAM.

That's one of the things that I like about Metal as opposed to OpenGL ES: Metal doesn't pretend your little mobile GPU has its own VRAM. Textures and vertexes and shader arguments are all just backed by untyped buffers in main memory at the end of the day. You don't need to map/unmap/lock/unlock/whatever a buffer before you change it, since there's no copying between main memory and VRAM that needs to happen. Makes the API cleaner and nicer to use IMHO. Of course, that means there's nothing stopping you from modifying a buffer or texture while the GPU is using it, either...

And you can use Metal on OS X too. Does it expose a different memory model?

# ? Feb 20, 2016 01:29

Doc Block: Apr 15, 2003; Fun Shoe

Metal on OS X has some tacked on stuff that lets you specify you'd like a given buffer/texture to go in VRAM if the GPU has any, which almost certainly requires mapping then unmapping the buffer when you modify it. (I've only glanced at Metal OS X since my iMac is too old for it).

It also has methods to let you choose which GPU to use if the system has more than one, so you can pick the integrated GPU if your app isn't too demanding and you want to be nice to laptop users.

It's got different alignment requirements in some areas too, and obviously the GPU features are different.

# ? Feb 20, 2016 02:54

Xerophyte: Mar 17, 2008; This space intentionally left blank

I've been tinkering with Vulkan over the day. I knew that doing basic things would require a lot of code but, man, doing basic things sure requires a lot of code.

I found Baldurk's Vulkan in 30 minutes pretty useful for getting a handle on things, as well as answering the basic questions like what the hell a WriteDescriptorSet is as I'm trying to modify this code sample into doing something fun and useful. Plus you support your local goon tutorials and debugging tools or something.

# ? Feb 21, 2016 00:38

Tres Burritos: Sep 3, 2009

Can someone explain what a "SwapChain" (from here) is? I've been working my way through vulkan examples and I've never heard of a SwapChain before.

edit : Jesus I guess wikipedia (of all things) has it https://en.wikipedia.org/wiki/Swap_Chain

Tres Burritos fucked around with this message at 23:39 on Feb 22, 2016

# ? Feb 22, 2016 23:10

Minsky: May 23, 2001

Yeah that wikipedia gives you the general idea.

The way it's represented in Vulkan is that you make a swap chain object, you ask it to give you handles to all of the images in the chain, and then during each frame's render function you acquire an index of one of these images onto which you may render that frame's picture. Then you queue the image to be presented and continue to the next frame and repeat the process.

# ? Feb 23, 2016 08:54

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

It gives you the index, then, right? So you get something like buffer_age for free?

# ? Feb 23, 2016 09:24

Minsky: May 23, 2001

Yes it gives you the image index, but you'd have to double check whether the spec says that it's guaranteed that the previous image contents are guaranteed to still be there. This is assuming your intent is to for example map from that image index to a previous frame id and then reuse part of that previous frame's picture instead of redrawing all of it, or something.

I don't see why it shouldn't be possible, but there's a lot of moving parts with swap chains so I can't say for sure whether that kind of behavior is safe or not. I haven't checked myself, so I honestly don't know one way or the other.

Minsky fucked around with this message at 15:18 on Feb 23, 2016

# ? Feb 23, 2016 15:16

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

I drat hope nothing touched my image in the middle. Wouldn't it be possible to use that image as a texture, as well, or are images from swap chains not guaranteed to be textureable?

... which would be a really weird thing, because the compositor has to be able to texture it anyway when rendering the full frame in a windowed system.

# ? Feb 23, 2016 18:28

baldurk: Jun 21, 2005; If you won't try to find coherence in the world, have the courtesy of becoming apathetic.

Vulkan spec posted:

Presentation is a read-only operation that will not affect the content of the presentable images. Upon reacquiring the image and transitioning it away from the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout, the contents will be the same as they were prior to transitioning the image to the present source layout and presenting it. However, if a mechanism other than Vulkan is used to modify the platform window associated with the swapchain, the content of all presentable images in the swapchain becomes undefined.

So basically as long as you do things normally the image will still be there unmolested.

There is a query vkGetPhysicalDeviceSurfaceCapabilitiesKHR that returns a VkSurfaceCapabilitiesKHR. The member VkSurfaceCapabilitiesKHR::supportedUsageFlags is an enumeration of VK_IMAGE_USAGE_*_BIT saying what you're allowed to do with the images, as if it had been created with that set of usages passed to vkCreateImage. I don't know how often that's a real limitation though at least on desktop. E.g. on nvidia it's set to 255 which actually sets some bits that don't even exist as usage flags, and last I checked AMD only sets COLOR_ATTACHMENT. But I still do resolve/copy operations onto it and it works fine on AMD. :getin:

I imagine they support most operations too. Note: that might not include image load/store since AFAIK DX12 doesn't allow UAV access to the backbuffer.

# ? Feb 23, 2016 18:45

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Oh, yeah, there's the obvious caveat of "if the user resizes your window, of course you can't reuse the old buffers", but everyone already knows that one and never says it. Thanks, Mr. aldurk.

# ? Feb 23, 2016 19:01

Sex Bumbo: Aug 14, 2004

Suspicious Dish posted:

I drat hope nothing touched my image in the middle. Wouldn't it be possible to use that image as a texture, as well, or are images from swap chains not guaranteed to be textureable?

... which would be a really weird thing, because the compositor has to be able to texture it anyway when rendering the full frame in a windowed system.

Why does it matter? When do you care about old swap chain contents? It's values from N-1 frames ago.

If you're trying to do a feedback effect, use a render target.

# ? Feb 23, 2016 19:15

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Because my main job isn't actually building games, it's building UIs and compositors, which render mostly small updates when somebody clicks on a menu item. If my only frame update is a 16px animated spinner, I know I only need to scissor out that square. That's a lot of rendering I'm not doing, a lot of memory bandwidth saved, and a lot of tiles not even touched on a tiler.

# ? Feb 23, 2016 19:37

High Protein: Jul 12, 2009

D3D12 question:

I'm porting over my D3D11 dynamic vertex buffer class. That worked basically like std::vector; I'd create a USAGE_DYNAMIC buffer, map it with WRITE_DISCARD, and if the size turned out to be insufficient, I'd unmap it, create a new larger buffer, CopySubResourceRegion() over the old data, and map the new buffer.

For D3D12 I'm creating a vertex buffer on an UPLOAD heap, I map it, and if it's too small, create a new buffer. Now comes the issue. I'm trying to copy over the old data using ID3D12GraphicsCommandList::CopyBufferRegion() function, and it doesn't seem to work properly. Note that the debug runtime doesn't complain about anything, which it will if trying to do a plain CopyResource() with an UPLOAD heap as destination.

One issue I can imagine is that the GPU tries to draw from my buffer before the copy is complete. However, I'm unable to insert a resource usage barrier, as for upload heap resources only D3D12_RESOURCE_STATE_GENERIC_READ usage is accepted; if I try to use anything else the debug runtime complains.

Any idea what I'm doing wrong? Do I need to use a fence, or copy to a default usage buffer?

# ? Feb 28, 2016 00:25

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

gr... time to try to figure out why my specular light is creating a giant dent

edit: help welcome i guess if you want to stare at my awful shaders.

http://magcius.github.io/pbrtview/src/pbrtview.html

i've been trying to follow http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_notes_v2.pdf as best i can, without too much success.

Suspicious Dish fucked around with this message at 04:00 on Feb 29, 2016

# ? Feb 29, 2016 03:47

Joda: Apr 24, 2010; When I'm off, I just like to really let go and have fun, y'know?; Fun Shoe

Are you maxing all your cos() factors with 0? It looks like the sort of error you get when negative values are allowed in the rendering equation.

# ? Feb 29, 2016 17:32

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Well, I found the issue -- my D term was simply denom * denom, adding in the alpha2 fixed it. I had a bit of trouble realizing why my highlight disappeared, until I realized that roughness was set to 0, and obviously that would make the entire term go to zero. The paper didn't describe a range for roughness!

# ? Feb 29, 2016 18:07

Tres Burritos: Sep 3, 2009

So if I had a bunch of vector data (think svg, lines, polygons, etc) and I wanted to be able to draw them on a plane as a texture, what's the best way to do that?

this looks pretty good to me. (and blog post)

It looks like it's doing a pass to make a sdf texture and then using that texture for rendering correct?

Or is that a stupid dumb idea?

# ? Mar 6, 2016 06:43

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

GPU accelerated 2D rendering is very much an unsolved problem. The general approached used by Direct2D, Skia, etc. is to do the conversion from splines to tris on the CPU, then upload that data as vertices. Loop-Blinn still has a large number of downsides and special cases before it can be performant -- you're doing a lot more work in the pixel shader per-spline.

The SDF-texture or general resolve approach of submitting the entire scene to the GPU doesn't work too well. I've seen a few variants of this where a compute shader first does a basic bounds check on each shape, breaks the scene into tiles, each one having the shapes that cull into it, pixel shader picks up its variant of that scene. Doesn't perform as well as the tessellation approach, and it's usually a lot more complicated, but it does mean you can get performant Loop-Blinn.

I've been playing around with a GPU-accelerated 2D library in my spare time, to try and test some of these things. It's a really hard problem, much harder than 3D.

# ? Mar 6, 2016 07:42

Hubis: May 18, 2003; Boy, I wish we had one of those doomsday machines...

If anyone is going to be at GDC this year, feel free to swing by and check out my talk Wednesday afternoon: Fast, Flexible, Physically-Based Volumetric Light Scattering

https://www.youtube.com/watch?v=lQIZzKBydk4

# ? Mar 9, 2016 18:55

Sex Bumbo: Aug 14, 2004

Can you just tell us how the techniques work here?

# ? Mar 9, 2016 19:19

baldurk: Jun 21, 2005; If you won't try to find coherence in the world, have the courtesy of becoming apathetic.

Hubis posted:

If anyone is going to be at GDC this year, feel free to swing by and check out my talk Wednesday afternoon: Fast, Flexible, Physically-Based Volumetric Light Scattering

https://www.youtube.com/watch?v=lQIZzKBydk4

This was already in my schedule planner :3:

.

Also if we're plugging GDC talks then I have am doing a short demo as part of Practical Development for Vulkan (presented by Valve Software).

# ? Mar 9, 2016 19:19

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Sex Bumbo posted:

Can you just tell us how the techniques work here?

Given it's part of Gameworks, doesn't that mean it's a proprietary SDK, or...

# ? Mar 9, 2016 19:33

Hubis: May 18, 2003; Boy, I wish we had one of those doomsday machines...

Sex Bumbo posted:

Can you just tell us how the techniques work here?

Sure, though since I'm still frantically editing slides I'll probably save going into great depth until I have some free time

Basically, the hard part of direct in-scattering (1 bounce) is the fact that the light source may not be visible from all points along the ray. If it were, you could just evaluate the integral from they eye to the scene depth once and be done with it. This technique takes advantage of the fact that, since the visibility function is binary, you can re-state it as the sum of integrals over the lit portions. Furthermore, you can re-state an integral over an interval is the integral to the end-point minus the integral to the starting point:

code:

L = I(a, b) + I(c, d)
L = [I(0, b) + I(0, d)] - [I(0, a) + I(0, c)]

So what we do is render a mesh that corresponds to the volume that is visible to the light, evaluating the integral in the pixel shader and adding the result if it's a front-face or subtracting the result if it's a back-face. We do this by using a tessellated volume corresponding to the world-space coverage of the light and using the shadow map to offset the depth so it matches the world. That's the wireframe view you see at the end.

There's some tricks in how the integrals are evaluated, how the media is modeled, filtering to provide better results, etc. but the concept itself is pretty straightforward once you wrap your head around the math.

baldurk posted:

This was already in my schedule planner .

Also if we're plugging GDC talks then I have am doing a short demo as part of Practical Development for Vulkan (presented by Valve Software).

Suspicious Dish posted:

Given it's part of Gameworks, doesn't that mean it's a proprietary SDK, or...

I'll be going into all the details at the talk. :ninja:

# ? Mar 9, 2016 19:44

Sex Bumbo: Aug 14, 2004

How does this compare to oldschool fog polygon volumes? Like http://developer.download.nvidia.com/SDK/9.5/Samples/DEMOS/Direct3D9/src/FogPolygonVolumes3/docs/FogPolygonVolumes3.pdf

# ? Mar 9, 2016 19:54

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 19:05

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Suspicious Dish posted:

GPU accelerated 2D rendering is very much an unsolved problem. The general approached used by Direct2D, Skia, etc. is to do the conversion from splines to tris on the CPU, then upload that data as vertices. Loop-Blinn still has a large number of downsides and special cases before it can be performant -- you're doing a lot more work in the pixel shader per-spline.

The SDF-texture or general resolve approach of submitting the entire scene to the GPU doesn't work too well. I've seen a few variants of this where a compute shader first does a basic bounds check on each shape, breaks the scene into tiles, each one having the shapes that cull into it, pixel shader picks up its variant of that scene. Doesn't perform as well as the tessellation approach, and it's usually a lot more complicated, but it does mean you can get performant Loop-Blinn.

I've been playing around with a GPU-accelerated 2D library in my spare time, to try and test some of these things. It's a really hard problem, much harder than 3D.

Webrender is cool pcwalton did a talk on it recently

# ? Mar 9, 2016 19:56

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 3D graphics questions that do not deserve their own thread (OpenGL / Dx10)

«‹›74 »