Another idea, assuming all you have for your heatmap is some points, is to keep those in a float2 array, and in the pixel shader use something that goes through each point and adds red based on the distance.
|
|
# ? Oct 30, 2015 15:00 |
|
|
# ? May 25, 2024 14:20 |
|
Tres Burritos posted:What's the simplest way to make a heatmap using shaders? (Assuming that's even a good idea?) It depends on how accurate you need it to be, and what you want to use it for. The easiest way would be the "splatting" method -- render your sample points as quads over your destination map with additive blending on. The size of the 'splat' should correspond to the importance/intensity of the sample (so that the 'edge' of the splat is where the weight function hits zero). Inside the pixel shader, you compute the distance the fragment being shaded is from the center of the splat using something like a windowed gaussian, and then output a value scaled by that function. All the splats will additively blend together and give you a localized density in the output texture, which you can then read from and feed into a color lookup if you want to get false-color rendering. This is also known as a "scatter" approach (you're "scattering" a single input to multiple outputs). The above way is going to be the most efficient for most cases, though you might run into performance problems if you end up with most of your points in a very small area of the render target. In that case you'll run into a blending bottleneck, since the pixels will be contending for the same blending interlocks (although this isn't as much of a problem as it sounds for small amounts of overdraw). The other approach is something closer to what is done with "tiled" deferred lighting: divide the output into regular tiles (8x8 is probably a good number) and go through all the points so that each tile has a list of the points which may be affecting it (points may appear in multiple lists). The, using a compute shader pass (ideally -- though this is doable with a fullscreen PS pass as well, albeit less efficiently) figure out what tile each pixel belongs to and compute the sum of the distance-weighted value for every sample that hits that tile. Then output that, without blending. This can be more efficient if there's a lot of "hot spots" with blend contention, but it's still not necessarily a win because now you've got a much smaller number of threads doing a lot of work in serial, instead of a the entire GPU distributing the workload (and then relying on the blend interlock to coalesce it). This is what's referred to as a "gather" approach (you're "gathering" multiple inputs to a single output). e: And if the number of points is relatively small and performance is not critical, jam them into a constant buffer and just do a full screen PS pass that iterates over all the samples at every pixel and accumulates a weight. This is roughly equivalent to the second ("gather") approach except without the tiling step, which means you will probably have a lot of wasted work processing samples that don't affect the pixel at all. Still, if the number of samples is small (or their radius of effect is large) it could be roughly as efficient. Hubis fucked around with this message at 15:57 on Oct 30, 2015 |
# ? Oct 30, 2015 15:53 |
|
https://www.shadertoy.com/view/lt2SWG drag the mouse around to make it do something
|
# ? Oct 30, 2015 18:50 |
|
Joda posted:Assuming your data points are two dimensional, and the heat map you want to make is too, then draw a quad filling the entire screen and project it with an orthogonal projection. You have a couple of options on how to do the heat map for each of your data points. One is to simply, as you said, upload your data points array as a uniform, then loop through it in a for-loop in the shader adding the weighted* contribution of all data points to your fragment. This has the advantage of you being able to account for overexposure (colour values being over 1.) Another option is to enable additive blending (gl.enable(gl.BLEND);gl.blendFunc(gl.ONE,gl.ONE);gl.blendEquation(gl.FUNC_ADD)) and drawing your data points one at a time without clearing the buffer. This has the advantage that you can have an arbitrary amount of data points, and the shader won't have to know exactly how many there are (As a general rule, the GPU has to know exactly how much uniform data you have to upload, and how many times you plan to loop over your data at compile time.) Ahhh yep, I follow. Sex Bumbo posted:https://www.shadertoy.com/view/lt2SWG Good poo poo. So where you put code:
I think I'm going to try that first and see how it goes. Hubis posted:It depends on how accurate you need it to be, and what you want to use it for. I'm not quite getting this one, so for each datapoint I'd create a quad (based on the intensity / whatever) of the point, then I'd run the gaussian function for just that plane and then render that to a texture? And then when planes are overlapping the GPU would just know what's going on and do some behind the scenes blending? Would you have to make sure that the planes are in distinct layers (y = 0, y = 1, y = 2) so that you didn't get weird collision artifacts (I'm fairly certain I've seen that before)?
|
# ? Oct 30, 2015 23:13 |
|
Tres Burritos posted:I'd just be comparing that fragment against all the uniform points or whatever that got passed in. The problem with this seems to be that it doesn't run so hot on a 4k display, I'm guessing looping through all those fragments (for like 1000 points on a GTX 980) is a little expensive. Or maybe shaderToy just doesn't like fullscreen. You can't splat sprites using shader toy but the idea would be the same -- either partition your points in such a way that it makes the shader faster, or render the points as sprites and do a post-process to determine the color value. Notice it's doing an add operation for each point, equivalent to additive blending. Or as a trivial example, say you have a NxN grid and put each point into a bucket. Then each pixel only examines nearby buckets for points. You need a clever way to encode the buckets and sizes but it shouldn't be too hard.
|
# ? Oct 31, 2015 02:13 |
Tres Burritos posted:I'm not quite getting this one, so for each datapoint I'd create a quad (based on the intensity / whatever) of the point, then I'd run the gaussian function for just that plane and then render that to a texture? With additive blending enabled and glBlendFunc set to GL_ONE,GL_ONE, the GPU takes whatever is already in the framebuffer/texture for a given fragment, and adds the value you just calculated for the fragment. E: The idea behind using multiple planes is that you only draw the part of the texture the point actually influences. Joda fucked around with this message at 04:50 on Oct 31, 2015 |
|
# ? Oct 31, 2015 04:43 |
|
Tres Burritos posted:Ahhh yep, I follow. They can all be in the same plane -- if you configure for additive blending like Joda described the blend/framebuffer unit will resolve overlapping regions correctly. You'd accumulate your splats to a single-valued texture (R16_FLOAT or whatever) and then when you rendered that texture, you'd remap the float value to a color via a lookup.
|
# ? Oct 31, 2015 16:39 |
|
Thanks goons! I got Sex Bumbo's solution working Demo And then some blended splatting(?) Demo The splatting isn't as fancy right now, and it's missing the final step, which will be taking the red values and converting it to the same shading as the first heatmap, but it allows me to have around 2.5x more datapoints. See, you just needed to use tiny words
|
# ? Nov 1, 2015 01:32 |
|
I'm having trouble sending integers through shaders in opengl. I have a switch in the geometry shader that uses integers to see what shape it should make. The problem is that only the value zero goes through it like it should. One, two, three etc seem to come out as really large numbers. I tried setting the value of the integer inside the vertex shader which worked as intended. And if I send the values as floats and replace the switch with if's and if else's, it also works. One thing I could find on the internet was, that since the sizes of integers could be different on different hardware the values could get messes up, but the problem persisted even when I stored the integers as GLint. Edit: gently caress this. Now it works and I have no idea why. Edit2: Found out why it works now, but I have no idea why you would do this. code:
code:
Mugticket fucked around with this message at 21:42 on Nov 3, 2015 |
# ? Nov 3, 2015 17:11 |
|
I think glVertexAttribPointer converts values to floats. Try glVertexAttribIPointer.
|
# ? Nov 4, 2015 07:54 |
|
Ralith posted:I think glVertexAttribPointer converts values to floats. Try glVertexAttribIPointer. Thanks a lot!
|
# ? Nov 4, 2015 10:13 |
|
Am I allowed to ask Metal questions in here?
|
# ? Nov 5, 2015 23:47 |
|
Doc Block posted:Am I allowed to ask Metal questions in here? This would probably be the right place, though I'd be curious to see how much expertise is out there
|
# ? Nov 6, 2015 14:07 |
|
Well, here's the thing. I want to add a bloom effect to my AppleTV game (whose engine I wrote using Metal), and I've got it working, but it's slooooow. The basic steps I'm doing right now: 1) Render the scene into a framebuffer (RGBA16F) that's the same resolution as the screen 2) Create a specular buffer by rendering the output of Step 1 into a texture that's 1/4th the screen resolution, using a shader that drops any values below a brightness threshold. 3) Blur the specular buffer 4) Render a fullscreen quad to the screen, with a shader that takes the output of Step 1 and Step 3 and combines them. This pretty much kills the device's fillrate and is not kind to the GPU's tile cache. But there doesn't really seem to be a better way. I can use multiple render targets to get rid of Step 2, but then the specular buffer has to be full resolution and makes Step 3 a lot slower. And there's still the need to store values outside of the tile cache and then read them back in, which is a performance no-no on PowerVR. Basically, does anyone have a better way to do this in Metal? Hopefully one that's able to do most/all of the work inside the tile cache so the GPU doesn't have to touch main memory until the tile is finished?
|
# ? Nov 6, 2015 19:32 |
|
And, of course, now that I've written it all out it occurs to me that large chunks of that could possibly be combined into just one or two compute shaders. Welp. vv
|
# ? Nov 6, 2015 19:36 |
|
Shouldn't you be looking into the performance shaders for a blur effect? I thought that's specifically what those were made for.
|
# ? Nov 6, 2015 19:42 |
|
That's what I'm using to do the blur, but there's a performance penalty involved in switching the GPU from draw mode to compute mode and then back within the same frame. The Metal Performance Shader documentation suggests either doing all your compute at the beginning or at the end of the frame to avoid this. Even without that penalty, I imagine it'd be a lot faster if I wrote a compute shader (or two) that took the initial color buffer, downscaled it & clipped the sub-threshold colors to black, blurred that, then combined that with the original and just wrote it to the final drawable in as few operations as possible instead of a bunch of discrete steps like I'm doing now. Plus, Metal Performance Shaders are only available on A8 GPUs and up, so if I bring the game to iPhone later I'd have to come up with an alternate solution for A7 devices anyway (never mind having to do an OpenGL renderer). Doc Block fucked around with this message at 20:45 on Nov 6, 2015 |
# ? Nov 6, 2015 20:43 |
|
Why are you using RGBA16F, and can you avoid that? Can you downsample it further? What's the format of the specular buffer? If you want to use compute, can you eat a frame of latency? That way you would have only one draw portion of your frame and one compute portion: 1: Compute last frame's blur 2: Draw new frame to different frame buffer 3: Draw/present last frame Sex Bumbo fucked around with this message at 03:59 on Nov 7, 2015 |
# ? Nov 7, 2015 03:55 |
|
I'm using RGBA16F for the offscreen color buffer so that it preserves brightness above 1.0 instead of just clipping, which is useful for doing post processing effects like bloom. I've tried downsampling the specular buffer further, but then it looks blocky and shimmers if I downsample it enough to make a noticeable dent in performance. It's actually 1/8th the screen resolution, not 1/4 as I said earlier (divided by 4 in each direction = 1/8th not 1/4th, whoops). The specular buffer pixel format is just R8. I could definitely stand to have at least the bloom effect be a frame behind. Hadn't thought of that...
|
# ? Nov 7, 2015 07:37 |
|
As an experiment you might want to try lowering the bit depth, even to something extreme like 565, just to see how it affects performance. Other than that, I'm not familiar with Metal but there aren't really that many different ways to downsample a texture and blur it that I'm aware of. Are you able to profile it somehow? You can do hacky profiling like doing a pass-through non-blur just to see if it's the blur kernel that's the bottleneck.
|
# ? Nov 7, 2015 07:47 |
|
Xcode has some pretty nice tools for Metal. You can do a GPU frame capture and it will give you an exact breakdown of everything, including how long it all takes, what the various state objects are set to at any given point during the frame, and even what the frame looked like at any specific draw call (with funky green outlines showing you what exactly was drawn in that particular call). And yeah, the two biggest bottlenecks are the blur kernel and the composite pass, which combines the color buffer with the (downsampled and blurred) specular buffer and then writes it to the framebuffer. (the "3D Mesh (normal)" pipeline isn't really a bottleneck since in that frame it's drawing multiple models with large, poorly-optimized geometry sets that take up large portions of the screen and use a crappy fragment shader that I haven't really optimized yet). edit: those 8 million compute pipelines are the implementation of a Gaussian Blur filter from Apple's Metal Performance Shader framework). Doc Block fucked around with this message at 08:39 on Nov 7, 2015 |
# ? Nov 7, 2015 08:36 |
|
Sex Bumbo posted:As an experiment you might want to try lowering the bit depth, even to something extreme like 565, just to see how it affects performance. Other than that, I'm not familiar with Metal but there aren't really that many different ways to downsample a texture and blur it that I'm aware of. Are you able to profile it somehow? You can do hacky profiling like doing a pass-through non-blur just to see if it's the blur kernel that's the bottleneck. Does Metal support 10-11-10?
|
# ? Nov 7, 2015 18:49 |
|
Hubis posted:Does Metal support 10-11-10? Looking at the Metal Programming Guide and they have RG11B10Float as well as RGB9E5Float shared exponent formats, apparently, which might work for the HDR in this case. Maybe. I really don't know enough about the problem domain here.
|
# ? Nov 7, 2015 19:18 |
|
I'd look into trying a comparable non-compute blur. It might be doing a constant time blur for any blur width, which might lose out pretty hard in performance compared to a fragment shader blur until you make the blur width enormous. Compute Gaussian blurs seem to function better as example code in my experience.
|
# ? Nov 7, 2015 19:49 |
|
The Metal Performance Shader library has tons of permutations for each operation, optimized for various image sizes, kernel sizes, and pixel formats. The API chooses which to use at runtime. If memory serves, in one of the WWDC sessions Apple said they had created 60+ different Gaussian blur implementations.
|
# ? Nov 7, 2015 23:27 |
|
so you're saying they wrote a shader compiler
|
# ? Nov 7, 2015 23:29 |
|
They explain in the WWDC session video: https://developer.apple.com/videos/play/wwdc2015-607 Anyway, I'll try a 2-stage blur, but given what I've seen so far of dependent texture reads on PowerVR, in not confident it'll be fast. Especially not if it has to swap out the tile cache to main memory. I think having to do multiple tile cache stores per frame is part of my current problem.
|
# ? Nov 7, 2015 23:56 |
|
You should be able to avoid dependent texture reads in a blur shader. After all, the position of every sample is always the same every blur for each pixel.
|
# ? Nov 8, 2015 02:20 |
|
Whatever they're called then when you specify the texcoords in the fragment shader so the GPU doesn't know to prefetch the necessary texels. Or is that not an issue anymore? I'll have to play around and see.
|
# ? Nov 8, 2015 02:37 |
|
It should know how to prefetch the necessary texels is what I'm saying. Like, if you're reading 3 texels, encode all three positions into the vertex data so it gets interpolated and you don't need any math in the fragment shader.
|
# ? Nov 8, 2015 03:15 |
|
Doc Block posted:Whatever they're called then when you specify the texcoords in the fragment shader so the GPU doesn't know to prefetch the necessary texels. Not a thing anymore (and hasn't been really since 2005 or so). "Dependent Texture Reads" refer to one texture fetch relying on the value of a previous texture fetch to determine its lookup location. A screen-space distortion shader is a perfect example -- the value you output is fetched from the rendered framebuffer using a coordinate offset by a second "distortion map". This is potentially bad because the first texture read injects round-trip latency so the shader unit sits idle, then the texture unit sits idle while the shader unit computes the new sample position. If you have well balanced workloads that will keep the shader unit busy with other things then this isn't that big of a deal. Reading something straight from a vertex interpolator will be no faster than if you read it and did some kind of (simple) math on it first. Either way it's executing some shader functions to put the interpolated (and possibly modified) value into a register that it then feeds to the texture unit. There's no "fast path" where you just pump the TEXCOORD semantic straight to the texture unit anymore.
|
# ? Nov 8, 2015 04:55 |
|
I'm just way overthinking this, it seems. I'll try to get back to obsessing over this stupid bloom effect that I just had to have in my game later this week or next weekend. Thanks guys. Doc Block fucked around with this message at 08:26 on Nov 8, 2015 |
# ? Nov 8, 2015 08:12 |
|
Hubis posted:Not a thing anymore (and hasn't been really since 2005 or so). I thought you were wrong about this, at least regarding A7, but quote:The Apple A7, A8, and A9 GPUs do not penalize dependent-texture fetches. It is I who am the big dummy
|
# ? Nov 9, 2015 01:32 |
|
Sex Bumbo posted:I thought you were wrong about this, at least regarding A7, but It's a fair possibility -- my experience is mostly with discrete PC GPUs -- but yeah I would have been very surprised if the PowerVR architecture were hugely different in that area.
|
# ? Nov 9, 2015 02:41 |
This is kind of a tangential question, but as far as I can tell this is the place I'm most likely to find people who work/have worked with rendering academically. What drawing program do/did you use for theses and papers to demonstrate spacial concepts? I'm currently working in Geogebra, and it works great for 2D simplifications of concepts, but there are some things where a 3D drawing is simply needed, and doing those in Geogebra are a pain.
|
|
# ? Nov 15, 2015 00:19 |
|
Is there a way I can draw to a framebuffer in OpenGL such that if a pixel has been written to once then it is locked into that color and cannot be overwritten? I can't use the stencil or depth buffer and the pixel values could have alpha < 1. Like if i clear it to (0,0,0,0), then only pixels with value == (0,0,0,0) should be allowed to be written to. Or to put another way, if the alpha value is non-zero then don't let it be drawn over anymore.
|
# ? Nov 20, 2015 04:30 |
|
Can you read from the destination color buffer in OpenGL? If so, just write a shader that reads the corresponding pixel in the destination color buffer, and if its alpha > 0 then discard the fragment by calling gl_DiscardFragment() (or whatever it's actually named). What are you trying to do?
|
# ? Nov 20, 2015 05:01 |
|
As I understand it, shaders can't read pixel data out of the framebuffer.
|
# ? Nov 20, 2015 05:21 |
|
peepsalot posted:Is there a way I can draw to a framebuffer in OpenGL such that if a pixel has been written to once then it is locked into that color and cannot be overwritten? I can't use the stencil or depth buffer and the pixel values could have alpha < 1. However, I think you also can't do it anyway - reading from the destination is broadly a no-no. If you could make the prior rendering go to an intermediate texture then you could do a final combination render to the screen to get the effect you want. You could do some weird thing for the combination without branches, something like code:
|
# ? Nov 20, 2015 05:27 |
|
|
# ? May 25, 2024 14:20 |
|
You can't read the destination color buffer in a shader in OpenGL? I thought you could... You can in Metal Doc Block fucked around with this message at 05:35 on Nov 20, 2015 |
# ? Nov 20, 2015 05:30 |