|
MarsMattel posted:Not sure if this is the right thread, but it seems the best fit. Which OpenCL version is this? http://www.khronos.org/message_boards/showthread.php/6788-Multiple-host-threads-with-single-command-queue-and-device this is old but apparently 1.0 does not support thread-safe kernel enqueueing?
|
# ? May 11, 2014 08:22 |
|
|
# ? May 31, 2024 04:12 |
|
Its version 1.2, so it should "just work" I think. I'll try per thread contexts next I think.
|
# ? May 11, 2014 14:35 |
|
I've managed to get to the root of my problem. I narrowed it down one of my two kernels, and the "weird behaviour" still occurred when using an almost empty kernel (just writing 0 to the two output images). That indicated the problem was on the CPU side. I'm building this from example code, so I was naively creating an 4 image objects and destroying for each kernel invocation. Creating the images once and then writing to them each time instead seems to have fixed the problem. The crashes were generally reported as things like heap corruption so that would seem to fit with my hammering the resource create/destroy.
|
# ? May 12, 2014 18:44 |
|
I've been hacking away at my OpenCL code and now run into another (seemingly) strange issue. In a single source file I have a multiple kernels defined. The first kernel examines every edge on the voxel grid and runs a density function for each end point. If and only if the density on one end is +ve and -ve on the other is the edge selected for processing in the second kernel. The second kernel again calculates the density at either end and uses these values, d0 & d1, to do further computation. The only way an edge can be considered for the second kernel is if the first kernel determined that (d0 < 0 && d1 >= 0) || (d1 < 0 && d0 >= 0). The strange problem is that in the second kernel, for at least one edge d0 == d1. This should never happen. My first thoughts were that the wrong edge was being examined, or that the wrong edge was being nominated from the initial kernel. I've debugged it and as far as I can tell, the problem is that my density() function is returning a different result with the same inputs when invoked from a different kernel. This seems very strange to me, but I'm a total OpenCL novice so perhaps there is something in this. The only explanation I can think of is either a floating point inaccuracy that only occurs in the second kernel, or the second kernel does not run the density() function properly (i.e. it just lifts the same value twice, or something). All the calculations are carried out with 32-bit floats, and there are a lot of floating point calculations so an inaccuracy appearing somewhere doesn't seem unlikely. The bit that's really confusing is that the density() function when called with the same arguments produces a different result -- the only way that could make sense is if the execution of the function depends on something other than the parameters, but it doesn't use global state or anything like that, the function should be "pure". Any ideas?
|
# ? May 20, 2014 15:57 |
|
Apologies for this becoming my own personal Misadventures in OpenCL thread. I found the problem! To get the positions of each corner I use an array of offsets: code:
code:
This seems to be some sort of alignment problem then?
|
# ? May 20, 2014 23:47 |
|
Hi I'm using GLSL compute shaders and would like to perform operations on a bunch of XYZ values. My current strategy is to put them into a SSBO, but unless I pad them into 16 bytes, I can't reference them in GLSL as vec3s. Or can I? i.e.: code:
Is this preferrable to XYZXYZXYZ? Is it faster for some reason? If so, why? If not, is there an easy way to access XYZ elements since declaring an array as vec3s won't provide the desired behavior?
|
# ? May 27, 2014 20:01 |
|
Vectors should be 16byte aligned. You could make also make a giant array of floats and then create vectors out of them in your shader, but I wouldn't recommend it.
|
# ? Jun 4, 2014 01:51 |
|
Sex Bumbo posted:Is this preferrable to XYZXYZXYZ? Is it faster for some reason? If so, why? Hell yes. Unaligned accesses are slowwwwww...
|
# ? Jun 4, 2014 02:02 |
|
High Protein posted:I don't think I fully understand; so you want a piece of road with one rounded corner to somehow use the 'rounded corner' texture for one corner and the 'straight road' texture for others? Thanks for your help, my reply is really late but I hadn't had time to revisit this problem until now. The geometry itself is actually in the form of triangle strips, so I figure I actually need 4 different textures per vertex. I'm actually a little torn between using instancing and stuffing the texture data into the per-instance data, or putting the texture data per-vertex. Since I'm using DirectX9 I don't have access to array textures as far as I know.. Can I just use 3D textures instead? I won't be interpolating between the textures, I just want to be able to access like 100 textures at once so I can draw all my terrain instances in one draw call. Is there some drawback to using a 3D texture as if it were a texture array?
|
# ? Jun 6, 2014 00:26 |
|
From what I remember, array textures are best if you want "just one or the other", wheras 3D textures try to interpolate between layers, so even if you're exactly on one layer, it'd still have some math to try and blend on the Z axis I think? So it might be a bit slower but hey, try it, if it works it works.
|
# ? Jun 6, 2014 00:42 |
|
Suspicious Dish posted:Hell yes. Unaligned accesses are slowwwwww... Spite posted:Vectors should be 16byte aligned. Can you elaborate on why it's faster to use 16 byte aligned elements? I tested this out with a simple read-modify-write shader with linear memory access. It's about 10% slower (old GTS 250, gonna try some others too) to do unaligned which I wouldn't qualify as "slowwwwww" but it's still significant. I imagined linear access would result in a chunk of memory being requested or scheduled, and if it were packed that would result in less bandwidth. Also if it were reading it in as a chunk, the alignment of the elements wouldn't matter would it? Sex Bumbo fucked around with this message at 19:33 on Jun 16, 2014 |
# ? Jun 16, 2014 19:23 |
|
Mata posted:Thanks for your help, my reply is really late but I hadn't had time to revisit this problem until now. The geometry itself is actually in the form of triangle strips, so I figure I actually need 4 different textures per vertex. I'm actually a little torn between using instancing and stuffing the texture data into the per-instance data, or putting the texture data per-vertex. Volume textures would work fine, just remember if you go down a mip you're losing half your layers. Also make sure they're available since you're using DX9 and presumably working with old hardware.
|
# ? Jun 16, 2014 20:31 |
|
Sex Bumbo posted:Can you elaborate on why it's faster to use 16 byte aligned elements? I tested this out with a simple read-modify-write shader with linear memory access. It's about 10% slower (old GTS 250, gonna try some others too) to do unaligned which I wouldn't qualify as "slowwwwww" but it's still significant. The float4 alignment requirement is probably due to how GPU memory accesses are coalesced for neighboring addresses.
|
# ? Jun 16, 2014 21:35 |
|
High Protein posted:The float4 alignment requirement is probably due to how GPU memory accesses are coalesced for neighboring addresses. That shouldn't matter though because it coalesces the access from all threads in a group right? Like if each thread accesses 12 bytes and the group is 32 threads big, the thread group needs 384 bytes. I don't know how much it would coalesce but 384 bytes would be 16 byte aligned. I think this is the correct answer maybe? It's for CUDA but presumably the rules should be the same for OpenGL on an nvidia card. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses posted:Global memory instructions support reading or writing words of size equal to 1, 2, 4, 8, or 16 bytes. Any access (via a variable or a pointer) to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e., its address is a multiple of that size). Sex Bumbo fucked around with this message at 05:00 on Jun 17, 2014 |
# ? Jun 17, 2014 00:01 |
|
Yay! I didn't realize this thread existed. Maybe one of you fine goons could help me with an issue I've been having for the past couple days. Whenever I try to run an OpenGL 3.0 executable that compiles with no issue under linux, I get the following: code:
cat Makefile: Pastebin This is the same makefile that I can compile and run the project on other machines fine with. I should add that removing -L/usr/lib/nvidia/current/ has no effect in this case, but was previously necessary on another build. cat /proc/driver/nvidia/version: Pastebin glxinfo: Pastebin Jehde fucked around with this message at 19:16 on Jun 17, 2014 |
# ? Jun 17, 2014 19:04 |
|
Try the solution here? https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-319/+bug/1248642
|
# ? Jun 17, 2014 20:13 |
|
Okay I replicated their test case and had the exact same symptoms, and applying the fix of running "export LD_PRELOAD=/usr/lib/mesa-diverted/x86_64-linux-gnu/libGL.so.1" before compiling worked for the test case, but when I tried to do the same for my actual project, I get a new error that I can't figure out. So, progress?code:
ldd of project executable: Pastebin EDIT: Nevermind, my whole Debian install went kaput again. I'll probably be back once I get another distro up and running. Jehde fucked around with this message at 22:40 on Jun 17, 2014 |
# ? Jun 17, 2014 22:07 |
|
Read further into the bug, you want to preload libpthread, not libGL
|
# ? Jun 18, 2014 02:11 |
|
I just managed to get it working on Debian 7.5 with relatively little trouble, surprisingly. I sacrifice being able to use Steam on linux, but my main purpose for the linux install is development, so. Thanks for the help anyways though.
|
# ? Jun 18, 2014 02:18 |
|
Sex Bumbo posted:Can you elaborate on why it's faster to use 16 byte aligned elements? I tested this out with a simple read-modify-write shader with linear memory access. It's about 10% slower (old GTS 250, gonna try some others too) to do unaligned which I wouldn't qualify as "slowwwwww" but it's still significant. It depends on the hardware. If the hardware only has vector units, then any unaligned values have to be read/masked/swizzled to get them in the right place. More hardware is scalar now, but it's still easiest to think of GPUs as consuming aligned vectors. OpenCL compiles down to CUDA on nv hardware. I'm not sure what they do with OpenGL compute.
|
# ? Jun 20, 2014 07:48 |
|
Has anyone ever toyed around with Ramamoorthi's article An Efficient Representation for Irradiance Environment Maps ? I have a radiance probe I want to convert to irradiance with the spherical harmonics technique; I've started writing a Python program to implement the technique in the article but I'm pretty sure I'm doing it the wrong thing. If anybody had advice or even an example of implementation I'd be forever grateful!
|
# ? Jul 9, 2014 16:54 |
I'm trying to logic my way through a shader or two. Summarily, I want object A to be visible only if seen through object B. Imagine object A is X-Ray glass and object B is a skeleton which I've rigged to match the interior of a character (a la that scene from the original Total Recall). I had some luck using a depth buffer trick, but that made the materials visible when behind _any_ material. Everything else should still render as normal behind the glass.
|
|
# ? Jul 19, 2014 19:04 |
|
Are you using the stencil buffer? That's exactly what it's for.
|
# ? Jul 19, 2014 19:56 |
haveblue posted:Are you using the stencil buffer? That's exactly what it's for. Ah, I'm sorry. I should have said I was using Unity Free, so I can't use the stencil buffer.
|
|
# ? Jul 19, 2014 21:30 |
|
Jo posted:I'm trying to logic my way through a shader or two. Summarily, I want object A to be visible only if seen through object B. Imagine object A is X-Ray glass and object B is a skeleton which I've rigged to match the interior of a character (a la that scene from the original Total Recall). I had some luck using a depth buffer trick, but that made the materials visible when behind _any_ material. Everything else should still render as normal behind the glass. Your depth trick probably was to not have the skeleton test depth; an alternative would be to not have the material write depth, however this would mean everything is visible through the material (as opposed to the skeleton visible through everything). You could always fake the stencil buffer by either rendering to texture in a separate pass or by using multiple render targets, drawing the skeleton using a special color. Then you can bind this texture as a shader input when drawing the wall and use it just like you'd do the stencil buffer, discarding pixels using the hlsl clip or discard function when you detect your special marker value. An advantage of rendering to a texture is that instead of the hard stencil pass/fail you could use the information to adjust the wall's alpha. High Protein fucked around with this message at 22:20 on Jul 19, 2014 |
# ? Jul 19, 2014 22:17 |
|
Colonel J posted:Has anyone ever toyed around with Ramamoorthi's article An Efficient Representation for Irradiance Environment Maps ? I have a radiance probe I want to convert to irradiance with the spherical harmonics technique; I've started writing a Python program to implement the technique in the article but I'm pretty sure I'm doing it the wrong thing. If anybody had advice or even an example of implementation I'd be forever grateful! A word of caution though, the "1% error" thing is a deceptive. It's very low-error when you're integrating a shitload of mostly-similar sources from a large number of sources, like an omnidirectional environment probe or GI, because the error gets flattened out by the monotony of the environment itself. It it NOT low-error when dealing with sources that come from a narrow angle range, or in the worst case, a single direction. It can suffer from ring artifacts in those cases due to the intensity not being slightly positive at what should be the horizon, then going negative, and then going slightly positive again on the opposite side of the object. In other words, it's good for an indirect lighting probe, it's not very well-suited to integrating primary light sources.
|
# ? Jul 21, 2014 03:45 |
|
I am having a lot of trouble trying to wrap my head around modern OpenGL. I am trying to create a simple sprite sheet class, but I had a hard time trying to find answers to some structure related questions with Google. Maybe someone could help? If I have understood correctly a single sprite sheet would need its own VAO, VBO(EBO could be reused?), a shader program and an array of vertex data(just coordinates?). Before drawing you would determine the clips to match the texture coordinate range(0.0f - 1.0f). Should the coordinates of the rectangle be the same as the largest clip for a sprite, or should the size of the rectangle change? After that you would control the location/scale/rotation and the texture coordinates of the sprite with uniforms. When drawing things you would first bind the sprite's VAO and then go through all instances of that sprite like this: glBindVAO -> glUniform(set the location and frame of sprite) -> glDraw -> glUniform(new location and frame) -> glDraw... and so on until everything that uses that sprite has been drawn. Any advice on where to look for OpenGL tutorials/articles/books/sources that tell you good practices for planning the rendering structure? Most Modern OpenGL tutorials on the internet don't really help much with this.
|
# ? Aug 5, 2014 16:58 |
High Protein posted:Your depth trick probably was to not have the skeleton test depth; an alternative would be to not have the material write depth, however this would mean everything is visible through the material (as opposed to the skeleton visible through everything). Sorry for the delayed response. Other projects piled up. Unity Free doesn't support render-to-texture, either. I guess I've gotta stick with the depth hack. Thanks for the advice, though. I'll try and make it happen if I switch to Unreal.
|
|
# ? Aug 7, 2014 02:28 |
|
I have a list of points in 3D forming a convex hull. How do I generate a triangle mesh from this fastest/easiest?
|
# ? Aug 19, 2014 11:08 |
|
Boz0r posted:I have a list of points in 3D forming a convex hull. How do I generate a triangle mesh from this fastest/easiest? Does this help? Its a list of techniques for isosurface extraction, taking a bunch of points in space and forming a mesh around it.
|
# ? Aug 19, 2014 13:27 |
|
Boz0r posted:I have a list of points in 3D forming a convex hull. How do I generate a triangle mesh from this fastest/easiest? Look at QHull (a quickhull implementation). Once you have each face, as each face is well formed, you can use delauny triangulation to produce your triangles. Hey presto a mesh made of triangles.
|
# ? Aug 19, 2014 22:20 |
|
Cross posted this to the iOS thread since it's only occurring on those devices, but it's OpenGL related so it definitely belongs here too. Firstly, I have some skeletal vertex shader code that is flickering the triangles whenever something moves. It looks as if the culling is suddenly backwards, but I have culling turned off and it's still occurring. The vertex data is contained in static VBOs that are specified once and never change, so I really doubt it's a matter of the vertex data itself getting overwritten, either. The triangles attached to bones that don't move in a given frame don't have this problem, and I'm reasonably sure the only thing changing frame to frame is the bone matrices and the overall clip matrix (and sometimes not even the latter, if I just leave a given object sitting in one spot, unmoving but animating). I doubt it's the shader, either, because I got identical (so far as I could tell, anyway) results when I switched the shader off and used CPU-side bone calculations instead. The part that has me most confused is that this is ONLY occurring so far on iOS 7.x devices (tested an iPad 2, iPad 3, and iPod Touch 5). The same code displays correctly on Android, OSX, Windows, iOS 5.x (I don't have any iOS 6 or 8 devices to test so I don't know on that either way), and the iOS simulator. Secondly, the same overall code base, on iOS 5, is stalling out pretty badly. According to the Instruments tool it's losing a lot of time in glInvalidateFramebuffer (about 6ms per frame), which is a function I'm not even calling directly, so I figure it must be part of the frame blit, and maybe it just thinks the function is stalling because that's where it's actually doing all of the command flushes. No other per-frame operations are taking any significant length of time, everything else in the top 10 in terms of "average time consumed" is one-off things like texture loads. This is happening on an iPad 3 with 5.5.1, and NOT an iPad 3 with 7.x, so I almost want to think it's an OS/Driver problem. But that doesn't really help me fix the issue since I don't really want to require people to update past 5 just to fix a weird performance problem that shouldn't even be occurring. Anybody have any ideas on how to solve/investigate either one of these problems?
|
# ? Aug 20, 2014 06:21 |
|
Thanks for the links. Now I'm looking for a simple OpenGL framework to visualise the hulls in 2D and 3D.
|
# ? Aug 25, 2014 10:57 |
|
Is it possible for the clip region to become misaligned when using glm::ortho? I used ortho to place origin in the upper left corner like so: glm::ortho(0, 1920, 1080, 0). And this matrix is multiplied with the vertex position. The problem is that things that are near the left and top border do not get drawn when a single pixel of that image gets outside the screen. Bottom and right work as intended. I also tried it with glm::ortho(1000. 2920. 2080, 1000), to move the pictures near the centre, but even then the images in the top and left would get cut off even if they weren't near the edge of the screen. I am using SDL2 for window/context, if that matters.
|
# ? Sep 17, 2014 21:39 |
|
I have no idea if this relates to your problem but try using all floats in glm::ortho's constructor. I had a huge bug one time and THAT was the cause.
|
# ? Sep 18, 2014 02:41 |
|
Sorry. They are floats in the actual code. I just didn't remember that when I posted.
|
# ? Sep 18, 2014 08:56 |
I'm taking an introductory graphics course this semester that uses OpenGL (it uses the Angel book, which I assume some people here are familiar with.) One assignment this week is to reproduce the effects of applying a translation matrix and a rotation around the Y-axis by -120 degrees (e.g. view = RotateY(-120)*Translate(-4,-1,-1)) using a LookAt() function that takes an eye position, a point to look at and a normalized up-vector. Eye position and up-vector were both intuitive to be (4,1,1) and (0,1,0) respectively. However, with the look at point we assumed we had to rotate the it 120 degrees in the X-Z plane, so we had (4 + cos((2*M_PI)/3)), 1 , 1 + sin((2*M_PI)/3). When this wasn't right we made an implementation where we could gradually increment the angle by M_PI/6 (or 30 degrees.) As it turns out we had to rotate the camera by (5*M_PI)/6 or 150 degrees and not 120. Could anyone shed some light on why this is the case? I can't really make it make any sense. Joda fucked around with this message at 15:41 on Sep 18, 2014 |
|
# ? Sep 18, 2014 15:37 |
|
Any OpenGL people know what could be causing a thing like this? Close up, the seam between 2 pieces of geometry looks O.K., far away it looks like poo poo and the geometry seems to be clipping through eachother. Edit: fixed it with quote:12.040 Depth buffering seems to work, but polygons seem to bleed through polygons that are in front of them. What's going on? DoubleEdit: vvvvv Ahhhh if I knew it'd be that easy when I asked I would've asked days ago vvvvvv Tres Burritos fucked around with this message at 16:34 on Sep 18, 2014 |
# ? Sep 18, 2014 15:54 |
|
It's caused by a lack of precision in the depth buffer. Most of its precision is distributed very close to the near clip plane, so a good start is to move that forward as far as you can allow. There's been quite a lot written about this subject, search around a bit and you'll find some good ideas. Spatial fucked around with this message at 16:35 on Sep 18, 2014 |
# ? Sep 18, 2014 16:33 |
|
|
# ? May 31, 2024 04:12 |
|
Spatial posted:It's caused by a lack of precision in the depth buffer. Most of its precision is distributed very close to the near clip plane, so a good start is to move that forward as far as you can allow. Thanks man, got it just before your response. Anyways, I somehow got the opportunity to work on 3D stuff at work. I'm using WebGL and I've got (mostly)no problem making geometry, moving the camera around and other basic stuff. I'm even using vertex shaders and VBOs to distort geometry / vertices. The huge gap in my knowledge is really the fragment shader / actually applying lighting and whatnot to geometry. I've copied and pasted a couple of phong shaders into my code and got them to sort of work (not really), but I don't feel like I'm grasping the main concepts. Can anyone recommend any resources? Books, articles, whatever?
|
# ? Sep 19, 2014 02:25 |