|
passionate dongs posted:edit: Is there somewhere to look for OpenGL best practices? Is this something the red book would have? There's really only the most basic of "best practices" out there. You can look at: http://developer.apple.com/library/mac/#documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide and the more recent WWDC talks on it. Also keep in mind that it varies (sometimes quite heavily) by GPU vendor and platform.
|
# ¿ Sep 16, 2011 01:28 |
|
|
# ¿ May 15, 2024 00:12 |
|
Paniolo posted:Why are you using separate OpenGL contexts for everything? If you are using multiple contexts you MUST use a separate context per thread. To answer the main question, yes, multiple contexts can share resources. Note that not everything is shared, like Fences (but Syncs are). Check the spec for specifics. If you are trying to do something like Context A, B, C are all shared Context B modifies a texture Context A and C get the change That should work. Create context A. Create contexts B and C as contexts shared with A. That should allow you to do this. Keep in mind you've now created a slew of race conditions, so watch your locks! Also, you have to make sure commands have been flushed to the GPU on the current context before expecting the results to show up elsewhere. For example: Context A calls glTexSubImage2D on Texture 1 Context B calls glBindTexture(1), Draw This will have undefined results. You must do the following: Context A calls glTexSubImage2D on Texture 1 Context A calls glFlush Context B calls glBindTexture(1), Draw On OSX, you must also call glBindTexture before changes are picked up, not sure about windows. It's probably vendor-specific. And again, you need a context per thread. Do not try to access the same context from separate threads; only pain and suffering awaits. You probably want to re-architect your design, as it doesn't sound very good to me. Also, can't Qt pass you an opaque pointer? You can't hold a single context there and lock around its use? Or have a single background thread doing the rendering? Spite fucked around with this message at 03:24 on Nov 3, 2011 |
# ¿ Nov 3, 2011 03:22 |
|
shodanjr_gr posted:That's what I ended up doing. I create context on application launch and then any other contexts that are initialized share resources with that one context. This is way complicated. Have you tried a single OpenGL context that does all your GL rendering and passes the result back to your widgets? The widgets can update themselves as the rendering comes back. Remember: there's only one GPU so multiple threads will not help with the actual rendering. Also GPU and CPU resources are separate from each other unless you are using CLIENT_STORAGE or just mapping the buffers and using that CPU side. You can track what needs to be uploaded to the GPU by just making dirty bits and setting them. Multiple threads should not be trying to update the same GPU object at once in general - that gets nasty very fast.
|
# ¿ Nov 6, 2011 00:43 |
|
shodanjr_gr posted:You mean as in having a single class that manages the context and rendering requests get posted to that class which does Render-To-Texture for each request then returns the texture to a basic widget for display? That's how I'd approach it myself. Just post requests to this thread/object and have it pass back whatever is requested or necessary for the UI to draw. I'm making quite a few assumptions because I don't know exactly what your architecture needs to be, but it will vastly simplify your locking and GPU handling to do it that way. As for the second part, do you think multiple threads will be accessing the same CPU object simultaneously often? You can always make your CPU stuff more volatile - if you don't mind taking copies of stuff and passing that around/modifying the copy instead of locking. Your program's needs will better determine which will be more efficient.
|
# ¿ Nov 7, 2011 08:34 |
|
Yikes, that's hard. It could be one of approximately a million things. I'm not familiar with slimdx but it's quite possible it's something with that? Do other 3d apps and games run fine? Have you tried writing a simple c d3d or opengl app that does the same thing?
|
# ¿ Nov 9, 2011 04:43 |
|
OneEightHundred posted:Yes, you could do that. You'd still want to use GetProcAddress for anything that you weren't sure would be available (i.e. calls exposed by extensions). Not really. What's happening is that wglGetProcAddress asks the driver for its implementations of the functions. So you'd need a .lib for each driver and probably each build of each driver. Then you'd need a separate exe for each and it would be nasty. This is also why you have to create a dummy context, get the new context creation function and create a new context if you want to use GL3+. It sucks.
|
# ¿ Nov 15, 2011 01:16 |
|
SAHChandler posted:I had a feeling this is what was being done. (especially since I did a dumpbin to a .def file of the ATI drivers, and only about 24 functions are actually present within the opengl driver dll. some of which are for egl ) There's an extension for ES2 compatibility, which is probably why they are there. As for OpenCL: also keep in mind that OpenCL is driven by Apple, which controls its driver stack and can release headers that work with all its parts.
|
# ¿ Nov 16, 2011 02:47 |
|
Sampler2D is essentially a number that corresponds to the texture unit. Most GPUs have 16 these days. ie: glActiveTexture(GL_TEXTURE0) glBindTexture(GL_TEXTURE_2D, tex) Will bind the texture 'tex' to unit 0. Then you set the sampler to 0 and it will sample from that texture. As for defaults, it will depend on the implementation. It should default to 0, but I'd check the spec to be sure.
|
# ¿ Nov 22, 2011 00:56 |
|
If you want to do some cross referencing between GL and DX, try: http://developer.apple.com/graphicsimaging/opengl/capabilities/ I'd hope everything relatively recent/worth developing for supports depth read
|
# ¿ Nov 22, 2011 22:36 |
|
OneEightHundred posted:Yeah looks like I was confusing the X1000-series' lack of PCF with what looks like pre-PS2.0 inability to sample depth textures as floats. The ATI X1xxxx series doesn't let you filter float textures in hardware, from what I recall. There's always room for a dirty shader hack though!
|
# ¿ Nov 23, 2011 01:09 |
|
zzz posted:I haven't touched GPU stuff in a while, but I was under the impression that static branches based on global uniform variables will be optimized away by all modern compilers/drivers and never get executed on the GPU, so it wouldn't make a significant difference either way...? You'd hope so, but I wouldn't assume that! The vendors do perform various crazy optimizations based on the data. I've seen a certain vendor attempt to optimize 0.0 passed in as a uniform by recompiling the shader and making that a constant. Doesn't always work so well when those values are part of an array of bone transforms, heh. Basically, you don't want to branch if you can avoid it. Fragments are executed in groups, so if you have good locality for your branches (ie, all the fragments in a block take the same branch) you won't hit the nasty miss case.
|
# ¿ Nov 24, 2011 08:18 |
|
nolen posted:Cross-posting this from the Mac OS/iOS Apps thread. Split your quad into 2 quads. Then look into an affine transform. Also cocos2D is one of the worst pieces of software known to man (not helpful, I know...)
|
# ¿ Dec 1, 2011 21:14 |
|
The default OpenGL implementation in Win32 doesn't have all the entry points. All the modern stuff is requested from the driver via glwGetProcAddress. So you can break on that and see what it returns. Or you can use one of the various tracing interposers to get a call trace and see what it's doing. When I've done similar stuff to what you are describing I've taken a CRC of the texture when it's passed in to glTexture2D and recorded the id that's bound to that unit. Then you can store that away and do whenever you want when it's bound again.
|
# ¿ Dec 28, 2011 07:46 |
|
OneEightHundred posted:I believe VBOs are mandatory for all geometry in the forward-compatibility contexts. D3D switched to buffer-only ages ago, at least circa D3D8. Yes, use VBO for sure. VAR are probably super stale code in most drivers these days. Note: calling glBufferSubData with NULL doesn't discard the buffer. You should use glBufferData with NULL to get that orphaning behavior. glBufferSubData is specified to not modify the buffer, only the data. You can also use the unsynchronized parameters to get nonblocking behavior.
|
# ¿ Jan 11, 2012 21:17 |
|
Bisse posted:So the OpenGL driver may not implement the only currently allowed way to render? It's a bit more complicated and nasty than that. In Windows, the OS only implements OpenGL 1.1 or something like that. So that's all you are guaranteed to have. Everything else has to go through wglGetProcAddress, which queries the driver and returns a function pointer to the function. To get a modern (3+) OpenGL context, you have to call wglCreateContextAttribs. Which is not part of the old OpenGL, so you have to actually create an old context, call wglGetProcAddress to get the new creation function and _then_ create your real context. It's a disaster. And glBegin/End should absolutely be banned. One of OGL's biggest issues is that it has a billion ways to do things, but only one of them is fast, and the others don't tell you they are slow. As was said earlier, a more friendly API could be built on top of OGL to do similar stuff. Begin/End and fixed function are really out of date ways of thinking about modern GPUs and graphics - it may be user-friendly but it has nothing to do with how the hardware works or how you should organize your rendering. A good low-level graphics API should only have fast paths. The ARB is attempting to remove the slow crap. Unfortunately they'll never succeed because there are too many apps and people that are using the old stuff and not adopting the new stuff.
|
# ¿ Jan 20, 2012 10:49 |
|
PalmTreeFun posted:This is a pretty basic math theory question, but I'm taking a graphics class right now and I need a little help. Long story short, my teacher isn't the best at speaking English, much less at explaining things, and I had to go read the book just to figure out what convolution and reconstruction was. I have that figured out, but what I can't wrap my head around is resampling. You know, scaling images/resampling audio. I understand what it's supposed to do, but I don't quite get the math. How's your math? Convolution has different meanings depending on which domain you are in. The easiest way to thing about it as the overlap between two functions. Or you can think about it as multiplying every data point in function A by every datapoint in function B and adding together. Of course, this really isn't feasible in real time so you use a small 'kernel' as the second function. Take for example a gaussian blur. You'll typically see something like this: 0.006 0.061 .242 .383 .242. 061 .006 That's the kernel, that's function B. Your set is 0, 1, 4, 5, 3, 5, 7 Let's say I want to find the blurred value of the middle element, which is 5 0.006+1*0.0061+4*0.242+5*0.383+3*0.242+5*0.061+7*0.006 You repeat that for each value in your set to get the convolved set, ie f(x-3)*0.006+f(x-2)*0.061+f(x-1)*0.242 0.383*f(x)+0.242*f(x+1)+f(x+2)*0.061+f(x+3)*0.006 That's for a 1D convolution, It can be extended to anything. If you are curious about the math, you should probably take a class on signal processing as it gets quite complex. Or am I explaining the wrong thing?
|
# ¿ Feb 9, 2012 23:30 |
|
PalmTreeFun posted:I think so. I understand the part you explained already, but basically what I want to know is how scaling/reconstructing a sound/image works. Like, you convert a set of discrete data to a continuous function somehow, and you can use a kernel (thanks for explaining what that was, I didn't know that that and the "filter" were the same thing, this teacher really sucks at explaining things) to extrapolate new, "in-between" data. It's kind of odd they'd have you do this without giving you the background theory around filtering and time domain vs frequency domain. Forgive me if I'm going a bit overboard with the explanation. The Fourier transform will convert a function in the time domain into its equivalent into the frequency domain. The easiest way to visualize this is to think about a sine wave. Its Fourier transform is simply two peaks, one positive and one negative. They represent the frequency of the wave. Now, every function (at least the ones you'll be interested in) has a transform that converts to this domain. Why is this interesting? Because you can multiply two functions together in the frequency domain to apply a filter. For example, a box function can be a lowpass filter. However, this is a problem since you need all of both functions in order to do the transform. So we would like to apply them in the time domain as they are fed in to us. A multiplication in the frequency domain corresponds to convolution in the time domain. So we can take the filter we are interested in and convert them to the time domain via the inverse fourier transform. Then we can do the operation with the kernel we get in the time domain. The box/triangle filters are in the frequency domain. If you want to apply them to the dataset you need to apply the inverse Fourier transform and convolve that with the set. The box filter's inverse fourier transform is the sinc function, so we can make a kernel out of sinx/x and use that if we wanted to. Gaussian is a special case since the fourier transform of a Gaussian is also a Gaussian. The splines are slightly different because they reconstruct points on a path. You can think of something like Catmull-rom, etc as simply a function F(t) that happens to pass through the control points.
|
# ¿ Feb 13, 2012 21:58 |
|
Contero posted:In gl 3.1+, do vertex array objects give a performance benefit over manually binding VBOs? What is the advantage of using them? You have to use VAO in GL 3+. In legacy contexts they should ideally provide a performance enhancement. However, most drivers don't implement them so that there's a noticable difference.
|
# ¿ Feb 26, 2012 06:51 |
|
Firstly, what has changed since it used to work? What OS, HW, driver, etc? Do other gl calls work? Can you clear the screen to a color? Can you draw a simple quad? Do other glGet* calls work? Can you call glGetString(GL_RENDERER) or GL_VERSION or GL_EXTENSIONS? Also it occurs to me that you may not be creating the context correctly in the dynamically linked case. How is the context created? Are you using glXGetProcAddress, etc? Does that return valid function pointers?
|
# ¿ Feb 28, 2012 07:19 |
|
Sounds like you don't have a context. I'm not sure why that would occur and I don't have much experience with xwin. Try calling glXGetProcAddress and seeing if it returns valid pointers. Where does the code create your context and how does it do so? Look for glCreateContext or glXCreateContext.
|
# ¿ Mar 1, 2012 07:46 |
|
That makes sense, the Mesa lib has its own GL implementation. Fun stuff.
|
# ¿ Mar 2, 2012 01:51 |
|
Also are you sure the Quadro actually implements the Tesselation Shaders in hardware? Sometimes they don't have full implementations. Or they fall back to software for stupid reasons.
|
# ¿ Mar 21, 2012 01:59 |
|
baka kaba posted:I have a couple of newbie questions, if anyone can help. So, this is old but I can perhaps shed some light on it. Firstly, calling ActiveTexture with anything more than GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS-1 should be an error. It is a bug in the implementation if you can set, say, texture 8. You are probably just reseting the same unit over and over (which would be one you specified with the last call to glActiveTexture that succeeded). You're misunderstanding what it means by "texture unit." On a basic level, texturing involves some memory on the GPU to hold the actual texture data in some format (which is almost always something other than linear the normal array of pixels you're used to thinking of). There's usual a texture cache, because texture accesses tend to be pretty coherent spatially (if you are sampling a pixel, it's very likely you'll be looking at ones immediately around it soon). There is also usually specific hardware to take your texture coordinates, use them to lookup into whatever format the data is actually in, fetch it and do whatever sampling, interpolation, etc, you've asked for. If you set up filtering, this will actually result in multiple lookups and some interpolation to return you an actual color. This piece of hardware is what you usually refer to as a texture unit - if you could read memory directly you could write code to do it all by hand. But it's always doing the same thing and is faster to have hardware do it. So your hardware has 8 of these units, which means you can access 8 different textures in a shader, one per unit. They aren't specific caches or pieces of memory, they are more like lookup and translation units. As for keeping them bound you are correct that it will be faster, but not for that reason. Calling glActiveTexture and glBindTexture does not actually move data around, it changes GL state. And it actually changes 2 levels of state because OpenGL is a terribly designed API. Think of OpenGL has a giant state machine. When you create a context, you are making a big struct with all the OpenGL state that is mandated by the OpenGL spec. You can't access this struct directly, you have to use the GL API calls. Originally, OpenGL only allowed one texture, so it only had the one call to change the current texture state, which is glBindTexture. You give it the type and the texture 'name' (that numerical id you got from glGenTextures) and that's that. Eventually they added the support for multiple textures, but instead of letting you specify which unit to modify directly, they added the glActiveTexture call, which tells GL which texture unit to modify in subsequent calls. Think of the code looking like this: code:
Instead of doing: set texture A draw mesh A set texture B draw mesh B set texture A draw mesh C set texture B draw mesh D try: set texture A draw mesh A draw mesh C set texture B draw mesh B draw mesh D
|
# ¿ Dec 15, 2013 10:57 |
|
BattleMaster posted:Cool thanks, it works great. It's funny, a few days ago I was looking at the different options for face culling and I was wondering "why would anyone want to use anything other than back?" I guess I should assume that if something exists in the spec it exists for a reason. Don't assume this The spec is a giant political document as much as a functional one. BattleMaster posted:Earlier I was also puzzling over why the UniformMatrix family of functions had an argument to transpose your matrix. And last night I was having trouble getting an orthogonal projection matrix to work. It turns out that I was formatting my matrix as row major when OpenGL wants them as column major. So I set transpose to true when I fed it my matrix and it worked perfectly. I guess they put that there for people/libraries that prefer to work with it in row-major. Honestly it's a wonder that my rotation and perspective matrices worked when I was giving it the transpose by accident. Direct3D also takes row-major matrixes. That argument was mainly a bone to throw to people porting stuff to GL from D3D. Transgaming, etc. I think it's a bad feature since I feel the GL should not do conversions itself and this requires the CPU to do the transpose. BattleMaster posted:Edit: Kind of a silly question but would it be problematic to feed my shader several transform matrices (for instance, a transpose, rotation, and scale matrices) and have the GPU multiply them together rather than doing it on the CPU and feeding only one to it? I have a feeling that the GPU is faster at multiplying floating-point matrices than the CPU is but I'm not sure how expensive it is to feed uniforms to a shader program. (Though I guess doing it once on the CPU instead of having the GPU do it 3 times for every triangle might be better.) Generally, hoist everything you can. So if you can hoist something off the GPU (or push it up the pipeline), it's worthwhile. You'd have one matrix multiply instead of # of vertexes multiplies.
|
# ¿ Jan 3, 2014 22:33 |
|
Malcolm XML posted:Yeah though I think the memory space uniforms inhabit is really fast and is cached as well This greatly depends on the hardware. For example, most of the modern cards do not actually have uniform registers anymore and treat everything as a constant buffer (it's just that what were previously 'uniforms' are now part of a magic global constant buffer). Constant buffer reads are cached similar to textures. A fun bit of trivia was that the early NV Tesla cards didn't cache constant buffer reads so they were slower than uniforms. OneEightHundred: That was more true for AMD's Northern Islands. Southern Islands and on are (more) scalar and I think the scheduling happens on the compute units themselves. I haven't read their docs in a while though so I don't remember all the details. EDIT: I may be misremembering NI, which I think is purely vector all the time. Is that what you meant? In terms of GPGPU they'd have to vectorize everything. Spite fucked around with this message at 06:58 on Jan 8, 2014 |
# ¿ Jan 8, 2014 06:52 |
|
Pixel buffers aren't really analogous to texture buffer objects. PBO is more for asynchronous upload/download. Why are you using them and not just straight textures? EDIT: To be more clear: PBO is just used for transferring data, not storing it. Basically, how they work is you bind the buffer, and the glTexImage call reads from that buffer instead of whatever pointer you pass. In this case, the argument that was a pointer becomes an offset. Spite fucked around with this message at 05:57 on Jan 10, 2014 |
# ¿ Jan 10, 2014 05:50 |
|
slovach posted:Am I missing something here or what with initializing GL? PIXELFORMATDESCRIPTOR is a windows specific struct, not a general OpenGL struct. You fill that out yourself and use your Windows Device Context to get a Pixel format. (Via ChoosePixelFormat). Then with the Pixel Format it returns, you call SetPixelFormat, then wglCreateContext. code:
Because Microsoft didn't want to implement OpenGL3, and OpenGL is a messy API in general, you have to create an OpenGL context using the windows APIs. Then you can request the function pointers for the newer OpenGL context creation methods. So you'll need to call wglGetProcAddress And find the address of wglCreateContextAttribsARB. Then you'll end up with two contexts, so you'll have to destroy the temporary one. Or use one of the utilities to do it for you.
|
# ¿ Feb 15, 2014 09:50 |
|
Vectors should be 16byte aligned. You could make also make a giant array of floats and then create vectors out of them in your shader, but I wouldn't recommend it.
|
# ¿ Jun 4, 2014 01:51 |
|
|
# ¿ May 15, 2024 00:12 |
|
Sex Bumbo posted:Can you elaborate on why it's faster to use 16 byte aligned elements? I tested this out with a simple read-modify-write shader with linear memory access. It's about 10% slower (old GTS 250, gonna try some others too) to do unaligned which I wouldn't qualify as "slowwwwww" but it's still significant. It depends on the hardware. If the hardware only has vector units, then any unaligned values have to be read/masked/swizzled to get them in the right place. More hardware is scalar now, but it's still easiest to think of GPUs as consuming aligned vectors. OpenCL compiles down to CUDA on nv hardware. I'm not sure what they do with OpenGL compute.
|
# ¿ Jun 20, 2014 07:48 |