|
The problem is that there's a metric assload of code out there that still uses Begin/End and/or display lists. And there's still a bunch of people that have to maintain that code. Unfortunately most games use D3D these days, so there isn't as much pressure for good OGL tutorials. The amount of infighting in the ARB doesn't help the state of the API either. I wish there were a few good really impressive OGL3.2+ games out there, but there aren't. ES is the best bet since it still has the majority, unless they screw up that spec.
|
# ¿ Jul 17, 2010 00:09 |
|
|
# ¿ May 14, 2024 04:02 |
|
OneEightHundred posted:(i.e. why is it still not possible to save a compiled shader to disk and reload it, even if it's vendor-specific bytecode?) That's really, really hard and annoying to do. The bytecode would have to be so generic as to essentially be the same as a shader source file or arb program string. If they made vendor-specific bytecodes it would have to be totally hardware agnostic (you don't want to have to ship different bytecodes for r5xx, r6xx, r7xx, r8xx if you are ATI) and also you'd have to support it forever in the future. It's not going to happen unless OpenGL defines a format - but that's kind of pointless. Storing bytecode doesn't really help you that much - the GL stack and the Driver stack still have to compile it into machine code and do their optimizations. You don't gain much time from hoisting out the parsing and if you are compiling shaders constantly during rendering you are doing something very wrong. The only reason to do this I see is obfuscation, but that doesn't gain you much since the people you are trying to hide the shaders from can just read the spec and reverse engineer it anyway.
|
# ¿ Jul 18, 2010 11:38 |
|
OneEightHundred posted:That isn't what I'm asking for. I'm saying that I should be able to have the driver give me the vendor-specific and probably hardware-specific code compiled from the GLSL so I only have to compile it once and then save it to disk so I never have to compile it again. No vendor is going to do that though - that's too much information for them to be comfortable letting the user keep around. And then you'd have to ship compiled shaders for every card model - r5xx, r6xx, etc, etc. It's not really worth it. And the agnostic bytecode D3D ships is still compiled by each driver - every vendor still has to convert it into machine code and run optimizations, or maybe even replace chunks altogether. It may look very different from the bytecode after all this. It only saves the parsing step, which is really not saving much time at all.
|
# ¿ Jul 19, 2010 20:45 |
|
OneEightHundred posted:Oh bullshit, CTM basically already works off of that exact concept. ATI even has tools that let you look at the compiled output of GLSL shaders. Right, but it's not _really_ what's being run on the card. That's just a representation of it. You can't just compile once, because it doesn't work that way. Every driver on every card will recompile shaders based on state to bake stuff in and do other optimizations. You'd need a huge number of different binaries to cover all possibilities. Turned on sRGB? Your fragment shader will be recompiled. Using texture borders? Recompile again, etc. It would be nice to have a specified GLSL parser that doesn't suck, definitely. But there's a whole lot more work that goes into compilation than just that. Hoisting the initial parsing step is really saving just a small part of the overall time and work for "compilation."
|
# ¿ Jul 20, 2010 00:44 |
|
Luminous posted:Do you really not understand what OneEightHundred is saying? Or are you just trolling him? You should have just stopped at your very first sentence. I'm coming from the perspective of someone writing the driver, not someone using the API, so I may be making my point poorly. I'm simply saying that bytecode and cached (shipping on disk) versions aren't all that simple and I don't feel they really save you anything. There's a lot more that happens at the driver and hardware level during shader compilation and linking and it's not widely understood. To really get at the heart of the issue, you'd have to save out a multitude of variants of each shader and you'd have to make it future-compatible. I'm not sure it's worth it, and it's a royal pain in the rear end for the driver developer, that's all.
|
# ¿ Jul 20, 2010 21:33 |
|
roomforthetuna posted:At the heart of this cross-purposes discussion is the fact that that's not what cached means. OneEightHundred is proposing that the card allow one to save the specific compiled shader for the current situation, to avoid unnecessary time spent recompiling. Like, the first time you load a game/area, it compiles the shaders and saves them out so that every other time you load that game/area it won't need to compile them again. Most drivers do this already behind your back, though. And saving to disk will almost certainly be saving an intermediate format that would have to be retranslated/recompiled once they are actually used anyway. What I mean to say is what the app developer thinks of as "one specific situation" the driver thinks of as "potentially 2 dozen+ possibilities" and that's hard to reconcile easily. And yes, every driver for every card currently shipping will recompile the active shader based on state. Think about a simple texture sample. What if you change the wrap mode and you don't have support for it in hardware? Your shader is already uploaded and compiled, but the driver has to fixup the shader to get the correct behavior - which causes a recompile. For example, nvidia does not support seamless cube maps in hardware (the gt100 series does, anything less does not), so if you turn them on and your shader samples a cube map, they modify the shader to do the fixup and blending. Or they will borderize each face when you draw and blend in the shader. There are dozens and dozens of situations where a driver will take your current shader (that it has told you is already compiled) and recompile it to add/remove stuff based on state. Since the driver doesn't know what's coming it's hard to cache; it would either have to guess at a common set of state or provide all possibilities in its binary blobs. Neither are particularly appealing, and as I said before, it doesn't really gain you a whole lot unless you can avoid all the recompiles and links completely.
|
# ¿ Jul 21, 2010 02:08 |
|
OneEightHundred posted:Thank God. Add the DX11 functionality as well and it'll FINALLY be at feature parity with D3D, and maybe we'll see some development with it again. Yeah, and my previous posts were mostly an annoyance at having to implement it I still don't think it will save nearly as much time as people hope, but it's good to have the appearance of parity with DX11. I'm curious to see how they make WebGL secure. ARB_robustness is kind of a pain, and being able to give the GPU arbitrary shader code over the web means you can do nasty things to the user.
|
# ¿ Jul 26, 2010 23:51 |
|
OneEightHundred posted:I don't see why that's such a big deal. Basically all that's needed to secure OpenGL, as far as I can tell, is to bounds check operations that pull from VBOs/textures and not have batshit retarded shader compilers. Bounds checking should already be in since they're supposed to throw INVALID_VALUE on that case. The hard problem to solve is making sure a shader doesn't take so long as to hang the entire system. You could easily do some sort of denial-of-service type attack with a shader - there's only one GPU and if you can peg it, the whole system will screech to a halt. I'm not super familiar with Managed DirectX, but from what I recall it requires marshalling data across the c# runtime boundary. WebGL will absolutely have to do something similar since it's based on JavaScript bindings. The fun part with be rectifying the garbage-collected, typeless JS stuff with the OpenGL runtime underneath.
|
# ¿ Jul 28, 2010 08:58 |
|
Well, there really is no "modern" opengl since no apps have really be written that use GL4.0+ or even really OpenGL 3.0+. I haven't gone through the 5th edition, but I'd hope it's decent since it's supposed to be approved by the ARB.
|
# ¿ Aug 2, 2010 22:21 |
|
OneEightHundred posted:I'd think "modern" would be not depending on features that have been deprecated since 3.0 at least, even if you're targeting OpenGL 2 Yeah, most definitely. A part of me dies inside when I see a new piece of code that still uses immediate mode. Forcing everyone to use vertex array objects, vertex buffers and shaders is the best thing the ARB ever did.
|
# ¿ Aug 3, 2010 22:00 |
|
Yup, OneEightHundred is correct. Batch that poo poo. Also re: updating a VBO every frame. When you render in immediate mode you ARE updating geometry every frame. And you are doing it very slowly since you're specifying each vertex individually.
|
# ¿ Aug 4, 2010 19:19 |
|
OneEightHundred posted:Use one dynamic VBO for dynamic data (or two, one for vert data one for indexes). Use glBufferSubData to stream data into it, ideally. When there isn't enough room in the buffer left to add more data, or you hit something that forces a state change, issue a draw call to whatever's in the buffer and start over. You can alleviate the locking problem by double buffering your VBOs. Use one VBO for even frames and one for odd frames. That also means you don't have to call BufferData, which may cause a slight stall as the driver deallocs and reallocs the memory. Just make sure you overwrite everything you're drawing so you don't get bad data.
|
# ¿ Aug 5, 2010 21:50 |
|
OneEightHundred posted:The idea behind that behavior is that you don't need to double-buffer them, calling BufferData with a NULL pointer will give you the same memory region if it's not locked, and if it is locked, will give you a fresh one. But you still may cause an allocation in that case. And if the driver is using the memory to draw from you get better pipelining if you use two. Plus there are other considerations based on whether the driver can do out of line uploads, etc, which hits certain iPhone apps. If you can spare the memory, we always recommend using two. (Not trying to be contrarian, I swear!)
|
# ¿ Aug 6, 2010 00:15 |
|
Eponym posted:I'm a beginner, and have 2 questions... For drawing lines in screen space, the easiest way will be to use an orthographic projection. Check out glOrtho (http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml). You can generate the matrix yourself if you're using a version without the matrix stack. An orthographic project is a parallel projection - parallel lines remain parallel (as opposed to perspective projections, where parallel lines approach a vanishing point). This will let you do your direct mapping to pixels on the screen. glLineWidth works as you'd expect with antialiasing off. It's different with it on - check the spec for more details.
|
# ¿ Aug 16, 2010 09:46 |
|
Do you not have a DX11 card? The reference rasterizer will always work, but as you say, it will be horribly slow.
|
# ¿ Aug 17, 2010 18:33 |
|
Tw1tchy posted:That's the thing, I DO have a DX11 card, and all the other examples in the directx11 sdk work fine. That's why I'm having problems, I have absolutely no idea at all what's wrong. That cube app - what's the actual error returned by the Device creation function?
|
# ¿ Aug 18, 2010 05:03 |
|
Why not just use a rectangle texture? http://www.opengl.org/registry/specs/ARB/texture_rectangle.txt
|
# ¿ Aug 24, 2010 20:56 |
|
Eponym posted:I am trying to draw anti-aliased lines, and this is how to do it, or so I've read: What OS, what GPU? AA lines are an odd duck. Does it work if you draw to GL_BACK?
|
# ¿ Sep 1, 2010 19:54 |
|
Well, for one you aren't actually clearing the buffer. You have to call glClear(GL_COLOR_BUFFER_BIT) And I think there are a bunch of AA line bugs, I'd have to check. Turning off blending fixes it? Have you installed the GFX update?
|
# ¿ Sep 3, 2010 18:41 |
|
You need to unproject from 0 (near) and 1 (far) if you're going to go that way, from what I recall. I highly recommend doing the math yourself though. Think of it as the pick origin being at 0,0,0 and the pick point being on the near plane, then see what that ray intersects with. Since the screen itself covers the entire near plane you can convert from those coords to eye coords without too much trouble. http://www.opengl.org/resources/faq/technical/selection.htm DON'T USE GL_SELECT. Unique colors is ok, but it will force a readback of a buffer which probably isn't desirable.
|
# ¿ Sep 12, 2010 02:52 |
|
Keep in mind that branching, and especially stuff like discard/texkill tends to run a lot better if you've got a lot of localized pixels that will take the same branch. On projection and picking: I wouldn't recommend using Unproject. Think about it this way: Your screen maps directly to the near plane. Once you've transformed everything into eye space, you know that your eye is at 0,0,0 and you also know where your near plane is at. (you've specified your frustum's top,bottom, left and right so you know those values). This means there's a simple mapping from a spot on your screen to a point on your near plane. You can then cast a ray from the origin through that point on the near plane to generate your pick ray. You'll have to transform it out of eye space (or everything into eye space), but that's the best way to do picking.
|
# ¿ Sep 17, 2010 05:19 |
|
haveblue posted:OS X can do it because Apple put in a shim layer that uses LLVM to run shaders on the CPU in a pinch. I doubt the Intel embedded chips have true shader processors. This isn't precisely true. The Apple SWR can run shaders and the runtime will fall back to software depending on the input, but the x3100+ do have shader processors. It's not a shim layer - it's an entirely different renderer. You can actually force it on for everything via a defaults write, or by doing a safe boot. No GPU is interruptible in the classical sense, so you have no real defense against an infinite loop other than to kill the GPU. And since GPU recovery basically doesn't work anywhere, you are pretty much hosed. It's pretty bad across all the OSes. There's an extension called ARB_robustness that is supposed to lay out guidelines to make WebGL something other than an enormous security hole. Whether it works or is implemented correctly remains to be seen. http://www.opengl.org/registry/specs/ARB/robustness.txt The browsers that pass GLSL directly down are being very, very naughty and are asking for trouble.
|
# ¿ Dec 23, 2010 00:40 |
|
Sabacc posted:I'm trying to work on a port of an OpenGL 1.1 game to OpenGL ES 1.3. I came acros this bit of code and I can't for the life of me figure out how to replicate it in OpenGL ES: Two things: You should never use glPush/PopAttrib, you should track that yourself You should never call glGet* you should track that yourself (unless it's a fence or occlusion query, etc). Not even to check for errors (don't check for errors in production code). To actually answer you: GL_LIGHTING includes a ton of stuff, as mentioned. It also tracks EXT_provoking_vertex and ARB_color_buffer_float's clamped vertex color.
|
# ¿ Jan 13, 2011 21:47 |
|
Sabacc posted:I'm not trying to be rude or daft but I really don't know what you mean here. Sorry if I was a bit unclear or rude. You shouldn't use glPush/PopAttrib even in normal OpenGL because it does a lot of extra work that probably isn't necessary. It's easier to shadow all the state you've changed and reset it as necessary. Set your shadow to whatever the default is and as you change state, update it. So if you only change, say, the GL_COLOR_MATERIAL enable bit, you'd keep track of that and set/reset it as necessary.
|
# ¿ Jan 14, 2011 03:15 |
|
FBO 0 isn't actually an object - it's the system drawable. So attaching things to it may have...odd effects (though it should just throw an error). One quick note about CPU overhead: OpenGL is pretty bad about CPU overhead as well. Using 3.1+ will mitigate this, as they removed a bunch of junk. But anything earlier (ie, anything that still has fixed function, etc) will require validation of all that legacy state which sucks rear end.
|
# ¿ Feb 16, 2011 20:59 |
|
Unormal posted:Am I using anything fixed function here? I figured since I'm entirely shader driven I was bypassing the 'fixed' pipeline, though I don't really know how using the framebuffer EXT functions vs the builtin framebuffer functions for 3.x would effect things. I guess I figured driver writers would just implement the EXT functions as special cases of the more general 3.0 functionality, and EXT would actually be more portable, even though the opengl pages tell you to use the more updated built in functions if you can. Well, it doesn't matter if you're not using the fixed function pipeline. If it's there, it needs to be validated because the spec says so. Hooray. Of course, every modern implementation with have a bunch of dirty bits and not do validation if nothing's changed (ie, don't validate the really old image processing type state if it's not been changed). GL 3.1+ doesn't _have_ all that old crap, so it can be totally ignored. Blending will absolutely be faster if you use the fixed hardware. Programmable blending doesn't exist on the desktop yet - and using multiple render targets will suck for perf. For limits, etc, this isn't a bad reference: http://developer.apple.com/graphicsimaging/opengl/capabilities/
|
# ¿ Feb 17, 2011 02:09 |
|
OneEightHundred posted:Current best practice is VBOs, glDrawRangeElements for everything, and use SSE uncached writes to VBOs if you're mapping (glBufferSubData does uncached writes). As a quick note, this will depend on the driver and OS (BufferSubData, I mean). Remember to keep an eye on your alignments with SSE. But yeah, you totally want to use uncached writes, especially for a big data set. If you do use MapBuffer for your VBO, remember to use FlushMappedBufferRange/MapBufferRange. And move to generic vertex attributes instead of using builtins if you can.
|
# ¿ Mar 7, 2011 05:34 |
|
OneEightHundred posted:The point of command buffers is that they can be stuffed with commands from any thread. Display lists don't really work that way because they're filled based on API calls that operate on the state machine, which is bound to a single thread. I agree that they are pointless, and pretty flawed from a design point of view. You can't optimize anything if the display list still obeys the state machine. But GL calls do have significant CPU overhead, especially on OS X. There's a ton of validation, conversion and other stuff that needs to be done because it's demanded by the spec. D3D10 is way better than GL is, at this point.
|
# ¿ Mar 25, 2011 22:57 |
|
octoroon posted:Okay, so I have a few (hopefully not-too-dumb) questions about OpenGL and VBO usage. I'd say it's more a caching and locality issue than a pure VBO size issue. There are paging and VRAM constraints to think of, but each vertex is just a number in VRAM to the GPU. Sourcing that number efficiently can be affected by layout and cache, however. Consider that if your entire world is in one VBO pieces that are next to each other spatially may not be next to each other in memory. It's similar to why you'd store an image in Z-order or Hilbert-order instead of just straight linear. Plus, you do NOT want to be drawing things you can't see - so you'll either be sending multiple DrawRangeElements calls, or you can decompose your world into smaller chunks and bind and draw them individually. I'd prefer this myself, as you can then do frustum checks and just skip the draws if they aren't visible. And as a last caveat - how is your performance thus far? If it's not bad, you may not want to overcomplicate your problem just yet.
|
# ¿ Mar 28, 2011 01:13 |
|
Clearing is always a cost, though it's certainly faster now. Of course, you have to consider that your framebuffer is likely to be much larger as well. There are a ton of things you can do. You can try to make a "fastclear" path that only clears pixels that have been touched. You can do Hi-Z. Some combination of the two will allow you to categorize blocks of the zbuffer that need to be cleared. Say you mark 4x4 blocks as dirty, then you know you need to clear them, etc. That also means you need to store your buffers in block-linear order, which is better than straight linear anyway.
|
# ¿ Apr 9, 2011 00:41 |
|
There are lots of things to think about, but I don't have a good reference off the top of my head, unfortunately. Bresenham is sort of the gold standard way to do it. There are variants, of course, but it's actually quite simple to implement. Other stuff to think about: Are you doing the z-cull in eye space or clip space? What are you doing to map to pixels? How are you interpolating (ie, you should do adds instead of mul/divide per pixel)? Are you working in floating point or with integers?
|
# ¿ Apr 9, 2011 07:12 |
|
First, I'm confused. If the object isn't physically changing shape, etc, you don't need to dynamically update its vertex/index data. Unless I'm missing your point... For text, most people pass a sysmem pointer to index data. Usually you want to double-buffer VBOs. Frame 1: Mod VBO A, Draw with A Frame 2: Mod VBO B, Draw with B Think of it this way: All GL commands get put into a queue that will be pushed to the GPU (a "command buffer"). However, you can modify stuff in VRAM via DMA as well. So you cannot* modify something that is currently being used to draw. However, if you have 2 VBOs with the same data, you can get around this (and vertex data tends to be small, so it doesn't really hurt your mem usage). This true for textures,vertex data, cbuffers, etc (I've seen several DX10 apps that try to use one cbuffer and end up waiting after every draw). *D3D's LOCK_DISCARD and GL's FlushMappedBufferRange will allow you to modify stuff in use - it's up to you to make sure you don't smash your own memory.
|
# ¿ Apr 10, 2011 23:02 |
|
OneEightHundred posted:Why would you do this instead of just using discard? Because the driver has to allocate you new VRAM when you discard or orphan the buffer. Any decent driver keeps a list of these so it doesn't have to do an actual allocation, but I feel it's a better design to handle this yourself. not a dinosaur posted:VBO stuff Fewer calls is better, yeah. You've got a couple options. For something like that, instancing works really well. You'd only have to upload a transform vector per quad instead of all 4 vertices. On modern hardware, sending points down to the geometry shader and generating quads would also work, but is almost certainly an unnecessary optimization. I'm not a fan of geometry shader in general, so I try to stay away from it. Or you can double buffer the VBO and update pieces as needed, which is essentially what you are doing now.
|
# ¿ Apr 11, 2011 20:42 |
|
octoroon posted:I was referring more to OpenGL. The ARB instancing extensions are 2008, unless I'm missing something. That's true, but consider that the g8x generation is like 5 years old at this point. Just about anything that's actually worth developing for supports instancing. Ironically, nvidia does not support instancing in hardware on those parts - it's implemented in the driver. Still is faster than making multiple draws yourself. As always, you have to play around with it and see what gives the best perf for your app.
|
# ¿ Apr 12, 2011 20:08 |
|
OneEightHundred posted:I still don't get the point of instancing. High-poly models don't benefit from it unless there are multiples of the same model at the same LOD, low-poly models are cheap enough to just dump to a dynamic buffer. Well, stuff like single-quad grass is cheap enough to do dynamically, but instancing can be really useful for stuff like rocks and trees. Or for stuff like RTS games where you have a crapload of the same type of unit running round onscreen.
|
# ¿ Apr 13, 2011 20:07 |
|
octoroon posted:Still curious if there's any tangible benefit to getting a newer 3.0+ OpenGL context. Is there some sort of optimization or practical improvement? Because right now it seems like just creating an old context and getting any entry points I need for newer functionality with GLEW is exactly the same as getting a 3.0+ compatibility context. Is there any reason at ALL to get a newer context if I don't care about asking for a core profile so that deprecated functionality literally throws an error? Yes. 3.2+ removes a ton of outdated and stupid poo poo. This means the runtime can skip the checks and not have to worry about the fixed function built-in stuff. Also, mandating the use of VAO means you can optimize vertex submission in a couple ways that would be harder with the standard Bind/VertexPointer, etc method. Now, I'm not sure if the windows drivers have all these optimizations, but it's worth 10% or so on OSX.
|
# ¿ Apr 16, 2011 19:32 |
|
octoroon posted:So I would assume asking for a compatibility context negates these benefits. It would depend on how the driver is implemented, but I would assume so. I haven't profiled it though.
|
# ¿ Apr 18, 2011 02:11 |
|
roomforthetuna posted:Yeah, only three meshes (or at least something on that order of magnitude) and it's taking me from my capped 60fps to 40fps. Which wouldn't be a problem except there's going to be more than a few guys running around. That's not really how it works in DX9. Constants don't work the same way as buffers. Without looking at the whole code it'd be hard for me to say for sure what's going on. What do your passes do? If you are uploading a bunch of matrices, and doing 3 passes that's going to upload the data 3 times. Maybe that's why it's so slow? Dunno. One thing to keep in mind is that it's better to do Bind Shader 0 Bind Vtx buffer 0 Draw Bind Vtx buffer 1 Draw Bind Shader 1 Bind Vtx buffer 0 Draw Bind Vtx buffer 1 Draw Than to do Bind Vtx Buffer 0 Bind shader 0 Draw Bind Shader 1 Draw
|
# ¿ May 1, 2011 20:58 |
|
roomforthetuna posted:Thanks for that, which might help somewhat. I'm actually only using one shader object, but selecting from amongst multiple 'techniques' - is that the same thing as binding a new shader each time? I'm not super familar with D3DX to be honest. I'm not sure what it does under the hood. But CommitChanges has to do with State and not data - I would guess you don't need it if you are just changing constants. Also make sure you aren't having the runtime save state for you. You should manage it yourself. As for setting the shader, I think it depends on whether each technique is actually a different shader or not. Otherwise it may just be state. Not sure. You should run your app through PIX and see what the actual command stream looks like.
|
# ¿ May 2, 2011 22:46 |
|
|
# ¿ May 14, 2024 04:02 |
|
YeOldeButchere posted:Welcome to graphics programming! I hope you like loving around until you find what your drivers and graphics hardware like. Working on drivers has given me a whole new perspective on it. To the point where I now think that anyone writing high perf apps should do so. It sucks at times, because there are a lot of games that aren't written well and the drivers have to end up optimizing their (bad) usage of the API. Better documentation, examples and some transparency would really help that. Along with better education. Though every vendor is loath to describe the hints/rules/etc because it would out all the ridiculously dirty hacks and unimplemented crap that's in every runtime/driver/etc.
|
# ¿ May 3, 2011 01:45 |