|
Um, they pretty much do the same thing:code:
code:
As for VBO: I've considered it, but well ... I can't be arsed to. I'd need to figure out how they work first (haven't seen a SIMPLE example tutorial yet) and then i'd have to figure out how to get them to work in Perl.
|
# ? Sep 6, 2008 01:41 |
|
|
# ? May 15, 2024 04:14 |
|
Right, but how are you creating the lists? glVertex doesn't really mean anything outside of a begin/end pair - so the driver may be getting confused when you compile your display lists, or try to draw them. First thing that comes to mind for me, anyway.
|
# ? Sep 6, 2008 03:01 |
|
Try checking the code yourself: http://code.google.com/p/dwarvis/source/browse/trunk/livevis/dwarvis.pl The display lists get constructed in ourInit by iterating over a 3-dimensional array and drawing cubes when appropiate. Cubes are drawn via the stuff contained in ourDrawCube. Down in cbRenderScene the display lists then get called. Note that i know what you mean. When i leave out the begin/end things nothing gets rendered, but when i have them as noted above, it gets rendered either way. The only difference is the amount of CPU rape.
|
# ? Sep 6, 2008 03:58 |
|
Mithaldu posted:However when i take the begin and end out of the lists proper and put them in front and after the loop, it suddenly goes weird. The CPUs cores both go to 100%, all used by the Perl process and the framerate noticeably drops. It doesn't even matter whether the list calling loops is within begin/end or not. If it isn't it won't render, but the CPU will still freak out. As soon as you call glBegin(), you enter immediate mode and the display driver expects *nothing but* vertex data (glColor, glAttribute, glVertex etc). When you called a display list while in this mode, your display driver was probably nice enough to notice this and pass the display list that way (hence the slowdown). Same deal if you're creating a display list full of vertices but have no glBegin anywhere, inside or out of it - the driver has no idea what to do with them, anything could happen. Basically the begin and end are meant to be in the display list definition, not when you call the display list, and Bad Things happen if you put them anywhere else :P. As for the threading thing, it's not really that surprising. Just because your program is single-threaded, doesn't mean the display driver it's calling is. If you do something weird like this it's not entirely unexpected that the driver will balance the load between cores to increase performance. Edit: pianoSpleen fucked around with this message at 07:34 on Sep 6, 2008 |
# ? Sep 6, 2008 04:46 |
|
pianoSpleen posted:As for the threading thing, it's not really that surprising. Just because your program is single-threaded, doesn't mean the display driver it's calling is. If you do something weird like this it's not entirely unexpected that the driver will balance the load between cores to increase performance. in fact, depending on what hardware/driver you are using, I can almost gaurentee that this is the case.
|
# ? Sep 6, 2008 04:53 |
|
Remember that GL is designed as a client/server system. Using two cores is natural.
|
# ? Sep 6, 2008 05:30 |
|
Erm, and what about this? "Only a subset of GL commands can be used between glBegin and glEnd. The commands are glVertex, glColor, glSecondaryColor, glIndex, glNormal, glFogCoord, glTexCoord, glMultiTexCoord, glVertexAttrib, glEvalCoord, glEvalPoint, glArrayElement, glMaterial, and glEdgeFlag. Also, it is acceptable to use glCallList or glCallLists to execute display lists that include only the preceding commands. If any other GL command is executed between glBegin and glEnd, the error flag is set and the command is ignored." http://www.opengl.org/sdk/docs/man/xhtml/glBegin.xml To elaborate on that: I had already suspected some incompatibility and done my share of googling. Every kind of documentation i ran across told me that it's a-ok as long as i only have vertex stuff in my display lists. The threading thing is nice, now i only wish it could do that when it's not already massacring performance. Mithaldu fucked around with this message at 06:09 on Sep 6, 2008 |
# ? Sep 6, 2008 05:49 |
|
I think that's just worded badly. glVertex/Color/Normal, etc don't really mean anything by themselves - the display list will be optimized by the implementation, but in order to do that it needs to know what the data means. By not including begin/end, you aren't including that info so it can't really do anything. Most implementations let you turn off multithreading, try turning it off and seeing what happens.
|
# ? Sep 6, 2008 06:06 |
|
Ooh, ok, yea, the optimization thing makes sense. Thanks. As for the multithreading, i wouldn't know how. :/
|
# ? Sep 6, 2008 06:10 |
|
On windows, it's usually in the driver options. I know nvidia has it; not sure about ATI. On Mac, it's set programmatically (though that may be difficult if you're using perl)
|
# ? Sep 6, 2008 09:44 |
|
Been using windows xp + nvidia for ages and never saw anything like that in the driver settings, unless you're talking about prerendered images.
|
# ? Sep 6, 2008 09:57 |
|
Another question about OpenGL: As far as i can see it only offers a spot light source, which i'd have to put really far away to emulate a global directional light. Is there any direct way to make a global directional light in OpenGL?
|
# ? Sep 18, 2008 15:26 |
|
Mithaldu posted:Another question about OpenGL: As far as i can see it only offers a spot light source, which i'd have to put really far away to emulate a global directional light. Is there any direct way to make a global directional light in OpenGL? Set the w coordinate (fourth parameter of GL_POSITION) to 0 to get a directional light source (technically a point light at infinite distance).
|
# ? Sep 18, 2008 15:33 |
|
Ah, exactly what i was looking for, thanks.
|
# ? Sep 18, 2008 15:51 |
|
I'm currently fiddling with more complex objects and i'm finding out that i cannot display them with all with QUADS anymore. Currently i have the setup like this: code:
code:
Or should i alternatively simply duplicate one vertex per quad in the same location with the same tex coords, so it looks like a triangle? Mithaldu fucked around with this message at 15:16 on Sep 21, 2008 |
# ? Sep 21, 2008 15:13 |
|
Why not make it all triangles and save yourself a state change and an additional vertex and index buffer?
|
# ? Sep 22, 2008 12:10 |
|
heeen posted:Why not make it all triangles and save yourself a state change and an additional vertex and index buffer? Yes, you're almost certainly better off with all trianges (preferrably, tri strips)
|
# ? Sep 22, 2008 13:05 |
|
God, i just realized that quads or tris doesn't matter to the display list at all, since it only collapses into triangles internally anyhow. Also: "and an additional vertex and index buffer" Huh? Anyhow, the reason why i'm using quads is that the generation of the display lists also happens constantly in the background while new objects come into view and while the content of the data source changes. I'm hoping that using quads improves the generation performance there a bit, but right now i just plain don't know what the gently caress. As for triangle-strips: I'd at most be using them for single quads anyhow, as i plan to use lighting and such and will need normals. Don't really see a method to do that easily. Edit: Thanks guys, going with triangles is really the thing to do. As a test i inserted something that's a rather extreme case and it went absolutely smooth: http://www.veoh.com/videos/v16023678hb23TCjA 3D performance is perfectly fine and even with the switch from quads to tris, cache generation did not slow down noticeably. Mithaldu fucked around with this message at 23:11 on Sep 24, 2008 |
# ? Sep 22, 2008 17:35 |
|
Everything will be decomposed to triangles eventually, so you should start with them. Constantly generating new display lists won't speed you up unless you re-use them a lot. You're almost certainly better off using a vertex buffer object and updating it.
|
# ? Sep 24, 2008 01:52 |
|
A word ahead: I'm doing this in Perl, so i'm trying to keep stuff relatively simple. The more work i can shove off to OpenGL itself, the faster this thing becomes. At any given time i'm rendering 45-180 display lists per frame. Additionally about 1-3 display lists gets remade roughly every second or so. Due to not being able to figure out a way how to thread the updates, i need to perform them inbetween frames. As such i try to keep them as few and fast as possible to avoid frame-rate hitching. I'm not sure if VBOs give me any advantage there. Additionally, I have no idea how VBOs work and I would need to figure out by trial and error how to get them working with the OpenGL interface i have. Which, i suspect, may be incomplete or buggy as well, since i can't get specular light reflection to work. As for triangles, yea, i already switched everything around. Being able to sketch up the models in Wings 3D is a godsend.
|
# ? Sep 24, 2008 14:38 |
|
Mithaldu posted:At any given time i'm rendering 45-180 display lists per frame.
|
# ? Sep 24, 2008 22:08 |
|
Mithaldu posted:I'm not sure if VBOs give me any advantage there. VBOs would be an advantage for anything that's generated exactly once and never modified for the entire session. For any element that might be frequently rebuilt, they're a wash or worse.
|
# ? Sep 24, 2008 22:11 |
|
Cross_ posted:That's quite a bit. It might help to merge some of them if possible to avoid any overhead with state switching in between. Two things to keep in mind here: The data originates as (16x16)x(5-20) blocks. In the video above i am displaying a range of 3*3 blocks. One of the requirements for this application is the ability to slice off layers of the top at will to display the inside of the environment, since this is a digging/mountain-type stuff game. So i can only combine display lists on the 2D layer, which either means making the caching mechanism and general data->display chain a hell of a lot more complicated by assigning one cache to multiple blocks; or i could simply have the center block in each case cache enough data to display the blocks around it at once, which would make the memory use explode. Honestly though, right now i'm really happy with the performance of this thing and i'm not even optimizing much. Cross_ posted:Also to repeat what was said above: use tri_strips_ if you can. They reduce the vertex data sent to card by up to 30% compared to triangle lists. sex offendin Link posted:VBOs would be an advantage for anything that's generated exactly once and never modified for the entire session. For any element that might be frequently rebuilt, they're a wash or worse. Also, thanks in general to any of you guys throwing stuff at me here. Even if i can't use all of it, it's hard to find brains off which i can bounce ideas regarding this. Mithaldu fucked around with this message at 23:13 on Sep 24, 2008 |
# ? Sep 24, 2008 23:10 |
|
sex offendin Link posted:VBOs would be an advantage for anything that's generated exactly once and never modified for the entire session. For any element that might be frequently rebuilt, they're a wash or worse. Of course, there are two obscure things to keep in mind: First is that writing into a mapped VBO can cause severe cache pollution, which is why the GPU manufacturers recommend doing uncached SSE writes to the GPU space after you map it. glBufferData/glBufferSubData do exactly that, but it also means that SSE-accelerated transforms can push results directly to the GPU with no performance hit. Another obscure thing is mirroring D3D's "discard" behavior: If you're not going to use the data in a VBO any more and want to map it again, call glBufferData with a NULL data pointer to throw out the existing contents. Doing that will cause the driver to give you a new region of memory to work with if it's not done with the old one, not doing it will cause a stall until it's done using that region. OneEightHundred fucked around with this message at 05:00 on Sep 25, 2008 |
# ? Sep 25, 2008 04:20 |
|
Can anyone give me some guildlines on geometry shader performance? (a paper/article/tutorial for instance) I am using GSs to output GL_POINTS. If i output 4 or less points per shader call, performance is fine. Anything more than that, and i get 1-4 fps (down from > 75). I have a feeling that the shaders MIGHT be running in software for some reason, but i cant figure out what i am doing wrong...Any ideas? (also is there a way to include a "static" vec2 array inside a geometry shader? Such a thing is possible inside a Vertex/Fragment shader, but the compiler does not get past the "opengl does not allow C-style initializers" when compiling the GS).
|
# ? Sep 26, 2008 14:40 |
|
What GPU are you running on? I have no problems outputting dozens of vertices. There are some OpenGL variables you can query to determine how much data you can output. Keep in mind vertex attributes do count against these limits, so you can output less vertices if you assign a hundred varyings with each. code:
code:
|
# ? Sep 26, 2008 16:13 |
|
heeen posted:What GPU are you running on? I have no problems outputting dozens of vertices. Ive checked those variables, and i am well within their limits (max output is 1024 primitives per call iirc).
|
# ? Sep 26, 2008 16:26 |
|
shodanjr_gr posted:Can anyone give me some guildlines on geometry shader performance? (a paper/article/tutorial for instance) The problem you are running into is due to the somewhat short-sighted way the Geometry Shader was fitted into the pipeline on a lot of hardware. Essentially, if you're using the geometry shader to do expansion in the data stream, you run into memory contention problems. This is due to the fact that the primatives (triangles, points, etc) need to be rasterized in issue order, which means that the memory buffer coming out of the geometry shader stage has to match the order of primatives coming into the geometry shader (i.e. primative A, or any shapes derived from primative A, must all come before primative B). If you're doing expansion in the shader, this means that processing of Primative B can't start (or at least, the results can't be output) until Primative A is finished. Combine this with the fact that the clipping/rasterizer stage now has to process 4 times faster than the vertex shading stage and that the buffers were optimally sized to a 1:1 ratio, and you have the potential for a lot of bottlenecks. For what it's worth, you *might* want to try doing a two-stage approach: shading your vertices and using the GS to expand, then feeding the output to a Stream-Out buffer, then re-rendering that with a rudimentary vertex shader. Because you're streaming straight to memory, you might not run into the buffer limitation issues. I haven't tested this myself though, and the extra processing + slower memory speed might negate any gains you get. This, incidentally, is why tesselation in the DX10 Geometry Shader is generally discouraged. e: To be taken with a grain of salt. Looking at the post again, 75fps to 4fps seems like a very dramatic slowdown for this sort of thing. It could actually be possible that you're running into software mode, but that seems unlikely based on your description. Hubis fucked around with this message at 18:25 on Sep 26, 2008 |
# ? Sep 26, 2008 18:09 |
|
Thanks for the input!Hubis posted:This is due to the fact that the primatives (triangles, points, etc) need to be rasterized in issue order, which means that the memory buffer coming out of the geometry shader stage has to match the order of primatives coming into the geometry shader (i.e. primative A, or any shapes derived from primative A, must all come before primative B). I am just weirded out that generating a display list with my high-quality point mesh and sending it over the card is a lot faster than generating a low quality mesh display list and expanding it in the shader... (for instance, i want to end up with an 800 * 800 point mesh, so i figured id send an 80*80 mesh, then generate further points inside the GS). If anyone else has any clues, please let me know. e: quote:e: To be taken with a grain of salt. Looking at the post again, 75fps to 4fps seems like a very dramatic slowdown for this sort of thing. It could actually be possible that you're running into software mode, but that seems unlikely based on your description. To give you a measure, lets assume that i send over a 200 * 200 grid display list to the card and i generate 1 point in my GS. This runs at < 75 fps and all is well. If i keep the display list constant, and i generate 4 points in my GS, i drop to 25 FPS! If i move up to 9 points, i drop to 12 FPS. 16 points brings me into single digits. At 16 points per grid point, i get a total of 640000 points. Now, if i send an 800 * 800 display list for rendering, and only generate a single point (for the same number of total points), i get at least 15 FPS. So for the same amount of geometry, using the GS to expand gives me a 75% reduction to frame rate compared to a display list... shodanjr_gr fucked around with this message at 18:39 on Sep 26, 2008 |
# ? Sep 26, 2008 18:32 |
|
shodanjr_gr posted:Thanks for the input! Unfortunately, no. Adding the ability to disable/relax that requirement has been suggested as a workaround to this, but as is it's an ironclad rule of the API. Remember, you're working with a pipeline, not a series of iterative stages being run on the whole data set at a time. Thus, if data transfer and vertex processing are not the bottlenecks in your pipeline, optimizing them isn't going to give you any performance benefit. Unless you only have one job going through the entire pipeline at a time, your render time is going to be only as fast as the single slowest stage of the pipeline (barring some strange edge cases). If you were using DirectX, I'd say run it through nvPerfHUD and see where your bottlenecks are, then optimize from there. From what you describe, it sounds like Blending/Raster Ops are your bottleneck, not Data Transfer/Input Assembly/Vertex Processing. shodanjr_gr posted:To give you a measure, lets assume that i send over a 200 * 200 grid display list to the card and i generate 1 point in my GS. This runs at < 75 fps and all is well. If i keep the display list constant, and i generate 4 points in my GS, i drop to 25 FPS! If i move up to 9 points, i drop to 12 FPS. 16 points brings me into single digits. At 16 points per grid point, i get a total of 640000 points. Now, if i send an 800 * 800 display list for rendering, and only generate a single point (for the same number of total points), i get at least 15 FPS. So for the same amount of geometry, using the GS to expand gives me a 75% reduction to frame rate compared to a display list... ah ok. Then yes, this sounds like you're running into the issue I described above. Hubis fucked around with this message at 20:41 on Sep 26, 2008 |
# ? Sep 26, 2008 20:39 |
|
Hubis posted:ah ok. Then yes, this sounds like you're running into the issue I described above. I'll look into your suggestion to see if i can get a buff in FPSes...i really dont want to can this idea, since its been producing very nice results visually. Can you point me to the right direction for Stream-out buffers? edit: Is there a chance this is an nvidia only problem? (i assume there are pretty large architectural differences between nVidia and ATi GPUs). shodanjr_gr fucked around with this message at 20:45 on Sep 26, 2008 |
# ? Sep 26, 2008 20:42 |
|
shodanjr_gr posted:I'll look into your suggestion to see if i can get a buff in FPSes...i really dont want to can this idea, since its been producing very nice results visually. shodanjr_gr posted:edit: I can't speak to AMD internals, but it's my understanding that they run into the same problem. The only difference would be the size of the post-GS buffer, which will determine where you hit those falloffs and how much they'll hurt performance.
|
# ? Sep 26, 2008 21:01 |
|
Thanks for the replies Hubis. Ill check out the spec you linked. Final question, is there a way to query the driver to see if it's running in software mode?
|
# ? Sep 26, 2008 21:06 |
|
shodanjr_gr posted:Thanks for the replies Hubis. Ill check out the spec you linked. Not that I'm aware of.
|
# ? Sep 26, 2008 21:07 |
|
Can't you request that software fallback be disabled when creating the context? That would cause it to error out instead of silently slowing down, right?
|
# ? Sep 26, 2008 21:11 |
|
sex offendin Link posted:Can't you request that software fallback be disabled when creating the context? That would cause it to error out instead of silently slowing down, right? Not sure...Ive been using GLUT to handle context creation...
|
# ? Sep 26, 2008 21:12 |
|
Kind of a vague question, sorry. Lets say I'm working on an outdoor first person game in OpenGL. I want a large view distance, so I'm assuming I'm going to be drawing very large, very simple geometry on the horizon, and small complicated geometry close up. For whatever reason, some way of cheating with a 2 pass skybox rendering method isn't going to work. What are the performance considerations of doing this? For instance, does it poo poo all over my z buffer precision if in the same scene I'm drawing a 4 poly pyramid on the horizon that's hundreds of units tall and doing detail work close up? Also in that kind of situation, what's an acceptable scale for our coordinates? Is it worth spending time worrying about whether making my map extents minfloat to maxfloat rather than something arbitrary like +- 1024.
|
# ? Oct 1, 2008 12:34 |
|
HauntedRobot posted:Kind of a vague question, sorry. You're going to run into signficant z-buffer precision problems leading to Z-fighting for things that are near one another (within one exponential precision value from one another) unless they are drawn in strict back-to-front order, including sub-draw call triangle order.
|
# ? Oct 1, 2008 21:30 |
|
HauntedRobot posted:What are the performance considerations of doing this? For instance, does it poo poo all over my z buffer precision if in the same scene I'm drawing a 4 poly pyramid on the horizon that's hundreds of units tall and doing detail work close up?
|
# ? Oct 1, 2008 22:25 |
|
|
# ? May 15, 2024 04:14 |
|
What's better, drawing pixels directly, or using textures? Poking around the OpenGL documentation, I see that I can put pixel data into a raster and glDrawPixels() it onto the screen. I can also put the pixel data onto a texture, put the texture on a quad, and put that onto the screen. If I'm just doing two-dimensional imaging, then which is preferable? Also, am I crazy, or has the page for glutKeyboardUpFunc disappeared from the Opengl documentation? I could swear I was looking at it a couple weeks ago, but it isn't there now.
|
# ? Oct 1, 2008 23:52 |