Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Spite
Jul 27, 2001

Small chance of that...
The problem is that there's a metric assload of code out there that still uses Begin/End and/or display lists. And there's still a bunch of people that have to maintain that code. Unfortunately most games use D3D these days, so there isn't as much pressure for good OGL tutorials. The amount of infighting in the ARB doesn't help the state of the API either.

I wish there were a few good really impressive OGL3.2+ games out there, but there aren't. ES is the best bet since it still has the majority, unless they screw up that spec.

Adbot
ADBOT LOVES YOU

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

(i.e. why is it still not possible to save a compiled shader to disk and reload it, even if it's vendor-specific bytecode?)

That's really, really hard and annoying to do. The bytecode would have to be so generic as to essentially be the same as a shader source file or arb program string. If they made vendor-specific bytecodes it would have to be totally hardware agnostic (you don't want to have to ship different bytecodes for r5xx, r6xx, r7xx, r8xx if you are ATI) and also you'd have to support it forever in the future.

It's not going to happen unless OpenGL defines a format - but that's kind of pointless. Storing bytecode doesn't really help you that much - the GL stack and the Driver stack still have to compile it into machine code and do their optimizations. You don't gain much time from hoisting out the parsing and if you are compiling shaders constantly during rendering you are doing something very wrong.

The only reason to do this I see is obfuscation, but that doesn't gain you much since the people you are trying to hide the shaders from can just read the spec and reverse engineer it anyway.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

That isn't what I'm asking for. I'm saying that I should be able to have the driver give me the vendor-specific and probably hardware-specific code compiled from the GLSL so I only have to compile it once and then save it to disk so I never have to compile it again.

Even that's a pretty large concession though: I do think it would probably still be better if they used semi-agnostic bytecode similar to D3D because it gives the driver writers fewer places to screw up. Being able to blue-screen my computer because the ARB decided to trust ATI with writing a compiler sucks rear end.

No vendor is going to do that though - that's too much information for them to be comfortable letting the user keep around. And then you'd have to ship compiled shaders for every card model - r5xx, r6xx, etc, etc. It's not really worth it.

And the agnostic bytecode D3D ships is still compiled by each driver - every vendor still has to convert it into machine code and run optimizations, or maybe even replace chunks altogether. It may look very different from the bytecode after all this. It only saves the parsing step, which is really not saving much time at all.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

Oh bullshit, CTM basically already works off of that exact concept. ATI even has tools that let you look at the compiled output of GLSL shaders.

I'm not saying you'd have to SHIP it with each one anyway, I'm saying that the app would only have to compile them once ever and cache them to disk.

Compile-then-save isn't a revolutionary feature either, Far Cry and a few Battlefield entries for example both leave shaders uncompiled until the first time they're used.

And yes, the incompatible shader compilers thing is a complete pain in the rear end too, especially with int/float conversions which ATI will throw errors over while NVIDIA completely ignores them.

Right, but it's not _really_ what's being run on the card. That's just a representation of it. You can't just compile once, because it doesn't work that way. Every driver on every card will recompile shaders based on state to bake stuff in and do other optimizations. You'd need a huge number of different binaries to cover all possibilities. Turned on sRGB? Your fragment shader will be recompiled. Using texture borders? Recompile again, etc.

It would be nice to have a specified GLSL parser that doesn't suck, definitely. But there's a whole lot more work that goes into compilation than just that. Hoisting the initial parsing step is really saving just a small part of the overall time and work for "compilation."

Spite
Jul 27, 2001

Small chance of that...

Luminous posted:

Do you really not understand what OneEightHundred is saying? Or are you just trolling him? You should have just stopped at your very first sentence.

If options change that would cause a need for recompilation, then it would be recompiled. However, if not, then keep using the stored built version. I'm not sure why caching appears to be an alien concept to you. Maybe you're just being all goony semantic about it, or something.

I'm coming from the perspective of someone writing the driver, not someone using the API, so I may be making my point poorly.

I'm simply saying that bytecode and cached (shipping on disk) versions aren't all that simple and I don't feel they really save you anything. There's a lot more that happens at the driver and hardware level during shader compilation and linking and it's not widely understood. To really get at the heart of the issue, you'd have to save out a multitude of variants of each shader and you'd have to make it future-compatible. I'm not sure it's worth it, and it's a royal pain in the rear end for the driver developer, that's all.

Spite
Jul 27, 2001

Small chance of that...

roomforthetuna posted:

At the heart of this cross-purposes discussion is the fact that that's not what cached means. OneEightHundred is proposing that the card allow one to save the specific compiled shader for the current situation, to avoid unnecessary time spent recompiling. Like, the first time you load a game/area, it compiles the shaders and saves them out so that every other time you load that game/area it won't need to compile them again.

If it would really "recompile shaders based on state to bake stuff in and do other optimizations" then it could still be an option, where an attempt to load a precompiled shader that is flagged as being for a different situation (you could include whatever flags you need since it's a vendor-specific binary being dumped) would return a "can't use that saved shader, please recompile" return value. Or you could even include the uncompiled shader code inside your precompiled shader file, and then in the event of loading it in a situation where the precompilation isn't valid, automatically recompile.

(Does a card really recompile shaders based on state?)

Most drivers do this already behind your back, though. And saving to disk will almost certainly be saving an intermediate format that would have to be retranslated/recompiled once they are actually used anyway. What I mean to say is what the app developer thinks of as "one specific situation" the driver thinks of as "potentially 2 dozen+ possibilities" and that's hard to reconcile easily.

And yes, every driver for every card currently shipping will recompile the active shader based on state. Think about a simple texture sample. What if you change the wrap mode and you don't have support for it in hardware? Your shader is already uploaded and compiled, but the driver has to fixup the shader to get the correct behavior - which causes a recompile.
For example, nvidia does not support seamless cube maps in hardware (the gt100 series does, anything less does not), so if you turn them on and your shader samples a cube map, they modify the shader to do the fixup and blending. Or they will borderize each face when you draw and blend in the shader.

There are dozens and dozens of situations where a driver will take your current shader (that it has told you is already compiled) and recompile it to add/remove stuff based on state. Since the driver doesn't know what's coming it's hard to cache; it would either have to guess at a common set of state or provide all possibilities in its binary blobs. Neither are particularly appealing, and as I said before, it doesn't really gain you a whole lot unless you can avoid all the recompiles and links completely.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

Thank God. Add the DX11 functionality as well and it'll FINALLY be at feature parity with D3D, and maybe we'll see some development with it again.

WebGL is the real news though: It's a major step towards what Google wants, which is turning web browsers into application development platforms, overthrowing the core operating system in the process. If they're aiming towards high-performance 3D, I suppose it's only a matter of time until we get better input and audio APIs and it becomes possible to write serious games on top of Chrome.

Yeah, and my previous posts were mostly an annoyance at having to implement it :)
I still don't think it will save nearly as much time as people hope, but it's good to have the appearance of parity with DX11.

I'm curious to see how they make WebGL secure. ARB_robustness is kind of a pain, and being able to give the GPU arbitrary shader code over the web means you can do nasty things to the user.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

I don't see why that's such a big deal. Basically all that's needed to secure OpenGL, as far as I can tell, is to bounds check operations that pull from VBOs/textures and not have batshit retarded shader compilers.

Granted that second point might be a bit much for certain vendors, but if you're going to say "but what if driver writers can't follow the spec that says shader code should never crash the application, even if malformed?" then you're basically saying we'll never have an API that adheres to standards they're already told to adhere to. I'm not sure that's really valid. We already have Managed DirectX for instance, why should it be so hard to make OpenGL pull the same trick?

Bounds checking should already be in since they're supposed to throw INVALID_VALUE on that case.

The hard problem to solve is making sure a shader doesn't take so long as to hang the entire system. You could easily do some sort of denial-of-service type attack with a shader - there's only one GPU and if you can peg it, the whole system will screech to a halt.

I'm not super familiar with Managed DirectX, but from what I recall it requires marshalling data across the c# runtime boundary. WebGL will absolutely have to do something similar since it's based on JavaScript bindings. The fun part with be rectifying the garbage-collected, typeless JS stuff with the OpenGL runtime underneath.

Spite
Jul 27, 2001

Small chance of that...
Well, there really is no "modern" opengl since no apps have really be written that use GL4.0+ or even really OpenGL 3.0+. I haven't gone through the 5th edition, but I'd hope it's decent since it's supposed to be approved by the ARB.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

I'd think "modern" would be not depending on features that have been deprecated since 3.0 at least, even if you're targeting OpenGL 2

Yeah, most definitely. A part of me dies inside when I see a new piece of code that still uses immediate mode.

Forcing everyone to use vertex array objects, vertex buffers and shaders is the best thing the ARB ever did.

Spite
Jul 27, 2001

Small chance of that...
Yup, OneEightHundred is correct. Batch that poo poo.

Also re: updating a VBO every frame. When you render in immediate mode you ARE updating geometry every frame. And you are doing it very slowly since you're specifying each vertex individually.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

Use one dynamic VBO for dynamic data (or two, one for vert data one for indexes). Use glBufferSubData to stream data into it, ideally. When there isn't enough room in the buffer left to add more data, or you hit something that forces a state change, issue a draw call to whatever's in the buffer and start over.

When you "start over", use glBufferData on with a NULL pointer so you get a fresh memory region if the driver is still locking the one you were streaming into. (This mimics the D3D "discard" behavior)

You can alleviate the locking problem by double buffering your VBOs. Use one VBO for even frames and one for odd frames. That also means you don't have to call BufferData, which may cause a slight stall as the driver deallocs and reallocs the memory. Just make sure you overwrite everything you're drawing so you don't get bad data.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

The idea behind that behavior is that you don't need to double-buffer them, calling BufferData with a NULL pointer will give you the same memory region if it's not locked, and if it is locked, will give you a fresh one.

They did this very deliberately, and you can probably trust the driver to be smart about it.

But you still may cause an allocation in that case. And if the driver is using the memory to draw from you get better pipelining if you use two. Plus there are other considerations based on whether the driver can do out of line uploads, etc, which hits certain iPhone apps. If you can spare the memory, we always recommend using two.

(Not trying to be contrarian, I swear!)

Spite
Jul 27, 2001

Small chance of that...

Eponym posted:

I'm a beginner, and have 2 questions...

About the coordinates of lines

I have an application where I'm trying to draw lines on a 512x512 texture. The texture is mapped to a screen aligned quad with vertices (-1.0f, -1.0f), (1.0f, -1.0f), (1.0f, 1.0f), (-1.0f, 1.0f).

Suppose I want to draw a line. I don't have any problem drawing lines in world space, but if I want the lines to start and end at particular pixel positions (eg. (48,48) to (72,72)), I get confused. Is it up to me to scale and transform those endpoints so that they are mapped to the appropriate world-space coordinates? Is there anything I can do so that I don't have to perform that transformation?

About the width of lines
I am using OpenGL 2.1, so glLineWidth can be set to values greater than 1 pixel. I read that in OpenGL 3.1, glLineWidth can't be set greater than 1. In that environment, what do people do instead?

For drawing lines in screen space, the easiest way will be to use an orthographic projection. Check out glOrtho (http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml). You can generate the matrix yourself if you're using a version without the matrix stack. An orthographic project is a parallel projection - parallel lines remain parallel (as opposed to perspective projections, where parallel lines approach a vanishing point). This will let you do your direct mapping to pixels on the screen.

glLineWidth works as you'd expect with antialiasing off. It's different with it on - check the spec for more details.

Spite
Jul 27, 2001

Small chance of that...
Do you not have a DX11 card? The reference rasterizer will always work, but as you say, it will be horribly slow.

Spite
Jul 27, 2001

Small chance of that...

Tw1tchy posted:

That's the thing, I DO have a DX11 card, and all the other examples in the directx11 sdk work fine. That's why I'm having problems, I have absolutely no idea at all what's wrong.

That cube app - what's the actual error returned by the Device creation function?

Spite
Jul 27, 2001

Small chance of that...
Why not just use a rectangle texture?
http://www.opengl.org/registry/specs/ARB/texture_rectangle.txt

Spite
Jul 27, 2001

Small chance of that...

Eponym posted:

I am trying to draw anti-aliased lines, and this is how to do it, or so I've read:

code:
     // Enable line antialiasing
     glEnable(GL_LINE_SMOOTH);
     glEnable(GL_BLEND);
     glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
     glHint(GL_LINE_SMOOTH_HINT, GL_NICEST);
However, I am trying to draw antialiased lines into a framebuffer object. Without the above, my lines draw fine. With the above, nothing draws.

I'm not sure what other details to provide. I am using textures and shaders.

What OS, what GPU? AA lines are an odd duck.
Does it work if you draw to GL_BACK?

Spite
Jul 27, 2001

Small chance of that...
Well, for one you aren't actually clearing the buffer.
You have to call glClear(GL_COLOR_BUFFER_BIT)

And I think there are a bunch of AA line bugs, I'd have to check. Turning off blending fixes it? Have you installed the GFX update?

Spite
Jul 27, 2001

Small chance of that...
You need to unproject from 0 (near) and 1 (far) if you're going to go that way, from what I recall. I highly recommend doing the math yourself though.

Think of it as the pick origin being at 0,0,0 and the pick point being on the near plane, then see what that ray intersects with. Since the screen itself covers the entire near plane you can convert from those coords to eye coords without too much trouble.

http://www.opengl.org/resources/faq/technical/selection.htm

DON'T USE GL_SELECT. Unique colors is ok, but it will force a readback of a buffer which probably isn't desirable.

Spite
Jul 27, 2001

Small chance of that...
Keep in mind that branching, and especially stuff like discard/texkill tends to run a lot better if you've got a lot of localized pixels that will take the same branch.

On projection and picking:

I wouldn't recommend using Unproject.
Think about it this way:
Your screen maps directly to the near plane. Once you've transformed everything into eye space, you know that your eye is at 0,0,0 and you also know where your near plane is at. (you've specified your frustum's top,bottom, left and right so you know those values). This means there's a simple mapping from a spot on your screen to a point on your near plane. You can then cast a ray from the origin through that point on the near plane to generate your pick ray. You'll have to transform it out of eye space (or everything into eye space), but that's the best way to do picking.

Spite
Jul 27, 2001

Small chance of that...

haveblue posted:

OS X can do it because Apple put in a shim layer that uses LLVM to run shaders on the CPU in a pinch. I doubt the Intel embedded chips have true shader processors.

This isn't precisely true. The Apple SWR can run shaders and the runtime will fall back to software depending on the input, but the x3100+ do have shader processors. It's not a shim layer - it's an entirely different renderer. You can actually force it on for everything via a defaults write, or by doing a safe boot.

No GPU is interruptible in the classical sense, so you have no real defense against an infinite loop other than to kill the GPU. And since GPU recovery basically doesn't work anywhere, you are pretty much hosed. It's pretty bad across all the OSes.

There's an extension called ARB_robustness that is supposed to lay out guidelines to make WebGL something other than an enormous security hole. Whether it works or is implemented correctly remains to be seen.
http://www.opengl.org/registry/specs/ARB/robustness.txt

The browsers that pass GLSL directly down are being very, very naughty and are asking for trouble.

Spite
Jul 27, 2001

Small chance of that...

Sabacc posted:

I'm trying to work on a port of an OpenGL 1.1 game to OpenGL ES 1.3. I came acros this bit of code and I can't for the life of me figure out how to replicate it in OpenGL ES:

code:
glPushAttrib(GL_LIGHTING_BIT);
Now, I know pushing the attribute basically store all things affecting lighting--lights, materials, directions, positions, e.t.c. But how the hell am I supposed to capture this information in OpenGL ES? In OpenGL, I push the attribute, do some work, and pop it off.

Two things:
You should never use glPush/PopAttrib, you should track that yourself
You should never call glGet* you should track that yourself (unless it's a fence or occlusion query, etc). Not even to check for errors (don't check for errors in production code).

To actually answer you:
GL_LIGHTING includes a ton of stuff, as mentioned. It also tracks EXT_provoking_vertex and ARB_color_buffer_float's clamped vertex color.

Spite
Jul 27, 2001

Small chance of that...

Sabacc posted:

I'm not trying to be rude or daft but I really don't know what you mean here.

glPushAttrib is out of the OpenGL ES spec, so I'm not using it :)
GL_LIGHTING_BIT includes a ton of stuff, as you and the poster above noted.
Is it appropriate to keep track of my current state if there's a lot of things to manage, then?

So if previously I had

code:
glPushAttrib(GL_LIGHTING_BIT);
... perform some operations, some which affects masks and lighting ...
glPopAttrib();
do I now do:

code:
... store states of masks and lighting...
... perform some operations, some which affects masks and lighting...
... return states of masks and lighting ...
Is that (terrible) pseudocode accurate?

Sorry if I was a bit unclear or rude. You shouldn't use glPush/PopAttrib even in normal OpenGL because it does a lot of extra work that probably isn't necessary.
It's easier to shadow all the state you've changed and reset it as necessary. Set your shadow to whatever the default is and as you change state, update it. So if you only change, say, the GL_COLOR_MATERIAL enable bit, you'd keep track of that and set/reset it as necessary.

Spite
Jul 27, 2001

Small chance of that...
FBO 0 isn't actually an object - it's the system drawable. So attaching things to it may have...odd effects (though it should just throw an error).

One quick note about CPU overhead: OpenGL is pretty bad about CPU overhead as well. Using 3.1+ will mitigate this, as they removed a bunch of junk. But anything earlier (ie, anything that still has fixed function, etc) will require validation of all that legacy state which sucks rear end.

Spite
Jul 27, 2001

Small chance of that...

Unormal posted:

Am I using anything fixed function here? I figured since I'm entirely shader driven I was bypassing the 'fixed' pipeline, though I don't really know how using the framebuffer EXT functions vs the builtin framebuffer functions for 3.x would effect things. I guess I figured driver writers would just implement the EXT functions as special cases of the more general 3.0 functionality, and EXT would actually be more portable, even though the opengl pages tell you to use the more updated built in functions if you can.

The only thing that feels 'built in/fixed' to me is using the OpenGL blend mode to render each deferred light into the intermediary buffer. That feels a little more auto-magic then the rest of the rendering I do, which is much more manual-direct-writes via shaders. Though I guess there's alot of magic going on under there anyway. I can't figure out any way I could do the blending manually other than having 2 FBOs and swaping them each time I render a new light, which seems ridiculous and I can't imagine would be speedier than just using the built in blend, though I haven't actually benchmarked it.

E: Is there any kind of good comprehensive guide to mainstream video cards and their capabilities in terms of OpenGL? (i.e. how many render targets they support, how many texture units, etc?)

Well, it doesn't matter if you're not using the fixed function pipeline. If it's there, it needs to be validated because the spec says so. Hooray. Of course, every modern implementation with have a bunch of dirty bits and not do validation if nothing's changed (ie, don't validate the really old image processing type state if it's not been changed). GL 3.1+ doesn't _have_ all that old crap, so it can be totally ignored.

Blending will absolutely be faster if you use the fixed hardware. Programmable blending doesn't exist on the desktop yet - and using multiple render targets will suck for perf.

For limits, etc, this isn't a bad reference:
http://developer.apple.com/graphicsimaging/opengl/capabilities/

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

Current best practice is VBOs, glDrawRangeElements for everything, and use SSE uncached writes to VBOs if you're mapping (glBufferSubData does uncached writes).

As a quick note, this will depend on the driver and OS (BufferSubData, I mean). Remember to keep an eye on your alignments with SSE. But yeah, you totally want to use uncached writes, especially for a big data set.

If you do use MapBuffer for your VBO, remember to use FlushMappedBufferRange/MapBufferRange.
And move to generic vertex attributes instead of using builtins if you can.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

The point of command buffers is that they can be stuffed with commands from any thread. Display lists don't really work that way because they're filled based on API calls that operate on the state machine, which is bound to a single thread.

Personally I think they're kind of pointless because inherent overhead in making API calls when you have user mode drivers is practically nothing.

I agree that they are pointless, and pretty flawed from a design point of view. You can't optimize anything if the display list still obeys the state machine.

But GL calls do have significant CPU overhead, especially on OS X. There's a ton of validation, conversion and other stuff that needs to be done because it's demanded by the spec. D3D10 is way better than GL is, at this point.

Spite
Jul 27, 2001

Small chance of that...

octoroon posted:

Okay, so I have a few (hopefully not-too-dumb) questions about OpenGL and VBO usage.

Currently I'm working on a game that basically renders the world similar to minecraft, i.e., the world is made up of a lot of smaller blocks. I've been keeping all the world geometry in a giant VBO, but I've also read that VBOs are really optimized for holding about 2-4MB of data (or possibly a max of 8MB?).

My question is, would it be better to divide the world up into smaller sections that each have their own VBO in order to get my individual VBO size smaller? Should I look into making my vertex data smaller by using something like triangle fans with primitive restart, or what else would be a good way to maximize vertex sharing/reduce the size?

I'd say it's more a caching and locality issue than a pure VBO size issue. There are paging and VRAM constraints to think of, but each vertex is just a number in VRAM to the GPU. Sourcing that number efficiently can be affected by layout and cache, however. Consider that if your entire world is in one VBO pieces that are next to each other spatially may not be next to each other in memory. It's similar to why you'd store an image in Z-order or Hilbert-order instead of just straight linear.

Plus, you do NOT want to be drawing things you can't see - so you'll either be sending multiple DrawRangeElements calls, or you can decompose your world into smaller chunks and bind and draw them individually. I'd prefer this myself, as you can then do frustum checks and just skip the draws if they aren't visible.

And as a last caveat - how is your performance thus far? If it's not bad, you may not want to overcomplicate your problem just yet.

Spite
Jul 27, 2001

Small chance of that...
Clearing is always a cost, though it's certainly faster now. Of course, you have to consider that your framebuffer is likely to be much larger as well.

There are a ton of things you can do. You can try to make a "fastclear" path that only clears pixels that have been touched. You can do Hi-Z. Some combination of the two will allow you to categorize blocks of the zbuffer that need to be cleared. Say you mark 4x4 blocks as dirty, then you know you need to clear them, etc.
That also means you need to store your buffers in block-linear order, which is better than straight linear anyway.

Spite
Jul 27, 2001

Small chance of that...
There are lots of things to think about, but I don't have a good reference off the top of my head, unfortunately.
Bresenham is sort of the gold standard way to do it. There are variants, of course, but it's actually quite simple to implement.

Other stuff to think about:
Are you doing the z-cull in eye space or clip space? What are you doing to map to pixels? How are you interpolating (ie, you should do adds instead of mul/divide per pixel)? Are you working in floating point or with integers?

Spite
Jul 27, 2001

Small chance of that...
First, I'm confused. If the object isn't physically changing shape, etc, you don't need to dynamically update its vertex/index data. Unless I'm missing your point...
For text, most people pass a sysmem pointer to index data.

Usually you want to double-buffer VBOs.
Frame 1:
Mod VBO A, Draw with A
Frame 2:
Mod VBO B, Draw with B

Think of it this way:

All GL commands get put into a queue that will be pushed to the GPU (a "command buffer"). However, you can modify stuff in VRAM via DMA as well. So you cannot* modify something that is currently being used to draw. However, if you have 2 VBOs with the same data, you can get around this (and vertex data tends to be small, so it doesn't really hurt your mem usage).
This true for textures,vertex data, cbuffers, etc (I've seen several DX10 apps that try to use one cbuffer and end up waiting after every draw).

*D3D's LOCK_DISCARD and GL's FlushMappedBufferRange will allow you to modify stuff in use - it's up to you to make sure you don't smash your own memory.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

Why would you do this instead of just using discard?

Also another option for using discard on OpenGL buffers is use glBufferData with a NULL data pointer.

Because the driver has to allocate you new VRAM when you discard or orphan the buffer. Any decent driver keeps a list of these so it doesn't have to do an actual allocation, but I feel it's a better design to handle this yourself.

not a dinosaur posted:

VBO stuff

Fewer calls is better, yeah.
You've got a couple options.
For something like that, instancing works really well. You'd only have to upload a transform vector per quad instead of all 4 vertices.
On modern hardware, sending points down to the geometry shader and generating quads would also work, but is almost certainly an unnecessary optimization. I'm not a fan of geometry shader in general, so I try to stay away from it.
Or you can double buffer the VBO and update pieces as needed, which is essentially what you are doing now.

Spite
Jul 27, 2001

Small chance of that...

octoroon posted:

I was referring more to OpenGL. The ARB instancing extensions are 2008, unless I'm missing something.

e: from what I can tell only stuff from the GeForce 8800+ era definitely supports the OpenGL extensions.

That's true, but consider that the g8x generation is like 5 years old at this point. Just about anything that's actually worth developing for supports instancing.

Ironically, nvidia does not support instancing in hardware on those parts - it's implemented in the driver. Still is faster than making multiple draws yourself.

As always, you have to play around with it and see what gives the best perf for your app.

Spite
Jul 27, 2001

Small chance of that...

OneEightHundred posted:

I still don't get the point of instancing. High-poly models don't benefit from it unless there are multiples of the same model at the same LOD, low-poly models are cheap enough to just dump to a dynamic buffer.

Well, stuff like single-quad grass is cheap enough to do dynamically, but instancing can be really useful for stuff like rocks and trees. Or for stuff like RTS games where you have a crapload of the same type of unit running round onscreen.

Spite
Jul 27, 2001

Small chance of that...

octoroon posted:

Still curious if there's any tangible benefit to getting a newer 3.0+ OpenGL context. Is there some sort of optimization or practical improvement? Because right now it seems like just creating an old context and getting any entry points I need for newer functionality with GLEW is exactly the same as getting a 3.0+ compatibility context. Is there any reason at ALL to get a newer context if I don't care about asking for a core profile so that deprecated functionality literally throws an error?

Yes. 3.2+ removes a ton of outdated and stupid poo poo. This means the runtime can skip the checks and not have to worry about the fixed function built-in stuff. Also, mandating the use of VAO means you can optimize vertex submission in a couple ways that would be harder with the standard Bind/VertexPointer, etc method.

Now, I'm not sure if the windows drivers have all these optimizations, but it's worth 10% or so on OSX.

Spite
Jul 27, 2001

Small chance of that...

octoroon posted:

So I would assume asking for a compatibility context negates these benefits.

It would depend on how the driver is implemented, but I would assume so. I haven't profiled it though.

Spite
Jul 27, 2001

Small chance of that...

roomforthetuna posted:

Yeah, only three meshes (or at least something on that order of magnitude) and it's taking me from my capped 60fps to 40fps. Which wouldn't be a problem except there's going to be more than a few guys running around.

Oh, yeah, sorry, it's Direct3D 9 so I don't even know what Constant Buffers are (in a DirectX context anyway). It does sound like a thing that would help!

Basically, the problem resembles when I was doing vertexbuffer->Lock() , write , Unlock() , render, with non-discardable buffers - everything would get very very slow with only like 20-30 very small buffer rewrites. So I'm guessing my bone matrices declared in the shader are acting the same way, as a bit of graphics memory that has to be locked and overwritten, and thus delays until the previous render using that memory is completed.

But in this case I have no idea how to have multiple bits of memory for that, unless having multiple instances of the same shader is the way you do this, but that doesn't seem right.

That's not really how it works in DX9. Constants don't work the same way as buffers.

Without looking at the whole code it'd be hard for me to say for sure what's going on.
What do your passes do? If you are uploading a bunch of matrices, and doing 3 passes that's going to upload the data 3 times. Maybe that's why it's so slow? Dunno.

One thing to keep in mind is that it's better to do

Bind Shader 0
Bind Vtx buffer 0
Draw
Bind Vtx buffer 1
Draw
Bind Shader 1
Bind Vtx buffer 0
Draw
Bind Vtx buffer 1
Draw

Than to do
Bind Vtx Buffer 0
Bind shader 0
Draw
Bind Shader 1
Draw

Spite
Jul 27, 2001

Small chance of that...

roomforthetuna posted:

Thanks for that, which might help somewhat. I'm actually only using one shader object, but selecting from amongst multiple 'techniques' - is that the same thing as binding a new shader each time?

I was uploading a new batch of matrices each time, but when I stopped doing that it didn't actually help much (maybe from 40fps to 41fps) so that wasn't the problem. The little code extract I posted is almost literally what's being added - it's not all neatly in a row like that, and the variables have meaningful names, but I walked through the code and that's all the DirectX-facing functions that are called, in that order, to bring it down from 60fps to 50fps (or 40fps if the shader is always skinning on 4 bones rather than 2/3).

'shader' is the same object in each of the 3 repetitions, the technique being set may or may not be the same one. I just googled for "SetTechnique" to see if maybe there'd be a hint about usage, and found someone saying they'd have a separate shader class for each of their objects, but that didn't seem like someone who really knew what they were doing. Is this a horrible idea?

Is there some way to render everything with technique X within one shader->Begin/shader->End block, given that they'll have different transform matrices? Would it be something like
code:
shader->BeginPass(0);
pD3DDevice->SetVertexDeclaration(the_appropriate_declaration);
pD3DDevice->SetStreamSource(vertexbuffer,offset,stride);
pD3DDevice->SetIndices(indexbuffer);
shader->SetMatrix(transformhandle,a_matrix);
shader->CommitChanges();
pD3DDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST,0,0,numvertices,0,length);

pD3DDevice->SetVertexDeclaration(the_appropriate_declaration);
pD3DDevice->SetStreamSource(different_vertexbuffer,offset,stride);
pD3DDevice->SetIndices(different->indexbuffer);
shader->SetMatrix(transformhandle,another_matrix);
shader->CommitChanges();
pD3DDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST,0,0,numvertices,0,length);

shader->EndPass(0);
This is not what I'm doing now - this question is really "is this what CommitChanges is for?"

I'm not super familar with D3DX to be honest. I'm not sure what it does under the hood. But CommitChanges has to do with State and not data - I would guess you don't need it if you are just changing constants. Also make sure you aren't having the runtime save state for you. You should manage it yourself.

As for setting the shader, I think it depends on whether each technique is actually a different shader or not. Otherwise it may just be state. Not sure.

You should run your app through PIX and see what the actual command stream looks like.

Adbot
ADBOT LOVES YOU

Spite
Jul 27, 2001

Small chance of that...

YeOldeButchere posted:

Welcome to graphics programming! I hope you like loving around until you find what your drivers and graphics hardware like.

Though to be fair, these are pretty complex systems with a whole lot of factors interacting, so often there's not much choice but to test and profile. But I do feel like a lot of the documentation could use some more hints and rules of thumb when it comes to performance concerns.

Working on drivers has given me a whole new perspective on it. To the point where I now think that anyone writing high perf apps should do so. It sucks at times, because there are a lot of games that aren't written well and the drivers have to end up optimizing their (bad) usage of the API. Better documentation, examples and some transparency would really help that. Along with better education.

Though every vendor is loath to describe the hints/rules/etc because it would out all the ridiculously dirty hacks and unimplemented crap that's in every runtime/driver/etc.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply