Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Spatial
Nov 15, 2007

The half pixel offset thing is only for DirectX. The centre of a pixel is already [0.5,0.5] in OpenGL so it pretty much works out of the box.

For example let's say you have your typical ortho projection where 1 unit maps to 1 pixel in the viewport. If you render a 32x32 quad (e.g. from [0,0] to [31,31]) with tex coords [0,0]->[1,1] mapping a 32x32 texture, it will be pixel accurate. Even if you use filtering and multisampling.

Multisampling is the easiest way to test this sort of thing because the sub-pixel accuracy shows up any distortions right away.

Adbot
ADBOT LOVES YOU

Sex Bumbo
Aug 14, 2004
I'm trying to use ExecuteIndirect in DX12 on a compute command list. I want the indirect buffer to be written to by a compute shader. To get written to, I need to transition the buffer to a UAV, and to use it for ExecuteIndirect, I need to transition it to Indirect. I can't do this transition on a compute command list though, I get the error: "D3D12_RESOURCE_STATES has invalid flags for compute command list".

Is this just something compute command lists/queues can't do? This really sucks right?

Doc Block
Apr 15, 2003
Fun Shoe
I heard recently that ARGB is a faster texture format for modern GPUs than RGBA.

Is there any truth to this?

Doc Block fucked around with this message at 04:32 on Apr 8, 2016

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
For rendering to or texturing from?

Doc Block
Apr 15, 2003
Fun Shoe
What I mean is that images are stored on disk as RGBA, and so once they're loaded I've just been uploading them as-is. For OpenGL/OpenGL ES this is fine since (if memory serves) the driver is free to turn an RGBA texture into ARGB etc, whereas Metal doesn't change the texture format (and probably neither do Vulkan or DX12, I'd imagine) and it's up to the application to use the most optimal texture format.

So I guess I need to add a byte-swapping stage to my texture loader.

edit:

Suspicious Dish posted:

For rendering to or texturing from?

For texturing from.

Doc Block fucked around with this message at 04:30 on Apr 8, 2016

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
in most cases texture upload is gonna tile or swizzle the texture anyway so it's not really any more or less efficient

i don't know any gpu that can texture from a linear bitmap like that

i'd be very surprised if metal doesn't do swizzle into an efficient format on upload

Doc Block
Apr 15, 2003
Fun Shoe
If I'm remembering correctly, when Apple introduced Metal they said in the corresponding WWDC session that textures don't get changed. But I could be mistaken. v:shobon:v

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
at the very least, for NPOT textures, it's gonna align the row stride to be a multiple of 8 during upload.

there is literally no way they wouldn't do poo poo like that. but i can't find very much good documentation on Metal and how the texture upload path in it works

Doc Block
Apr 15, 2003
Fun Shoe
I just rewatched the WWDC Metal intro session video, and you're right. They specifically say that Metal reformats textures and the underlying buffer is "implementation private". Must've misheard the first time or misremembered.

So there's no point in adding a byte swapping stage to my texture loader, at least.

edit: If you really want to, though, applications can access the underlying texture data by supplying their own Metal buffer for storage, but Apple warns this prevents texture optimization, and rows have to be padded out to 64 byte boundaries. It only works on iOS, though.

Doc Block fucked around with this message at 08:28 on Apr 8, 2016

mobby_6kl
Aug 9, 2009

by Fluffdaddy

Doc Block posted:

You can set it to match the viewport, that's fairly common for 2D games. Then your map quads will have what are effectively screen-space coordinates.
...

Suspicious Dish posted:

OpenGL wants to convert everything to be in the range -1 to 1. The big secret is that you don't even need any projection matrix to do that, as long as the output of the vertex shader is in that space.
...

HappyHippo posted:

Yep. It's not the prettiest but these are the two shaders I've been using to draw simple sprites:
...
Thanks guys! I didn't have time to mess around with this until now, but thanks to all of this, my tiles are rendered pixel perfect now. Had to disable depthmask but that's fine as the background will always go to the very back.

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe
I'm making some groundwork for a dynamic particle system, and was wondering what the most efficient way to handle that would be in terms of shaders and draw calls? My thinking is that I'll have a file defining different particle effects, and there are some basic different types that will, of course, all require different shaders (i.e. a particles that follows a bezier curve will require different code to ones that follows a parametrised curve, stateful particles will need to look up previous values and write updated ones, whereas stateless ones only depend on t and so on.) The thing is, say I have 10 different subcategories of each basic type of particles (for instance, for parametrised curves I might have a spiral for wind effects using 100 particles, a modified sine curve for fire using 50 particles etc,) I'm not sure how to handle this without defining 10 different shaders and binding each of them in turn, which would make the number of draw calls being made explode. One possible solution I thought of was to have one shader for each basic type that each contains a switch-case in the vertex shader, and then use a vertex attribute to indicate which type of particle a vertex is, but I'm worried this would be ineffective, if the GPU this has to work on does not support that sort of branching. Is there any ideal solution to this problem?

I'm working in OpenGL.

Sex Bumbo
Aug 14, 2004
If you're not on a mobile or lovely gpu, it's not going to have any trouble doing complicated stuff in a vertex shader. Unless you're rendering several million particles.

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?
100 particles is nothing. The explosion particle effect I posted in the other thread has 16000 particles, and I can get around 30-40 explosions before the framerate drops below 60fps. The shader for updating the particle positions is about 50 lines long.

HappyHippo posted:

Gif quality isn't the best but I've been working on a particle system for explosions for my RTS:



Higher quality gfycat video

Edit: Also if you want nice looking wind I recommend curl noise

Sex Bumbo
Aug 14, 2004
If you're using 3D curl noise it definitely isn't ~50 instructions long. I'm assuming you're using the typical implementation which requires a ton of 4D noise samples. But, you know, gpus are fast now and the noise doesn't add memory pressure so you can go hog wild.

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

Sex Bumbo posted:

If you're using 3D curl noise it definitely isn't ~50 instructions long. I'm assuming you're using the typical implementation which requires a ton of 4D noise samples. But, you know, gpus are fast now and the noise doesn't add memory pressure so you can go hog wild.

Well the noise is precomputed, so it's just a lookup in the shader. Also I "cheat" the 3d noise. You take 3 2d curl noise textures, one each for the xy, yz and xz planes, then you add the vectors together to get a 3d vector.

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe
The 100 particles number was just an inconsequential number I pulled out of my rear end, what I'm really worried about is the number of changes to the state machine and draw calls vs. possible inability to branch if I do a switch-case.

Like say I have a file like this:

code:
name=name1
type=parametrised
number=1000
xt=sin(t)
yt=2*t
at=1/t
toffset=0.1

name=name2
type=parametrised
number=200
xt=t
yt=t
at=1
toffset=0.025
and I write an interpreter that generates the following geometry shader to handle parametrised particles:

code:
layout(point) in;
layout(triangle_strip, max_vertices=[A lot]) out;

in vec2 emitterPos;
flat in int type;

uniform mat3 PVM;
uniform float t;

out vec2 UV;
out float alpha;

void main() {

switch(type) {
case 0:
	for(int i = 0; i < 1000; i++) {
		float offset_t = t + 0.1*float(i);
		vec2 center = emitterPos + vec2(sin(offset_t),2*offset_t);
		alpha = 1/offset_t;
		//define the four corners of the quad, set UV and gl_Position, end each with EmitVertex();
		EndPrimitive();
	}
	break;
case 1:
	for(int i = 0; i < 200; i++) {
		float offset_t = t + 0.025*float(i);
		vec2 center = emitterPos + vec2(offset_t,offset_t);
		alpha = 1;
		//define the four corners of the quad, set UV and gl_Position, end each with EmitVertex();
		EndPrimitive();
	}
	break;
}
}
Is something like this better or worse than 1) having separate shaders for every single type of particles and calling draw for every emitter and 2) Having vertices for every single particle, as opposed to just their emitters, and doing the switch case in the vertex shader in stead?

Sex Bumbo
Aug 14, 2004

Joda posted:

code:
void main() {

switch(type) {
case 0:
	for(int i = 0; i < 1000; i++) {
		float offset_t = t + 0.1*float(i);
		vec2 center = emitterPos + vec2(sin(offset_t),2*offset_t);
		alpha = 1/offset_t;
		//define the four corners of the quad, set UV and gl_Position, end each with EmitVertex();
		EndPrimitive();
	}
	break;
case 1:
	for(int i = 0; i < 200; i++) {
		float offset_t = t + 0.025*float(i);
		vec2 center = emitterPos + vec2(offset_t,offset_t);
		alpha = 1;
		//define the four corners of the quad, set UV and gl_Position, end each with EmitVertex();
		EndPrimitive();
	}
	break;
}
}

whoa whoa whoa don't emit that much stuff from a single geometry shader invocation. It's this:


except way worse because there's much more cpus and also cpu 0 is relatively slow. Do something else to allocate the number of work items for the gpu.

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe

Sex Bumbo posted:

whoa whoa whoa don't emit that much stuff from a single geometry shader invocation. It's this:


except way worse because there's much more cpus and also cpu 0 is relatively slow. Do something else to allocate the number of work items for the gpu.

Maybe instancing would be better? Is there any way I can tell the driver with OpenGL, that I want it to repeat vertex A 1000 times and vertex B 200 times? If so I could just generate the quad in the geom shader and use the instance ID to determine time offset.

E: I guess I can use seperate draw calls, since most modern drivers are clever enough to realise that the shader hasn't changed, but afaik that behaviour isn't guaranteed?

Sex Bumbo
Aug 14, 2004
The only issue is that you're doing a ton of work in a single shader invocation. Assuming you have a lot of particles, you want work somewhat evenly distributed onto your gpu. So if you're using geometry shaders, you probably want each geometry shader to only process one particle, or maybe a few but probably just one. Then the gpu makes all its threads run that kernel and it goes really fast, even if the geometry shader is moderately complex, because it's doing it all in parallel.

Alternatively you could put the logic into a vertex or compute shader. Then the question is how to get it to dispatch X particles but there's a lot of ways to do that too.

Technically geometry shaders outputting particle vertices is rather slow, mainly because geometry shaders outputting anything is kind of slow. But it's not slow enough to matter. If you're trying to process like >10 million particles, your best bet is probably to render them as a huge list of indexed quads, and they use the vertex id to sample from a buffer -- kind of like instancing, but not, because instancing is slow on small vertex count instances. If you don't need that many particles it doesn't matter how you render them.

lord funk
Feb 16, 2004

A general rendering question: I've got a scene in Metal where there are a bunch of floating stretched icosphere models. When the camera is zoomed out, the frame rate is a solid 60fps. But zooming in causes the frame rate to drop.

Zoomed in / out pictures:
http://imgur.com/a/QibKz

Is there a general cause to this? something I can do to mitigate it?

Doc Block
Apr 15, 2003
Fun Shoe
How complicated/expensive is your fragment shader? How many values are being interpolated for each fragment? What's the blend mode?

edit: is this on iOS or OS X? I'm not familiar with Metal on OS X (my 2011 iMac is too old for it). I know some of the alignment requirements are different. And if your buffers are in system memory instead of VRAM then obviously things will be slower.

Doc Block fucked around with this message at 20:15 on Apr 22, 2016

Tres Burritos
Sep 3, 2009

Is it possible to use google maps satellite data to actually texture a sphere? I've been tasked with making a 3d tool for scientific use and it'd be sweet if we could use existing sat images.

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe

Tres Burritos posted:

Is it possible to use google maps satellite data to actually texture a sphere? I've been tasked with making a 3d tool for scientific use and it'd be sweet if we could use existing sat images.

It's certainly possible. Depending on your needs there's a highly variable level of work load involved, though. Like anything from half a day to a week. Do you know how the existing sat images are formatted in terms of projection unto the image plane? And do you need variable level of detail based on zoom and stuff like that?

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

http://vterrain.org has a lot of resources for generating and modeling terrain, including from real world data sources.

Tres Burritos
Sep 3, 2009

Joda posted:

It's certainly possible. Depending on your needs there's a highly variable level of work load involved, though. Like anything from half a day to a week. Do you know how the existing sat images are formatted in terms of projection unto the image plane? And do you need variable level of detail based on zoom and stuff like that?

Yeah, I know how the images are formatted, and it definitely needs a LOD system.

I assume once you zoom in far enough you can just treat the ground as a warped rectangular plane and get away with it, but I don't even know where to start with texturing a globe from image data that's been projected.

lord funk
Feb 16, 2004

Doc Block posted:

How complicated/expensive is your fragment shader? How many values are being interpolated for each fragment? What's the blend mode?

I did a test and made a fragment shader that just returns a constant color. Still chugs when zoomed in. I've also tried turning blending off completely, and depth testing on / off.

Here is a video of it in action (the results are the same even when not blending):
https://www.youtube.com/watch?v=b7vut8k_tOc

quote:

edit: is this on iOS or OS X? I'm not familiar with Metal on OS X (my 2011 iMac is too old for it). I know some of the alignment requirements are different. And if your buffers are in system memory instead of VRAM then obviously things will be slower.

OS X. Yep - everything on OS X has to be 256 byte aligned.

I am going to look into the memory location. Xcode is telling me that I'm using all CPU and no GPU in the debug view, but that may just be Xcode being its usual POS broken self (the Metal system trace tool isn't even supported on OS X).

Sex Bumbo
Aug 14, 2004

lord funk posted:

I did a test and made a fragment shader that just returns a constant color. Still chugs when zoomed in. I've also tried turning blending off completely, and depth testing on / off.

This didn't help? Are you doing some sort of complicated visibility testing?

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe

lord funk posted:

I did a test and made a fragment shader that just returns a constant color. Still chugs when zoomed in. I've also tried turning blending off completely, and depth testing on / off.

Could it be fill related? When you're zoomed in the triangles take up more of the viewport, so if you're committing them all to memory you're gonna be doing way more writes when you're zoomed in. What happens if you just draw lines in stead of full triangles?

E:

Tres Burritos posted:

Yeah, I know how the images are formatted, and it definitely needs a LOD system.

I assume once you zoom in far enough you can just treat the ground as a warped rectangular plane and get away with it, but I don't even know where to start with texturing a globe from image data that's been projected.

Well if you have a globe texture that looks something like this you should be able to use the normal of the sphere to do a look up with normalized spherical coordinates as your UV-coordinates (see this). Of course, LOD is slightly more complicated, since you will have to use different textures and different UV mappings depending on your level of zoom. You could probably use an array texture, and have every level in the array correspond to a region on the globe.

Joda fucked around with this message at 23:06 on Apr 23, 2016

Doc Block
Apr 15, 2003
Fun Shoe

lord funk posted:

I did a test and made a fragment shader that just returns a constant color. Still chugs when zoomed in. I've also tried turning blending off completely, and depth testing on / off.

Here is a video of it in action (the results are the same even when not blending):
https://www.youtube.com/watch?v=b7vut8k_tOc


OS X. Yep - everything on OS X has to be 256 byte aligned.

I am going to look into the memory location. Xcode is telling me that I'm using all CPU and no GPU in the debug view, but that may just be Xcode being its usual POS broken self (the Metal system trace tool isn't even supported on OS X).

Seems like it only happens once you're inside the models. Maybe something to do with clipping?

lord funk
Feb 16, 2004

Well it's not because of filling with triangles - lines do the same thing.

Here is a new datapoint: this seems to only happen on my hi-dpi monitor. At 1280x800 on a projection it runs at 60fps.

@Doc Block: I was thinking something similar, like maybe once the model vertices are behind the viewport they get stretched to infinity or something. But I'm not sure what to change here.

@Sex Bumbo: I haven't included any visibility testing on my end, not to say that I'm missing something about this engine.

Doc Block
Apr 15, 2003
Fun Shoe

lord funk posted:

Well it's not because of filling with triangles - lines do the same thing.

Here is a new datapoint: this seems to only happen on my hi-dpi monitor. At 1280x800 on a projection it runs at 60fps.

Maybe a weird driver bug?

If this is on a Retina Mac, try disabling Retina mode or whatever just for your application. This will be in different places depending on how you're setting up your window and whether you're using MetalKit or not.

Sex Bumbo
Aug 14, 2004
What's your framebuffer size/format? Is it on a tiled renderer?

Doc Block
Apr 15, 2003
Fun Shoe
He's running it on OS X, so it's either a desktop/laptop AMD GPU, or an integrated Intel one.

Metal restricts your framebuffer to being whatever the drawable's format is, which is in turn whatever format the windowing system uses. For now, that's BGRA8 on iOS and probably also OS X (offscreen FBOs can be in whatever format you want, obviously).

lord funk
Feb 16, 2004

I'm on a late 2013 Mac Pro w/ AMD FirePro D500 3072 MB graphics. I'm going to leave the issue alone for now, since I guess it'll work fine when I present it. There is a slight difference in the number of pixels this thing is pushing on the display v. projector:

Only registered members can see post attachments!

Zerf
Dec 17, 2004

I miss you, sandman

Joda posted:

Could it be fill related? When you're zoomed in the triangles take up more of the viewport, so if you're committing them all to memory you're gonna be doing way more writes when you're zoomed in. What happens if you just draw lines in stead of full triangles?

Fill rate would be my guess - check the blending states and especially measure overdraw - if you see a lot of objects at the same time doing alphablending and ignoring the z-buffer, and fill the entire screen with it, you could see large performance drops. But instead of speculating about it, there should be some performance analyzer program that you could download? Intel GPA maybe? I know very little about the tools you have available on OSX, so can't give you any proper recommendations.

Doc Block
Apr 15, 2003
Fun Shoe

Zerf posted:

Fill rate would be my guess - check the blending states and especially measure overdraw - if you see a lot of objects at the same time doing alphablending and ignoring the z-buffer, and fill the entire screen with it, you could see large performance drops. But instead of speculating about it, there should be some performance analyzer program that you could download? Intel GPA maybe? I know very little about the tools you have available on OSX, so can't give you any proper recommendations.

On iOS, Apple has performance analyzers for Metal that will detect some things that hurt performance and give you a complete breakdown/trace analysis for a frame, showing you every API call made during that frame and how long each draw call took etc. But according to Lord Funk, the trace analyzer for Metal isn't available on OS X.

edit: I kinda wanna say it's related to clipping somehow (driver bug?), since it really only seems to happen once the view goes inside the models but not when the models take up a lot of the screen but the viewer is still outside them. Lord Funk said changing to lines and/or just using a shader that outputs solid white with no blending doesn't make the problem go away, so it doesn't seem like a fill rate problem.

Doc Block fucked around with this message at 00:21 on Apr 25, 2016

Zerf
Dec 17, 2004

I miss you, sandman

Doc Block posted:

On iOS, Apple has performance analyzers for Metal that will detect some things that hurt performance and give you a complete breakdown/trace analysis for a frame, showing you every API call made during that frame and how long each draw call took etc. But according to Lord Funk, the trace analyzer for Metal isn't available on OS X.

edit: I kinda wanna say it's related to clipping somehow (driver bug?), since it really only seems to happen once the view goes inside the models but not when the models take up a lot of the screen but the viewer is still outside them. Lord Funk said changing to lines and/or just using a shader that outputs solid white with no blending doesn't make the problem go away, so it doesn't seem like a fill rate problem.

How did I miss that entire post? My reading comprehension yesterday must've been broken. Sorry for that Lord funk.

Advice still stands though, try downloading a performance analyzer of some kind, to get more information on what takes up more time.

Doc Block
Apr 15, 2003
Fun Shoe
Would there even be a 3rd party performance analyzer for Metal? A quick Google search doesn't turn anything up.

lord funk
Feb 16, 2004

Apple's been releasing some half-baked tools just to get features out the door. Like Doc said, the tools are already there on iOS. Then they got just enough support on OS X to say gently caress it, we'll finish it later.

I'm fairly sure we'll see announcements about tool support on OS X at WWDC next month.

Adbot
ADBOT LOVES YOU

Doc Block
Apr 15, 2003
Fun Shoe
Metal on OS X itself seems like something they brought over from iOS just for the sake of having it.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply