Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Xerophyte posted:

This lead me down a rabbit hole of looking at the subgroup stuff from 1.1 which I was completely unaware existed; I'm not very current or good with GPU framework stuff which is why I started this litte hobby project. Thanks! I noticed that the 1 atomic/subgroup was exactly what the subgroup tutorial recommends too. I expect the subgroup operations will be very useful for stuff like sampling, since I can make the subgroup collectively vote on the BRDF to sample which should be efficient. Unfortunately I don't think I can boil down path tracing sample accumulation to scan local group + one atomic op in that way.

The problem with GPU pathtracing has always been that it's incoherent: paths started in nearby pixels will very quickly diverge and veer off into paths that access completely different parts of the scene. Most GPU path tracers deal with this by doing wavefront tracing. Generate a lot of subpath rays, sort them by their position and direction, dispatch work according to that order so the local work group always access the same region of the scene. The problem with that is that now the local work group will include paths with vastly different origin pixels instead, and writing any new samples is a big incoherent scatter write. I expect I can deal with that by just sorting the samples back into image space buckets or something like that, it'll just be a little more annoying than just atomically adding them to the target accumulation storage image immediately when I have them.

Incoherent gathers are, all else being equal, going to be way faster than incoherent scatter in part because the GPU is built to handle them as a common case.

Ralith posted:

If the target locations are effectively random, contention might not be too big an issue, though I suppose that's scene dependent.

Actually, having threads from the same subgroup hitting the same global atomic is ironically probably going to be faster, because you can coalesce the atomic within a warp (essentially as a parallel sum) and then do a single write to the global address. Atomic contention within a thread group can add latency to the thread group; atomic contention between different thread groups can potentially serialize the entire machine.

Adbot
ADBOT LOVES YOU

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Hubis posted:

Actually, having threads from the same subgroup hitting the same global atomic is ironically probably going to be faster, because you can coalesce the atomic within a warp (essentially as a parallel sum) and then do a single write to the global address.

At least Intel doesn't do this; explicit subgroup ops made a shader of mine an order of magnitude faster.

fankey
Aug 31, 2001

Not really a 3d question but this is probably the right place to ask this question. I'm trying to encode a series of IDXGISurfaces with an IMFSinkWriter. Here's my encoder implementation:

code:
#pragma once
#define NOMINMAX
#include <windows.h>
#include <d3d11_2.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <wrl.h>

class Encoder
{
  Microsoft::WRL::ComPtr<IMFMediaType> mediaTypeOut;
  Microsoft::WRL::ComPtr<IMFMediaType> mediaTypeIn;
  Microsoft::WRL::ComPtr<IMFSinkWriter> writer;
  DWORD streamIndex{ 0 };
  DWORD frame{ 0 };
public:
  Encoder();
  void AddFrame(_In_ Microsoft::WRL::ComPtr<IDXGISurface> surf);
  void Save();
};
code:
#include "encoder.h"
#include <comdef.h>
#include <iostream>
#include <mfapi.h>

#pragma comment(lib, "mf.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "Mfreadwrite.lib")
#pragma comment(lib, "dxguid.lib")

using namespace std;
using namespace Microsoft::WRL;

const UINT32 VIDEO_WIDTH = 1024;
const UINT32 VIDEO_HEIGHT = 768;
const UINT32 VIDEO_FPS = 30;
const UINT64 VIDEO_FRAME_DURATION = 10 * 1000 * 1000 / VIDEO_FPS;
const UINT32 VIDEO_BIT_RATE = 800000;
const GUID   VIDEO_ENCODING_FORMAT = MFVideoFormat_WMV3;
const GUID   VIDEO_INPUT_FORMAT = MFVideoFormat_RGB32;

class ComException : public std::exception
{
public:
	ComException(HRESULT errorCode, const std::string& message) :
		errorCode(errorCode),
		message(message)
	{}
	friend std::ostream& operator<<(std::ostream& s, const ComException& e)
	{
		_com_error comError(e.errorCode);

		char msg[512];
		size_t length;
		wcstombs_s(&length, msg, 512, comError.ErrorMessage(), _TRUNCATE);
		return s << "Error in '" << e.message << "': 0x" << std::hex << e.errorCode << " - " << std::string(msg,length);
	}
private:
	HRESULT errorCode;
	std::string message;
};


#define THROW_ON_FAIL(expr) {                                      \
	HRESULT _errorCode = expr;                                     \
	if(FAILED(_errorCode)) throw ComException(_errorCode, #expr);  \
}

Encoder::Encoder()
{
	try
	{
		MFStartup(MF_VERSION);
		
		THROW_ON_FAIL(MFCreateSinkWriterFromURL(
			L"c:\\logs\\output.wmv",
			NULL,
			NULL,
			&writer));

		// set output type
		THROW_ON_FAIL(MFCreateMediaType(&mediaTypeOut));
		THROW_ON_FAIL(mediaTypeOut->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
		THROW_ON_FAIL(mediaTypeOut->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264));
		THROW_ON_FAIL(mediaTypeOut->SetUINT32(MF_MT_AVG_BITRATE, VIDEO_BIT_RATE));
		THROW_ON_FAIL(mediaTypeOut->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive));
		THROW_ON_FAIL(MFSetAttributeSize(mediaTypeOut.Get(), MF_MT_FRAME_SIZE, VIDEO_WIDTH, VIDEO_HEIGHT));
		THROW_ON_FAIL(MFSetAttributeRatio(mediaTypeOut.Get(), MF_MT_FRAME_RATE, VIDEO_FPS, 1));
		THROW_ON_FAIL(MFSetAttributeRatio(mediaTypeOut.Get(), MF_MT_PIXEL_ASPECT_RATIO, 1, 1));
		THROW_ON_FAIL(writer->AddStream(mediaTypeOut.Get(), &streamIndex));

                // Set the input media type.
		THROW_ON_FAIL(MFCreateMediaType(&mediaTypeIn));
		THROW_ON_FAIL(mediaTypeIn->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
		THROW_ON_FAIL(mediaTypeIn->SetGUID(MF_MT_SUBTYPE, VIDEO_INPUT_FORMAT));
		THROW_ON_FAIL(mediaTypeIn->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive));
		THROW_ON_FAIL(MFSetAttributeSize(mediaTypeIn.Get(), MF_MT_FRAME_SIZE, VIDEO_WIDTH, VIDEO_HEIGHT));
		THROW_ON_FAIL(MFSetAttributeRatio(mediaTypeIn.Get(), MF_MT_FRAME_RATE, VIDEO_FPS, 1));
		THROW_ON_FAIL(MFSetAttributeRatio(mediaTypeIn.Get(), MF_MT_PIXEL_ASPECT_RATIO, 1, 1));
		THROW_ON_FAIL(writer->SetInputMediaType(streamIndex, mediaTypeIn.Get(), NULL));

		writer->BeginWriting();
	}
	catch(ComException ex)
	{
		std::cout << ex;
	}
}

void Encoder::AddFrame(_In_ Microsoft::WRL::ComPtr<IDXGISurface> surf)
{
	try
	{
		DXGI_SURFACE_DESC desc;
		THROW_ON_FAIL(surf->GetDesc(&desc));
		std::cout  << "w: " << desc.Width << " h: " << desc.Height << " format " << desc.Format << std::endl;
		ComPtr<IMFSample> sample;
		ComPtr<IMFMediaBuffer> buffer;

		THROW_ON_FAIL(MFCreateDXGISurfaceBuffer(
			IID_ID3D11Texture2D,
			surf.Get(),
			0,
			FALSE,
			&buffer));

		THROW_ON_FAIL(MFCreateSample(&sample));
		THROW_ON_FAIL(sample->AddBuffer(buffer.Get()));
		THROW_ON_FAIL(sample->SetSampleTime(frame * VIDEO_FRAME_DURATION));
		THROW_ON_FAIL(sample->SetSampleDuration(VIDEO_FRAME_DURATION));
		THROW_ON_FAIL(writer->WriteSample(streamIndex, sample.Get()));

		std::cout << "wrote ok" << std::endl;
	}
	catch (ComException ex)
	{
		std::cout << ex << std::endl;
	}
	frame++;
}

void Encoder::Save()
{
	writer->Flush(streamIndex);
}
The code runs fine without any errors up until the writer->WriteSample call - that returns an Invalid Parameter error. I've verified that the size and format of the surface matches the size and format of the input media type. I've enabled DirectX debug messages and don't see anything of interest printed there. Any idea what might be causing this issue? Any way to better narrow it down?

For some context - my end goal is to encode an IddCx virtual display which has SwapChain that supplies the IDXGISurface. Since developing and debugging drivers is painful at best I want to prove out my encoder on a simple desktop app. Since I have 0 experience with DirectX I found a tutorial here which uses a SwapChain on a desktop app and added a call to my Encoder::AddFrame. I was able to save the contents of the IDXGISurface without any issue so I know things are working at some level.

Edit: If I create the buffer with a static block of memory like this example things work fine so it's something to do with the surface I'm trying to use.

fankey fucked around with this message at 05:12 on Nov 10, 2020

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

Hey, got a question about OpenGL VBOs and using indexed glDrawElements[BaseVertex], with non-interleaved attributes.

So let's say I'm rendering a cube: 8 vertices, 6 faces, 12 tris. Forget about triangle strips for the moment and lets say I'm using basic GL_TRIANGLES, so there would be 12*3 = 36 vertex indices.
I have a shader which takes a vertex attribute relating to how edges are rendered: eg "internal" edges of cube faces would be omitted.
I want to avoid duplication of vertex coordinate data, but each vertex of the cube could be used in anywhere from 3 up to 6 triangles (or 4.5 on avg), and I would want that vertex attribute to possibly be different in each case.

Since the attributes are not interleaved, is there any way to have them indexed separately from the vertices?

As I understand, you can set up the initial location and stride with glVertexAttribPointer, but how does that work with indexed elements? Is it always going to basically follow vertex_index * attribute_stride when accessing these attributes even from separate buffers, or can it be made to just sequentially/directly go through that attribute buffer, while indexing the vertices indirectly?

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

Since the attributes are not interleaved, is there any way to have them indexed separately from the vertices?

There is not, but you can always use a storage buffer instead of a vertex attribute, and index it with whatever logic you like.

Raenir Salazar
Nov 5, 2010

"According to Wikipedia" there is a black hole that emits zionist hawking radiation where my brain should have been

I really should just shut the fuck up and stop posting forever
College Slice
I am struggling with this shader here:

Perlin Noise Shader

I'd like it so that regardless the dimensions of the plane it is on, it will properly display the generated noise, instead of stretching or squashing. Can I get this to use world coordinates instead of vertex coordinates?

Xerophyte
Mar 17, 2008

This space intentionally left blank
Not entirely sure that what you're asking for is what you want. Sure, you can do simplex/FBM noise in worldspace. Exactly how you do it in Unity I don't know. I'd guess that they have easier methods than editing that shader, but I don't know Unity.


Anyhow, that repo has a 3D version at https://github.com/Scrawk/GPU-GEMS-Improved-Perlin-Noise/blob/master/Assets/ImprovedPerlinNoise/Shader/ImprovedPerlinNoise3D.shader which is probably a better base. The only change you'd need to do is at the call point. At fBm(i.uv.xyz, 4);, use the worldspace position instead of the 3D UV. It should just work.

If your UVs have a consistent scale across the entire surface then another option is to keep using the 2D version but add a real world UV scale parameter to the shader (also works if your surface is a plane, in which case you can just project its worldspace coordinates). This is how basically every CAD material works: all the meshes have a consistent UV scale known by the application. All the textures, including simplex noise and other procedurals, have metadata specifying the real-world extent of a [0,1] slice. Those two values are used to compute a float2 texture scale that is applied to the UVs when rendering.

I'd also note that shader is implementing a GPU Gems article by Ken Perlin from 2004 that focused largely on performance. Ken Perlin is, unsurprisingly, pretty good at Perlin noise so it's going to work fine. It also does a lot of things that aren't relevant 16 years of GPUs later.

Raenir Salazar
Nov 5, 2010

"According to Wikipedia" there is a black hole that emits zionist hawking radiation where my brain should have been

I really should just shut the fuck up and stop posting forever
College Slice
Basically I just want to (in Unity) generate procedural Perlin/Simplex noise to a texture to use for real-time terrain generation (I tried using DOTS/Unity's Jobs system but 5k by 2k was an unacceptable 10 seconds).

In Unity you use Graphics.Blit, which at first didn't work very well but eventually some people were able to help me solve it.

Changing it from
code:
o.uv = v.vertex;
to:
code:
o.uv = v.texcoord - float4(0.5, 0.5, 0, 0);
Let Blit work to properly save it to a texture, and it lets me zoom in and out from the center of the plane.

To from there get it so that the displayed material on the plane doesn't squash/stretch when I change the scale of the plane to something that isn't square (1408, 1, 512) I did this:

code:
				o.uv = v.texcoord - float4(0.5, 0.5, 0, 0);
				o.uv.x = o.uv.x * _Scale.x;
				o.uv.y = o.uv.y * _Scale.y;
I was looking into "world space coordinates" because as far as I knew the points for the noise was determined by the object vertices, but with the above and subsequent changes the need for this change no longer is needed. Basically if I made a plane that's wider than a square I wanted the generated noise to "continue".

Where _Scale is something like 2.75, 1 in the case of 1408 by 512 to make sure the noise/material displayed to the plane isn't being stretched.

But now I want to be able to both zoom and pan (zoom is controlled by _Frequency), but I can't get that to work.

If I put the Offsets in inoise it pans and stays centered but has a weird parallax/cloud effect.
code:
			float inoise(float2 p)
			{
				p += _Offset;
				float2 P = fmod(floor(p), 256.0);	// FIND UNIT SQUARE THAT CONTAINS POINT
				p -= floor(p);                      // FIND RELATIVE X,Y OF POINT IN SQUARE.
				float2 f = fade(p);                 // COMPUTE FADE CURVES FOR EACH OF X,Y.

				P = P / 256.0;
				const float one = 1.0 / 256.0;

				// HASH COORDINATES OF THE 4 SQUARE CORNERS
				float A = perm(P.x) + P.y;
				float B = perm(P.x + one) + P.y;

				// AND ADD BLENDED RESULTS FROM 4 CORNERS OF SQUARE
				return lerp(lerp(grad(perm(A), p),
								   grad(perm(B), p + float2(-1, 0)), f.x),
							 lerp(grad(perm(A + one), p + float2(0, -1)),
								   grad(perm(B + one), p + float2(-1, -1)), f.x), f.y);

			}
i also get the same result doing it this way:
code:
			// fractal abs sum, range 0.0 - 1.0
			float turbulence(float2 p, int octaves)
			{
				float sum = 0;
				float freq = _Frequency, amp = 1.0;
				for (int i = 0; i < octaves; i++)
				{
					sum += abs(inoise(_Offset + p * freq))*amp;
					freq *= _Lacunarity;
					amp *= _Gain;
				}
				return sum;
			}
In Sebastian Lague's procedural landmass tutorial someone solved a similar parallax effect doing this:

code:
float sampleX = ((x - halfWidth) / scale * frequency) + (octaveOffsets[i].x / scale * frequency);
float sampleY = ((y - halfHeight) / scale * frequency) + (octaveOffsets[i].y / scale * frequency);
But no matter what I try I can't seem to solve the parallax without also breaking the zoom being centered.

I don't really need 3D or 4D noise because I'm likely just going to use a fall off map to insure the map is surrounded by water, so I don't really need it to be able to wrap around.

I can probably live with the parallax but if there's a simple change to fix it, I'd be greatful.

Xerophyte
Mar 17, 2008

This space intentionally left blank

Raenir Salazar posted:

Basically I just want to (in Unity) generate procedural Perlin/Simplex noise to a texture to use for real-time terrain generation (I tried using DOTS/Unity's Jobs system but 5k by 2k was an unacceptable 10 seconds).

To from there get it so that the displayed material on the plane doesn't squash/stretch when I change the scale of the plane to something that isn't square (1408, 1, 512) I did this:

code:
				o.uv = v.texcoord - float4(0.5, 0.5, 0, 0);
				o.uv.x = o.uv.x * _Scale.x;
				o.uv.y = o.uv.y * _Scale.y;
I was looking into "world space coordinates" because as far as I knew the points for the noise was determined by the object vertices, but with the above and subsequent changes the need for this change no longer is needed. Basically if I made a plane that's wider than a square I wanted the generated noise to "continue".

Where _Scale is something like 2.75, 1 in the case of 1408 by 512 to make sure the noise/material displayed to the plane isn't being stretched.

But now I want to be able to both zoom and pan (zoom is controlled by _Frequency), but I can't get that to work.

[...]

But no matter what I try I can't seem to solve the parallax without also breaking the zoom being centered.

I don't really need 3D or 4D noise because I'm likely just going to use a fall off map to insure the map is surrounded by water, so I don't really need it to be able to wrap around.

Right, OK. I'm kind of avoiding doing a deep dive into that particular simplex noise implementation, because it should be irrelevant to your problem and it seems very Unity-specific. I would again be deeply surprised if Unity does not have a better solution to your problem than "edit this shader" but I don't know Unity. Someone who does could probably give a better answer.

This said, the "parallax" you're talking about sounds like what happens when you scale or translate the different octaves in simplex noise independently. I think inoise is called for each octave in that implementation, so if you insert a fixed transform there then the transforms will only be correct for at most one octave of the noise and you get the effect that the different layers of the noise move independently.


For texture transforms, you're on the right track by the sounds of it. You can always do those without touching anything about the details of the texture generation or even knowing what type of texture you are using. You don't need or want to touch noise generation parameters like frequency or bandwidth to zoom or pan, any more than you'd need to edit an image-based texture in paint to zoom or pan.

If you have a texture coordinate p, and you want to translate it so the texture origin is centered on a point p0, you can do the transform p = p + p0.
If you want to zoom in by a factor of k, you can do the transform p = p * k.

Transforming the texture lookup coordinate prior to look-up works for any texture, procedural or image, of any dimensions. So just do that things wherever you're specifying the o.uv = v.texcoord - float4(0.5, 0.5, 0, 0); texture coordinate, I assume the vertex shader. You'd end up with something like:

code:
// Assuming we have the uniforms:
// uniform float2 _Scale;
// uniform float2 _Translate;
// uniform float2 _OutputSize;
// Your Uniforms May Vary

// Change the extents from [0,1] -> [-0.5,0.5] as before
o.uv = v.texcoord - float4(0.5, 0.5, 0, 0);

// Translate to center the texture on a new point
o.uv = o.uv + _Translate;

// Zoom in or out on the new center
o.uv = o.uv * _Scale;

// Correct for blitting to an image with a non-square aspect
float aspectRatio = _OutputSize.x / _OutputSize.y;
o.uv.x = o.uv.x * aspectRatio;
This will look different in your case, I don't know how Unity does texture transforms. As you noticed you can bake the aspect ratio into the _Scale parameter, for instance. In general a texture transform can be expressed with a homogeneous coordinates and a matrix like any other affine transform.


I suggested 3D noise because I figured you were texturing 3D data. I'm not sure what you mean by wrap-around, but it sounds unrelated to texture dimension. Since you're texturing a 2D image, use a 2D texture.

Raenir Salazar
Nov 5, 2010

"According to Wikipedia" there is a black hole that emits zionist hawking radiation where my brain should have been

I really should just shut the fuck up and stop posting forever
College Slice

Xerophyte posted:

Right, OK. I'm kind of avoiding doing a deep dive into that particular simplex noise implementation, because it should be irrelevant to your problem and it seems very Unity-specific. I would again be deeply surprised if Unity does not have a better solution to your problem than "edit this shader" but I don't know Unity. Someone who does could probably give a better answer.

This said, the "parallax" you're talking about sounds like what happens when you scale or translate the different octaves in simplex noise independently. I think inoise is called for each octave in that implementation, so if you insert a fixed transform there then the transforms will only be correct for at most one octave of the noise and you get the effect that the different layers of the noise move independently.


For texture transforms, you're on the right track by the sounds of it. You can always do those without touching anything about the details of the texture generation or even knowing what type of texture you are using. You don't need or want to touch noise generation parameters like frequency or bandwidth to zoom or pan, any more than you'd need to edit an image-based texture in paint to zoom or pan.

If you have a texture coordinate p, and you want to translate it so the texture origin is centered on a point p0, you can do the transform p = p + p0.
If you want to zoom in by a factor of k, you can do the transform p = p * k.

Transforming the texture lookup coordinate prior to look-up works for any texture, procedural or image, of any dimensions. So just do that things wherever you're specifying the o.uv = v.texcoord - float4(0.5, 0.5, 0, 0); texture coordinate, I assume the vertex shader. You'd end up with something like:

code:
// Assuming we have the uniforms:
// uniform float2 _Scale;
// uniform float2 _Translate;
// uniform float2 _OutputSize;
// Your Uniforms May Vary

// Change the extents from [0,1] -> [-0.5,0.5] as before
o.uv = v.texcoord - float4(0.5, 0.5, 0, 0);

// Translate to center the texture on a new point
o.uv = o.uv + _Translate;

// Zoom in or out on the new center
o.uv = o.uv * _Scale;

// Correct for blitting to an image with a non-square aspect
float aspectRatio = _OutputSize.x / _OutputSize.y;
o.uv.x = o.uv.x * aspectRatio;
This will look different in your case, I don't know how Unity does texture transforms. As you noticed you can bake the aspect ratio into the _Scale parameter, for instance. In general a texture transform can be expressed with a homogeneous coordinates and a matrix like any other affine transform.


I suggested 3D noise because I figured you were texturing 3D data. I'm not sure what you mean by wrap-around, but it sounds unrelated to texture dimension. Since you're texturing a 2D image, use a 2D texture.

Thanks for this! I was able to gradually solve it with some help in the Unity discord, so for reference I will document the solution for your interest :) Unity doesn't really have any better noise generation solution that's real time. I tried using their noise generation function from their math library in DOTS (their multithreading/jobs API) and it was still 10 seconds or more for a 5k by 2k texture while this github project was real time on the shader. Basically it seemed like it was an unavoidable issue where the snoise(...) function was a very expensive operation no matter what when performed on the CPU with the in-built Unity one. Maybe one of those "fastnoise" libraries would've made it better but they use generators and would've required a way to re-implement it to work while multithreaded.

The solution in the end looks like this:

code:
Shader "Noise/IPN_FBM_2D"
{
	Properties{
		_Frequency("Frequency", float) = 1
		_Lacunarity("Lacunarity", float) = 2
		_Gain("Persistence", float) = 0.5
		_Scale("Scaling", Vector) = (1,1,0,0)
		_Offset("Offset", Vector) = (0,0,0,0)
	}
	SubShader
	{
		Pass
		{

			CGPROGRAM

			#pragma vertex vert
			#pragma fragment frag
			#pragma target 3.0
			#include "UnityCG.cginc"

			sampler2D _PermTable1D, _Gradient2D;
			float _Frequency, _Lacunarity, _Gain;
			float2 _Scale;
			float2 _Offset;

			struct v2f
			{
				float4 pos : SV_POSITION;
				float2 uv : TEXCOORD;
			};
[b]
			 v2f vert(appdata_base v)
			{
				v2f o;
				o.pos = UnityObjectToClipPos(v.vertex);
				o.uv = _Scale * (v.texcoord - 0.5) + _Offset / _Frequency;

				return o;
			}
[/b]
			float2 fade(float2 t)
			{
				return t * t * t * (t * (t * 6 - 15) + 10);
			}

			float perm(float x)
			{
				return tex2D(_PermTable1D, float2(x,0)).a;
			}

			float grad(float x, float2 p)
			{
				float2 g = tex2D(_Gradient2D, float2(x*8.0, 0)).rg *2.0 - 1.0;
				return dot(g, p);
			}

			float inoise(float2 p)
			{
				float2 P = fmod(floor(p), 256.0);	// FIND UNIT SQUARE THAT CONTAINS POINT
				p -= floor(p);                      // FIND RELATIVE X,Y OF POINT IN SQUARE.
				float2 f = fade(p);                 // COMPUTE FADE CURVES FOR EACH OF X,Y.

				P = P / 256.0;
				const float one = 1.0 / 256.0;

				// HASH COORDINATES OF THE 4 SQUARE CORNERS
				float A = perm(P.x) + P.y;
				float B = perm(P.x + one) + P.y;

				// AND ADD BLENDED RESULTS FROM 4 CORNERS OF SQUARE
				return lerp(lerp(grad(perm(A), p),
								   grad(perm(B), p + float2(-1, 0)), f.x),
							 lerp(grad(perm(A + one), p + float2(0, -1)),
								   grad(perm(B + one), p + float2(-1, -1)), f.x), f.y);

			}

			// fractal sum, range -1.0 - 1.0
			float fBm(float2 p, int octaves)
			{
				float freq = _Frequency, amp = 0.5;
				float sum = 0;
				for (int i = 0; i < octaves; i++)
				{
					sum += inoise(p * freq) * amp;
					freq *= _Lacunarity;
					amp *= _Gain;
				}
				return sum;
			}

			half4 frag(v2f i) : COLOR
			{
				float n = fBm(i.uv, 4);

				return half4(n,n,n,1);
			}

			ENDCG

		}
	}
	Fallback "VertexLit"
}
So looking at specifically the vertex shader:
code:
			 v2f vert(appdata_base v)
			{
				v2f o;
				o.pos = UnityObjectToClipPos(v.vertex);
				o.uv = _Scale * (v.texcoord - 0.5) + _Offset / _Frequency;

				return o;
			}
I don't really get why dividing by frequency fixes it, but it is very similar to the corrected code posted in Sebastian Lague's procedural terrain video:

code:
float sampleX = ((x - halfWidth) / scale * frequency) + (octaveOffsets[i].x / scale * frequency);
float sampleY = ((y - halfHeight) / scale * frequency) + (octaveOffsets[i].y / scale * frequency);
It took a bit of effort to explain the problem because of confusion over "scaling" as in correcting for aspect ratio and scaling as in zooming. Some solutions allowed to zoom the texture but frequence was still off-centered and so on until eventually we figured out the above solution.

code:
o.uv = _Scale * (v.texcoord - 0.5) + _Offset / _Frequency;
From there I also went and decided to make some shaders to apply a fall off map to the generated texture.



I figure it's much faster to blend textures on the GPU and re-Blit the result to a new texture than to either use multithreading to apply photoshop blends or just to run a loop. Not that running a loop would've been slow, as it was the noise function that made it intolerably slow.

Although I hated the streaks on the diagonals, so after discovering Unity's Shader Forge has a Rounded Rectangle node, spent some time through sheer trial and error reimplementing it as a custom node.

code:
float a = min(abs(r * 2), w);
float b = min(a, h);

r = max(b, 1e-5);
float2 uvs = abs(uv * 2 - 1) - float2(w, h) + f;
float d = length(max(0, uvs)) / v;
Out = d;
(where w,h is offset, v and f used to be radius but I split it up to try to have better control, but are probably aren't necessary given below).



I then plug the resulting value into another custom function node that takes the value to a power of a different a,b input to adjust the fall off map. Basically I wanted to have the "fall off" of the fall off map but in the shape of a rectangle, the "linear" based fall off map resulted in uglyness along the diagonals, if I knew how to do like, bicubic sampling or gaussian blur maybe that would've fixed it but it seemed easier to reimplement the rounded rectangle.

code:
Out = pow(Value, A) / (pow(Value, A) + pow(B - B * Value, A));
I then now do this, for each photoshop blend mode I'm interested in as my goal here is to layer a bunch of different noise maps together to try to get improved and more natural results (Multiply, Screen, Blend, etc):



Since by multiplying ridged noise and fbm noise will give like, mountainous looking islands with ridges and so on.

I've tested it and the results are Blit-able back to texture, so I'm pretty happy that I can get the results I want. I'm basically working on a procedural random world generator where I'd like to give the user/player the ability to create a world as they want.

fankey
Aug 31, 2001

I'm trying to write a simple HLSL 2d shader ( the end result is a WPF Effect if that matters ) that needs to deal with alpha and I am getting totally confused. This simple filter works fine and passes the color straight through
code:
sampler2D input: register(s0);

float4 main(float2 uv:TEXCOORD) : COLOR
{
  float4 color = tex2D(input, uv);
  return color;
}
The problem is that if I modify either the alpha or the color things get wonky so I'm obviously not understanding basic concepts. I can set rgba to fixed values but attempting to do any math on them gets weird. So this works to clear out red or to adjust the alpha
code:
sampler2D input: register(s0);

float4 main(float2 uv:TEXCOORD) : COLOR
{
  float4 color = tex2D(input, uv);
  color.rgb.r = 0;
  // also this works 
  color.a = color.a / 2;
  return color;
}
But if I try and invert a pixel things get weird. This code tries to just invert the blue channel but ends up also modifying the alpha.
code:
sampler2D input: register(s0);

float4 main(float2 uv:TEXCOORD) : COLOR
{
  float4 color = tex2D(input, uv);
  color.b = 1.0f - color.b;
  return color;
}
both
I tried saving the alpha and reapplying but it didn't make any difference.

Absurd Alhazred
Mar 27, 2010

by Athanatos

fankey posted:

I'm trying to write a simple HLSL 2d shader ( the end result is a WPF Effect if that matters ) that needs to deal with alpha and I am getting totally confused. This simple filter works fine and passes the color straight through
code:
sampler2D input: register(s0);

float4 main(float2 uv:TEXCOORD) : COLOR
{
  float4 color = tex2D(input, uv);
  return color;
}
The problem is that if I modify either the alpha or the color things get wonky so I'm obviously not understanding basic concepts. I can set rgba to fixed values but attempting to do any math on them gets weird. So this works to clear out red or to adjust the alpha
code:
sampler2D input: register(s0);

float4 main(float2 uv:TEXCOORD) : COLOR
{
  float4 color = tex2D(input, uv);
  color.rgb.r = 0;
  // also this works 
  color.a = color.a / 2;
  return color;
}
But if I try and invert a pixel things get weird. This code tries to just invert the blue channel but ends up also modifying the alpha.
code:
sampler2D input: register(s0);

float4 main(float2 uv:TEXCOORD) : COLOR
{
  float4 color = tex2D(input, uv);
  color.b = 1.0f - color.b;
  return color;
}
both
I tried saving the alpha and reapplying but it didn't make any difference.

Have you tried dividing the color by the alpha, inverting, then multiplying by the alpha? It might be pre-multiplied and get messed up if you don't account for it.

haveblue
Aug 15, 2005



Toilet Rascal
Also are you sure the range of the texture channel values is 0-1? If the texture uses an integer pixel format it could be 0-255 and casting it to float won't necessarily normalize it.

What happens if you assign 1.0 to red in the second example?

fankey
Aug 31, 2001

Absurd Alhazred posted:

Have you tried dividing the color by the alpha, inverting, then multiplying by the alpha? It might be pre-multiplied and get messed up if you don't account for it.
That was it - thanks!

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

How can I calculate if a planar polygon in 3d space is front or rear facing in relation to the camera? I need to apply a glPolygonOffset in opposite direction depending on which side is visible.

haveblue
Aug 15, 2005



Toilet Rascal
If you're using the fixed function pipeline, GL can do this for you with with the cull options. Then you just render it twice, for front and back, with matching offset values

If you're in a shader, check the sign of normal.z after the modelview transformation

Xerophyte
Mar 17, 2008

This space intentionally left blank
You can apply an offset to gl_FragDepth based on gl_FrontFacing in the fragment shader. That might limit the early z tests, depending on how you're offsetting, and I don't think you can offset by the same amount as glPolygonOffset (which I've never used).

In general you will not be able to do this in the vertex shader by just looking at the post-transform normal. A given vertex can be a part of both a front-facing and a back-facing triangle, and a vertex normal can and almost certainly will be back-facing from some perspectives even if it is a part of a front-facing triangle. Offsetting the vertex by the normal still works if you know you have nice inputs with vertex data that lets you, of course, but not for arbitrary input.

xgalaxy
Jan 27, 2004
i write code
Kind of a long shot.

Does anyone know how the Material Graph in Unreal Engine works?
I’m assuming a graph gets "cooked" into a shader + textures but I can’t seem to find where in the code this happens.

For example in another engine I know that has a much less sophisticated material graph each node actually corresponds to a "template" (for lack of a better term) of shader code and when the graph gets cooked for release it just compiles together each node into a single shader text file and then compiles that to spirv or whatever platform shader byte code. In other words each node is just some text and it walks the graph and concats the text together (albeit it’s a bit more complicated then that). But this doesn’t appear to be how Unreal does it.

EDIT:
Was able to find what I was looking for in:
- Source/Runtime/Engine/Private/Materials
- Shaders/Private/MaterialTemplate.ush

xgalaxy fucked around with this message at 17:05 on May 25, 2022

giogadi
Oct 27, 2009

If I’m planning to do cross-platform 3d graphics, is there any reason not to do it in Vulkan? I.e., is there any advantage to the alternative of manually doing a dx12 backend and a metal backend?

(Talking mostly theoretically here just to better understand. I’m probably gonna just stick with OpenGL for my actual work for the foreseeable future)

Absurd Alhazred
Mar 27, 2010

by Athanatos

giogadi posted:

If I’m planning to do cross-platform 3d graphics, is there any reason not to do it in Vulkan? I.e., is there any advantage to the alternative of manually doing a dx12 backend and a metal backend?

(Talking mostly theoretically here just to better understand. I’m probably gonna just stick with OpenGL for my actual work for the foreseeable future)

My impression is that structurally there isn't that much difference between the latest generation of APIs. That being said, you might still not be able to use Vulkan or might not have as much support for some low-end or specialized platforms like slightly older consoles and maybe certain smartphones. So you might end up having to create an abstraction layer with separate implementations anyway.

giogadi
Oct 27, 2009

I'm following vulkan-tutorial.com on an old mac with integrated graphics. I ran into a weird issue, and I wonder if any of y'all have a clue about what could be going on.

The first part of the tutorial involves drawing a single triangle to the screen without using a vertex buffer at all, so the vertex shader just hardcodes the vertex positions and colors into the shader, using gl_VertexIndex to select the appropriate position/color. I'm following the code exactly to the tutorial, but my triangle's colors are wrong: only one corner is red, and the other two corners are black.

code:
// Vertex shader
#version 450

layout(location = 0) out vec3 vertColor;

vec2 positions[3] = vec2[](
     vec2(0.0, -0.5),    
     vec2(0.5, 0.5),
     vec2(-0.5, 0.5)
);

vec3 colors[3] = vec3[](
     vec3(1.0, 0.0, 0.0),
     vec3(0.0, 1.0, 0.0),
     vec3(0.0, 0.0, 1.0)
);

void main() {
     gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
     vertColor = colors[gl_VertexIndex];
}
The fragment shader is super simple, but including it just for completeness:

code:
// fragment shader
#version 450

layout(location = 0) in vec3 vertColor;

layout(location = 0) out vec4 outColor;

void main() {
     outColor = vec4(vertColor, 1.0);
}
HOWEVER, here's the weird part: if I change main to re-set the values of the colors array individually, the result looks perfectly fine:

code:
// <above main, all the code is the same>
void main() {
     gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);

     colors[0] = vec3(1.0, 0.0, 0.0);
     colors[1] = vec3(0.0, 1.0, 0.0);
     colors[2] = vec3(0.0, 0.0, 1.0);
     vertColor = colors[gl_VertexIndex];
}
Does anyone have any idea why the 2nd main() function would work where the 1st one didn't? There seems to be something odd about the array of vec3's. But the weird thing is that the array of vec2's before that for the positions works just fine.

e: I realize that this specific shader is totally unrealistic, but I'm stuck on this because in the future if I want to use hardcoded constant arrays for some reason, I'm not gonna trust 'em if they can exhibit weird behavior I don't understand!!!

Only registered members can see post attachments!

giogadi fucked around with this message at 22:09 on Nov 20, 2022

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today
Is your code validation clean? Wouldn't be shocked if you're hitting a moltenvk or driver bug, "old integrated Mac" is almost the worst environment you could've chosen.

It's also not really an unrealistic shader at all, most games do something similar for full screen passes.

Ralith fucked around with this message at 23:49 on Nov 23, 2022

Ranzear
Jul 25, 2013

Suppose I have a compute shader with a really sparse SSBO array that is ostensibly a tree structure that fills itself up (fixed depth, fixed size). Is it fine to just no-op any indexes that haven't been populated yet?

Edit: Found better leverage of my acceleration structure to still do everything parallel but get 90% of the benefit. Still posing the question whether a single if-then-no-op is potentially slow just because it needs to split the execution weirdly or something.

Ranzear fucked around with this message at 01:41 on Mar 20, 2023

Yaoi Gagarin
Feb 20, 2014

giogadi posted:

I'm following vulkan-tutorial.com on an old mac with integrated graphics. I ran into a weird issue, and I wonder if any of y'all have a clue about what could be going on.

The first part of the tutorial involves drawing a single triangle to the screen without using a vertex buffer at all, so the vertex shader just hardcodes the vertex positions and colors into the shader, using gl_VertexIndex to select the appropriate position/color. I'm following the code exactly to the tutorial, but my triangle's colors are wrong: only one corner is red, and the other two corners are black.

code:
// Vertex shader
#version 450

layout(location = 0) out vec3 vertColor;

vec2 positions[3] = vec2[](
     vec2(0.0, -0.5),    
     vec2(0.5, 0.5),
     vec2(-0.5, 0.5)
);

vec3 colors[3] = vec3[](
     vec3(1.0, 0.0, 0.0),
     vec3(0.0, 1.0, 0.0),
     vec3(0.0, 0.0, 1.0)
);

void main() {
     gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
     vertColor = colors[gl_VertexIndex];
}
The fragment shader is super simple, but including it just for completeness:

code:
// fragment shader
#version 450

layout(location = 0) in vec3 vertColor;

layout(location = 0) out vec4 outColor;

void main() {
     outColor = vec4(vertColor, 1.0);
}
HOWEVER, here's the weird part: if I change main to re-set the values of the colors array individually, the result looks perfectly fine:

code:
// <above main, all the code is the same>
void main() {
     gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);

     colors[0] = vec3(1.0, 0.0, 0.0);
     colors[1] = vec3(0.0, 1.0, 0.0);
     colors[2] = vec3(0.0, 0.0, 1.0);
     vertColor = colors[gl_VertexIndex];
}
Does anyone have any idea why the 2nd main() function would work where the 1st one didn't? There seems to be something odd about the array of vec3's. But the weird thing is that the array of vec2's before that for the positions works just fine.

e: I realize that this specific shader is totally unrealistic, but I'm stuck on this because in the future if I want to use hardcoded constant arrays for some reason, I'm not gonna trust 'em if they can exhibit weird behavior I don't understand!!!


This could very well be a compiler bug. Can you post the SPIR-V? The Vulkan sdk has a program called spirv-dis you can use to get a text listing of a SPIR-V binary. I'm not familiar with Mac development but I assume you're using glslang to convert from GLSL to SPIR-V right?


Ranzear posted:

Suppose I have a compute shader with a really sparse SSBO array that is ostensibly a tree structure that fills itself up (fixed depth, fixed size). Is it fine to just no-op any indexes that haven't been populated yet?

Edit: Found better leverage of my acceleration structure to still do everything parallel but get 90% of the benefit. Still posing the question whether a single if-then-no-op is potentially slow just because it needs to split the execution weirdly or something.

If you mean something like:


code:

if (is_populated[threadID]) {
  // Do stuff
}

It'll really depend on how much work is done on that branch. If it's really quick then it's probably not a big deal but if it's long then you could be hurting your utilization over time.

Note that if you have an else branch on that it gets even worse because all the threads in each warp will have to run both if even one is different

giogadi
Oct 27, 2009

VostokProgram posted:

This could very well be a compiler bug. Can you post the SPIR-V? The Vulkan sdk has a program called spirv-dis you can use to get a text listing of a SPIR-V binary. I'm not familiar with Mac development but I assume you're using glslang to convert from GLSL to SPIR-V right?

A little while back I actually found a github issue on MoltenVK with my exact problem:

https://github.com/KhronosGroup/MoltenVK/issues/1499

It looks like it's a bug in Intel's Metal drivers. I've been meaning to put together a super minimal example in pure Metal that reproduces the issue that I can send over to Apple.

It really sucks that bugs like this exist - it makes me afraid to use any graphics API built on top of Metal because it could potentially have weird bugs like the above. It's shameful that support for intel-based macs is so bad - there are still soooo many of these macbooks out there that otherwise work really great!

Yaoi Gagarin
Feb 20, 2014

giogadi posted:

A little while back I actually found a github issue on MoltenVK with my exact problem:

https://github.com/KhronosGroup/MoltenVK/issues/1499

It looks like it's a bug in Intel's Metal drivers. I've been meaning to put together a super minimal example in pure Metal that reproduces the issue that I can send over to Apple.

It really sucks that bugs like this exist - it makes me afraid to use any graphics API built on top of Metal because it could potentially have weird bugs like the above. It's shameful that support for intel-based macs is so bad - there are still soooo many of these macbooks out there that otherwise work really great!

Tbf any API might have bugs, that's just the nature of software. I've seen stuff like this in OpenGL too.

Absurd Alhazred
Mar 27, 2010

by Athanatos
I'd go further and say every API absolutely has bugs. Right now.

Yaoi Gagarin
Feb 20, 2014

Has anyone here actually used mesh shaders in production? Did you have a problem for which you felt they were the best option?

I'm curious because they've been around a while and the only game I've heard uses them is Alan Wake 2

Adbot
ADBOT LOVES YOU

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today
I think they're still not widely supported, so if you use them it can only be as an optional extra, which makes it a harder sell.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply