|
SwissCM posted:Path tracing is a little different to ray tracing. Path tracing calculates light from the source, ray tracing calculates light from the camera. Ray tracing is more efficient, but less correct and can't do complex light interactions as well as path tracing can (caustics is a common example). Ray tracing is even more efficient if you add tech like eye tracking. Most production ray tracing based renderers use fairly straight forward path tracing. I suspect this is because it's easier to get right and offers more opportunities for SIMD and GPU acceleration than BDPT or particle tracing.
|
# ¿ Mar 23, 2018 09:07 |
|
|
# ¿ May 15, 2024 07:47 |
|
SwissCM posted:Whats a good resource to find out more about path tracing?
|
# ¿ Mar 23, 2018 17:51 |
|
shrike82 posted:it's not clear the frame + movement vectors and previous frame CNN autoencoder approach they've gone with is the only/best way to do game upscaling - just look at how fast Nvidia is iterating on their implementations. given how fast the broader ML field moves, it would not surprise me to see Microsoft or a game developer roll out an implementation of a better approach that some random academic group (or even Microsoft Research) publishes out of nowhere. I've got some experience with neural networks and machine learning algorithms. Enough to implement a digit classifier with MNIST from scratch in C++ anyway. I'm also familiar with reconstruction algorithms for computer graphics as writing ray tracers has been a dorky hobby of mine for the last 20 years. The work NVIDIA's been doing on neural denoising is extremely impressive and could be a total game changer for stuff like the architectural visualisation field. DLSS is also extremely impressive but I don't think it's self evident that they've hit the best spot on the quality/computation curve or that their algorithm universally applicable. Nobody outside of NVIDIA knows exactly what they're doing, but part of their algorithm we do know about is an auto-encoder. Autoencoders are, put very simply, networks that accepts a bunch on inputs, contains a "choke point" in the middle that forces the inputs to be mapped to a much smaller representation, and an expander that attempts to reconstruct the original inputs from the compressed middle layers. These are great at denoising, inpainting missing details, or if you've got a larger number out outputs than inputs, scaling. Running them is just a bunch of matrix multiplications, which is what the Tensor Cores are designed to do quickly. They usually produce crap results when the inputs are too far outside their training set though and you can't train on everything. To engage in a bit so semi-informed baseless speculation, I think part of the reason DLSS still isn't widely available and has shown up in so few shipped titles is that it requires a degree of per-title or even per-scene tuning in it's current state to produce decent results. Not retraining the underlying network, but massaging and filtering it's inputs. It's in UE4 now, but NVIDIA does approval on a per-developer basis before they let you turn it on. Throwing the doors open and letting everybody play with it will mean that it's limits and weak points are quickly found. NVIDIA has determined, correctly in my opinion, that a few titles that implement DLSS really well combined with the promise of more to come is going to sell more cards than those same few games and a bunch of awful indy crap that uses DLSS badly. I guess what I'm getting at is that DLSS doesn't seem, to me, to be a general solution to the problem of game upscaling and there are almost certainly cheaper, non neural network based algorithms that will do well enough. Now that they've opened that door, I think we're going to see a lot of rapid progress in the field, and I can't wait to see where things are in a couple years.
|
# ¿ Oct 17, 2020 20:57 |
|
v1ld posted:So having the extra data available from a good TAA implementation may not be sufficient to have a good DLSS implementation in the same game? Interesting if so and may explain the gap between promise and reality right now. Thanks for the post, good read along with the one you responded to. Again, this is just speculation, but it’s telling that getting access to DLSS involves jumping through hoops and seems to be behind an NDA. There haven’t been, as far as I know, any third party really technical deep dives or blogs posts from gamedevs loving around with it. Why haven’t we seen somebody add it to a Doom source port yet? This is the kind of stuff that gamedevs and graphics programmers would love to see and would be super informative but it’s been crickets so far.
|
# ¿ Oct 18, 2020 00:48 |
|
TAAU and FSR aren't orthogonal. You could use TAAU to get to the base resolution and then FSR up to a higher resolution theoretically. Depending on the overhead and your target frame rate and resolution, that could be a viable setup. The fact the FSR will work in purely forward renderers that don't necessarily have motion vectors can't be entirely ignored either. I dunno, it seems good enough. FSR ultra quality for a ~30% frame rate boost could be a reasonable compromise, if you were struggling to hit 60hz/120hz/244hz at your desired resolution on your hardware. If AMD can convince/pay enough developers to support it, it'll have done it's job, which is to show up. I've always wondered why pure frame interpolation didn't see more uptake. I remember reading a paper in like 2011 about it from the Force Unleashed developers. They basically just used motion vectors to re-project every second frame while keeping the base rate at 30hz. They sampled the controller every 16ms to ensure that camera motion vectors updated at 60hz which had the effect of reducing perceived latency as well. It looked pretty good at the time and was viable on an Xbox 360. Something like that combined with rendering a quarter or eighth resolution intermediate frame to fill in occlusion artefacts would probably look pretty good if it were constrained to high frame rates. steckles fucked around with this message at 20:22 on Jun 22, 2021 |
# ¿ Jun 22, 2021 20:19 |
|
repiv posted:FSR does fit nicely there, but AFAIK there just aren't that many engines left that haven't already gone all-in on TAA Player relative motion aside, if you've got shadows in your game you'd need motion vectors relative to shadow casting lights and do reprojection of your shadow maps avoid ghosting. Ray traced reflections have their own problems. Computing the motion vectors in reflected geometry is non-trivial. As far as I know, games just ignore this and treat all motion as relative to the player. I'm not saying stuff like this is gonna be the death of TAAU and DLSS, but it is an area where a purely spatial upscaler could have perceptually better image quality at a given resolution depending on which artefacts you're sensitive to.
|
# ¿ Jun 22, 2021 21:30 |
|
Dr. Video Games 0031 posted:I think using DLSS Frame Generation will probably increase input lag by a lot. I would think that it'd work by looking at frame N-1, N, and then trying to generate frame N+0.5 by extrapolating the detected motion vectors, rather than looking at frame N-1, N, and then trying to generate from N-0.5 based on the difference. Motion vectors should allow you to do this for opaque stuff already, I guess all the optical flow stuff is to they can project transparent stuff forward. I'm kinda surprised it's take so long for somebody to come back to frame interpolation. The Force Unleashed devs showed an experimental 30 -> 60 FPS frame interpolation technique back in like 2011 that ran on a Xbox 360 and looked pretty good as I recall. They were projecting forward from the previous frame and polled controller input at 60hz rather than 30hz, which let them update camera movement motion vectors at 60hz. This apparently had the effect of reducing perceived latency. I'd be curious to see how something like that would work today, projecting the last frame forward and rendering like a 1/16th resolution frame to fill in disocclusions.
|
# ¿ Sep 20, 2022 20:28 |
|
repiv posted:it's been a staple of VR rendering for years, any time you miss vsync they extrapolate the last frame to the current frames head position to keep you from ing. AFAIK they even borrow the video encoders optical flow engine like nvidia is doing.
|
# ¿ Sep 20, 2022 21:20 |
|
repiv posted:frozen 1 was the last disney film to use their old raster renderer and it looks conspicuously video-gamey compared to big hero 6 just a year later Apparently production scene sizes are getting so large that they won't actually fit into any amount of ram any more and they're becoming impractical to just move around on company networks. They're totally disk and network bandwidth limited now so a lot of current research is focusing on ray re-ordering and clever caching to increase locality of reference. I wonder if some of it will filter down to GPU sized problems where keeping local caches full is as important as ray/primitive intersection performance. Incidentally, UE5's Nanite has some REYES-ish qualities to it. They managed to crack the automatic mesh decimation problem in a way that nobody could figure out with REYES, but I think there's a lot of similarity there. I heard from somebody that Weta was experimenting with something like Nanite for their production renderer. Basically, they were running geometry through a preprocessing step every frame to reduce the whole scene down to 1-2 triangles/texels per pixel for visible geometry and used some super simple heuristics for off-screen geometry. The resulting geometry is absolutely tiny compared to the input scene so they don't need to worry about disk or network latency, everything would fit memory. Not sure if it went anywhere, but that's supposedly what they were researching. Something like that might be possible on a GPU too, although I think the API would need to expose some way to update BVH coordinates rather than the current black-box that it is right now. Exciting times.
|
# ¿ Nov 6, 2022 02:55 |
|
repiv posted:epic did a seriously impressive flex by loading up a scene from disney's moana and rendering it in realtime through nanite v1ld posted:There have to be really good reasons why this isn't done, it's pretty obvious, but would be good to know why a movie studio which has no realtime constraint on rendering a scene wouldn't pursue those kinds of approaches. On the GPU side, batching and re-ordering can be non-trivial parallelize in a way that makes GPUs happy, so perhaps people have just been avoiding it. steckles fucked around with this message at 03:36 on Nov 6, 2022 |
# ¿ Nov 6, 2022 03:19 |
|
v1ld posted:E: Guess what I'm asking is if the fundamental bottleneck of pathtracing re: multi-machine parallelization is the full scene has to be on each machine? Rays can hit any part of the scene, so that would seem to be the case? VVVVV: Typically paths are traced hundreds or thousands of times per-pixel and averaged in off-line rendering. Fundamentally, path tracing would be impossible without averaging. Tracing even double digit numbers of rays per pixel and getting decent results requires nutty space magic like ReSTIR and Deep learning driven de-noising filters. steckles fucked around with this message at 03:53 on Nov 6, 2022 |
# ¿ Nov 6, 2022 03:45 |
|
repiv posted:speaking of nutty space magic, some of the nerds here may find this overview interesting, from a channel that popped up out of nowhere with improbably high production values Saukkis posted:I was thinking about this and I guess it would be possible to divide the scene to separate boxes and put every box in their dedicate computer. When I ray travels between these boxes the computers would communicate the ray information between them.
|
# ¿ Nov 6, 2022 20:32 |
|
My friend picked up a 4090 and I tried out Portal and Witcher 3 at his place. DLSS3 is pretty cool and I didn't really notice any latency, but I must be particularly sensitive to the weird compression-y artefacts it introduces. I felt like I was watching a YouTube video on a TV with frame interpolation turn on. I'm sure it'll improve in time, but I'd personally leave it off right now. My friend didn't mind the look though. Seems to perform best when going from a high frame rate to an even higher frame rate so I'm not sure how well something like a 4060, where the base frame rate will be way lower, would fare. I do think it's amusing that the people were always complaining about all the "hacks" that developers need to do to good looking lighting in video games before ray tracing was a thing. Now we've got ray tracing, but we're also burdened with an entirely new set of temporal hacks to get anything useable. Maybe one day return to the good old days, when a frame was a frame, not some monstrosity stitched together from the corpses of previous, dead frames.
|
# ¿ Dec 21, 2022 22:17 |
|
mobby_6kl posted:I dunno. I'd have to see it in action but being mad about "fake frames" while the raster pipeline is full of horrible hacks that vaguely approximate what things should look like seems pretty weird. I'm kind of reminded of the mid-late 2000s when everyone started doing HDR, bokeh, and bloom in their games, but few did it well and most were a blurry brown mess. Now, we've got a few stand out raytracing titles but need to put up with a lot ghosting, interpolation, and blur. In the case of RTGI, I find the way lighting lags its cause to be very distracting. Shadows from fans and quickly moving object in Portal RTX have an unrealistic, odd look to them that comes from the temporal reconstruction. We put up with it now because it's new and shiny, but nobody wants that. Honestly, we're probably at the dawn of a glorious new age of hacks that developers will create to bring, I dunno, "immediacy" back to ray traced lighting.
|
# ¿ Dec 22, 2022 00:50 |
|
JackBandit posted:Deep learning is actually an insane inflection point on image and video modeling and temporal data, this would not be just a “technology progressing as expected” case. I don’t have much experience with rendering but there’s a reason why Neurips and the other ML conferences sell out registration in minutes and SIGGRAPH is 75% deep learning papers. It really is a new world. repiv posted:yeah shadowmaps can model point lights pretty well, where they fall apart completely is with large or oddly shaped area lights. Truga posted:so what you're saying is, the first person who builds path tracing assisted shadowmaps to file off the edge cases wins the RT race?
|
# ¿ Dec 22, 2022 01:23 |
|
New Zealand can eat me posted:I'm not sure I agree, things like Restir DI (Reservoir Spatio-Temporal Importance Resampling) by itself seem to indicate the opposite. Any time you can throw monte carlo at a problem and see gains like this, a more efficient educated guess is right around the corner Edit: To clarify further: Stochastic algorithms, like ReSTIR or TAA, and Deep Learning/Whatever passes for AI these days have very little to do with one another. Stochastic techniques have a real part to play in network training, and the application of carefully formed noise is integral to content generation with trained networks, but just because an algorithm can adapt to it's input, like almost every light transport algorithm invented in the last 25 years does, doesn't make it AI or Deep Learning. I'm sure NVidia doesn't mind the public's confusion about where one ends and the other begins, but some DL algorithms are complementary to light transport, rather than a requirement for it. Edit2: One really cool thing that DL can do for path tracing is to compress extremely complicate BSDFs into very small networks. Super impressive stuff, but the training happens off-line and the CPU/GPU only need to evaluate the final network. steckles fucked around with this message at 03:29 on Dec 22, 2022 |
# ¿ Dec 22, 2022 02:46 |
|
Paul MaudDib posted:the ray volume is so low that pixels aren't being sampled enough. when you've got a raster with motion, and it's harmonic/periodic movement and you sample that at a fixed rate, you're getting an aliased sample - basically the beat frequencies of the motion and the sampling will cause aliasing actually in the subpixel samples themselves. I think this issue is indicative of a broader problem facing all temporal reconstruction algorithms: You need to track a potentially infinite number of motion vectors per pixel to track all of the dis/occlusions that could affect it during any frame. You’ve got a pixel receiving light in the presence of moving geometry? You need to store relative vectors for all the moving lights and geometry that could be affecting it. That pixel is also receiving some bounce light? Well now you also need to store the relative motion vectors for all the moving lights and geometry for all of the points that might contribute some illumination to that pixel. And so on. You basically need as many motion vector as there are path vertices if you want to accurately handle dis/occlusion without smearing. I’m not sure what a path tracer that could take that into account would look like. I doubt it would look anything like ReSTIR though.
|
# ¿ Dec 22, 2022 09:34 |
|
repiv posted:someone profiled portal on AMD and saw the issue was extremely high register pressure, which could be AMDs driver doing something stupid, or the way RTXDI is structured just being pathological for the way RDNA does raytracing half in software Yeah, utilization is appalling on RDNA. They hardly spend any time doing raytracing or anything else for that matter and hardware utilization is like 5% on 6900xt. I don't know that NVidia went too far out of their way to make AMD look bad intentionally but I doubt they're putting much effort into tuning RTXDI performance for anything but the 4000 series. steckles fucked around with this message at 21:55 on Apr 11, 2023 |
# ¿ Apr 11, 2023 21:51 |
|
Regarding path tracing, I don't know that we're going to see a true path tracing focused GPU until we start getting hundreds of megabytes of L2 and like a gigabyte of L3 cache standard. Even if you could make ray queries free, it wouldn't gain a gigantic amount of performance on current architectures because of memory thrashing. Both NVidia and AMD have put a lot of effort into making the retrieval of random bytes from huge arrays as efficient as possible in the last couple generations. I think that's had as much effect on the gen-on-get RT improvements as increased ray/box and ray/triangle intersection performance has, but it's gonna need to be turbo charged if we want to start shooting a practical number of rays for next level path tracing. Stuff like SER can definitely help in the narrow sense but we'll need new engines built around keeping millions or billions of rays in flight to really make progress. Also, being able to prebuild BVHs and stream them as needed would be a benefit for all architectures even now, the fact that it's treated as a black box was a misstep in the API design. Anyway, a true path tracing first engine would need to be something like this: Rasterize a depth/normal buffer. Using that, spawn millions or billions of ray queries. Batch those rays based on their origin and direction. Once your batches are large enough, clip them against the acceleration structure. Where a ray enters a leaf node, add it to another batch that's queued for geometry intersection. Once a batch of leaf node ray queries gets large enough, load the actual triangles associated with the BLAS node, clip the rays against them, and batch the hit position and surface normals. Once the batches of intersection data get large enough, load the relevant textures and run your surface shaders. Spawn more ray queries as needed and put them into new TLAS/BLAS batches. After running the shaders, add the computed colour attenuation to a list that's kept per-pixel and then every frame, collapse the list to generate a final color. Basically to do one thing at a time, keep as much in L2/L3 as possible, and make sure that every request to glacially slow video memory is as contiguous as possible and can serve as many operations as possible. This is already best practice for current rasterization workloads, it's just being taken to a ridiculous extreme. It's not the kind of thing you could easily bolt onto an existing engine, nor is it the kind of thing hardware and driver level shenanigans are going to do for you. Some developer will need to be brave enough write it from scratch. Hopefully the APIs and architectures will evolve to support such a thing, because RT as it currently exists is gonna be hard to scale otherwise.
|
# ¿ Apr 14, 2023 22:38 |
|
repiv posted:unless we move towards "frameless" architectures where the GPU can churn on batches out of phase with the framerate, which sounds like a nightmare Nightmare is definitely the right word. True path tracing is gonna need to throw out a lot of what we're used to, but NVidia and AMD could give us hardware to support that. It's just not gonna look like what we have now.
|
# ¿ Apr 14, 2023 23:30 |
|
Paul MaudDib posted:Obviously this is tremendously expensive in transistor terms: it's not shocking at all that Intel is using the transistors of a 3070 to compete with a 3050 or whatever it is! It's an absolute ton of additional SM complexity/scheduler overhead/etc. But cache isn't going to shrink much between 6nm and 3nm, but logic is. And so all that SM complexity is going to get a lot cheaper, while cache isn't going to get much better in the meantime. That's my pet theory at least, Arc looks like a design aimed a couple nodes ahead of where it is. And it wouldn't be surprising if NVIDIA recognized the problem too - 4090 benefits from much higher clocks but 3090 had fairly poor SM scaling, and 2080 Ti wasn't really that great either. AMD had the same problem with wave64 with GCN, Fiji and Vega just didn't scale to 64CU very well, and they switched to wave32. I think at some point everyone is going to have to shrink wave size again. When the memory required by the raytracing step gets too large though, as it would with true path tracing, all the other work you could be scheduling gets stalled because what it needs isn't in cache any more. The short term solution to that is obviously more cache, but I doubt you could ever add enough that any sort of hardware level scheduling could hope to remain tenable. Ultimately, ray tracing as it exists now on GPUs is basically rays starting in random places going in random directions needing random geometry and random textures from random places in video memory. That's just an insanely hard thing to schedule for and we're gonna get more out of reducing the randomness to start with. We really haven't even scratched the surface though and there are so many different ways we could be using visibility queries that haven't been experimented with yet. Nobody has really come up with a good LOD method for raytracing either. Intel showed a method of stochastically picking a LOD on a per-pixel basis which looked great, but would be hard or impossible to implement with the current APIs. While I'm at it, one dumb idea I had was to treat a nanite mesh as the input for an old skool finite element radiosity solver. Basically have all the patches iteratively swap radiance with each other and store it in like some Gaussian Mixture Model or Spherical Harmonic nonsense as part of the surface properties. I'm sure there would be a billion corner cases, but it'd be an interesting experiment. steckles fucked around with this message at 02:16 on Apr 15, 2023 |
# ¿ Apr 15, 2023 02:01 |
|
Paul MaudDib posted:is there a technical reason the async scheduling and reordering needs to happen on the GPU at all? If the state-per-ray is relatively minimal (position, vector, luminosity/whatever) then can't you just immediately spill to CPU, schedule/reorder, and dump over batches? Or spill certain parts of the rayspace, like spill unless it's some area or some vector-direction that is being modeled right then? I realize of course that may not be much of a useful coherence/alignment in itself since ray behavior is so random but the point is just to define some region that you're currently working on and to spill everything else until later. Paul MaudDib posted:On the other hand for a purely path-traced game this means that you can just partition your scene across multiple GPUs and get good scaling. And if the collective amount of VRAM is sufficient, that's fine, you don't need every GPU to hold the full queue (holding the full BVH tree would be really nice of course). Like if total VRAM can hold it all, you can totally do a multi-GCD/MCM design with path-tracing. (I think that's been remarked before, it's the reason why it works for big pixar movies and stuff too, right?) Paul MaudDib posted:I wonder if spilling favors L1/L2 or L3 - NVIDIA chose the former and AMD chose the latter. I’m guessing nvidia probably needed some extra for Shader Execution Reordering to work, perhaps a reason they went with that over an L3. Paul MaudDib posted:I wonder if AMD's thing about memory was because they modeled path-tracing (or even it's just self-evident that there's too little VRAM headroom to spill much) and they think path-tracing is going to do ironically poorly on NVIDIA's GPUs due to VRAM limitations. Paul MaudDib posted:That's really the money question around your objections in a practical sense I think - are we talking gigs of ray-state, tens of gigs of ray-state, or hundreds of gigs of ray-state for useful path-tracing? Obviously not too helpful if path-tracing requires people to have 128GB or 256GB of system memory even if it's feasible at a GPU level. Paul MaudDib posted::yea: Obviously a lot better to avoid randomness than to handle it better. But I mean, how is that really possible with raytracing given the positions and angles are essentially random? Paul MaudDib posted:What would be the point of having patches swap luminosity with each other, I guess you're fuzzing... something? And that produces a statistically useful property in the rays/bounces somehow?
|
# ¿ Apr 15, 2023 03:44 |
|
Paul MaudDib posted:it'd be interesting to see whether OFA could improve this at all. Being able to have some raw "draw poo poo here" metrics would be great. Although I guess at a certain level of detail the heatmaps don't have to be GPU accelerated at all. There's also been a ton of work on path guiding by basically storing an octree of directions where the signal is bright. It's super simple and probably wouldn't be hard to extend to big fat ray bundles, but it's also one of those highly async methods that would need a rethink of how presented frames and light transport work together. Realistically though, an algorithm that focuses on shooting as many dumb rays as possible rather than trying to be clever is probably gonna work better for games in the long run. Paul MaudDib posted:it seems like there's some fairly enlightening questions like "what is the number of BVH regions behind this raster region/space voxel/OFA motion estimation group" or "what is the average depth of rays shot in this region" that could guide LOD tuning too. Yes you can compute it but being like "yo there's a chunky thing at X but there's nothing ahead/behind" or even just "the time spent in this region is out of control" are relevant info for building an optimal tree. Paul MaudDib posted:The nanite guy has got the right idea though this isn't a "rebuild every frame" or even every N frames, you should rebalance it based on where poo poo's happening and where rays need a better performance level of both detail and sampling.
|
# ¿ Apr 15, 2023 20:17 |
|
repiv posted:the console APIs do allow developers to cook BVHes ahead of time, i wonder if they're already taking advantage of the fact that they can spend much more time on BVH refinement than is practical on PC I recall reading somewhere that Epic did that for their Matrix thingy on the consoles.
|
# ¿ Apr 15, 2023 21:06 |
|
repiv posted:yeah that's about the extent of what's publicly known about the console RT APIs, but in principle they should be able to expose a lot more of the nitty gritty details since they have a fixed hardware target. directly exposing the raw BVH representation ought to be on the table, which could enable console-exclusive tricks like streaming in sub-trees at arbitrary depth on the fly. maybe they could do nanite-style fine-grained cluster streaming while PC is stuck flipping between discrete LODs. NVidia is a bit more cagey about their low-level architecture, but I'd be shocked if they didn't have similar functionality for dynamically loading BVH data. Maybe what's holding it back on PC is the lack of an agreed upon binary BVH format.
|
# ¿ Apr 15, 2023 21:22 |
|
Subjunctive posted:I did always wonder why ray tracing used random rays rather than just iterating the light sources, generating sorted sample sets, and walking them. Anything that doesn’t get hit is in shadow, badabing.
|
# ¿ Apr 26, 2023 18:24 |
|
Subjunctive posted:You want a random sample but you don’t have to walk them in random order, right? I don’t get why you can’t do bounced light the same way, hmm. The first is that to compute the incoming radiance at any given point, you need information about the entire scene. Each pixel needs to estimate the average colour within some cone of directions, depending on the BSDF of the material and the orientation of the geometry. To calculate that, you need to do the same for every single surface within that cone and to calculate those, you need all the surfaces in those cones and so on forever. That's obviously not a tractable problem, so sampling a random assortment of directions to some random depth of recursion is the best that we can do. Eliminate the randomness between pixels, and you end up with correlation artefacts which cause ugly banding when still and nasty flickering in motion. The more fundamental problem is that to answer the question "can these two points in space see each other?", you need to load an unknowable-in-advance amount of geometry to make that determination. You can have two rays that start and end 1mm apart and one will need to check 1kb of geometry and the other will need to check 50mb worth because it took a different path through the BVH. No matter how coherently you're processing samples, you're still gonna end up being hit by bad memory access patterns when determining visibility. There are a few ways to address these. For the visibility problem, going super async and batching a huge numbers of rays together or making sure your whole scene will fit in L2 cache are basically the only strategies that work. If you didn't mind false occlusions and light leaking, you could trace against low-resolution proxy geometry. For sampling the path integrals, you can pick your random numbers in a way that maximizes the distance between points on the hypercube (Quasi-Monte Carlo), you can sample clusters of rays when you find a path with a high contribution (MLT, MEMLT, Path Guiding, Too many others to list), you can sample clusters of random numbers when you find a good point on the hypercube (PSSMLT), you can try to share rays between pixels when you find a path with a high contribution (ERPT, ReSTIR), you can use some proxy representation of scene radiance (Voxel Cone Tracing, Light Probes, VPLs), or you can use a surface based approach where radiance is computed at fixed locations and each pixel is interpolated from it's nearest neighbors (Radiosity, Surfels, Photon Mapping, plain old Light Maps). All of these serve to minimize randomness, but come with various tradeoffs in maximum quality, time to image, or memory usage and you can always come up with some pathological scene geometry that will make any algorithm perform badly steckles fucked around with this message at 07:45 on Apr 28, 2023 |
# ¿ Apr 28, 2023 07:43 |
|
repiv posted:the main thing that differentiates "proper" supersampling from just rendering at a higher resolution then scaling down is the sampling pattern, the points sampled within each pixel are rotated about 45 degrees from the pixel grid which has nicer visual properties Offline renderers use filters that touch 16, 36, or even 64 pixels per sample. 1000 samples with a box filter will usually look worse than 16 samples with a good filter that approximates sinc. It’d be interesting to see where the inflection point was on the GPU, where you’re better off throwing resources into filtering versus rendering. Of course, I guess it’d be pointless given that good sampling isn’t really compatible with all the temporal reconstruction and denoising shenanigans that are needed today.
|
# ¿ May 20, 2023 16:55 |
|
Yeah, morphological AA is good at smoothing lines, but pretty bad at resolving Moiré patterns, which is what more-triangles-than-pixels will usually give you. The whole concept was pretty neat when it landed on the PS3 though. I recall being blown away by how smooth God of War 3 looked when it came out. I suppose it might still have use in a pure SSAA setting to remove some high frequencies before downsampling and get you a slightly smoother image for the same sample count. Actually, that reminds me: I would sometimes combat Moiré when resizing photos by adding a light noise layer over the affected areas in Photoshop before downsampling and it'd often work really well to smooth stuff out while still looking natural. I wonder if you could apply something stupid like that, driven by some metric of like the surface area of a nanite mesh to it's projected size to break things up and trade Moiré for "filmic" noise. I'd guess it'd look terrible and require a ton of scene specific tuning, but it might be a fun thing to code up.
|
# ¿ Jun 28, 2023 20:57 |
|
repiv posted:maybe once they're all-in on software rasterization they could randomize the sample positions per-pixel to trade aliasing for noise? unreal isn't at the point where they can get that experimental though, they still use hardware rasterization for parts of the pipeline so the software rasterizer needs to match the regular sampling grid of the HW rasterizer for them to fit together seamlessly That no-TAA unreal video looked shimmery as hell, but I did really like the complete lack of ghosting. Maybe I've spent too long playing with graphics algorithms and the parts of my brain that notice small rendering errors are over-developed, but I've always been super sensitive to temporal artefacts.
|
# ¿ Jun 28, 2023 22:12 |
|
repiv posted:i wonder which way the pendulum will swing from here with hardware raster, do the IHVs add quadless rasterization modes to make dense geometry more efficient without having to resort to software raster ala nanite? Nanite proved you can ignore the hardware rasterizer and still get high performance. Maybe it's time to look at some more advanced algorithms. Computing a list of triangles which overlap each pixel and calculating coverage analytically might be viable for rendering billions of little triangles with high quality. That'd probably break a whole pile of other stuff as there'd no longer be a 1-to-1 correspondence between the colour and depth buffers, but it'd eliminate Moiré. Probably not practical on modern GPUs though, given the immense amount of ALU and memory you'd need to allocate to every pixel. Perhaps some of the "deep" buffer algorithms developed for order-independent transparency could be adapted to do AA on micro-triangles instead. repiv posted:maybe one day someone will come up with a practical way to do texture-space shading that doesn't cause more problems than it solves, then we could maybe move beyond doing temporal filtering in screen space
|
# ¿ Jun 29, 2023 00:29 |
|
repiv posted:i was thinking along the lines of allowing you to export to the ROPs from compute, so a software rasterizer can do blending and output more than 64 bits at once
|
# ¿ Jun 29, 2023 01:03 |
|
Falcorum posted:As a dev, this "porting toolkit" is utterly bizarre and I can't see who it's for at all. It's not actually for porting so companies that aren't porting their games to Macs already won't care, and companies that are will have better tools anyway. On the flip side, perhaps leadership at Apple is nervous about games and this is a weird baby step dreamed up to gauge interest while not offending the ardent anti-game contingent too much. “Look at these people jumping through hoops to play games on our computers. Maybe there are enough dollars there to justify more support from us.”
|
# ¿ Jul 1, 2023 19:06 |
|
Indiana_Krom posted:Also the thing about path tracing is that it doesn't actually get significantly harder to do with more complex games. The computational complexity of rasterization is, in aggregate, N in the absence of any occlusion culling. Doubling the number of triangles will double the computation needed. The difference is that rasterization is extremely memory friendly. Basically every part of the rasterization process can be devolved into huge contiguous reads and writes which play very nicely with memory and the increase in overall compute utilization easily makes up for the worse theoretical complexity. steckles fucked around with this message at 18:26 on Jul 4, 2023 |
# ¿ Jul 4, 2023 18:19 |
|
Mark my words, in 15-20 years, on the off chance society somehow hasn't collapsed by then, we'll be looking back at the smeared temporally accumulated and interpolated mess of today's games with the same disdain we have for the blurry brownness that was that first crop of games with "HDR".
|
# ¿ Jul 12, 2023 19:20 |
|
K8.0 posted:Nah. Games now are doing a markedly better job of showing detail than they ever did before. We're years into the era of details being smaller than one pixel. Throw that stuff in a conventional renderer with no supersampling and it would look horrid. Plus I think people will just naturally become less tolerant of ghosting and weird artifacts once the shininess of real time path tracing and more-triangles-than-pixels wears off. Like, nobody actually wants those things, right?
|
# ¿ Jul 12, 2023 23:53 |
|
Truga posted:but in places where doubling fps would actually help a ton (shitboxes pushing 40fps on a good day), the additional latency can be noticeable, and the artifacts very visible lol
|
# ¿ Aug 10, 2023 01:49 |
|
shrike82 posted:yeah going to disagree with the walls of texts - LLMs have entrenched the dominance of Nvidia hardware if anything. there are a lot of toy repos out there that cut down LLMs to fit and be compatible with CPUs including Apple Silicon but it's not really useful beyond students and execs wanting something to run locally on their laptops. a) it's lossy and b) why try to penny pinch on on-prem or private cloud hardware - businesses are happy to throw money at racks of nvidia cards to demonstrate having an AI strategy Eh, I don’t know. We’re still relatively early in the development of machine learning and it’s hard to say where things are going. Nvidia has the best support and most developed software ecosystem for sure, but ultimately most DL algorithms just need to do as many matrix multiplications as possible. A simpler architecture without all the GPU baggage designed solely to feed billions of MADDs could end up being the most cost effective approach as models continue to grow. Plenty of companies are experimenting with such designs. I wouldn’t be surprised if we see a bunch more competing products, as Alphabet, Amazon, Meta, Microsoft, and others develop in house, single purpose hardware that is cheaper to rent if you’re already in their cloud ecosystem.
|
# ¿ Aug 10, 2023 06:58 |
|
I can't remember the last time some GPU thing has caused this much hand wringing. We may be in our 30s/40s now, but there's an indignant 13 year old console warrior in all of us. You're all lucky I don't have a billion dollar company, I'd be paying companies to keep all TAA out of games altogether.
|
# ¿ Aug 18, 2023 20:51 |
|
|
# ¿ May 15, 2024 07:47 |
|
Blorange posted:This is great, it's just fast and intense enough to trigger the uncanny valley. I suddenly care that it's not switching instantly. Anyway, Ray Reconstruction looks pretty cool. I'm sure Nvidia will remain cagey about what they're actually doing, but I'll take a guess: If you take ray direction and the BSDF/PDF of the surface into account, you can use smaller blur kernels for the same noise level by lowering the weight of intra-frame samples you'd expect to have high variance. That would need a bit of extra data supplied by the engine though. Such information would also let them lower the weight of inter-frame samples whose motion vectors would otherwise imply they should be included in the kernel. I think they might be shifting the colour of the accumulated frame based on the next intra-frame colour as well. "Reconstructing" it, if you will. Where the changes in hue come from a surface texture rather a change in geometry, as is the case with the Cyberpunk footage shown, you could look at the current frame and say "these pixels used to be yellow, but now they're purple, let's just pretend they were always purple and accumulate them instead of discard them". I'm sure the heuristics for that would get pretty groady, but that seems like a great use case for a small neural network.
|
# ¿ Aug 22, 2023 18:44 |