Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

j4cbo posted:



My kernel. :clint:

The design is basically UNIX-like, with the usual protected preemptive multitasking and fork(), exec(), wait(). There are a bit over twenty system calls total. I wrote it for an OS class, but then added some extra stuff like COW and SMP support. It's around 5700 lines of mostly C, some asm.

This summer's project: scheduler activations! :suicide:

I wrote an OS kernel, created a tiny libc and ported some minor unix programs to it for my 4th year CS project. I'm warning you now, a kernel is NEVER done. You'll always be adding "one more feature".

Adbot
ADBOT LOVES YOU

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Adhemar posted:

That said, ray casting is fun, so you should really consider it, just google for some papers/tutorials. You're already doing shader programming anyway, right? Shouldn't be that big of a learning curve.

My research is on real-time ray-tracing.

Ray tracing/casting is computationally expensive. With M rays and N triangles, you get M*N intersection tests. Let's say you're rendering a 1024x1024 image with 1 ray cast per pixel and 6 million triangles. That's 1,048,576,000,000 intersection tests per frame.

To speed this up you need to use an ADS (acceleration data structure) such as the KD-Tree or BVH. This reduces the number of intersections per frame to M*log(N) ~23592960 tests (that's 0.00225% of the original!)

OK, now there's a few problems though. Your data pretty much needs to be static. The tree build process is obviously Nlog(N). You don't want to do that every frame (although some people do for dynamic scenes). If your data changes you need to rebuild all or part of your ADS.

Traversing a tree data structure on the GPU is painful due to the inherent problem wih stacks on GPUs. Now, some people build shallow trees to overcome this, but that means tree quality suffers greatly. You can work around this using a rope-tree (Prof. Slusalleks crew in Saarland published on this about 2 years ago), but it's still pretty messy.

There's other non-tree based ADS methods like grids that are better suited to GPUs, but as they don't adapt to the underlying geometry, they have much lower performance.

The OP's app uses semi-transparent triangles, which means another ray shot per semi transparent triangle intersected. Also, to get anti-aliasing he'll need to shoot more than one ray per pixel or do some adaptive sub-sampling. Anyway, that's a least say another 10% rays.

Funnily enough may people in the RTRT field use GPU rasterisation for all primary rays. Colour each triangle a unique colour. Render the scene. Use the colour of each pixel as an index to the proper triangle shader (RTRT software shader - not GPU). Use the value from the depth buffer as the distance along the ray intersection occurs. Spawn secondary rays from that point.


My advice? Stick with the rasterisation for this application at least.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge
HAHA whoops my bad. Yes, GPU ray casting for volume rendering is a different kettle of fish and will work in this instance.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

StickGuy posted:

I've finally gotten my Large-eddy Simulation on the GPU mostly working thanks to CUDA. Here's a screenshot of some simulated cumulus clouds:


I still have to improve the rendering and try to improve the numerical stability a bit.

How are you finding the CUDA SDK on Ubuntu? A friend of mine just got his Geforce 280 for CUDA and is looking at putting Ubuntu on his machine to use it (He generally uses FreeBSD). Any weird pitfalls you encountered?

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

StickGuy posted:

Look up the papers "Data visualization: the end of the rainbow" and "Rainbow Color Map (Still) Considered Harmful" for a more in depth explanation.

Heh, gotta love the "considered harmful" papers. Everyone I know wants to write a CH paper some day.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Screeb posted:

My biggest personal project is my ray tracer, though. I don't have any pics at the moment cause I'm not an artist so everything I render with it looks dumb. It's quite advanced though. Fully OO (even the camera lens), global illumination (including caustics), heavily savagely optimised Oct-Tree (optional, due to being OO. You could replace it with whatever. Or have a mix), depth of field, motion blur, etc. Basically everything you need to render a photorealistic picture.

Very nice feature set :)

I'll assume you're working on an offline renderer as you're doing GI and distribution ray-tracing and mention the word "photorealistic". You did however talk about savagely omptimising your oct-tree so I guess you do care about speed. Forgive me if I'm giving unwanted advice, but this is my area of research and I love to talk about it :)

Dump the Oct-tree ASAP. SAH based KD-Trees perform much much better. Implement packet tracing also if you haven't yet for a ~3x speedup. There's a ton of literature out there, but Carsten Benthin's PhD is a great resource for getting a highly optimized ray-tracer up and running. http://graphics.cs.uni-sb.de/~benthin/phd.pdf Feel free to PM me if you want to chat about ray-tracing and making it FAST.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Screeb posted:

Yep, it's offline, but of course, speed is still a priority - just as long as it doesn't introduce artifacts / isn't a big hack.

I started implementing a KD-tree a while ago, but didn't get around to using SAH (so right now it's just a very slow naive one). I'm probably going to implement a BVH as well, as it's almost as fast, but much simpler.

BVH is only really fast in a few scenarios all of which are tied to dynamic scenes. Because BVH builds faster, in engines requiring per frame rebuilds, the slowdown in render speed is offset by the speedup gained building the BVH. BVHs can also be defromed in O(logn) up to a point, negating the need for an O(nlogn) full rebuild every frame. BVH *can* work nice with very large packets and entry point search, but as you haven't implemented packet tracing yet that point is moot. Bottom line is that the kD-Tree is king and on static scenes the BVH is actually quite sucky. In fact your octree should beat the pants off it.



quote:

I haven't yet packetised it, as I want to figure out a clean way to implement it in my architecture (one of my focuses is good clean code - no nasty hacks if I can help it).

SSE intrinsics don't give the cleanest of code. You can however use the below in most compilers these days which helps a bit. Hooray for operator overloads.

code:
__m128 a,b,c;

a = b + c;
instead of

code:
__m128 a,b,c;

a = _mm_add_ps(a,b);
One of the main problems with optimizations is that they can become nasty looking. It really just depends on how fast you want to go. There are thankfully some clean things you can such as ensuring const correctness that can help a lot. BTW what triangle/ray intersection method are you using? You can precalculate on a per-tri basis parts of the intersection for a decent speedup (~15% according to my figures).

quote:

I haven't seen that PHD - looks good, cheers.

Ingo Wald's thesis is worth a read too, as is Vlastimil Havran's. Some outdated stuff in the latter, but the section on kd-tree building and SAH is still relevant.


quote:

Thanks for the advice and the offer. Do you have your own pet renderer at all? If so, I'd love to hear about it.

My own render is a multi-core SIMD real-time ray-tracer. I mostly use an SAH O(nlogn) kd-Tree builder with perfect splitting and a new entry point traversal similar to MLRTA (paper on new EP search algorithm hopefully to be published at Eurographics 2009 - fingers crossed) . I do use BVH and Bounding Interval Hierarchy for parts of my dynamic scene support though. My pictures are boring as I'm mostly working on data structures and algorithms to improve performance :)

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

tripwire posted:

Thats amazing. Are you planning on sharing your source with anyone? I've always wanted to write my own raytracer but I've been daunted by how much work it seems to be.. would you be able to drop in some C extension code when youre finished to speed up the computationally intensive stuff? Can I see (pretty please!)

The ray-tracing algorithm is incredibly simple. Writing a simple ray-tracer can be done in less than 100 lines of code. Making it fast is the hard thing.

If you want to learn ray-tracing and be taken through all the steps to write a fully featured one, buy the PBRT book or just grab the guy's source code.

http://www.pbrt.org/

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Azazel posted:

Ran across why the lucky stiff's github code repo, and found his Shoes graphic API project. Decided to wrap my current Go related project with it. Very slick and easy to use.



Very interesting project. I've heard that group identification and determining if a group is alive is a pretty difficult problem.

Also, did black have three stones here?

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Azazel posted:

Yeah, group tracking was an interesting problem to tackle. You have to check liberties of groups, check if it's a suicide move, merge existing groups, etc. I added the colorization option so I could identify that it was behaving correctly. Trying to debug from a console can be too much of a pain sometimes.

As for the above game. Black captured 3 stones, is that what you mean? The game itself was an even match I just finished today on the Dragon Go Server. Lost by 10.5, sigh.

Ahh, I thought black had a 3 handicap as 3 of the 4,4 points were occupied by black. Of course, white was opening with 3,4 or similar

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge
haha, good ol' Sponza. Jesus I must know every nook and cranny in that model.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

shodanjr_gr posted:

Sponza rules. If I ever make an obscene amount of money, I'm building a life size replica for myself. Then I will proceed to hand-paint realistic global illumination effects on the walls.

A colleague and I half-heartedly tried to get our funding to cover a trip so we could "gather realistic illumination data".

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Id4ever posted:

I just launched fontcapture.com,

I just broke fontcapture.com. It may be the 60mb PNG I tried to upload (1200dpi). I tried again with a 4.6MB PNG at 1/4 of the resolution and got

code:
Page unavailable

We're sorry, but the requested page is currently unavailable.

Please try again later. If the problem persists, please contact us at [email]email@fontcapture.com[/email], or use our Contact form.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Id4ever posted:

Fixed now.

Excellent work. The font looks pretty drat good. It's a bit misaligned in places, but that's my own drat fault. I may write out another one more carefully and try again soon.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Tommy Calamari posted:

Working out which points are neighbours was far more complicated than I anticipated, so I'm pretty proud of coming up with an algorithm that does it (even though I've sinced discovered that these things are called Voronoi diagrams, and there are already far better algorithms out there). You can see that I never quite figured out the map edges.

Fortune's O(n log n) algorithm is one of my favorite algorithms of all time.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Kennedy posted:

Hmm, loads of work lately;

My dissertation was based on seeing how feasable it was to use HTNs to control AI on a per-agent level for gameplay. Long story short, it was something like 4% less efficient than the Valve AI in Half Life 2, but won 30/30 deathmatch games against the Valve AI. Resounding success.
https://www.youtube.com/watch?v=SOr4HwOhSVo


Second and most recent is a XNA space-sim I'm building. Hot off the presses as of a few mins ago:
https://www.youtube.com/watch?v=JTSO6Ugc7_U

Some custom shaders written for the sun combined with a tutorial-based post-processing bloom effect added (it's supposed to be subtle :ssh:).

(P.S any game devs in Dublin, Ireland looking for a graduate to shout at, give me a shout please :) )


Apply to Havok. They always have openings. Demonware are another good bet. If you're a TCD alumnus, then I believe Stefan Weber in the Lloyd Institute is a good man to talk to. He has a lot of contact in the development and research council.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

seregrail7 posted:

I'm still deciding on whether or not I'm going to apply for the Masters in Interactive Entertainment in Trinity this year, I graduated last year but was a bit burned out with college so thought I'd leave it. I'm just not really into the whole academic stuff though and a masters is going to be even more of that than in my degree so that's putting me off.

IET is a great course. Nobody I've met who's done it has had a bad thing to say about it. Have a chat to John Dingliana about it (GV2, Ground floor, Lloyd institute). I believe he heads it up now.

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Foiltha posted:


I wrote a bounding volume hierarchy that encloses all the triangles in a AABB tree, which really makes all the difference when you want to start rendering meshes. Without some sort of acceleration structure (BVH, KD tree etc) it's really, really painful to render anything over a thousand triangles. If anyone cares, the BVH took about 700 ms to build.

A simple 4-rays-per-packet packet tracer will give you a ~3.5X performance boost. Sometimes it goes superlinear due to cache coherence Consider using 64 bit for the 8 extra SIMD registers.

How are you choosing the splitting plane for the BVH? (btw, seriously consider going kD-tree for static scenes, especially with the sort of scenes you're using Also, the tree nodes can get *much* smaller.). If you use the SAH or SIROH when building the tree you can get up to a 2X speed in rendering performance over median splits. Of course, the builder becomes a *lot* more complicated unless you want to sort the primitives at each build step.

quote:

multihreading the trace algorithm because that's how I roll :v:

Tiling the screen into 32*32 pixel tiles seems to be the sweet spot for most people. To avoid the locking of the job queue, use a single int which represents the job number (an index into the tiles) and use atomic operations to fetch the next job.

Other cheap speedups come from pre-calculating some of the triangle intersection code offline. For my code I get about 15% faster using TriAccels

Sorry if you know all this, I just like talking about RTRT :)

There's lots of other neat tricks like large packet tracing with frustum culling, entry point search techniques (MLRTA and AEPSA) that you can seriously make a difference. My first render of Sponza took 2 hours. These days, I can walk around it at 200fps on a CPU (2 x 2.0GHz Xeons)

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Bob Morales posted:

Dumb question, but do you just compose it in RAM then blast it over to the video card in that kind of situation? What resolution/color depth is that (200fps) at?

Yup, it goes to a dumb frame-buffer and blitted over via SDL. 32bit 512x512 with textures. I can go much faster in other scenes, dpending on the view point.

Adbot
ADBOT LOVES YOU

Zakalwe
May 12, 2002

Wanted For:
  • Terrorism
  • Kidnapping
  • Poor Taste
  • Unlawful Carnal Gopher Knowledge

Foiltha posted:

Packet tracing is basically based on utilizing SSE calculations for ray packet intersections and traversals, right? I have never done anything with SSE so implementing all that might be a bit too hard for me, but I might look into it after I've actually got a decent non-RT raytracer working. Any good papers/sites about SSE in packet tracing or just SSE in vector calculations in general?

simple 2x2 (4 rays per packet) packet tracing is just tracing 4 rays at once through your acceleration data structure and intersecting 4 at once per triangle at each leaf via SSE. Large packet tracing involves more than that (frustum culling etc).
GCC has great support for SSE intrinsics in that they are treated as a basic type for the most part with operators +-=*/ etc defined for them. This makes code a lot more readable. A lot of the demo code you might see will use intrinsics though.

Carsten Bentin's thesis is *very* good for implementation detail. It's a bit out of date in one or two places, but you won't hit those until you go much deeper.

http://graphics.cs.uni-sb.de/~benthin/phd.pdf

quote:

As you seem to have guessed, I'm using a naive median split as my construction heuristic. I've only glanced at other heuristics but SAH seems to be the most popular choice and I'm most likely going to implement it when I have the chance to actually study it.

SAH is a good choice but we (a colleague and I) are trying to get people to try SIROH as it edges ahead a bit (a whopping 4%!).

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply