Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

IEatBabies posted:

There are also alternatives to GLUT. There is also SDL, sfml (never used), and, if you're strictly Windows, WGL.


GLUT is primarily just a platform-independent way of creating your window and handling I/O.
http://en.wikipedia.org/wiki/OpenGL_Utility_Toolkit

Also, I would outright refuse to use OpenGL without something like GLEW or GLee

Adbot
ADBOT LOVES YOU

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

IEatBabies posted:

GLUT is primarily just a platform-independent way of creating your window and handling I/O.
http://en.wikipedia.org/wiki/OpenGL_Utility_Toolkit
Yeah, I meant GLU, which is more comparable:
http://www.opengl.org/documentation/specs/glut/spec3/spec3.html
vs.
http://msdn.microsoft.com/en-us/library/bb172969(VS.85).aspx

Kimani
Dec 20, 2003

Anime avatar - check
ADTRW poster - check
Defends watching child porn - CHECK!!!
Alright guys, I finished my big rear end project. I've been working on it since December or so, balancing it with school work and real life and all that.


AIDemo2


It's a demo/game similar to demo I made way back in the day called AIDemo. Hence the name. You can find it by browsing my site a bit, or there's a link on that page for AIDemo2. But where AIDemo featured a simple way to implement AI that I thought of at the time ( nothing special, just an interesting way I thought up ), AIDemo2 features something more substantial and interesting ( even though AI is in it's name. )

A massively threaded game environment.

That's right, if you run this game and have Task Manager next to it, you'll see the amount of threads go up to potentially over 200 at times. Every entity in the game runs on it's own thread - every character, every bullet. It made for an interesting game programming exercise, writing it using message passing and being event-driven, but yeah. Massively parallel.

Although the game itself could use more polish, and it has some bugs, and could stand for way more internal optimization, I'm hoping that something like this can solve the issue of how to parallelize games. I don't think anyone has done something like this before, does anyone know? It's a general purpose massively threaded environment - you can make any game in it. It's also using 3D, Sound and Input engines I've written myself as well.

If this is as new and exciting as I think it is, I'm hoping it'll be a ticket for a sweet job somewhere in the future after Grad school.

Be sure to read the readme for what to do if it crashes. For some reason you can't click. Recommending dual-core or better, it seemed to chug when I set the affinity to one core.

Kimani fucked around with this message at 13:22 on Sep 14, 2008

Pfhreak
Jan 30, 2004

Frog Blast The Vent Core!

Kimani posted:

Recommending dual-core or better, it seemed to chug when I set the affinity to one core.

You know this was chugging because massively threaded things are generally considered a bad idea? Remember that the processor has some overhead in switching between threads, which isn't a big deal for a small number of threads. But once you have many threads, the amount of time dedicated to context switching can increase over the time dedicated to actually running your program.

Maybe some day this will become the way we do things once everyone has multicore systems, but rest assured, everyone who has ever played with threads has thought, "What if I put everything in it's own thread."

One current solution is generally to thread systems, not individual entities, so rather than 1 thread per bullet, lump all the bullets into a thread. The AI into a thread, the drawing in a thread, etc.

Obsurveyor
Jan 10, 2003

Pfhreak posted:

You know this was chugging because massively threaded things are generally considered a bad idea?
The context switches alone thrash the cache to hell and back again. Especially with such disparate threads fighting for cycles.

ehnus
Apr 16, 2003

Now you're thinking with portals!
Or if you really plan on going with a massively multithreaded system try a language/environment that's better suited for it, like Erlang, Stackless Python, or if you want to be really masochistic, Win32 fibers.

Remember that threads not only incur the CPU cost of a context switch but also the memory cost of TLS data and their own stack (which defaults to 1MB on Win32). Fibers will do the same but at least you save the cost of a trip through to kernel land when switching context.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...
If you really want to implement something massively threaded, you should look into writing a game engine using CUDA instead.

Dodger_ posted:

The context switches alone thrash the cache to hell and back again. Especially with such disparate threads fighting for cycles.

Not to mention any synchronization he might be doing

floWenoL
Oct 23, 2002

Kimani posted:

Although the game itself could use more polish, and it has some bugs, and could stand for way more internal optimization, I'm hoping that something like this can solve the issue of how to parallelize games. I don't think anyone has done something like this before, does anyone know? It's a general purpose massively threaded environment - you can make any game in it. It's also using 3D, Sound and Input engines I've written myself as well.

No one has done something like this before because it totally kills performance.

Kimani posted:

If this is as new and exciting as I think it is, I'm hoping it'll be a ticket for a sweet job somewhere in the future after Grad school.

Oh, man I would like to know the company that would give you a job for this.

Scaevolus
Apr 16, 2007

Kimani posted:

That's right, if you run this game and have Task Manager next to it, you'll see the amount of threads go up to potentially over 200 at times. Every entity in the game runs on it's own thread - every character, every bullet. It made for an interesting game programming exercise, writing it using message passing and being event-driven, but yeah. Massively parallel.
hahahahahaha

What would be better is having as many threads as processors, and then parceling out tasks to each worker thread. The Source engine does something like this.

If you're going to have more than a dozen threads, using OS threads is a really, really stupid idea.

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

Kimani posted:

A massively threaded game environment.

I have not the words.

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>
Do you at least use thread pooling

Obsurveyor
Jan 10, 2003

We should not totally poo poo all over him. If it is stable, he learned a little about programming distributed computing(even if it is implemented in a hamfisted way), more than multi-threading at least. That could be useful in the right work environment.

Hamled
Sep 11, 2003

Dodger_ posted:

We should not totally poo poo all over him. If it is stable, he learned a little about programming distributed computing(even if it is implemented in a hamfisted way), more than multi-threading at least. That could be useful in the right work environment.

This might not be an accurate metric, but I don't think you can call it 'stable' if the application crashes when you click your mouse. Also, reports from other people using it have all ended in forcibly killing the process after a few minutes.

StickGuy
Dec 9, 2000

We are on an expedicion. Find the moon is our mission.
The mouse interaction for me was a bit awkward, but the rest worked OK.

Kimani
Dec 20, 2003

Anime avatar - check
ADTRW poster - check
Defends watching child porn - CHECK!!!

Pfhreak posted:

One current solution is generally to thread systems, not individual entities, so rather than 1 thread per bullet, lump all the bullets into a thread. The AI into a thread, the drawing in a thread, etc.
Well, you can do that. The system is very flexible!

quote:

Maybe some day this will become the way we do things once everyone has multicore systems...
This is the plan. We'll have, say, 1024 core machines sometime in the future, and this system will take full advantage of them. All new PCs sold today are dual core or better as it is.

Dodger_ posted:

The context switches alone thrash the cache to hell and back again. Especially with such disparate threads fighting for cycles.

floWenoL posted:

No one has done something like this before because it totally kills performance.
Well, it actually runs pretty darn smoothly. I'm running a Q6600 and a 9800 GT and it works swell, but it was running okay on my laptop with a AMD Turion 64 X2 2.3GHz and GeForce Go 6150.

As for threads fighting for cycles, it's actually pretty fine. You see, each entity isn't just looping as fast as it can. Some threads can be set to "inactive", where they only loop occasionally to process messages passed to it, but "active" threads... their Update function returns a value which says how long it wants to wait until the next update. I used 1/30th of a second for a lot of the entities. This means except for the time taken to do the updating and parsing any messages, it's sleeping the better part of 1/30th of a second.

Running it in the debugger affirms my theory when I designed it - whenever I paused execution, the overwhelming majority of the threads are asleep! Only some threads are really processing, and the ones that really matter - the primary update thread and rendering thread - are on a much higher priority such that they're always chugging along. As long as those two aren't getting interrupted much, the framerate will always be just fine. I was getting ~200 fps at times.

ehnus posted:

Or if you really plan on going with a massively multithreaded system try a language/environment that's better suited for it, like Erlang, Stackless Python, or if you want to be really masochistic, Win32 fibers.
Although I can't say I know any of those, I chose Lua as the scripting language for a good reason - it's extremely fast and extremely small. The virtual machine takes about 100kb of memory when it's fully up and running.

Hamled posted:

This might not be an accurate metric, but I don't think you can call it 'stable' if the application crashes when you click your mouse. Also, reports from other people using it have all ended in forcibly killing the process after a few minutes.
Yeah, it's not really a problem with the system, it's just some weird bug I need to track down still. It runs well enough for me to want to get some input on it / declare it finished for the moment so I can work on something else for a change for a little while.

Kimani fucked around with this message at 00:03 on Sep 15, 2008

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

Kimani posted:

This is the plan. We'll have, say, 1024 core machines sometime in the future, and this system will take full advantage of them. All new PCs sold today are dual core or better as it is.

Yeah, coding for a system that doesn't exist, and probably won't for another 10 years at least, seems like a great idea.

Kimani posted:

This means except for the time taken to do the updating and parsing any messages, it's sleeping the better part of 1/30th of a second.

Hahahahahahahaha

Hamled
Sep 11, 2003

Kimani posted:

Running it in the debugger affirms my theory when I designed it - whenever I paused execution, the overwhelming majority of the threads are asleep! Only some threads are really processing, and the ones that really matter - the primary update thread and rendering thread - are on a much higher priority such that they're always chugging along. As long as those two aren't getting interrupted much, the framerate will always be just fine. I was getting ~200 fps at times.

It really seems like, given the small scale of the game you've shown here (I don't have any real numbers, but I'd guess somewhere on the order of 1,000-2,000 polygons and 100 distinct objects) and the level of consumer hardware you're running it on, 200fps is rather lovely.

When it comes down to it, there are simply better, more efficient ways to do what you're trying to do. At the very least, check out Green threads.

floWenoL
Oct 23, 2002

Kimani posted:

As for threads fighting for cycles, it's actually pretty fine.

Kimani posted:

Recommending dual-core or better, it seemed to chug when I set the affinity to one core.

Hmmmm.

Painless
Jan 9, 2005

Turn ons: frogs, small mammals, piles of compost
Turn offs: large birds, pitchforks
See you at the beach!
This would be a terrible architecture even with 1024 cores. Synchronization delays don't shrink when you add cores, they grow. Also, "sleep for some arbitrary amount of time, then check for messages" is pretty much the second-worst possible way to implement a message queue.
EDIT: I forgot, saying it runs at X FPS is completely meaningless. We don't even know what's using up time there. Making a properly optimized single-threaded version and comparing FPS counts would be the first step to proving that your idea has any merit to it.

Painless fucked around with this message at 01:52 on Sep 15, 2008

Kimani
Dec 20, 2003

Anime avatar - check
ADTRW poster - check
Defends watching child porn - CHECK!!!

Hamled posted:

It really seems like, given the small scale of the game you've shown here (I don't have any real numbers, but I'd guess somewhere on the order of 1,000-2,000 polygons and 100 distinct objects) and the level of consumer hardware you're running it on, 200fps is rather lovely.

When it comes down to it, there are simply better, more efficient ways to do what you're trying to do. At the very least, check out Green threads.
Well, my 3D engine could use more optimization too. I am but a man! I am sure with even a small team I could make it run way better.

Green threads are interesting, and would be nice for the whole "spawn X worker threads" method. In the system I've built, each thread ( or entity as I call it ) is basically just given a Lua environment, and bindings that allow you to create objects in the scene and manipulate them, load textures, etc. Given your own creativity, you can totally consolidiate everything into a handful of threads/entities. The only tradeoff is that it wouldn't be as straightforward to develop the game for as the green threads method looks like it would. On the other hand, Lua is faster.

Painless posted:

This would be a terrible architecture even with 1024 cores. Synchronization delays don't shrink when you add cores, they grow. Also, "sleep for some arbitrary amount of time, then check for messages" is pretty much the second-worst possible way to implement a message queue.
EDIT: I forgot, saying it runs at X FPS is completely meaningless. We don't even know what's using up time there. Making a properly optimized single-threaded version and comparing FPS counts would be the first step to proving that your idea has any merit to it.
What kind of synchronization delays? I'm intrigued. It would depend what kind of synchronization you're using - I'm not synchronizing every thread with a huge barrier or anything like that. Windows CRITICAL_SECTIONs constitute the bulk of the synchronization being used. Or maybe you mean something else entirely.

I could have some sort of semaphore dealy where an incoming message triggers the entity to wake up. In fact I probably should do something like that for the "inactive" entities. I'll add that to the to do list. As for having an incoming message interrupt an "active" entity's sleep by using a semaphore... I could, I suppose. It would make the thread wake up and sleep and wake up and sleep more often, though, trying to go back to sleep after each message to fulfill the requested sleep time. I don't know if that increase in synchronization primitive fiddling will have an adverse effect on performance or not. It would increase responsiveness, as an entity can shoot off a response message quicker, but you could do the same by making the entity update faster...

And oh man, the original plan I had when the whole thing was still in my head was to make the thing run on a single thread or many threads, and let the user switch between to test the performance difference. But then I was like - oh man - for the comparison to be fair both sides would have to be properly optimized and also how do I structure the whole thing to switch like that oh god. Way too much work there. As for the framerate, it's significance to me is that it the system works and that the environment is capable of decent framerates. The whole context switching issue, or the issue of just having so many threads and me not working with anything on that scale before was troublesome - I didn't know what kind of overhead I was looking at putting together here! But it doesn't seem to be much because it works just fine when I run it. And again, consolodating all the bullets into 1 entity, for instance, is totally possible, in case a full fledged game gets totally out of hand with threads.

vvvvvvvvvvvvv I know, but haven't touched, that you can initialize a CRITICAL_SECTION with a "spin count", as Microsoft calls it, where it'll busy loop X amount of times trying to get the lock before waiting on it. Most of the CRITICAL_SECTION usage is just pushing items into a queue, and is very fast. Do you think using the spin count would help?

Kimani fucked around with this message at 02:34 on Sep 15, 2008

ehnus
Apr 16, 2003

Now you're thinking with portals!
CRITICAL_SECTIONS are only fast in the case where there is no contention. If there is contention for the lock then it degenerates into a MUTEX which means thousands of cycles wasted as you take a trip down into kernel land and I'd imagine there's a fair bit of contention with that many threads.

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

ehnus posted:

CRITICAL_SECTIONS are only fast in the case where there is no contention. If there is contention for the lock then it degenerates into a MUTEX which means thousands of cycles wasted as you take a trip down into kernel land and I'd imagine there's a fair bit of contention with that many threads.

Even then, lockless programming is significantly faster, and all that still ignores the issue of object-level parallelism for collision detection/response.

I'm really not even going to bother explaining anymore why your chosen method of parallelism is stupid, since it should be self-evident to anyone with even the most rudimentary knowledge of multithreading.

Avenging Dentist fucked around with this message at 02:37 on Sep 15, 2008

ehnus
Apr 16, 2003

Now you're thinking with portals!
True, though lockless programming is drat difficult and fraught with pain and misery even if you have experience doing it. You get to run into fun problems, like atomicity guarantees going away in multi-socket machines if you're performing a locked operation on an unaligned address.

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

ehnus posted:

True, though lockless programming is drat difficult and fraught with pain and misery even if you have experience doing it.

One good way to do lockless programming is to use a single thread. :xd:

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!
How is any of this advantageous over using work queues anyway?

ehnus
Apr 16, 2003

Now you're thinking with portals!

Avenging Dentist posted:

One good way to do lockless programming is to use a single thread. :xd:

Haha, touche.

Kimani
Dec 20, 2003

Anime avatar - check
ADTRW poster - check
Defends watching child porn - CHECK!!!

Avenging Dentist posted:

I'm really not even going to bother explaining anymore why your chosen method of parallelism is stupid, since it should be self-evident to anyone with even the most rudimentary knowledge of multithreading.
Explain "anymore"? You haven't really said anything besides hahaha and that you haven't the words. I'm interested in your insight.

OneEightHundred posted:

How is any of this advantageous over using work queues anyway?
Well it is sort of using a work queue. It's just that the producer, the game logic, AI, etc., is massively parallel.

Kimani fucked around with this message at 03:29 on Sep 15, 2008

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

Kimani posted:

Explain "anymore"? You haven't really said anything besides hahaha and that you haven't the words. I'm interested in your insight.

I'm not going to get into this too much, but parallelizing at the object level is inappropriate because it requires synchronization between any pair of objects to perform comparison between them (e.g. for collision detection). Parallelizing physics is extremely hard because - to create an efficient engine - operations on pairs of nearby objects should be as fast as possible (culling is very important too, but that's a separate issue). It would be better (though still not great) to have a thread per BSP partition rather than per object. This would greatly reduce the amount of synchronization and context switching required.

As other people have mentioned in brief, even for a supercomputer with arbitrarily many cores, your method of parallelism is inappropriate simply because of synchronization overhead. In general, you should always endeavor to put objects that "talk" to each other in a single thread.

Really trivial example of good parallelizing: imagine a 2d cartesian grid simulating the propagation of a wave. The equations for wave propagation at a given point are based on the adjacent points. By dividing the grid into an appropriate number of subgrids (one per core), you can achieve significant performance improvements in part because only the points at the edges of each subgrid depend on the points belonging to another core.

Kimani
Dec 20, 2003

Anime avatar - check
ADTRW poster - check
Defends watching child porn - CHECK!!!
Okay, so collision detection is the issue. I've taken a class on parallel computing using MPI doing things like you mention, so I know what you're talking about. Minimizing "surface area" and all that.

It's a little different than the supercomputer situation, since there's no shared memory in that case ( unless there is, but there's disadvantages to having such. ) In this case, there is shared memory, that is the heap. Namely, within this heap, the scene management structures that would be used to do the collision detection. So everything that "talks" together is in the same thread, it's just a matter of not getting interrupted while you talk.

Now before I go too far, while I do intend to use this game environment for future projects ( wouldn't want to waste the effort... ) for this program I sort of fudged the collision detection because it needed more work and it was fudgable - and I wanted to finish the drat thing.

The simplest way to do it is just lock the whole scene graph, do your collision test, and get out. But there's multiple things wrong with that. If all you're doing is testing, and not moving things around within the scene graph, then you don't need to impede other threads from doing testing at the same time. So you don't need to lock. But if you are moving things around in the scene graph, you do need to lock so that you don't get 'bad things'.

However, if you're moving things around in area A, there's no reason to deny another thread to do testing in area B. So rather than lock the whole scene graph, you only lock a portion.

This is my end-game plan for collision detection - find/think up some scene management system that lets me do this effectively. The need for synchronization can be at a minimum because threads accessing opposite ends or even adjacent areas of the graph need not interfere with each other - I'm betting that the amount of threads vying for the same area will be minimal! I haven't done this yet because, well, I am but one man, it wasn't a priority, and I haven't gotten to it yet. I don't think it's infeasible.

Also I would plan to have static geometry in the scene be in a lockless scene graph, the idea being that altering of it would take place on a higher level to avoid problems - level changes, for instance, when no collision tests are taking place.

floWenoL
Oct 23, 2002

quote:

Okay, so collision detection is the issue. I've taken a class on parallel computing using MPI doing things like you mention, so I know what you're talking about. Minimizing "surface area" and all that.

lol. Is that the extent of your parallel programming experience? MPI is wholly inappropriate for intra-process parallelism.

Kimani posted:

It's a little different than the supercomputer situation, since there's no shared memory in that case ( unless there is, but there's disadvantages to having such. ) In this case, there is shared memory, that is the heap. Namely, within this heap, the scene management structures that would be used to do the collision detection. So everything that "talks" together is in the same thread, it's just a matter of not getting interrupted while you talk.

I have no idea what this paragraph says.

quote:

Now before I go too far, while I do intend to use this game environment for future projects ( wouldn't want to waste the effort... ) for this program I sort of fudged the collision detection because it needed more work and it was fudgable - and I wanted to finish the drat thing.

The simplest way to do it is just lock the whole scene graph, do your collision test, and get out. But there's multiple things wrong with that. If all you're doing is testing, and not moving things around within the scene graph, then you don't need to impede other threads from doing testing at the same time. So you don't need to lock. But if you are moving things around in the scene graph, you do need to lock so that you don't get 'bad things'.

However, if you're moving things around in area A, there's no reason to deny another thread to do testing in area B. So rather than lock the whole scene graph, you only lock a portion.

This is my end-game plan for collision detection - find/think up some scene management system that lets me do this effectively. The need for synchronization can be at a minimum because threads accessing opposite ends or even adjacent areas of the graph need not interfere with each other - I'm betting that the amount of threads vying for the same area will be minimal! I haven't done this yet because, well, I am but one man, it wasn't a priority, and I haven't gotten to it yet. I don't think it's infeasible.

Also I would plan to have static geometry in the scene be in a lockless scene graph, the idea being that altering of it would take place on a higher level to avoid problems - level changes, for instance, when no collision tests are taking place.

This is a lot of loving effort for something that shouldn't be parallelized (like how you're doing it) in the first place.

floWenoL fucked around with this message at 07:13 on Sep 15, 2008

Professor Science
Mar 8, 2006
diplodocus + mortarboard = party
#cobol agrees: I am the resident parallel programming expert. So I'll respond to one aspect of this that kind of makes your idea completely retarded and leave the rest because it's a waste of time.

Kimani posted:

This is the plan. We'll have, say, 1024 core machines sometime in the future, and this system will take full advantage of them. All new PCs sold today are dual core or better as it is.
How coarse is your synchronization? Sounds like NOT VERY AT ALL, ergo all of the power of a hypothetical 1024 core processor is going to be wasted on synchronization. I bet it's not exactly optimized for SIMD workloads, either, so a super-flexible vector unit (like a GT200's SM or the Larrabee vector unit, which is what we're going to get instead of lol 1024 out-of-order superscalar cores) would be completely wasted.

Kimani
Dec 20, 2003

Anime avatar - check
ADTRW poster - check
Defends watching child porn - CHECK!!!

floWenoL posted:

lol. Is that the extent of your parallel programming experience? MPI is wholly inappropriate for intra-process parallelism.
Avenging Dentist's examples were supercomputer related. So I figured I'd note that I'm familiar with MPI. There are some other things I've written that are threaded.

quote:

I have no idea what this paragraph says.
He says you need to synchronize two objects to compare them, so I should put them together in the same thread. I say I put representations of both objects in a data structure on the heap, so any thread can access both objects, so you don't need to synchronize them.

quote:

How coarse is your synchronization? Sounds like NOT VERY AT ALL, ergo all of the power of a hypothetical 1024 core processor is going to be wasted on synchronization.
Opening up the debugger and freezing it, with 299 threads active, only 18 entities are running. That's, what 6%? The rest are asleep. Each will probably send a couple messages each time they're alive, but it's not like with a 1024 core processor every core will be sending a message at the same time. I do understand, I guess, it just might not be that bad.

What do you think would be a better solution? Getting as much parallelization as possible would, naturally, be ideal. Game logic can be parallelized very nicely ( which hopefully this program shows! ) and I'm betting that collision detection can be as well. Is having a certain amount of worker threads each of which assigned a bunch of entities to loop over better? ( as suggested earlier ) Or maybe something more interesting can be done to better take advantage of what's available.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...

Kimani posted:

Well, you can do that. The system is very flexible!
This is the plan. We'll have, say, 1024 core machines sometime in the future, and this system will take full advantage of them. All new PCs sold today are dual core or better as it is.

Probably not. The "Super-Multi Deep-pipelined core" intel was trying to push 5 years or so ago is pretty much dead, as programmers are having trouble designing individual programs that effectively utilize more than 2 cores (games aside) and the perf/dollar just isn't there. Thus, the move to GPU computing/Larabee.

floWenoL
Oct 23, 2002

Kimani posted:

He says you need to synchronize two objects to compare them, so I should put them together in the same thread. I say I put representations of both objects in a data structure on the heap, so any thread can access both objects, so you don't need to synchronize them.

Putting them "on the heap" in no way removes the need for synchronization as the data values in question aren't constant. Where else would that data live, anyway, on the thread stacks? :confused:

quote:

What do you think would be a better solution? Getting as much parallelization as possible would, naturally, be ideal. Game logic can be parallelized very nicely ( which hopefully this program shows! ) and I'm betting that collision detection can be as well. Is having a certain amount of worker threads each of which assigned a bunch of entities to loop over better? ( as suggested earlier ) Or maybe something more interesting can be done to better take advantage of what's available.

Getting as much *performance* as possible is ideal, which is a different goal than getting as much parallization as possible.

floWenoL fucked around with this message at 11:17 on Sep 15, 2008

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...
Exposing parallelism is good, but only a means to improving utilization -- the efficiency with which you use the sillicon on the processor.

Kimani posted:

Avenging Dentist's examples were supercomputer related. So I figured I'd note that I'm familiar with MPI. There are some other things I've written that are threaded.
He says you need to synchronize two objects to compare them, so I should put them together in the same thread. I say I put representations of both objects in a data structure on the heap, so any thread can access both objects, so you don't need to synchronize them.
Opening up the debugger and freezing it, with 299 threads active, only 18 entities are running. That's, what 6%? The rest are asleep. Each will probably send a couple messages each time they're alive, but it's not like with a 1024 core processor every core will be sending a message at the same time. I do understand, I guess, it just might not be that bad.

What do you think would be a better solution? Getting as much parallelization as possible would, naturally, be ideal. Game logic can be parallelized very nicely ( which hopefully this program shows! ) and I'm betting that collision detection can be as well. Is having a certain amount of worker threads each of which assigned a bunch of entities to loop over better? ( as suggested earlier ) Or maybe something more interesting can be done to better take advantage of what's available.

you can actually get a good deal of parallelism and improve the amount of synchronization needed by "double-buffering" your game state -- having a read-only version of the current world state which all your "threads" (compute tasks mapped to your N worker threads) read from to do their processing, and a write-only buffer to which the threads write their output. After a processing frame, swap the two buffer's modes, so read becomes write and vice versa.

Of course, then your worker threads need to have little (ideally no) overlap in their write output ranges to eliminate contention. This means not relying on intermediate computations from other worker threads, which will require doing some redundant computation in each of your worker tasks; however, this will likely still be more efficient than the overhead from locking or incomplete hardware utilization.

akadajet
Sep 14, 2003

Bad idea or not, I'm just impressed that you went all the way through with it.

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

I'd like to build a small card game

At first I thought about just doing it in Visual Basic, but then I'd have EXE files which don't email well. I guess I could always just host them on a website and make people download them...

Then I thought, maybe I should do it on the web? I don't have any modern web experience so it might be fun to learn.

Any suggestions on what to do it in? I'd basically like to do a point/click interface with graphical cards. A suggestion of a pre-built card library would rock as well.

Bob Morales fucked around with this message at 14:46 on Sep 15, 2008

tyrelhill
Jul 30, 2006

tyrelhill posted:

Anyone know about a leak detector for COM objects, specifically DirectX objects in C/C++?

Anyone? I cant find anything!

Professor Science
Mar 8, 2006
diplodocus + mortarboard = party

Nuke Mexico posted:

Exposing parallelism is good, but only a means to improving utilization -- the efficiency with which you use the sillicon on the processor.
well, you can have plenty of utilization without increasing performance--just spinlock during synchronization

Adbot
ADBOT LOVES YOU

OneEightHundred
Feb 28, 2008

Soon, we will be unstoppable!

Kimani posted:

Opening up the debugger and freezing it, with 299 threads active, only 18 entities are running. That's, what 6%? The rest are asleep.
:psyduck:

quote:

What do you think would be a better solution?
Make things task-based, have a thread count based on the processor count, digest the task queue with those threads.

quote:

Probably not. The "Super-Multi Deep-pipelined core" intel was trying to push 5 years or so ago is pretty much dead, as programmers are having trouble designing individual programs that effectively utilize more than 2 cores (games aside) and the perf/dollar just isn't there. Thus, the move to GPU computing/Larabee.
That and the fact that the things that are sucking up the most CPU right now would benefit far more from a fast vector processor than a CPU designed for erratic branch-heavy computation.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply