|
Comparison: My first attempt at a two-pass implementation; except it is only the first pass. I need to integrate the Weight Union-Find library I found to work with how I implemented the algorithm and I'm not sure the best approach yet; the library I found works with ints and not sure how to use it with node-like structures but I'll see. Since its the first pass I assume the current trippy appearance is due to not merging the equivalence labels yet; and I hope there's no weird bug in there somewhere. I think largely whats happening is pixels that are part of a larger landmass that occur sooner by being more south and east than some later pixel that is north and west is going to result in the first pass a split coloured area. Good news is doing this has a time of approximately 28 seconds; which is about 20 seconds faster than I was expecting it to be to scan both water and land zones. I even managed to figure out some nifty optimizations, like being able to avoid bounds checks, or even bother with checking for labels. Code: code:
So I think once I finish this off I'll next look into multithreading or GPU parallelization which is tricky as this algorithm is heavily dependent on the previous state of the mapping.
|
# ? Oct 25, 2021 07:38 |
|
|
# ? May 25, 2024 10:19 |
|
sorry this probably gets asked all the time but does anyone have a good Unity course they would recommend? I already know a bit of C# so ideally one that is a course on Unity rather than an introduction to programming in general. I started "Complete C# Unity Game Developer 2D" on udemy and the guy is very pleasant but it's just too slow (like he explains where the parentheses keys are on the keyboard level of slow). I want to make a strategy game hehe
|
# ? Oct 25, 2021 12:16 |
|
fuf posted:sorry this probably gets asked all the time but does anyone have a good Unity course they would recommend? Unity has some tutorials on their site which are quite handy. I like the FPS microgame one.
|
# ? Oct 25, 2021 12:45 |
|
Raenir Salazar posted:So I think once I finish this off I'll next look into multithreading or GPU parallelization which is tricky as this algorithm is heavily dependent on the previous state of the mapping. 28 seconds seems nuts for a flood filling. Perhaps it would be faster if you performed it in a structure that's less complicated that whatever structure you're actually using? Like if you first convert whatever representation your 2D grid is into a single flat array of bytes that are good enough to signify whatever the criteria are for edges of the flood fill, then perform the flood fill in your bytes, then map the outcome back onto your original structure. I don't know what's going on under the hood in your case, but flood-filling sections of even a 4000x4000 grid shouldn't be the work of tens of seconds on a modern CPU.
|
# ? Oct 25, 2021 13:15 |
|
Mr Shiny Pants posted:Unity has some tutorials on their site which are quite handy. I like the FPS microgame one. oh cool thanks, yeah those "pathways" on their site look pretty good
|
# ? Oct 25, 2021 14:36 |
|
A quadtree might be a useful data structure for this problem.
|
# ? Oct 25, 2021 16:27 |
|
roomforthetuna posted:Since you're flood-filling multiple regions, you could make each region take a thread? I mean, its 33,554,432 pixels for a 8192 by 4096 texture. I feel like that's gotta be large enough that no matter how modern the CPU, working within the Unity editor play mode its probably going to struggle? I went and commented out SetPixel(...) and it was 19 seconds. Fake edit, I went and stripped as much as I could so its only populated a 2D int array with labels and a int[2][2] array with neighbours which gets re-written. code:
With 16 threads I imagine this takes 2 seconds which is probably fine, but I think the reason why it seems slow is the Unity editor is just on a single thread? Without multithreading it isn't like its using the full power of my machine. Bongo Bill posted:A quadtree might be a useful data structure for this problem. Hrm, assuming you're talking to me, a quadtree is useful if you're trying to determine if certain features are close to other features. I imagine a quadtree would be useful at some point for my purposes because I'm going to be doing lots of checks to know if certain land masses are close to other landmasses and that would speed things up along with pathfinding algorithms like A* to check connectivity but I don't think its useful here where what I want to do is have each region water or land, be processed, and have the number of pixels comprising it and their coordinates. I think mathematically it doesn't really work, because instead of iterating over each pixel twice for two pass; each pixel then I think has to 16 or so iterations through the quadtree to see who the neighbours are? I don't think that works. e2: Maybe the Colour32 comparisons are slow? I converted a texture2D to a Colour32 native array outside my stopwatch wrapper which I figured was faster than Texture2D.GetPixel(...); maybe because the rgba values are floats that's slower? I doubt it though. Raenir Salazar fucked around with this message at 16:58 on Oct 25, 2021 |
# ? Oct 25, 2021 16:50 |
|
What type is map there, and does the type of thing it contains implement an overload of Equals that takes the specific type instead of object? If not, that might be boxing on every call to Equals. 18 seconds does seem slow to me, and that's my only theory as to why it is at the moment after looking at that code. e: The Unity docs don't mention implementations of Equals or the == operator for either of the color types, which I think would mean it's doing a slow thing there (a bit hazy on the specifics of the default implementation of Equals for structs, but pretty sure it would have to box). Of course, the docs could just be missing that info. chglcu fucked around with this message at 17:20 on Oct 25, 2021 |
# ? Oct 25, 2021 17:03 |
|
chglcu posted:What type is map there, and does the type of thing it contains implement an overload of Equals that takes the specific type instead of object? If not, that might be boxing on every call to Equals. 18 seconds does seem slow to me, and that's my only theory as to why it is at the moment after looking at that code. The name of the native array is "map" that is not the type. The .Equals is on Colour32 types. I realized I can actually get a byte[] array using the GetRawTextureData method which might be more performant but that leaves me with a tricky problem that I'm not sure about the representation. A byte array means for a RGBA32 format pixel for say a 16 by 8 texture with no mipmaps is a 16 by 8 by 4 array of bytes according to the docs. So do I do this: code:
code:
code:
code:
|
# ? Oct 25, 2021 17:42 |
|
I'm curious what would happen if you add this:code:
code:
code:
|
# ? Oct 25, 2021 18:02 |
|
Lol, that reduces it to 6 seconds. Wtf Unity. Helps I actually can also just skip the Alpha channel since they should never be anything other than full.
|
# ? Oct 25, 2021 18:10 |
|
Raenir Salazar posted:Lol, that reduces it to 6 seconds. Wtf Unity. The difference in the generated IL code, in case you're curious: Calling .Equals: code:
code:
|
# ? Oct 25, 2021 18:15 |
|
That's extremely interesting; there's a lot happening under the hood I often don't really get to appreciate until situations like these. I'll amend my earlier statement to just "lol". I did some more stop watches, my program also seems to have an initial 3-4 second lag since I save some textures to the hard drive, and added together is approximately how long it takes from pressing the play button to the editor unfreezing itself. Thanks for that, that saves me from having to do finnicky coding with bytes. e: For the record its apparently 0.006s to generate two render textures from the GPU for the initial heat map; so presumably its like 0.003s per. Or 3ms. I assume thats good. Raenir Salazar fucked around with this message at 18:34 on Oct 25, 2021 |
# ? Oct 25, 2021 18:22 |
|
You can probably just check the red channel for determining what's land/water.
|
# ? Oct 25, 2021 23:57 |
|
It's going to be pretty minor compared to silently allocating memory every single time you call Equals, but I also noticed you're iterating through your label array like:code:
A modern CPU can execute billions of instructions per second - with merely 33 million pixels to look at, I'd expect you could get times comfortably under a second with just a single thread if you can avoid the CPU stalling because it needs to access memory instead of local caches.
|
# ? Oct 26, 2021 00:25 |
|
Jabor posted:It's going to be pretty minor compared to silently allocating memory every single time you call Equals, but I also noticed you're iterating through your label array like: Are you suggesting to make it a 1D array? I'm not super well versed on how it works to access memory instead of local caches etc.
|
# ? Oct 26, 2021 00:41 |
|
Basically, accessing main memory is really really slow. Hundreds of clock cycles slow. Instead of waiting for main memory all the time, the processor will have a local cache that can be accessed much more quickly. This cache is organized as a series of 64-byte cache lines, and it will contain both lines that have been recently accessed, and sometimes lines that a (rather simple) prefetcher thinks will be used next. When you're doing a bulk operation, where you do the exact same thing to very many bits of data, you really really want things to be arranged so that you access memory in a linear sequence - that way you access memory that's already cached as much as possible, and when you step from one cache line to the immediate subsequent line, that's probably already been prefetched. Using a 1d array is one way to make it clear whether you're accessing the elements linearly, or if you're making big jumps between widely-spaced elements. But you can do it with a 2d array as well - the implementation of multidimensional arrays is pretty straightforward.
|
# ? Oct 26, 2021 01:04 |
|
Jabor posted:It's going to be pretty minor compared to silently allocating memory every single time you call Equals, but I also noticed you're iterating through your label array like: Nice catch (or cache, I suppose). I completely missed that. Yeah, that should help a bit. I’ll let Jabor explain fully since they found it, but the short version is that if you swap your i and j indices into labelMap, you’ll be accessing memory sequentially which is much more performant due to how CPUs handle reading from RAM and the layout of a 2d array in memory. CPU cache 2d array locality might be good search terms for the details.
|
# ? Oct 26, 2021 01:06 |
|
I *think* I did it, but didn't seem to be noticably faster. Clocking in total around 8.5s. My currently revised code: code:
|
# ? Oct 26, 2021 01:33 |
|
Now you've got the opposite problem - you're jumping all over map instead of accessing it in a linear fashion! You need to transpose labelMap so that it gets walked in the same way as map - with y positions first and x positions second - then you'll be able to set up your loops so that both arrays are always accessing adjacent elements.
|
# ? Oct 26, 2021 01:53 |
|
Jabor posted:Now you've got the opposite problem - you're jumping all over map instead of accessing it in a linear fashion! The idea of transposing is kinda breaking my brain though, like is my "0,0" still supposed to be the bottom left or technically is it supposed to be the bottom right? e: Would this be easier if I unrolled my 2D array into a 1D array so the entries matched up? Lets suppose I transpose my original 2D array like so: code:
For something like this? code:
code:
Raenir Salazar fucked around with this message at 03:10 on Oct 26, 2021 |
# ? Oct 26, 2021 02:53 |
|
0,0 is still the top left. Right now 0,1 is the leftmost column, one pixel down from the top, while 1,0 is top row, one pixel in from the left. You want it to be the other way around, so 0,1 is top row, one pixel in. This will make it so that map[i*width + j] and labelMap[i, j] correspond to the same position.
|
# ? Oct 26, 2021 03:06 |
|
That was fun to read from my last post, where I started to doubt myself in the face of "but it's a shitload of pixels so tens of seconds is fine" then other people chimed in and got it to be a simpler structure and now it's under ten seconds like I thought it should be. Hooray! I think done right it should be achievable in under 2 seconds (on one thread, even less if the work could be divided sensibly), but I'm not really sure what the final purpose of the data is so maybe it has to end in a less efficient structure. For most purposes if it was me I'd make the primary representation a flat array of 4096*8192 bytes, each byte being a "label" index, do most things based on that, and only convert to other formats on demand. You could offload converting a 32MB index array to a 128MB 32-bit texture to the GPU even, if it's a conversion you might need to do frequently.
|
# ? Oct 26, 2021 03:13 |
|
Jabor posted:0,0 is still the top left. Hrm, IIRC in Unity3D for texture coordinates the origin 0,0 is Bottom Left no? So when I import my data from my Texture2D its 0,0 assumes bottom left? I think its been a couple of times now I said bottom left and you said top left and I want to clear this up in case it matters. e: Unity documentation says lower left corner for "texture coordinates" https://docs.unity3d.com/ScriptReference/Texture2D.GetPixel.html but I think this is still unclear as to whether that also means "pixel coordinates" vs "UV coordinates". e2: https://answers.unity.com/questions/810616/which-corner-is-the-origin-for-getpixel.html the guy here also says bottom left; most people at the Unity forums seem to say bottom left. roomforthetuna posted:That was fun to read from my last post, where I started to doubt myself in the face of "but it's a shitload of pixels so tens of seconds is fine" then other people chimed in and got it to be a simpler structure and now it's under ten seconds like I thought it should be. Hooray! The purpose is to create a procedural world map from a height map; so I take my height map and then use a cut off to determine whats land and whats water; but I want to take the land tiles and carve out things like provinces/boundaries etc so I need to know information like all pixels that connect to each other to form separated land masses and similarly for water. My first goal is to use this to make a procedural world map generator for Paradox games so the water information is important because many inland water zones are un-interactable. I greatly appreciate all the help in trying to get this to operate in a reasonable time without resorting to the GPU. Raenir Salazar fucked around with this message at 03:27 on Oct 26, 2021 |
# ? Oct 26, 2021 03:17 |
|
Raenir Salazar posted:Hrm, IIRC in Unity3D for texture coordinates the origin 0,0 is Bottom Left no? So when I import my data from my Texture2D its 0,0 assumes bottom left? I think its been a couple of times now I said bottom left and you said top left and I want to clear this up in case it matters. I'm not familiar with Unity3D specifically, you're probably correct here. It doesn't really matter in this case though, the important thing is that the coordinates should be the same for both your labelMap array and your imported texture data. In the code you've presented they're swapped around (labelMap[j, i] corresponds to the map pixel map[i*width + j]), you need to un-swap them.
|
# ? Oct 26, 2021 03:28 |
|
Hrm, I'm still having a little trouble wrapping my head around as there were some revisions between postings of code but here's my updated code and I switched the 2D labels array to a 1D array and it shaved off another 2 seconds. This revision swaps i and j to x and y to hopefully be more clear. code:
e to add: I'm basically at 5.8s now when it used to be 8.8s; which is pretty good. e2: I also changed ColourEquals to only compare the red channel, which strangely didn't net very much. Raenir Salazar fucked around with this message at 04:43 on Oct 26, 2021 |
# ? Oct 26, 2021 04:35 |
|
Yup, that sounds about right. The way you have it is correct - note that immediately after processing, say, element 2037, you then proceed on to element 2038, which will be right next to element 2037 in both arrays. That gives you very good locality. I think further optimizations are possible, but you'd need to change the structure you're outputting (but at a guess, the "bag of coordinates" you're currently outputting is going to be difficult to do anything performant with downstream anyway...). My first thought would be that the whole "finding which label is smaller when there are two possibilities" part is totally redundant - if you're going to be using UnionFind to merge the labels together, it doesn't actually matter which one you picked for this node. So you could use an algorithm like: code:
e: I guess swap "above" with "below" for your use case, I know I'm going to keep messing that up so I won't even try to fix it. Jabor fucked around with this message at 05:06 on Oct 26, 2021 |
# ? Oct 26, 2021 04:57 |
|
I'll give that a look tomorrow! Someone in a discord I'm in suggesting Vectorizing my loops to take advantage of SIMD instructions; would that gain any benefit? I was doing some googling but nothing I could easily implement came up. e: A clever solution I think might just be not to do the "bag collection" of pixels here at all; just record all the label groups and then move the pixel collecting to the GPU where I can just assign all the pixels to the relevant arrays and return them somehow or use ParallelFor since its a simpler problem space by separating them? Raenir Salazar fucked around with this message at 05:24 on Oct 26, 2021 |
# ? Oct 26, 2021 05:08 |
|
Generally speaking, vectorizing with 4-way SIMD instructions will improve performance somewhere between "not at all" and a 4x improvement depending on the problem. I'm not too sure it'll help in this case - critically, the label you assign to the fourth value you look at is going to depend on what label gets assigned to the preceding three values - that dependency is going to heavily limit how much benefit you can get from vectorizing. It's more useful for totally-parallel operations where the output from one element only depends on a defined set of inputs, and not on the output from other elements. You could perhaps look at a diagonal stripe all at once - say, process (1,4),(2,3),(3,2) and (4,1) as one unit, followed by (1,5),(2,4),(3,3) and (4,2), that way all of the outputs each one depends on will have already been written. You'd need to write some special-case code to handle the start and end of each row. It's questionable whether that would actually improve things though - if your existing solution was constrained by memory bandwidth instead of CPU cycles, it probably wouldn't. If you really wanted more performance, you could learn about using profiling tools to identify what parts of your code are the bottlenecks where improving them would net the biggest gains.
|
# ? Oct 26, 2021 05:34 |
|
I went and did what you suggested plus I removed the code were I added Coords to a List and I'm down to 1.8s! e code: code:
Raenir Salazar fucked around with this message at 05:40 on Oct 26, 2021 |
# ? Oct 26, 2021 05:37 |
|
What to replace the bag collection with is going to depend a lot on what you intend to do with this output. For a lot of use cases, a single point within the region is equally as useful as the entire bag of points, because what the consuming code really wants to do is something like find all the boundary points, or work with a 2d bitmask of in vs. out points, or something like that. But it really depends on what you want to do.
|
# ? Oct 26, 2021 05:39 |
|
Raenir Salazar posted:What I could do now is ParallelFor those two initial setup loops and maybe investigate a pre-allocated int array for SimpleLabels and maybe this is as good as it gets? What are you trying to do with SimpleLabels? It looks like it's just an array of numbers that counts up to whatever currentLabel is at?
|
# ? Oct 26, 2021 05:49 |
|
Jabor posted:What to replace the bag collection with is going to depend a lot on what you intend to do with this output. For a lot of use cases, a single point within the region is equally as useful as the entire bag of points, because what the consuming code really wants to do is something like find all the boundary points, or work with a 2d bitmask of in vs. out points, or something like that. But it really depends on what you want to do. Figuring things out like the size of the regions is a little important because some of the "islands" are going to be really tiny and need to be consolidated together with other tiny islands as being the "same" province or with the nearest larger landmass province and be "merged" with them. (Merging being that they get coloured the same unique colour as the rest of the province they are a part of for province 'selection' purposes) I dunno off hand if I want information like all the pixels that form the outer edge of a area but I might. One important thing and partly why I went to this trouble is also so I can randomly select only land pixels or only water pixels for generating delaney triangulation points to form a voronoi map to create provinces and I want to be able to deterministically say "Okay I want the map to have 1,000 provinces" and have exactly that, and my initial idea is just selecting 1,000 random land pixels distributed by area size to proportionally distribute them out. Assuming the average province is 20,000 pixels in area and I know an island has 2,000,000 pixels they should have on average 100 provinces and each area I can determine their share of the total number of provinces and assign nearly exactly that many. And islands that are too small but are close together would be a special case to consolidate them together as 1 province instead of assigning 1 triangulation point to a blob of merely 30 pixels which might be too small to even click. TheLastManStanding posted:What are you trying to do with SimpleLabels? It looks like it's just an array of numbers that counts up to whatever currentLabel is at? Its essentially list of detected regions; prior to the second pass anyways. e: Now I think I see what you're getting at, as the code currently stands I don't need a list of them; but the second past of the algorithm I think might need them because it involves merging equivalent labels together. Raenir Salazar fucked around with this message at 05:57 on Oct 26, 2021 |
# ? Oct 26, 2021 05:51 |
|
roomforthetuna posted:I think done right it should be achievable in under 2 seconds Raenir Salazar posted:... and I'm down to 1.8s!
|
# ? Oct 26, 2021 12:43 |
|
This is fun, you went from 3 minutes to under 2 seconds
|
# ? Oct 26, 2021 15:07 |
|
Sad news is adding in the UnionFind code and the additional if comparison to record the equivalences adds between 4 to 6s to the total runtime. For reference, code from here: https://www.codeproject.com/Articles/336915/Connected-Component-Labeling-Algorithm code:
code:
Goes from 1.8s, to 3.2s with the additional if check to 5.2s with the Join. Without the != it would be 8.5s or so. My code for the second pass which seems to take 2.2s: code:
|
# ? Oct 28, 2021 04:30 |
|
I’m not seeing the declarations for Labels and labelMap. What types are they? I’m wondering if those index operators might be slow, but hard to tell without knowing what exactly they are. The join itself doesn’t immediately look like it should be that big of an impact to me.
|
# ? Oct 28, 2021 04:44 |
|
chglcu posted:I’m not seeing the declarations for Labels and labelMap. What types are they? I’m wondering if those index operators might be slow, but hard to tell without knowing what exactly they are. The join itself doesn’t immediately look like it should be that big of an impact to me. No problemo. code:
code:
|
# ? Oct 28, 2021 04:47 |
|
Raenir Salazar posted:No problemo. I guess it could be the dictionary lookup. Can that be made an array using your keys as indices? Can’t tell for sure at the moment if the keys it’s using are a contiguous sequence.
|
# ? Oct 28, 2021 04:51 |
|
|
# ? May 25, 2024 10:19 |
|
chglcu posted:I guess it could be the dictionary lookup. Can that be made an array using your keys as indices? Can’t tell for sure at the moment if the keys it’s using are a contiguous sequence. I think they are, making it an array instead actually makes it about 2.2s faster. Largely negating the time of the second pass. Makes it around 5s exactly now as the total runtime. So subtract 2.2s and we're down to 2.8s for the first pass. e: There's an alternate Weighted Quick Union Find out there here but I'm confused as to how I'm supposed to use it. It seems like you instantied the data structure with a pre-set number of "dots" and then can start doing your unions but I don't know if that's supposed to actually correspond to something, like the approximate number of labels? Do I pass in 20,000 and then use it as is? Raenir Salazar fucked around with this message at 05:05 on Oct 28, 2021 |
# ? Oct 28, 2021 04:58 |