|
PilotEvan posted:Okay I have a pretty specific question. Say you had a huge amount of data with a tree-like relationship, maybe for example a list of all species names (and their respective genera, families, etc.). And say you wanted to make queries like, "Output all children (and grandchildren and so on) of the class Mammalia" (or phylum Arthropoda or whatever the hell). What kind of data structure would you want for that? I'm completely inexperienced with hash tables but I'm pretty sure that's the right answer. A straight-up tree data structure seems like it might make traversing and outputting the data simpler/faster but after doing some research it looks like it might be better to do some kind of hash list or even a hash tree to have pointers(?) to a child's parents or adjacent siblings (assuming there is ordering between siblings) with an added bonus of finding the starting node I want to begin with in a reasonable amount time. If there aren't any changes made to the data set whatsoever it makes me think that some augmented hash table is probably the way to go but I'm still unsure. If hash tables do the trick I'm open for any book recommendations or other references that might put me on the right track. Well, one of the big things you want is the path to the origin. With something like a list of species it might make sense to force the user to specify the path to the origin, but if you can't do that then it might make sense to make a hash set that you can use to find a reference to the origin node. Once you have the origin node you just recurse through the children. The big question is what your constraints are. If you're storage limited that's a different problem than if you need it to be as fast as possible.
|
# ? Sep 17, 2013 04:27 |
|
|
# ? May 31, 2024 14:46 |
|
If there aren't any changes after initialization, and if the dataset is small enough that you can keep a couple of pointers for each node in memory, my (completely uneducated) approach would be
Warning: if there's even the slightest chance that somewhere down the road, users will want to add stuff to the structure, go with a hash table/tree/whatever. coffeetable fucked around with this message at 05:08 on Sep 17, 2013 |
# ? Sep 17, 2013 05:02 |
|
PilotEvan posted:"Output all children (and grandchildren and so on) of the class Mammalia" (or phylum Arthropoda or whatever the hell). What kind of data structure would you want for that? For the tree itself, the data structure you want is called a rose tree, i.e. a tree with an arbitrary number of children from each node. Each node contains an element and a list (singly linked list will do fine) of subtrees. (Leaf nodes simply have an empty list of children.) Once you have a node in the tree you can recurse over its children easily. To get to a tree node, unless you are navigating the tree from the top, is a separate problem for which you want an index. I.e. a table or something that lets you find a tree node from whatever kind of query you give it. If you want actual Google-like search, you will have to learn how search indexes work. If you are content with exact string matching on the name of the node, a hash table is indeed what you want. (A hash table is always good if you want to map arbitrary strings to arbitrary values.) So you would map string names to tree nodes. Done.
|
# ? Sep 17, 2013 11:17 |
|
But he's using %d anyway, on what platforms does that print more than, like, 2^31? There's plenty of space, clearly.
|
# ? Sep 17, 2013 11:39 |
|
Suspicious Dish posted:The smallest 64-bit signed integer takes up 20 characters, including the negative sign. So unless you have some magical 65-bit signed integers, I'm not seeing the overflow. Oh poo poo, I was typing in 2^64. Well never mind. I guess %d was the real problem.
|
# ? Sep 17, 2013 12:16 |
|
Yeah, it's clearly badly written code and the '2^64+1' part makes no sense, but at least that isn't exploitable bad code.
|
# ? Sep 17, 2013 13:27 |
|
What's he supposed to do in ANSI C tho
|
# ? Sep 17, 2013 17:29 |
|
I feel really dumb for not being able to figure this out, but how the hell do I elegantly compare two periodic numbers (specifically angles) for equality? The inputs can be any real numbers, I don't get to restrict to positive numbers or numbers between 0 and 2 pi or anything.
|
# ? Sep 17, 2013 17:52 |
|
mod them down to be between 0 and 2pi, subtract, divide by one of the inputs to get the relative error and compare the result to an epsilon. Check both against 0.0 to make sure you won't get NaN at some point. Real math won't ever be 'elegant'.
|
# ? Sep 17, 2013 17:59 |
|
You could modulo them to both be between 0 and 2 pi before comparing them, I think. Look up how your language's modulo operation works on negative numbers or w/e but you should be able to get angles normalized like that.
|
# ? Sep 17, 2013 17:59 |
|
Vanadium posted:You could modulo them to both be between 0 and 2 pi before comparing them, I think. Look up how your language's modulo operation works on negative numbers or w/e but you should be able to get angles normalized like that. Rather, lots of languages have a remainder operator and no modulo operator. ((X % m) + m) % m should get you where you wanna go. e: actually maybe I'm dumb and drawing a distinction that doesn't exist. Either way, if you want the positive modulo number that your lecturer would make you turn in on an exam, then you can use the above formula. Seashell Salesman fucked around with this message at 18:21 on Sep 17, 2013 |
# ? Sep 17, 2013 18:06 |
|
e: whoops nm
|
# ? Sep 17, 2013 18:21 |
|
Is anyone else going through the exercises on hackermeter.com? A friend posted the link a while back on facebook (he doesn't appear to be the one who wrote/manages it) and I've found it kind of interesting. My job hasn't had a lot of challenging problems in the last few years and it's refreshing to work on some. I'm currently working through the dynamic programming problems. It's probably useless in its intended purpose, since it isn't widely known, but it is kind of fun.
|
# ? Sep 17, 2013 18:22 |
|
JawnV6 posted:mod them down to be between 0 and 2pi, subtract, divide by one of the inputs to get the relative error and compare the result to an epsilon. Check both against 0.0 to make sure you won't get NaN at some point. Won't this not work when comparing 2pi - (small number) and 2pi + (small number)? Their difference is almost 2pi.
|
# ? Sep 17, 2013 18:29 |
|
Test again in the range of pi to 3pi and OR the results?
|
# ? Sep 17, 2013 19:10 |
|
Sinestro posted:You wouldn't want to grab all the data and then futz with it. Use a search algorithm to find the item you are starting from, then recurse over the children of the item. Well say I had all the data in a .txt file using the format "Root(Child1(GC1,GC2)Child2(...". Would straight-up searching through that be efficient enough or is there some other way I should hold the data? NovemberMike posted:Well, one of the big things you want is the path to the origin. With something like a list of species it might make sense to force the user to specify the path to the origin, but if you can't do that then it might make sense to make a hash set that you can use to find a reference to the origin node. Once you have the origin node you just recurse through the children. I would assume storage is limited and that time isn't a huge deal. Assuming I hold the data using the method above, having the user specify the path seems reasonable... Navigating from the top you would essentially just have to count parentheses right? I'm not sure how efficient that is but it would work. There's probably a better way to do it that I'm not seeing though. coffeetable posted:If there aren't any changes after initialization, and if the dataset is small enough that you can keep a couple of pointers for each node in memory, my (completely uneducated) approach would be There won't be any changes whatsoever, but I'm thinking storage constraints would make having the entire tree in memory too difficult unfortunately. double riveting posted:When you say this, are you thinking of having the node whose subtree you want as a string (i.e. user typed it or something), or are you getting the node directly, say by the user clicking through the tree? Both ways of finding the node seems like something I might want to implement. Given space constraints clicking through the tree from the top and building the tree layer by layer seems like a good idea, but again I'm not sure how exactly fast finding adjacent children will be using the storage method I described above even though time isn't a huge constraint... There's probably a better way to hold the data but I'm not aware of one... Exact string matching seems like something I would be happy with. Since there won't ever be any changes would a perfect hash function be something I could do? I mean I could simply list every string in the code to be mapped but there's better ways to do that right?
|
# ? Sep 17, 2013 22:42 |
|
PilotEvan posted:There won't be any changes whatsoever, but I'm thinking storage constraints would make having the entire tree in memory too difficult unfortunately. Let's take the species name example. If we limit each token in the name to say 64 bytes, and we assume you use 8 byte addresses to other nodes, you can fit ~10 million nodes in a gigabyte. You're seriously playing with orders of magnitude more data than that? Or are the tokens/keys much bigger than 64 bytes? How many levels from the root would you have to descend before each individual subtree could fit in memory? How many levels could you descend before the tree above that level wouldn't fit in memory? e: poo poo gets very complicated when you try to optimize data structures that involve calls to the hard drive, because they're insanely expensive and random access assumptions go out the window. We really need to know more about the structure of the data you're working with (like the distribution of the node degrees), the kinds of queries, the frequency of each kind of query, and the hardware that you're dealing with. coffeetable fucked around with this message at 23:39 on Sep 17, 2013 |
# ? Sep 17, 2013 23:34 |
|
coffeetable posted:
The space constraint I'm worried about comes from: 1. I'm aiming to develop for tablets. 2. The search terms won't be long, but the data might be larger 64 bytes, since it will include several web addresses. There will be trimming here and there depending on what I decide to include. 3. I'm planning on using Unity3D, and there will be 3D rendering going on meanwhile (though I don't think it will be intense; this is the constraint I'm mostly unsure about since I haven't done 3D design before, not to mention on a tablet). The species example is quite exaggerated but there's 10's of thousands of leaves (the intermediate node count is unknown to me at the moment). Queries will be slow thankfully - minutes between - but I don't want the user to wait more than a few seconds for results. Again, unfortunately the 3D stuff meanwhile makes this a huge gray area and I'm not sure how it will turn out exactly.
|
# ? Sep 18, 2013 01:09 |
Okay so say you have about 50k nodes, on a 32 bit system, requiring 3 pointers per node, but actually round it up to 4. (One in the parent pointing to it, one in the node pointing to the parent, and some auxiliary.) That's 800 KB for the base tree structure. You could then spend two extra pointers per node to point into the raw source data where the text for each node can be found. That adds the storage required for the raw data. All in all, a few megabytes at most for the live structure. That should be okay also for mobile device use. If you have the option to store the data however and it won't be user generated, you can instead develop a file format that let's you do direct lookup in it. Augment the data in whatever convenient way to make lookups faster.
|
|
# ? Sep 18, 2013 01:23 |
|
I'd just loving slurp the file on start and forget about the disk. As for the format, how about this:code:
Might be easiest to hand-roll a parser for.
|
# ? Sep 18, 2013 01:59 |
|
PilotEvan posted:The space constraint I'm worried about comes from: 1. I'm aiming to develop for tablets. Normalize the data, throw it into a SQLite database and write relational queries on the data in a non rendering thread ?
|
# ? Sep 18, 2013 02:48 |
|
Hughlander posted:Normalize the data, throw it into a SQLite database and write relational queries on the data in a non rendering thread ? This is a much better idea than inventing your own plain-text file format and parser. It also separates your business logic from the data backend in case you ever want to switch the two out independently of one another.
|
# ? Sep 18, 2013 03:00 |
|
PostgreSQL supports recursive common table expressions which can be used to perform hierarchical searches. Oracle also supports it with CONNECT BY syntax. edit: Nevermind, thought this would be running server-side.
|
# ? Sep 18, 2013 03:19 |
|
PilotEvan posted:The species example is quite exaggerated but there's 10's of thousands of leaves (the intermediate node count is unknown to me at the moment). Queries will be slow thankfully - minutes between - but I don't want the user to wait more than a few seconds for results. Again, unfortunately the 3D stuff meanwhile makes this a huge gray area and I'm not sure how it will turn out exactly. <100k nodes with the output being fed straight to the user? Christ, you can structure it however you want and it'll return instantly from the user's perspective. Hand-crafting a data structure for it would be serious premature optimization. Like Hughlander said, just use a SQLite database. Also, the megabyte or so it'd take to hold it in memory is tiny even by phone standards, so go nuts. And don't worry about graphics performance - all that stuff will be carted off to the GPU core of the chip. It won't even notice a handful of database queries.
|
# ? Sep 18, 2013 08:51 |
|
Hughlander posted:Normalize the data, throw it into a SQLite database and write relational queries on the data in a non rendering thread ? I was prematurely guessing this was probably for some academic stuff where every extra dependency would just annoy others and RAM is free.
|
# ? Sep 18, 2013 11:02 |
|
Everyone posted:SQLite That sounds pretty much perfect. Thanks for the help guys! Yall are awesome.
|
# ? Sep 18, 2013 20:54 |
|
Not sure if this is a "programming" question or a math question, but since this is algorithmic complexity related, I figured this place is closer. Everybody knows the flash game, cookie clicker right? What is the algorithmic complexity of optimizing the total cash earned at a given time x in that game? If only I could remember what I learned about P and NP problems back in college.
|
# ? Sep 19, 2013 06:45 |
|
I don't know the game, and I'm probably not the only one. Please explain it for us. In particular, how do you measure the size of a given instance of the puzzle?
|
# ? Sep 19, 2013 07:14 |
|
ultrafilter posted:I don't know the game, and I'm probably not the only one. Please explain it for us. It's not a puzzle. Also it's a free web game that you can play and generally was a massive massive hit for reasons nobody really can understand yet. I'm wondering if you could make a formula for potential money at any given time then differentiate to get the maximum?
|
# ? Sep 19, 2013 07:33 |
|
Cookie clicker is a basic optimization game, where the objective is to generate cookies. The most basic form of generating cookies is to by clicking on the giant cookie, where each click gives you one cookie. With cookies, you can buy contraptions that generate more cookies over time. Every object generates infinite cookies, but the pricing and cookie generation rate varies. Here's a link to the games thread The theoretical naive approach would be an algorithm that takes a set of options and constructs a tree where each node on the search graph is an action during a 1 second payout rate, and simply do the max cost graph search. You can see that the size of this graph would be dependent on the number of options P and the maximum time to search. Given a constant time T and a setup of arbitrary complexity P, I suppose that approach would yield a complexity of P^T, which is pretty awful. The reason I ask is because this reminds me of the classic knapsack problem, so my naive solution might be wrong. Is there a canonical form of this class of problem and what is its computational complexity?
|
# ? Sep 19, 2013 07:42 |
|
i suppose some contraptions require other contraptions to be built first? and possibly also consume other contraptions to build?
|
# ? Sep 19, 2013 12:05 |
double riveting posted:i suppose some contraptions require other contraptions to be built first? and possibly also consume other contraptions to build? There are certain upgrades that only become available once you have a certain amount of something else (e.g. a second "factories double production" upgrade is available when you have 50 factories), but the only resource used to purchase things are the cookies. I.e. there is a tech tree and one resource. Writing some sort of simulation taking "steps" (one step being one purchase) might work, simulating multiple branches at once, and using a heuristic to give up on a branch at some point.
|
|
# ? Sep 19, 2013 12:12 |
|
WarpedLichen posted:What is the algorithmic complexity of optimizing the total cash earned at a given time x in that game? If only I could remember what I learned about P and NP problems back in college. It takes constant time (O(1)). Numbers in Cookie Clicker are all 64 bits at most, so the number of cookies is bounded. All we need to do is enumerate every possible configuration of purchases (of which there are a constant number 'cause of the 64 bit thing), and then simulate each until it's accumulated 2^64 - 1 cookies (which takes a constant amount of time). We now have an answer for the maximal number of cookies that can be generated at each unit time, and we just print whichever value is asked for.
|
# ? Sep 19, 2013 13:01 |
|
coffeetable posted:It takes constant time (O(1)). Numbers in Cookie Clicker are all 64 bits at most, so the number of cookies is bounded. All we need to do is enumerate every possible configuration of purchases (of which there are a constant number 'cause of the 64 bit thing), and then simulate each until it's accumulated 2^64 - 1 cookies (which takes a constant amount of time). We now have an answer for the maximal number of cookies that can be generated at each unit time, and we just print whichever value is asked for. Sounds reasonable. Get to it!
|
# ? Sep 19, 2013 13:05 |
|
double riveting posted:Sounds reasonable. Get to it! You didn't ask for a reasonable answer Slightly less smart-arse: it sounds like a prime suspect for dynamic programming, because an optimal solution for time T should contain the optimal solutions for many earlier times too. I'm too busy think about it in-depth, but you should start by throwing away status effects and upgrades and everything and just considering cursors alone. How would you calculate the optimal number of cookies if your only choice at each timestep was whether or not to buy a cursor? e: What distinguishes this from Knapsack is that in knapsack items are arbitrarily sized and there are arbitrarily many types of them. Here the items are of fixed size and there are a fixed number of types. coffeetable fucked around with this message at 13:20 on Sep 19, 2013 |
# ? Sep 19, 2013 13:17 |
|
At every step you have to decide do I spend some cookies to increase my output or do i keep harvesting cookies to buy more output. Each output has a reward (increase in cps), and a cost (# of cookies req to buy, which is really a function of current cps rate). It's kinda markov in that your available cookies goes to zero once you buy output. Also the costs change each iteration (ie after a purchase), so what was optimal at one stage will change. Whats optimal at time t is independent of the answer at time t-1 in some sense. Not having thought about it too much, you would basically weigh up your options, eg. i can buy a cursor for 15 now which will increase my cps by y, or wait x seconds based on my current cps and buy a grandma for whatever, which will increase my cps by some other value. So if you're just getting started you're better off buying cursors, but at some point you will be better off waiting a bit longer to buy a grandma. You would just evaluate all your options after each purchase of output.
|
# ? Sep 19, 2013 13:51 |
|
You'd have to weigh up what's faster, waiting to buy a grandma, or buying n cursors then buying a grandma. If you can calculate the time of those (fairly easily) then the lowest time should in theory be the most optimal approach. You could repeat that down the chain I think, and rule out the lower ones when you've gotten enough cp/s to afford the next step up in < 1s or something.
|
# ? Sep 19, 2013 14:22 |
|
Let us know when your Cookie Clicker TAS comes out, WarpedLichen!
|
# ? Sep 19, 2013 14:28 |
|
You can look at it starting from 0 cookies each time. Your current cps is x. The present value of every possible purchase is the increase in cps it brings and some function of the time in seconds it would take to get enough cookies to buy it. So if you're starting the game fresh, a factory would have a huge increase in cps but take friggin ages = high cost, while a cursor has a relatively low cps increase but a low time cost, so would be optimal. At some point you get enough cursors so the time taken to get a grandma and the cps increase that brings outweighs the value of a cursor. The nominal amount of cookies isnt the real cost, its the time taken to generate the cookies to make the next purchase. You could also evaluate all the bonus items you could buy with a similar pov.
|
# ? Sep 19, 2013 14:32 |
|
|
# ? May 31, 2024 14:46 |
|
The optimal decision is to get back to work on whatever you are procrastinating about.
|
# ? Sep 19, 2013 14:32 |