Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
hey mom its 420
May 12, 2007

Yeah, go either with Python, Perl or Ruby. Check out some code examples to see which syntax you like the best. Their text parsing capabilities I'd say are on par, although Ruby and Perl have regexes build right into the syntax, but Python is good with regexes too.
Also I think Python would have a minimum learning curve. Also Ruby's main thing is a lot of stuff you probably won't really need for text parsing, like closures and metaprogramming. And Perl is known for not being able to read your own code after 5 minutes of writing it.

Adbot
ADBOT LOVES YOU

hey mom its 420
May 12, 2007

I am currently running GHCi on Windows, the version is 6.8.2 and I seem to be missing some functions that I find around in various Haskell references. For example when I try using nub, sort, or groupBy it says that they're not in scope. It looks like about 25% of functions are missing. Am I using an older version or do I have to import some modules for those functions to work or what?

hey mom its 420
May 12, 2007

I'm doing something for my physics class and it involves a particle in a box problem in 3 dimensions. I have a function that generates further quantum states based on a starting state and a depth.
code:
def quantum_states(start, depth):
    if depth > 0:
        yield start
        for i, x in enumerate(start):
            start_list = list(start)
            start_list[i] += 1
            for state in quantum_states(tuple(start_list), depth-1):
                yield state
So an example of its usage would be
code:
>>> list(quantum_states((1,1,1),2))
[(1, 1, 1), (2, 1, 1), (1, 2, 1), (1, 1, 2)]
>>> list(quantum_states((1,1,1),3))
[(1, 1, 1), (2, 1, 1), (3, 1, 1), (2, 2, 1), (2, 1, 2), 
(1, 2, 1), (2, 2, 1), (1, 3, 1), (1, 2, 2), (1, 1, 2), (2, 1, 2), (1, 2, 2), (1, 1, 3)]
It works as it is, but you have to supply it a depth it's going to traverse. It works much like depth first traversal in a tree.
What I'd like it to do is to generate an unlimited amount of states, not just the amount that is inferred from depth. However, if I change it to this
code:
def quantum_states(start):
    yield start
    for i, x in enumerate(start):
        start_list = list(start)
        start_list[i] += 1
        for state in quantum_states(tuple(start_list)):
            yield state
and then try to get the first 10 states, I get this
code:
>>> list(x for x,_ in zip(quantum_states((1,1,1)), xrange(10)))
[(1, 1, 1), (2, 1, 1), (3, 1, 1), (4, 1, 1), (5, 1, 1), (6, 1, 1), 
(7, 1, 1), (8, 1, 1), (9, 1, 1), (10, 1, 1)]
Which makes sense, because without an edge condition, it just keeps going along the left branch forever. And that doesn't produce very useful results.

So how could I modify this so that it gives me an infinite sequence but doesn't just keep going along the left branch?

hey mom its 420
May 12, 2007

Thanks man, both those pieces of code work great!
I'm gonna write this thing in Haskell too, just for the heck of it.

hey mom its 420
May 12, 2007

Unless the requirements of your problem explicitly state several conditions on which different things occur, there's almost always a more elegant way than using big if else trees or big switch statements.

hey mom its 420
May 12, 2007

I doubt you'd find any noticeable performance difference either way.

hey mom its 420
May 12, 2007

Yeah, usually it's better if you tell us what you're actually trying to achieve overall (for instance: I want to find all the Sundays that fall on even dates or something) instead of asking how to solve your problem in a very specific way which just might be wrong. JavaScript has pretty decent date handling functions for messing around with the calendar and such.

hey mom its 420
May 12, 2007

Maybe you can give it an onEnterFrame event where it checks if the button is in its roll-over state but the mouse is not on it. If that's the case, then fire the roll-off event.

hey mom its 420
May 12, 2007

You don't actually have to fill out forms. All it has to do is make a HTTP POST request that contains all the same data that a request made from the form would. Use the Firefox extension live http headers to see what the POST request looks like when you fill out and send that form and to which URL it is made. Then just make a POST request to the same URL with the same POST data from your Java code.
As to how you make a POST request from Java, I wouldn't know exactly.

hey mom its 420
May 12, 2007

Triple Tech posted:

I was told this was a stupid idea because it didn't relate to state at all. Like, the functions don't necessarily change the concept, or anything else for that manner (again, functional, stateless).

I'm trying to see if there's like an OOP rule of thumb that will tell me who was right here.

e: spelling

Eh, I wouldn't worry about that. Some people say instance methods should only be the ones that change state, but I think it's OK for them to be instance methods if they just depend on the state or the identity of the object.

hey mom its 420
May 12, 2007

Debugging Haskell kind of really goes against the grain of strongly typed functional programming. Haskell programs are pure and have no concept of state. They're an equation instead of a series of steps. So if your function typechecks but is still giving you wrong results, try thinking and reasoning about your function a bit more. If that doesn't work, go into GHCI and try to transform some input into your desired output and then see what you did along the way.

hey mom its 420
May 12, 2007

Here's a little question for all you y'all. I'm doing a data structures assignment for school and I have to implement a system of two main roads that have roads crossing them. I also have to implement the shortest path search from town 1 to town 2. Here's an example:

I've implemented this in Python. Here's how the path search algorithm works:
I keep two shortest paths, left_path and right_path. They both start out as empty lists. I also keep left_price and right_price, which both start out as 0. And then, I see what the shortest path from the start to the next left node is. In the start, that could either be to take the left road directly (incur a cost of 10) or to take to right road and then cross over, which would incure a cost of 80. Obviously it's better to take the direct road. So new_left_price is now left_price + 10 and new_left_path is ['a']. Then I see what the shortest path to the next right node would be. Obviously, it's better to take the 10 road and then cross over for 30 instead of going over the road that costs 50. So new_right_price now becomes left_price + 10 + 30 and new_right_path becomes left_path + 'a' + 'r'. And this goes on for all the sections until I reach the end and then I just see which of the two possible paths is cheaper.

It works, but I think this is O(n2). If I was just reporting the shortest path cost, it would be O(n), but the thing is that I have to make copies of the left_path and right_path lists because at any moment the right path can become the left path plus something.

This problem could be easily avoided if I were using a purely functional language because with laziness and referential transparency you basically get persistence for free, but my assistant prof told be I can't do it in Haskell because he doesn't know wtf about Haskell.

So, does anyone have any hints or suggestions for making this something better than O(n2)?

hey mom its 420 fucked around with this message at 12:31 on Nov 24, 2008

hey mom its 420
May 12, 2007

rjmccall posted:

You can create immutable data structures in non-functional languages. The language might not guarantee that it's immutable, but it's easy enough to just. not. mutate it.
Well that's what I'm doing here, instead of mutating the list with the path, I'm copying it. It's just that when you have a lazy language, you usually get persistence for free, so you don't incur time penalties for making a copy.

I implemented the search with Dijkstra by using the algorithm description from Wikipedia. It works, only the constant factor is much bigger than the one in my own algorithm and it's still O(n^2) because I didn't use a priority queue to store the unoptimized vertices but a normal list. Because I'm a bit confused when it comes to using a priority queue to store the unoptimized vertices. Here's the pseudocode from Wikipedia:
code:
  function Dijkstra(Graph, source):
      for each vertex v in Graph:           // Initializations
          dist[v] := infinity               // Unknown distance function from source to v
          previous[v] := undefined          // Previous node in optimal path from source
      dist[source] := 0                     // Distance from source to source
      Q := the set of all nodes in Graph    // All nodes in the graph are unoptimized - thus are in Q
      while Q is not empty:                 // The main loop
          u := node in Q with smallest dist[]
          remove u from Q
          for each neighbor v of u:         // where v has not yet been removed from Q.
              alt := dist[u] + dist_between(u, v)       // be careful in 1st step - dist[u] is infinity yet
              if alt < dist[v]              // Relax (u,v)
                  dist[v] := alt
                  previous[v] := u
      return previous[]
If I use a binary heap for Q, I put them in by their distance from the source. That's all cool, but when relaxing (u,v), the distance for a vertice is changed and then suddenly the binary heap doesn't satisfy the heap property. If I rebuild the heap after relaxing (u,v), well that increases the complexity so much that it doesn't make sense to use a heap at all.

I did that with my own binary heap implementation, which takes a projection function, so I do something like heap.push(Q, vertex, key=lambda x:dist[x]) and u = heap.pop(Q, key=lambda x:dist[x])

hey mom its 420 fucked around with this message at 16:03 on Nov 25, 2008

hey mom its 420
May 12, 2007

Uh, wait, but I'm supposed to modify the distence of v, not u. u is the priority element and modifying it is log n, but I'm supposed to modify v, which is the neighbour of the priority element and I don't know where that element is in the priority queue.

hey mom its 420
May 12, 2007

Haha, nevermind, I actually managed to make a solution in O(n) time. Can someone confirm that this is O(n)?
code:
    def path(self):
        left_price, right_price = 0, 0
        previous = {}
        current = current_left, current_right = self.start
        while (current_left, current_right) != self.end: # O(n)
            _, a_len, next_left = current_left.next
            _, b_len, next_right = current_right.next
            _, c_len, _ = next_right.sideways
            if a_len + left_price <= b_len + c_len + right_price:
                new_left_price = left_price + a_len
                previous[next_left] = current_left, current_left.next
            else:
                new_left_price = right_price + b_len + c_len
                previous[next_left] = next_right, next_right.sideways
                previous[next_right] = current_right, current_right.next
            if b_len + right_price <= a_len + c_len + left_price:
                new_right_price = right_price + b_len
                previous[next_right] = current_right, current_right.next
            else:
                new_right_price = left_price + a_len + c_len
                previous[next_right] = next_left, next_left.sideways
                previous[next_left] = current_left, current_left.next
            right_price, left_price = new_right_price, new_left_price
            current_left, current_right = next_left, next_right
        end_left, end_right = self.end
        if left_price <= right_price:
            return extract_path(end_left, previous)
        else:
            return extract_path(end_right, previous)
Basically instead of storing the paths in a list and copying them, I store them in a dict where the key is a vertex and the value of that key corresponds to a tuple of (vertex, edge) which points to the key, so with that dict I can run extract_path on it to follow the best path. So it's sort of like a Dijkstra, only modified for this and I'm pretty sure it's O(n)

EDIT: I timed this, and it's a massive speed improvement over the previous version and the time looks like it's linear. w00t!

hey mom its 420 fucked around with this message at 18:56 on Nov 25, 2008

hey mom its 420
May 12, 2007

Hey, I wrote an AVL tree implementation in C (btw comments welcome, I think it's pretty dashing) and I'm having some problems. When I manually insert these words by doing insert(root, "word") and then manually remove them by remove(&root, "word"), everything seems to work fine. For example, the root is relevailles, the height ends up at 4 and then when I remove them in order, the height ends up at 0 because all the elements are removed and root is a null pointer.

However, when I try to insert from a file like in the source and then iterate over that file to remove words, I get strange results, because after removing all the elements, I get that the final height is 1, when it should be 0. I suck at doing I/O in C and I recon there's a problem in the way I read the lines from the file. Anyone know what I'm doing wrong?

hey mom its 420 fucked around with this message at 18:32 on Dec 6, 2008

hey mom its 420
May 12, 2007

Hmm, tried checking for feof after the call to fgets and breaking out then, but no change.

EDIT: Ah, got it, when doing malloc, I wasn't reserving space for the null termination byte. When I do malloc(strlen(buf) + 1)), it seems to work fine. I found another problem though, ugh. When I load up a file with 1,5M words that are in order and load it up, I get a reported tree height of 21, which sounds good because log2(1,5M) is just under 21. However, if I shuffle the lines in that file and try to insert them, I get a reported tree height of 25, which seems way too much. Can anyone see any problems on a quick glance with the code?

EDIT: UGHH!!! I'm so stupid, nevermind, it didn't come to my stupid head that a 1,5M node AVL tree with a height of 25 can still be balanced.

hey mom its 420 fucked around with this message at 20:26 on Dec 6, 2008

hey mom its 420
May 12, 2007

I'm going to also use a trie for storing words of the alphabet and searching through them, etc. What's the best way to store the edges in a node? If I use an array that I increase and decrease in size, I save space but I have to traverse the edges to find the next node. If I just use an array of 256 fields (one for each character in ASCII), I can just index them to get the next node but 256 fields times 8 for each (char, node pointer) pair is 2Kb. I could also use a hash table but I don't know if that's worth it.

hey mom its 420
May 12, 2007

Cool, I just implemented the trie by using a field which increases and decreases in size but I have to traverse when seeing which edge to follow. It performs alright for inserting, looking up and then deleting 1.5M words, but not as well as the AVL tree and hash table. I'll try doing some branching to see how fast it is if I switch to a hash table for storing edges or a field that I can index in O(1) time.

I've done an analysis of the input files and I've found that they contain 78 different ASCII characters, so that shouldn't be a problem.

Later I'll have to add the ability to find the n nearest words for any word by Levenshtein distance and I have a feeling tries will give good performance there. AVL trees will probably be alright for that too but I have no idea if it's even possible to implement that efficiently if I'm storing the words in a hash table.

hey mom its 420
May 12, 2007

narbsy posted:

That's pretty cool. There's another self-balancing tree that's faster than AVL called the DSW algorithm. It's different in the sense that it doesn't balance on each insert, delete but rather waits and then turns the tree into a "vine" (a linked list), and then re-forms the tree by unrolling the "vine". It is superb.

I always meant to implement a trie but never did :(
That's pretty cool stuff, I'll check out the paper, maybe even implement it for a bonus point in my class.
Just for fun I also went and implemented tries in Haskell, took me about 10 minutes, Haskell owns.
code:
data Trie a = Trie { isEnd :: Bool, getChildren :: [(a, Trie a)] }

trieInsert :: (Eq a) => [a] -> Trie a -> Trie a
trieInsert [] (Trie _ children) = Trie True children
trieInsert (x:xs) (Trie empty children) = 
    let (hit, rest) = partition ((==x) . fst) children in
    case hit of [(u, t)] -> Trie empty ((u, trieInsert xs t):rest)
                [] -> Trie empty ((x, trieInsert xs (Trie False [])):rest)

trieLookup :: (Eq a) =>  [a] -> Trie a -> Bool
trieLookup [] (Trie end _) = end
trieLookup (x:xs) (Trie end children) = 
    case lookup x children of Nothing -> False
                              Just nextTrie -> trieLookup xs nextTrie

trieDelete :: (Eq a) =>  [a] -> Trie a -> Trie a
trieDelete [] (Trie _ children) = Trie False children
trieDelete (x:xs) (Trie end children) = 
    let (hit, rest) = partition ((==x) . fst) children in
    case hit of [] -> Trie end children
                [(u, t)] -> let forwardDeleted = trieDelete xs t
                            in  if (null $ getChildren forwardDeleted) && (not $ isEnd forwardDeleted)
                                then Trie end rest
                                else Trie end ((u, trieDelete xs t):rest)

hey mom its 420
May 12, 2007

Hey, it's me again. I have implemented an AVL tree, a hash table and a trie for storing words. The source is here. I've also implemented a function that calculates the Levenshtein distance (similarity) of two words.

Now the last thing I have to implement here is that given a word W and a number k, I have to find the k most similar words to W. I have to implement this for each of the three data structures.

The naive and most obvious way of doing that is just to iterate over every word in the data structure and determine how similar it is to W. Then, sort those based on their similarity to W and return the top k results. If we assume the time complexity for determining the similarity of two words to be O(1), then the time complexity for finding the k most similar words is O(n) + M, where M is the time complexity for the sort, so in the end that's just M.

Do you think there's any way I can leverage the data structures to get better time complexity for that operation? For the hash table, well, obviously not, but for the AVL tree and trie I suspect there might be a way but I can't think of anything.

hey mom its 420
May 12, 2007

I need to see how long a piece of C code needs to run. What's the most reliable way to time it? I found this method:
code:
clock_t t1, t2;
float ratio;
ratio = 1./CLOCKS_PER_SEC;
// Record the start time:
t1 = clock();
/*************************
Run a segment of Code
*************************/
// Record the end time:
t2 = clock();
// Print out the difference, converting it to seconds:
printf("Time = %f", ratio*(long)t1 + ratio*(long)t2 );
and it seems to work, only I don't get why. There's a + there at the end. Shouldn't it be a -, because a time delta is t2 - t1?

EDIT: I can't use the unix time program because I have to time my program at 4 different sections (three of which take a really short time and one takes quite a long time) and I also have to disregard the time it takes to do some preliminary I/O (loading up a file into memory)

hey mom its 420 fucked around with this message at 23:26 on Jan 7, 2009

hey mom its 420
May 12, 2007

TSDK posted:

The only reason I can think for it apprearing to work, is that you're timing from very near the start of the program, when t1 is near to zero.

But yeah you're right, the time delta should be t2-t1.

Also the casting and pseudo-optimisation of turning one divide into two multiplies is a bit dumb. For quick and dirty, I generally use something like:
code:
clock_t start = clock();
// STUFF
clock_t stop = clock();
printf( "Time = %f ms\n", (float)( stop - start ) / ( CLOCKS_PER_SEC / 1000 ) );

Thanks, that works great! I don't even care for the exact timing, I'm just trying to see if the running time increases according to the algorithm time complexity.

hey mom its 420
May 12, 2007

Short answer: No
Long answer: Nooooooooooooo!

hey mom its 420
May 12, 2007

DB abstraction layer? If you know what you're going to put in it and how it's going to work, it doesn't matter what you call it.

EDIT: You beat me with your edit. AAA!

hey mom its 420
May 12, 2007

tef posted:

regardless of what question you ask, if it mentions 'text editing' and 'linux' you will get a few responses saying "vim" or "emacs"

these are probably not what you want.
Wow, what a substantiated claim. Those are probably exactly what you want.

hey mom its 420
May 12, 2007

Anyone know where to start if I want to create my own VST plugins? I know it will probably take me a hell of a long time before I make something useful, but I'd still like to get into it.

hey mom its 420
May 12, 2007

I would start it at 3 o'clock but then instead of going counter-clockwise, I'd go clockwise, because the y axis is flipped. If you start at 0 (that's 3 o'clock) and then turn by 45° and go forward, you should be going into a positive y and a positive x.

hey mom its 420
May 12, 2007

Okay guys but can you write fizz buzz? :smug:

hey mom its 420
May 12, 2007

It's just that people usually find it easier to help you with your problems if you present them in a wider context instead of just letting them see them through a small hole by witholding as much information about them as possible.

hey mom its 420
May 12, 2007

I'd say go with Haskell, it's purely functional and really fun to boot. Here's a list of tutorials, also drop by #haskell on freenode and ask questions if you don't get stuff, it's a really friendly channel.

hey mom its 420
May 12, 2007

You can probably hold a big list of the feeds that your users are subscribed to and then poll those at regular intervals and store them in a cache of some sorts. Only pull it from your cache when the user requests it i.e. lazily. Actually, you don't need to poll feeds that have more than a few subscribers, because requests for popular feeds are done often enough, so you can just cache them when users request them.

Also, take advantage of HTTP etags and Last-Modified + If-Modified-Since headers. No need to download what ain't new.

hey mom its 420 fucked around with this message at 01:14 on Jul 8, 2009

hey mom its 420
May 12, 2007

I'm just learning prolog, so say you have:
code:
beat(ali, frazier).
beat(ali, foreman).
beat(ali, liston).
beat(frazier, ali).
beat(foreman, frazier).
how do get prolog to tell you, say, how many people Ali has beaten?

hey mom its 420
May 12, 2007

Sup mate!

If you have a binary tree and a list like [Left,Right,Left,Left,Left] that tells you where you are in the tree i.e. the current node, and you want to modify it, you have to traverse the whole list before getting to your desired node and modifying it, so that's O(n) (or O(log n) if you consider n to be the number of elements in the tree and the tree is balanced). If you want to move around the tree a lot, you have to start from the root for each movement and traverse that list of Left and Right, so you'll be doing a lot of O(n) operations.

If you have a zipper, moving around is O(1) and so is modifying the element under focus. Instead of explaining zippers, I'll point to this excellent article about them: http://www.haskell.org/haskellwiki/Zipper also this new blog post about them is cool too: http://blog.ezyang.com/2010/04/you-could-have-invented-zippers/

hey mom its 420
May 12, 2007

I have a question about Visual C++ 2008 Express. My project files were generated by cmake. When I go to Build -> Build Solution everything builds nicely. But since I do most of the editing in gvim anyway, I'd rather build my project from the command line instead of having Visual C++ open just to press F7. Is there an easy way to see how Visual C++ is doing the building so that I can do the same in the command line?

hey mom its 420
May 12, 2007

Isilkor posted:

You can run devenv <solution> /Build <configuration> to achieve that. (Or use msbuild, which is the preferred way of building now, but I'm not sure if that was available in 2008.)
Where do I find this devenv?

hey mom its 420
May 12, 2007

Mine is Express, I don't have devenv in that folder :qq:

hey mom its 420
May 12, 2007

I'm thinking of doing a side project that gives shop owners the ability to set up a queue and print a QR code or something, and then their customers scan it (or enter a short link manually) and they get a number, like at the butcher shop or something. that way you can have people wait in your shop without crowding together and spreading rona all over the place.

i've seen some services that offer that solution, but they all look super corporate and costly to set up. my idea is to just have it be a simple website that's free for the shop keepers, requires minimal setup and works on all phones, so you don't need an app to get in the queue and you don't need to sign up.

how do I go about representing the queue and what kind of db to use for it? i was thinking of using firebase realtime db, but it looks like they charge up the wazoo once you get good usage. i could also just use normal postgresql and implement the pubsub manually with websockets. ideally the people waiting in line always want to have realtime info about where they are in the queue and how fast it's going.

hey mom its 420
May 12, 2007

hmm, so if I'm understanding correctly, you'd store the db completely on the client of the store guy in an IndexedDB? how reliable is that?

i was thinking you could also set up so the queue owner who's running his part of the website can also queue people manually and give them a printout, in case they don't know how/want to fiddle with phones and digital queues.

Adbot
ADBOT LOVES YOU

hey mom its 420
May 12, 2007

Anyone have any experience with EventStoreDB and event sourcing in general? I just recently learned about the concept and am intrigued.

I almost implemented this pattern a couple of times when my customers required tracking of events in a business sense and I added additional tables for that with JSON fields but it never occured to me to derive the complete state from that witha left fold

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply