|
homercles posted:What happens if you declare your methods static? You're polluting the global namespace for no reason. I'm considering just saying gently caress it and using Perl datatypes. Is there a significant speed penalty in doing so? uG fucked around with this message at 06:32 on Nov 3, 2012 |
# ? Nov 3, 2012 06:18 |
|
|
# ? May 22, 2024 15:31 |
|
uG posted:I'm having some XS troubles. First, let me present the code: I've never worked with XS so I'm just shooting in the dark here, but it looks like you're going past the end of your arrays in the two "setup scoring matrix" loops.
|
# ? Nov 3, 2012 15:04 |
|
It is not the loops. This can be demonstrated by the if statement inside, which is where its seg faulting on its first iteration. Removal of the if statement (and just letting the code block inside it run every time) results in the code working as intended. So why don't I just take them out? I could, but I 'want' the (struct dictionary) values to be unique. What happens when we take out those if statements is that (struct dictionary) gets stuffed with duplicate values (keys), but later when I change a specific value it always iterates to the first occurrence and sets/gets 'its' value. But i'm left with a bunch of junk we never use (which seems sloppy), so i'm not going to just leave it at that. Namespace was a pretty good guess, since a conflicting namespace could potentially only screw it up when compiling with the Perl headers. What I can say is its directly related to the linked list (item* head) in the if statement I mentioned above. Removal of the Perl headers, XS prototypes on the bottom, and cxs_edistance function removed result in code that when compiled, returns the expected value (so the C guys I know think i'm crazy). For what its worth, here is the Pure Perl version of the above: https://github.com/ugexe/Text--Levenshtein--Damerau/blob/master/lib/Text/Levenshtein/Damerau/PP.pm Its not complex, its just an edit distance between strings. The difference we have between this and the XS above is we're working with ints instead of chars (to handle different character widths), as demonstrated by the XS wrapper in the .pm: code:
uG fucked around with this message at 18:13 on Nov 3, 2012 |
# ? Nov 3, 2012 18:11 |
|
uG posted:What I can say is its directly related to the linked list (item* head) in the if statement I mentioned above. Removal of the Perl headers, XS prototypes on the bottom, and cxs_edistance function removed result in code that when compiled, returns the expected value (so the C guys I know think i'm crazy). code:
|
# ? Nov 3, 2012 20:26 |
|
homercles posted:
There is no call to head->next->next: First call to hash (line 23): code:
|
# ? Nov 3, 2012 20:56 |
|
tonski posted:There is no call to head->next->next: 40: head is malloc'd. contents of head contains garbage has it has not been memset, so: head = { next = <garbage>, value = <garbage>, count = <garbage> } 58: hash(head, src[i]) is called. head's contents still have not been initialised 23: item* iterator = head; 24: while(iterator->next){ // the truthfulness of this statement is undefined. it might be true, might be false. it depends on the memory allocated by malloc. we will assume it's true for this example as that will cause a segfault 25: if(iterator->value == index){ // undefined. assume false for this example 28: iterator = iterator->next // that is, iterator = head->next. head->next contains garbage. iterator is now filled with nonaddressable garbage. 23: while(iterator->next){ // this may segfault, we're testing head->next->next which may fail as head->next was never initialised with a value, attempting to deref it is undefined behavior homercles fucked around with this message at 21:32 on Nov 3, 2012 |
# ? Nov 3, 2012 21:19 |
|
Alas, that is not the problem either. FWIW, you can compile this and it will spit out '1': http://pastebin.com/M86yLumM Same code with the Perl headers slapped on, export/call main() (no arguments, the values are hard coded) from the XS.pm wrapper (instead of xs_edistance), and we segfault in the same spot we've been discussing (which, again, works perfectly fine outside the Perlish enviroment).
|
# ? Nov 4, 2012 00:37 |
|
That's still assuming the Perl version isn't doing something differently with malloc, which I'm not sure you can. (From my brief googling of "xs perl malloc" which mostly just hurt my head, especially one post recommending checking everything including the return value of malloc not being NULL...)
|
# ? Nov 4, 2012 02:17 |
|
uG posted:Alas, that is not the problem either. FWIW, you can compile this and it will spit out '1': http://pastebin.com/M86yLumM The malloc stuff is a problem. It's not the problem but it's a problem. I ran your code on my machine and it prints out 3 not 1, because you've got array corruption too. On my machine, writing to scores[ax+1][ay+1] was changing the value of ay. Here's my version that works and prints 1: http://pastebin.com/uN0dp9DT I changed how push and hash work, removed item *curr,*iterator from scores. Changed the scores array to int scores[ax+2][ay+2], because you're reading and writing to it past its bounds. Declaring an array int x[1] and then reading/writing to x[1] is broken, so the array has to be 1 larger. Same with declaring int scores[ax+1][ay+1] and then reading/writing to scores[ax+1][ay+1]
|
# ? Nov 4, 2012 02:50 |
|
You, sir, have saved my week long downward spiral into madness. I can now happily stay the hell away from C for a little while
|
# ? Nov 4, 2012 03:24 |
|
I am trying to save user input as a variable in a Perl script. I already have this part down. The part I need help with is when the input has characters that require escaping. I have a feeling that substitution would help here, but I don't know how to implement it against the variable. How do I handle this? This is how I currently have it acquiring the data: print "Question: "; my $variable = <>; chomp( $variable ); I'd prefer to not call in modules and just use pure Perl for this if possible. Please help out this Perl n00b!
|
# ? Dec 5, 2012 04:37 |
|
Crush posted:I am trying to save user input as a variable in a Perl script. I already have this part down. The part I need help with is when the input has characters that require escaping. How you escape data has absolutely nothing to do with how you acquire a variable, and everything to do with how you intend to use it. Why do you think your input needs escaping? What are you actually trying to do?
|
# ? Dec 5, 2012 05:14 |
|
ShoulderDaemon posted:How you escape data has absolutely nothing to do with how you acquire a variable, and everything to do with how you intend to use it. Why do you think your input needs escaping? What are you actually trying to do? More specifically, I am trying to have the input go into the body of an HTML page. I believe it needs escaping because when I paste something that has parenthesis, brackets, etc. it errors out, but when I just have alphanumeric characters, it doesn't error out.
|
# ? Dec 5, 2012 06:26 |
|
Crush posted:More specifically, I am trying to have the input go into the body of an HTML page. I believe it needs escaping because when I paste something that has parenthesis, brackets, etc. it errors out, but when I just have alphanumeric characters, it doesn't error out. HTML does need escaping, but neither parenthesis nor brackets should cause a problem. It'd be helpful if you posted an actual error, rather than just saying "it errors out". To escape for HTML, it's probably easiest to use: Perl code:
|
# ? Dec 5, 2012 06:41 |
|
Crush posted:I'd prefer to not call in modules and just use pure Perl for this if possible. Please help out this Perl n00b! This defeats the purpose of perl. If you're uninterested in the deep set of extensively-tested libraries, there is pretty much no reason to use perl for sw development. It just becomes a sed/awk replacement in your cron jobs.
|
# ? Dec 8, 2012 17:25 |
|
In addition to that, many modules are pure perl.
|
# ? Dec 8, 2012 20:35 |
|
I wrote a quick and dirty perl script that uses HTTP::Request and LWP::UserAgent to pick out pieces from a page and spit them out to a file. Is there a way to tell these to click into links then scrape? The problem I'm facing is that I can't recursively go through a URL (something like site.com/page/1) to get to each page, but there are links to each page divided by letters. The pages are all here so it would need to go to "0-9" then the first link in the <table><tr><td> and run the scraping part, go up one, click the second link and run the scraper again, etc. Is there any easy way to do this, or should I go the python route? raej fucked around with this message at 04:10 on Dec 14, 2012 |
# ? Dec 14, 2012 03:55 |
|
raej posted:Is there a way to tell these to click into links then scrape? Two options come to mind: 1) Add HTML::TreeBuilder to the mix and use that to extract the links you need. Something like this: code:
|
# ? Dec 14, 2012 18:09 |
|
As another option there is Web::Scrapercode:
|
# ? Dec 16, 2012 00:56 |
|
HTML::LinkExtor
|
# ? Dec 16, 2012 19:26 |
|
Too many ways to do it, I'm switching to python!
|
# ? Dec 16, 2012 21:03 |
|
Shoulda just used IO::Pty to run an instance of lynx and feed it some keystrokes, problem solved!
|
# ? Dec 17, 2012 09:23 |
|
Mojo is the one I would use, since its got both the UA stuff and DOM parsing built in.
|
# ? Dec 17, 2012 15:00 |
|
You could also probably just wget everything first then do your html scraping locally and not use any of your dumb scraper suggestions! Yeaaaahhhhh:Bash code:
|
# ? Dec 17, 2012 16:34 |
|
Mario Incandenza posted:Shoulda just used IO::Pty to run an instance of lynx and feed it some keystrokes, problem solved! You laugh, but I had to deploy a rudimentary status page last week by telnetting into a common jump server, telnetting into a specific site server, SSHing into the wireless controller| tee logfile.txt, spew commands blindly, parsing the results via 'typed-in' one-liners, and FTPing the results back home for further chewing by the script this is all wrapped in. Capturing output of Net::Telnet cmd() was failing, due to flaky network connections. We had to capture locally, then parse. And they removed Net::SSH years ago. It was disgusting, sneaky, and made me feel bad as a Perl coder. But when your sole customer says "We need this by X date" and also says "You may not install any software, scripts or make any changes until X date + 6 weeks", you figure out workarounds.
|
# ? Dec 17, 2012 18:57 |
|
Jonny 290 posted:telnetting into a common jump server, telnetting into a specific site server, SSHing into the wireless controller| tee logfile.txt, spew commands blindly, parsing the results via 'typed-in' one-liners, and FTPing the results back home for further chewing by the script this is all wrapped in. rgoldberg.pl
|
# ? Dec 17, 2012 19:09 |
|
Both those examples are awesome. But what I really need to do it crawl to those links, then the links on each of those results. And on those pages, extract certain portions of the page out. I've written most of the extraction part for each brewery's page, but it's the crawling part I'm having difficulty with.
|
# ? Dec 17, 2012 20:22 |
|
raej posted:Both those examples are awesome. But what I really need to do it crawl to those links, then the links on each of those results. And on those pages, extract certain portions of the page out. Are you just recursing on each link, maybe with a maximum recursion depth?
|
# ? Dec 17, 2012 21:09 |
|
That's what I'm trying to figure out. From that starting point of http://www.ratebeer.com/BrowseBrewers.asp I'd need to go to each Alphabetic category, then each brewery listed. On each brewery's page is where I'd scrape the data.code:
|
# ? Dec 17, 2012 21:19 |
|
raej posted:That's what I'm trying to figure out. From that starting point of http://www.ratebeer.com/BrowseBrewers.asp I'd need to go to each Alphabetic category, then each brewery listed. On each brewery's page is where I'd scrape the data. With that done you'll have all the data you need to deal with on disk, and you can just recurse through directories and run your extraction code on each index file.
|
# ? Dec 18, 2012 06:07 |
|
That's not a bad idea at all. I tried running that with wget.exe, but I got an exception on the curly braces and I have no Linux box :-/
|
# ? Dec 18, 2012 17:58 |
|
Ah, I tested it out in the cygwin terminal on windows and it worked, so if that's something you're willing to install (base system+wget) it'll work for you. It'll mean you then have cygwin as a dependency for your project though so that might not be something you want, though it shouldn't interfere with anything.
|
# ? Dec 18, 2012 18:09 |
|
code:
|
# ? Dec 18, 2012 21:10 |
|
Happy birthday perl
|
# ? Dec 19, 2012 03:35 |
|
tef posted:Happy birthday perl I love Perl, and screw all the computer scientists who look down their noses at me.
|
# ? Dec 19, 2012 13:45 |
|
So I wrote a script to feed AMQP feeds and raw log files into a Vertica database. It does this by parsing a text string, getting the relevant fields, building a new list of strings in comma separated format, then once it hits a set number of list items, blasts the list into a CSV file and makes a system call to the binary responsible to bulk load that file into the DB. Once that's confirmed as loaded, the list is cleared, and the process continues. It functions in this roundabout way due to a number of limitations in the bulk load methods provided by Vertica - the method I use is the only way I can see to bulk load without requiring a database user with excessive privileges. The problem I've come across now is that the script seems to become very sluggish after processing a few million entries, with users reporting some oddly high CPU and memory usage. My guess is that constantly filling and flushing the list of entries is filling up memory - I'm not sure how good Perl is with garbage collection, so perhaps when the list is 'cleared', the memory it was taking up isn't actually being freed or reused. Is there anything I can do to confirm this, and does anybody know a better method to use to avoid such problems?
|
# ? Jan 13, 2013 03:15 |
|
Rohaq posted:So I wrote a script to feed AMQP feeds and raw log files into a Vertica database. It does this by parsing a text string, getting the relevant fields, building a new list of strings in comma separated format, then once it hits a set number of list items, blasts the list into a CSV file and makes a system call to the binary responsible to bulk load that file into the DB. Once that's confirmed as loaded, the list is cleared, and the process continues. perl's gc is "okay", it can't handle cyclic references at all - so if you have an array containing a bunch of reference, then add a reference to the current array, perl will never be able to collect it. high cpu usage doesn't really suggest that though (from my experience at least), might want to look into devel::gladiator and a few others
|
# ? Jan 13, 2013 06:05 |
|
I've started learning Perl a few months ago for work, I'd like to think I've gotten a pretty decent handle on things. It's been pretty fun so far - I've been picking things up as I code and have been going through Programming Perl at my leisure... One thing I'm not sure I totally get are subroutine prototypes. From my understanding, they're really only useful if you want to be able to call a subroutine without parens (like Perl's built-in functions) and don't really offer much else on top of that, like what you'd get out of, say, a method signature in Java. Now, the code I'm working with uses prototypes everywhere. Am I not seeing some other benefit to using them?
|
# ? Jan 13, 2013 22:35 |
|
Crumbles posted:I've started learning Perl a few months ago for work, I'd like to think I've gotten a pretty decent handle on things. It's been pretty fun so far - I've been picking things up as I code and have been going through Programming Perl at my leisure... One thing I'm not sure I totally get are subroutine prototypes. From my understanding, they're really only useful if you want to be able to call a subroutine without parens (like Perl's built-in functions) and don't really offer much else on top of that, like what you'd get out of, say, a method signature in Java. Now, the code I'm working with uses prototypes everywhere. Am I not seeing some other benefit to using them? This reminds me of some code that I was asked to maintain when I got my first real job. My coworker who wrote the code was a novice programmer and not too familiar with perl, so she wrote something like this: Perl code:
|
# ? Jan 13, 2013 23:01 |
|
|
# ? May 22, 2024 15:31 |
|
het posted:There are other benefits, like transparently passing an array by reference so you can change it a la push(), but you can be virtually guaranteed that anyone who uses perl function prototypes constantly has no idea what they're for and shouldn't be using them. One cool use for prototypes is passing blocks in without the need for sub. e.g. code:
|
# ? Jan 13, 2013 23:42 |