Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Erasmus Darwin
Mar 6, 2001

qntm posted:

code:
open my $fh, ">>", $filename or die "urk"; # open for append

This should be:
code:
open my $fh, "+>>", $filename or die "urk"; # open for read and append

Adbot
ADBOT LOVES YOU

Schweinhund
Oct 23, 2004

:derp:   :kayak:                                     
On this page:
http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0

1/3 of the way down is a link called "export data". When you click it, it's a download link to a spreadsheet. I'd like to be able to download that spreadsheet automatically without clicking the link. The problem is it's a javascript link, so I'm not sure it's possible to get with Perl. Is there anyway to do it? It doesn't have to be a perl solution really. I just need something that will run on windows and download that link every day. I'd guess there may be a PHP equivalent to that link I could use with perl, but I'm not sure how to generate that.

Rohaq
Aug 11, 2006

Schweinhund posted:

On this page:
http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0

1/3 of the way down is a link called "export data". When you click it, it's a download link to a spreadsheet. I'd like to be able to download that spreadsheet automatically without clicking the link. The problem is it's a javascript link, so I'm not sure it's possible to get with Perl. Is there anyway to do it? It doesn't have to be a perl solution really. I just need something that will run on windows and download that link every day. I'd guess there may be a PHP equivalent to that link I could use with perl, but I'm not sure how to generate that.
EDIT:- Actually, ignore me. I'm wrong. Passing this through a capture proxy does however show that there's a parameter called "__EVENTTARGET" that gets set to "LB$cmdCSV". That's probably related.

Rohaq fucked around with this message at 14:14 on Jun 10, 2012

TiMBuS
Sep 25, 2007

LOL WUT?

looking at the source is easy enough man. there is a form with the id "form1", it has a bunch of hidden input vars.
the javascript link does nothing but post using that form. and as the post above says, "LB$cmdCSV" is the eventTarget. there is no eventArgument.

you could probably craft a couple of curl requests to ge tthe data but theres a worryingly long session var that might get in the way. I'd probably use WWW::Mechanize to grab the form, set the __EVENTTARGET to "LB$cmdCSV", and then post it off. You should get the raw data as a response.

Perl code:
use strict; 
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();

$mech->get('http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0');
$mech->form_id('form1');
$mech->field('__EVENTTARGET', 'LB$cmdCSV');
my $response = $mech->submit();

print $response->decoded_content;

Schweinhund
Oct 23, 2004

:derp:   :kayak:                                     

TiMBuS posted:

looking at the source is easy enough man.

I tried but I haven't really dealt with web stuff in a while so it was pretty confusing.


That works, thanks a lot.

Rohaq
Aug 11, 2006

TiMBuS posted:

looking at the source is easy enough man. there is a form with the id "form1", it has a bunch of hidden input vars.
the javascript link does nothing but post using that form. and as the post above says, "LB$cmdCSV" is the eventTarget. there is no eventArgument.

you could probably craft a couple of curl requests to ge tthe data but theres a worryingly long session var that might get in the way. I'd probably use WWW::Mechanize to grab the form, set the __EVENTTARGET to "LB$cmdCSV", and then post it off. You should get the raw data as a response.

Perl code:
use strict; 
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();

$mech->get('http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0');
$mech->form_id('form1');
$mech->field('__EVENTTARGET', 'LB$cmdCSV');
my $response = $mech->submit();

print $response->decoded_content;
Ah nice - I've not used WWW::Mechanize before, so this will no doubt be useful in the future, thanks!

Mithaldu
Sep 25, 2007

Let's cuddle. :3:
For those of you who couldn't attend YAPC::NA, here's a bunch of videos from the third day: http://www.youtube.com/playlist?list=PLE3F888A650339DDF

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
Perl saved me a bunch of time today. Like, a whole bunch. And I think made me look good to my coworkers and boss. Thank you Larry Wall, you crazy diamond :)


It was also nice to take a break from thinking about special function registers and clocks causing noise on the ADC and the like for a while here in microcontroller-land. :)

Blotto Skorzany fucked around with this message at 23:36 on Jun 18, 2012

JawnV6
Jul 4, 2004

So hot ...
You can do inline asm in perl if you want. Best of both worlds :)

syphon
Jan 1, 2001
Any regular expression gurus in the house? We have a custom template file that denotes variables wrapped in ??'s. Here's some example data:
code:
dns = ??computer??.??domain??
I'm trying to make a Regex expand the variables, but I can't get a regular expression that isn't greedy. Here's what I've got so far:
code:
$_ =~ /(?:\?\?(\S+)\?\?)?/g;
No matter what I do, I can't get it to match just "??computer??", it always matches "??computer??.??domain??".

Bonfire Lit
Jul 9, 2008

If you're one of the sinners who caused this please unfriend me now.

You need to add that ? to the quantifier (your +) to make it non-greedy (so you'd end up with \S+?). You're using it as a zero-or-one quantifier on a group around the entirety of the regexp, which isn't going to do what you want it to.

syphon
Jan 1, 2001
Thanks. Every time I think I understand regular expressions, something like this throws me a curve ball!

Rohaq
Aug 11, 2006

syphon posted:

Thanks. Every time I think I understand regular expressions, something like this throws me a curve ball!
Get a decent regex tester to help you build expressions. One of my favourites is The Regex Coach; though it's a little dated, it does do Perl-compatible expressions, and even lets you step through your expression against a target string, so you can find out where your expression might be going wrong.

Also, that feature is great for building more efficient expressions. The following is more efficient than your expression above, for example, since it doesn't constantly check for the existence of "??" after your capture group:

code:
(?:\?\?([^(?:\?\?)]+)\?\?)

Bonfire Lit
Jul 9, 2008

If you're one of the sinners who caused this please unfriend me now.

It's also different because [^(?:\?\?)] is a character class (i.e. it doesn't mean "not ??", it means "none of the characters (, ), ?, :"). The closest thing you can get with character classes is [^?] but then ??foo?bar?? will not match.

Rohaq
Aug 11, 2006

Isilkor posted:

It's also different because [^(?:\?\?)] is a character class (i.e. it doesn't mean "not ??", it means "none of the characters (, ), ?, :"). The closest thing you can get with character classes is [^?] but then ??foo?bar?? will not match.
Ah, you're right. I should have tested that further.
code:
(?:\?\?([^\?]+)\?\?)
This works too, but as mentioned, doesn't support ? characters in your machine name or domain.

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
Generating 226 S19 eeprom images in various configs using Moose : 68 seconds
Generating 226 s19 eeprom images in various configs using Mouse : 14 seconds
Generating 226 s19 eeprom images in various configs using Moose*: 1.8 seconds





*Don't be a moron and exec a script once per file for 226 files when you could exec it once and loop over the loving files

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Otto Skorzeny posted:

Generating 226 S19 eeprom images in various configs using Moose : 68 seconds
Generating 226 s19 eeprom images in various configs using Mouse : 14 seconds
Generating 226 s19 eeprom images in various configs using Moose*: 1.8 seconds





*Don't be a moron and exec a script once per file for 226 files when you could exec it once and loop over the loving files

Now try with Moo. :)

TiMBuS
Sep 25, 2007

LOL WUT?

Well okay. I've never seen Moo outperform Mouse. The only advantage I've seen is that it can work with Moose, whereas Mouse + Moose can collide on occasion.

If we need proof though:
Startup time. After 10 runs of each to avoid cold code:
code:
timbus@Timbox ~ $ time perl -MMoose -e0

real	0m0.132s
user	0m0.116s
sys	0m0.012s

timbus@Timbox ~ $ time perl -MMouse -e0

real	0m0.021s
user	0m0.020s
sys	0m0.000s

timbus@Timbox ~ $ time perl -MMoo -e0

real	0m0.026s
user	0m0.024s
sys	0m0.000s
And after running this lovely benchmark, which tests object creation, get/set and 'Int' type validation:
code:
Moo: 10 wallclock secs (10.39 usr +  0.00 sys = 10.39 CPU) @ 577478.34/s (n=6000000)
Moo w/quote_sub: 11 wallclock secs (11.70 usr +  0.00 sys = 11.70 CPU) @ 512820.51/s (n=6000000)
Moose: 10 wallclock secs (10.27 usr +  0.00 sys = 10.27 CPU) @ 584225.90/s (n=6000000)
Mouse:  1 wallclock secs ( 1.58 usr +  0.00 sys =  1.58 CPU) @ 3797468.35/s (n=6000000)
hash:  4 wallclock secs ( 3.72 usr +  0.00 sys =  3.72 CPU) @ 1612903.23/s (n=6000000)
hash, no check:  0 wallclock secs ( 0.96 usr +  0.00 sys =  0.96 CPU) @ 6250000.00/s (n=6000000)
manual: 10 wallclock secs (10.38 usr +  0.00 sys = 10.38 CPU) @ 578034.68/s (n=6000000)
manual, no check:  4 wallclock secs ( 4.57 usr +  0.01 sys =  4.58 CPU) @ 1310043.67/s (n=6000000)

Soo. Yeah..



Oh and while I'm at it:
code:
timbus@Timbox ~ $ time perl6 -e0

real	0m0.304s
user	0m0.236s
sys	0m0.056s
Slowwwly but surely...

TiMBuS
Sep 25, 2007

LOL WUT?

Before I get called a liar, I modified the code to put ->new into each test run so we are now actually benchmarking object creation.
I also added a Moo class that didn't validate the int -at all-
code:
Testing Perl 5.014002, Moose 2.0401, Mouse 0.97, Moo 0.091009
Benchmark: timing 6000000 iterations of Moo, Moo w/out validation, Mouse...
Moo: 24 wallclock secs (24.05 usr +  0.01 sys = 24.06 CPU) @ 249376.56/s (n=6000000)
Moo w/out validation: 17 wallclock secs (17.16 usr +  0.00 sys = 17.16 CPU) @ 349650.35/s (n=6000000)
Mouse: 10 wallclock secs ( 8.62 usr +  0.00 sys =  8.62 CPU) @ 696055.68/s (n=6000000)
Now I forget what my original point was.

welcome to hell
Jun 9, 2006
Moo will use Class::XSAccessor for simple accessors if it is available, which in my testing makes it faster than Mouse for that case.

TiMBuS
Sep 25, 2007

LOL WUT?

You're right. Thats not even in the Moo docs :/

I now get slightly faster Moo accessors but only if it's not doing any validation (which is never, in my code). Object instantiation is still much slower, startup time is the same. Overall benchmark (creation+validation+accessor) still puts Moo at ~2.5x slower (same bench result as the previous post).

Stickin' with Mouse.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

TiMBuS posted:

Well okay. I've never seen Moo outperform Mouse. The only advantage I've seen is that it can work with Moose, whereas Mouse + Moose can collide on occasion.
I didn't mean that as in "do a generic benchmark of things". Moo being Pure-Perl i know it can't be as fast as Mouse. I was really more interested in seeing how it stacks up in a real world situation like the one Otto wrote about.

Also, aside from being a smooth upgrade to Moose, the main advantages to Moo are that it is pure perl and thus easier to debug when stuff does go wrong, and comes with zero dependencies requiring a compiler, meaning that you can easily bundle it it for outdated clients or similar situations.

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."
I'm trying to print lines from a flat file that have a string match of '3133' in a certain position but can';t figure out a good way to move to position 1224, examine the next 4 bytes, and if they match, print the line out.

Eventually I'd like to dump this out to a file. Any pointers?

Anaconda Rifle
Mar 23, 2007

Yam Slacker

raej posted:

I'm trying to print lines from a flat file that have a string match of '3133' in a certain position but can';t figure out a good way to move to position 1224, examine the next 4 bytes, and if they match, print the line out.

Eventually I'd like to dump this out to a file. Any pointers?

Position 1224 of the file, or position 1224 of each line in a file?

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."

Anaconda Rifle posted:

Position 1224 of the file, or position 1224 of each line in a file?

Each line. See attached image. I want to print all of the lines out, except for lines 2 and 5.

Only registered members can see post attachments!

Anaconda Rifle
Mar 23, 2007

Yam Slacker

raej posted:

Each line. See attached image. I want to print all of the lines out, except for lines 2 and 5.



I'd probably use unpack or a really lovely regex. How large are these files? Might want to benchmark both approaches.

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."

Anaconda Rifle posted:

I'd probably use unpack or a really lovely regex. How large are these files? Might want to benchmark both approaches.

1254 bytes in length, ~50k lines for this guy.

het
Nov 14, 2002

A dark black past
is my most valued
possession

raej posted:

I'm trying to print lines from a flat file that have a string match of '3133' in a certain position but can';t figure out a good way to move to position 1224, examine the next 4 bytes, and if they match, print the line out.

Eventually I'd like to dump this out to a file. Any pointers?

Obviously you can flesh this out more but I think this should do what you want.
Perl code:
while (<>) {
	print if (substr($_, 1224, 4) eq '3133');
}

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."

het posted:

Obviously you can flesh this out more but I think this should do what you want.
Perl code:
while (<>) {
	print if (substr($_, 1224, 4) eq '3133');
}

This was it, perfect, thank you!

Anaconda Rifle
Mar 23, 2007

Yam Slacker
Wow. I completely forgot substr existed. Where do I hand in my Perl badge?

syphon
Jan 1, 2001
I never use substr in favor of regular expressions, nowadays. I rarely seem to process strings on consistent length where substr would actually be useful.

MacGowans Teeth
Aug 13, 2003

syphon posted:

I never use substr in favor of regular expressions, nowadays. I rarely seem to process strings on consistent length where substr would actually be useful.

I use substr and unpack all the time, but I work with a LOT of fixed length records, because this industry (insurance) is still in the COBOL mindset, I think.

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
I work with pack a bunch because it's a pretty reasonable way to pack things into s-records :engleft:

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp
I discovered WWW::Mechanize today.

It's pretty fun. :toot:

jeeves
May 27, 2001

Deranged Psychopathic
Butler Extraordinaire
I teaching myself Perl this summer, and I am working on problem about regular expression matching.

The problem has me stripping html tags out of text, stuff like <br> turning into new lines, and I have all of that working correctly.

However, the part I am stuck is putting dashes around all words contained within <em> and </em> tags. I really don't know where to start with this, besides what I've read about matching full words in RE via /b.

Should I do an if-check on if <em> is found, and then throw the rest of the text between until it finds </em> into an array, and then add the dashes around each array items using a foreach? I can't seem to figure out an easy way to do 'look forward' for the closing </em> once the initial <em> is found.

Example input:
<em>This is a test
Okay</em>

Output:
-This- -is- -a- -test-
-Okay-

jeeves fucked around with this message at 19:59 on Jul 16, 2012

het
Nov 14, 2002

A dark black past
is my most valued
possession
Not to rain on your parade but the real answer is seriously "don't use regexps to parse html", it's the wrong tool for the job, you're better off using a module that'll parse it for you.

That said, you can use s///m for multi-line matching.

jeeves
May 27, 2001

Deranged Psychopathic
Butler Extraordinaire

het posted:

Not to rain on your parade but the real answer is seriously "don't use regexps to parse html", it's the wrong tool for the job, you're better off using a module that'll parse it for you.

That said, you can use s///m for multi-line matching.

Yeah, I figured that there is a better way to do this, but I have just been doing the example questions and it seems whoever wrote this thought that this would be a good question for a RE substitution chapter or such.

Thanks for multi-line matching though, I'll look into that.

jeeves fucked around with this message at 20:20 on Jul 16, 2012

Polygynous
Dec 13, 2006
welp
The easiest / laziest thing I can think of is just to set a flag if there's an open <em> tag and then look for a closing tag instead while outputting appropriately.

And (efb of course) people are going to yell at you for trying to parse html with regexps in the first place. :)

jeeves
May 27, 2001

Deranged Psychopathic
Butler Extraordinaire

spoon0042 posted:

The easiest / laziest thing I can think of is just to set a flag if there's an open <em> tag and then look for a closing tag instead while outputting appropriately.

I didn't think to set a flag and then automatically add dashes while also looking for the closing tag. Much easier than first trying to find then closing tag and then processing the in between text after. Thanks!

And yeah, if this question was for a modules chapter I am sure I would use a module, but it was written for a RE chapter so oh well :v:

Adbot
ADBOT LOVES YOU

MacGowans Teeth
Aug 13, 2003

jeeves posted:

I teaching myself Perl this summer, and I am working on problem about regular expression matching.

The problem has me stripping html tags out of text, stuff like <br> turning into new lines, and I have all of that working correctly.

However, the part I am stuck is putting dashes around all words contained within <em> and </em> tags. I really don't know where to start with this, besides what I've read about matching full words in RE via /b.

Should I do an if-check on if <em> is found, and then throw the rest of the text between until it finds </em> into an array, and then add the dashes around each array items using a foreach? I can't seem to figure out an easy way to do 'look forward' for the closing </em> once the initial <em> is found.

Example input:
<em>This is a test
Okay</em>

Output:
-This- -is- -a- -test-
-Okay-

This isn't exactly what you're asking for, but you might be able to use it. I wrote (i.e., found on Perl Monks and then slightly modified) this to pull XML out of a bunch of log files a while back.
code:
            while (<$fh>) {
                if ( s/.*xml=[\s\S]*?(<$tag>)/$1/ .. s/(<(\/)$tag>).*/$1/ ) {
                    $extractedXML .= $_;
                    last if $2;
                }
            }
As I understand it, if you use .. inside an if like this, it starts returning true as soon as the left hand side matches, and it stays true over multiple lines until the right side matches. This piece of code is expecting to find garbage on each side of the opening and closing tags, hence the s//, but I think this could be used in a plain match as well.

  • Locked thread