The Perl Short Questions Megathread: executable line noise

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »

Erasmus Darwin: Mar 6, 2001

qntm posted:

code:

open my $fh, ">>", $filename or die "urk"; # open for append

This should be:

code:

open my $fh, "+>>", $filename or die "urk"; # open for read and append

# ? May 16, 2012 18:02

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 22:13

Schweinhund: Oct 23, 2004

On this page:
http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0

1/3 of the way down is a link called "export data". When you click it, it's a download link to a spreadsheet. I'd like to be able to download that spreadsheet automatically without clicking the link. The problem is it's a javascript link, so I'm not sure it's possible to get with Perl. Is there anyway to do it? It doesn't have to be a perl solution really. I just need something that will run on windows and download that link every day. I'd guess there may be a PHP equivalent to that link I could use with perl, but I'm not sure how to generate that.

# ? Jun 10, 2012 11:26

Rohaq: Aug 11, 2006

Schweinhund posted:

On this page:
http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0

1/3 of the way down is a link called "export data". When you click it, it's a download link to a spreadsheet. I'd like to be able to download that spreadsheet automatically without clicking the link. The problem is it's a javascript link, so I'm not sure it's possible to get with Perl. Is there anyway to do it? It doesn't have to be a perl solution really. I just need something that will run on windows and download that link every day. I'd guess there may be a PHP equivalent to that link I could use with perl, but I'm not sure how to generate that.

EDIT:- Actually, ignore me. I'm wrong. Passing this through a capture proxy does however show that there's a parameter called "__EVENTTARGET" that gets set to "LB$cmdCSV". That's probably related.

Rohaq fucked around with this message at 14:14 on Jun 10, 2012

# ? Jun 10, 2012 14:02

TiMBuS: Sep 25, 2007; LOL WUT?

looking at the source is easy enough man. there is a form with the id "form1", it has a bunch of hidden input vars.
the javascript link does nothing but post using that form. and as the post above says, "LB$cmdCSV" is the eventTarget. there is no eventArgument.

you could probably craft a couple of curl requests to ge tthe data but theres a worryingly long session var that might get in the way. I'd probably use WWW::Mechanize to grab the form, set the __EVENTTARGET to "LB$cmdCSV", and then post it off. You should get the raw data as a response.

Perl code:

use strict; 
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();

$mech->get('http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0');
$mech->form_id('form1');
$mech->field('__EVENTTARGET', 'LB$cmdCSV');
my $response = $mech->submit();

print $response->decoded_content;

# ? Jun 11, 2012 07:41

Schweinhund: Oct 23, 2004

TiMBuS posted:

looking at the source is easy enough man.

I tried but I haven't really dealt with web stuff in a while so it was pretty confusing.

That works, thanks a lot.

# ? Jun 11, 2012 13:25

Rohaq: Aug 11, 2006

TiMBuS posted:

looking at the source is easy enough man. there is a form with the id "form1", it has a bunch of hidden input vars.
the javascript link does nothing but post using that form. and as the post above says, "LB$cmdCSV" is the eventTarget. there is no eventArgument.

you could probably craft a couple of curl requests to ge tthe data but theres a worryingly long session var that might get in the way. I'd probably use WWW::Mechanize to grab the form, set the __EVENTTARGET to "LB$cmdCSV", and then post it off. You should get the raw data as a response.
Perl code:
use strict; 
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();

$mech->get('http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2012&month=0&season1=2012&ind=0');
$mech->form_id('form1');
$mech->field('__EVENTTARGET', 'LB$cmdCSV');
my $response = $mech->submit();

print $response->decoded_content;

Ah nice - I've not used WWW::Mechanize before, so this will no doubt be useful in the future, thanks!

# ? Jun 11, 2012 16:18

Mithaldu: Sep 25, 2007; Let's cuddle.

For those of you who couldn't attend YAPC::NA, here's a bunch of videos from the third day: http://www.youtube.com/playlist?list=PLE3F888A650339DDF

# ? Jun 16, 2012 05:49

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Perl saved me a bunch of time today. Like, a whole bunch. And I think made me look good to my coworkers and boss. Thank you Larry Wall, you crazy diamond

It was also nice to take a break from thinking about special function registers and clocks causing noise on the ADC and the like for a while here in microcontroller-land.

Blotto Skorzany fucked around with this message at 23:36 on Jun 18, 2012

# ? Jun 18, 2012 23:09

JawnV6: Jul 4, 2004; So hot ...

You can do inline asm in perl if you want. Best of both worlds

# ? Jun 18, 2012 23:28

syphon: Jan 1, 2001

Any regular expression gurus in the house? We have a custom template file that denotes variables wrapped in ??'s. Here's some example data:

code:

dns = ??computer??.??domain??

I'm trying to make a Regex expand the variables, but I can't get a regular expression that isn't greedy. Here's what I've got so far:

code:

$_ =~ /(?:\?\?(\S+)\?\?)?/g;

No matter what I do, I can't get it to match just "??computer??", it always matches "??computer??.??domain??".

# ? Jun 19, 2012 00:49

Bonfire Lit: Jul 9, 2008; If you're one of the sinners who caused this please unfriend me now.

You need to add that ? to the quantifier (your +) to make it non-greedy (so you'd end up with \S+?). You're using it as a zero-or-one quantifier on a group around the entirety of the regexp, which isn't going to do what you want it to.

# ? Jun 19, 2012 01:16

syphon: Jan 1, 2001

Thanks. Every time I think I understand regular expressions, something like this throws me a curve ball!

# ? Jun 19, 2012 01:26

Rohaq: Aug 11, 2006

syphon posted:

Thanks. Every time I think I understand regular expressions, something like this throws me a curve ball!

Get a decent regex tester to help you build expressions. One of my favourites is The Regex Coach; though it's a little dated, it does do Perl-compatible expressions, and even lets you step through your expression against a target string, so you can find out where your expression might be going wrong.

Also, that feature is great for building more efficient expressions. The following is more efficient than your expression above, for example, since it doesn't constantly check for the existence of "??" after your capture group:

code:

(?:\?\?([^(?:\?\?)]+)\?\?)

# ? Jun 19, 2012 10:51

Bonfire Lit: Jul 9, 2008; If you're one of the sinners who caused this please unfriend me now.

It's also different because [^(?:\?\?)] is a character class (i.e. it doesn't mean "not ??", it means "none of the characters (, ), ?, :"). The closest thing you can get with character classes is [^?] but then ??foo?bar?? will not match.

# ? Jun 19, 2012 11:15

Rohaq: Aug 11, 2006

Isilkor posted:

It's also different because [^(?:\?\?)] is a character class (i.e. it doesn't mean "not ??", it means "none of the characters (, ), ?, :"). The closest thing you can get with character classes is [^?] but then ??foo?bar?? will not match.

Ah, you're right. I should have tested that further.

code:

(?:\?\?([^\?]+)\?\?)

This works too, but as mentioned, doesn't support ? characters in your machine name or domain.

# ? Jun 19, 2012 13:44

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Generating 226 S19 eeprom images in various configs using Moose : 68 seconds Generating 226 s19 eeprom images in various configs using Mouse : 14 seconds Generating 226 s19 eeprom images in various configs using Moose*: 1.8 seconds

*Don't be a moron and exec a script once per file for 226 files when you could exec it once and loop over the loving files

# ? Jun 20, 2012 15:22

Mithaldu: Sep 25, 2007; Let's cuddle.

Otto Skorzeny posted:

Generating 226 S19 eeprom images in various configs using Moose : 68 seconds Generating 226 s19 eeprom images in various configs using Mouse : 14 seconds Generating 226 s19 eeprom images in various configs using Moose*: 1.8 seconds

*Don't be a moron and exec a script once per file for 226 files when you could exec it once and loop over the loving files

Now try with Moo.

# ? Jun 23, 2012 03:53

TiMBuS: Sep 25, 2007; LOL WUT?

Well okay. I've never seen Moo outperform Mouse. The only advantage I've seen is that it can work with Moose, whereas Mouse + Moose can collide on occasion.

If we need proof though:
Startup time. After 10 runs of each to avoid cold code:

code:

timbus@Timbox ~ $ time perl -MMoose -e0

real	0m0.132s
user	0m0.116s
sys	0m0.012s

timbus@Timbox ~ $ time perl -MMouse -e0

real	0m0.021s
user	0m0.020s
sys	0m0.000s

timbus@Timbox ~ $ time perl -MMoo -e0

real	0m0.026s
user	0m0.024s
sys	0m0.000s

And after running this lovely benchmark, which tests object creation, get/set and 'Int' type validation:

code:

Moo: 10 wallclock secs (10.39 usr +  0.00 sys = 10.39 CPU) @ 577478.34/s (n=6000000)
Moo w/quote_sub: 11 wallclock secs (11.70 usr +  0.00 sys = 11.70 CPU) @ 512820.51/s (n=6000000)
Moose: 10 wallclock secs (10.27 usr +  0.00 sys = 10.27 CPU) @ 584225.90/s (n=6000000)
Mouse:  1 wallclock secs ( 1.58 usr +  0.00 sys =  1.58 CPU) @ 3797468.35/s (n=6000000)
hash:  4 wallclock secs ( 3.72 usr +  0.00 sys =  3.72 CPU) @ 1612903.23/s (n=6000000)
hash, no check:  0 wallclock secs ( 0.96 usr +  0.00 sys =  0.96 CPU) @ 6250000.00/s (n=6000000)
manual: 10 wallclock secs (10.38 usr +  0.00 sys = 10.38 CPU) @ 578034.68/s (n=6000000)
manual, no check:  4 wallclock secs ( 4.57 usr +  0.01 sys =  4.58 CPU) @ 1310043.67/s (n=6000000)

Soo. Yeah..

Oh and while I'm at it:

code:

timbus@Timbox ~ $ time perl6 -e0

real	0m0.304s
user	0m0.236s
sys	0m0.056s

Slowwwly but surely...

# ? Jun 24, 2012 07:08

TiMBuS: Sep 25, 2007; LOL WUT?

Before I get called a liar, I modified the code to put ->new into each test run so we are now actually benchmarking object creation.
I also added a Moo class that didn't validate the int -at all-

code:

Testing Perl 5.014002, Moose 2.0401, Mouse 0.97, Moo 0.091009
Benchmark: timing 6000000 iterations of Moo, Moo w/out validation, Mouse...
Moo: 24 wallclock secs (24.05 usr +  0.01 sys = 24.06 CPU) @ 249376.56/s (n=6000000)
Moo w/out validation: 17 wallclock secs (17.16 usr +  0.00 sys = 17.16 CPU) @ 349650.35/s (n=6000000)
Mouse: 10 wallclock secs ( 8.62 usr +  0.00 sys =  8.62 CPU) @ 696055.68/s (n=6000000)

Now I forget what my original point was.

# ? Jun 24, 2012 07:31

welcome to hell: Jun 9, 2006

Moo will use Class::XSAccessor for simple accessors if it is available, which in my testing makes it faster than Mouse for that case.

# ? Jun 24, 2012 09:27

TiMBuS: Sep 25, 2007; LOL WUT?

You're right. Thats not even in the Moo docs :/

I now get slightly faster Moo accessors but only if it's not doing any validation (which is never, in my code). Object instantiation is still much slower, startup time is the same. Overall benchmark (creation+validation+accessor) still puts Moo at ~2.5x slower (same bench result as the previous post).

Stickin' with Mouse.

# ? Jun 24, 2012 09:51

Mithaldu: Sep 25, 2007; Let's cuddle.

TiMBuS posted:

Well okay. I've never seen Moo outperform Mouse. The only advantage I've seen is that it can work with Moose, whereas Mouse + Moose can collide on occasion.

I didn't mean that as in "do a generic benchmark of things". Moo being Pure-Perl i know it can't be as fast as Mouse. I was really more interested in seeing how it stacks up in a real world situation like the one Otto wrote about.

Also, aside from being a smooth upgrade to Moose, the main advantages to Moo are that it is pure perl and thus easier to debug when stuff does go wrong, and comes with zero dependencies requiring a compiler, meaning that you can easily bundle it it for outdated clients or similar situations.

# ? Jun 28, 2012 10:58

raej: Sep 25, 2003; "Being drunk is the worst feeling of all. Except for all those other feelings."

I'm trying to print lines from a flat file that have a string match of '3133' in a certain position but can';t figure out a good way to move to position 1224, examine the next 4 bytes, and if they match, print the line out.

Eventually I'd like to dump this out to a file. Any pointers?

# ? Jul 5, 2012 21:12

Anaconda Rifle: Mar 23, 2007; Yam Slacker

raej posted:

I'm trying to print lines from a flat file that have a string match of '3133' in a certain position but can';t figure out a good way to move to position 1224, examine the next 4 bytes, and if they match, print the line out.

Eventually I'd like to dump this out to a file. Any pointers?

Position 1224 of the file, or position 1224 of each line in a file?

# ? Jul 5, 2012 21:17

raej: Sep 25, 2003; "Being drunk is the worst feeling of all. Except for all those other feelings."

Anaconda Rifle posted:

Position 1224 of the file, or position 1224 of each line in a file?

Each line. See attached image. I want to print all of the lines out, except for lines 2 and 5.

Only registered members can see post attachments!

# ? Jul 5, 2012 21:32

Anaconda Rifle: Mar 23, 2007; Yam Slacker

raej posted:

Each line. See attached image. I want to print all of the lines out, except for lines 2 and 5.

I'd probably use unpack or a really lovely regex. How large are these files? Might want to benchmark both approaches.

# ? Jul 5, 2012 21:37

raej: Sep 25, 2003; "Being drunk is the worst feeling of all. Except for all those other feelings."

Anaconda Rifle posted:

I'd probably use unpack or a really lovely regex. How large are these files? Might want to benchmark both approaches.

1254 bytes in length, ~50k lines for this guy.

# ? Jul 5, 2012 21:42

het: Nov 14, 2002; A dark black past
is my most valued
possession

raej posted:

I'm trying to print lines from a flat file that have a string match of '3133' in a certain position but can';t figure out a good way to move to position 1224, examine the next 4 bytes, and if they match, print the line out.

Eventually I'd like to dump this out to a file. Any pointers?

Obviously you can flesh this out more but I think this should do what you want.

Perl code:

while (<>) {
	print if (substr($_, 1224, 4) eq '3133');
}

# ? Jul 5, 2012 21:49

raej: Sep 25, 2003; "Being drunk is the worst feeling of all. Except for all those other feelings."

het posted:

Obviously you can flesh this out more but I think this should do what you want.
Perl code:
while (<>) {
	print if (substr($_, 1224, 4) eq '3133');
}

This was it, perfect, thank you!

# ? Jul 5, 2012 21:56

Anaconda Rifle: Mar 23, 2007; Yam Slacker

Wow. I completely forgot substr existed. Where do I hand in my Perl badge?

# ? Jul 5, 2012 22:21

syphon: Jan 1, 2001

I never use substr in favor of regular expressions, nowadays. I rarely seem to process strings on consistent length where substr would actually be useful.

# ? Jul 5, 2012 22:44

MacGowans Teeth: Aug 13, 2003

syphon posted:

I never use substr in favor of regular expressions, nowadays. I rarely seem to process strings on consistent length where substr would actually be useful.

I use substr and unpack all the time, but I work with a LOT of fixed length records, because this industry (insurance) is still in the COBOL mindset, I think.

# ? Jul 6, 2012 02:01

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

I work with pack a bunch because it's a pretty reasonable way to pack things into s-records :engleft:

# ? Jul 6, 2012 03:43

Jonny 290: May 5, 2005; [ASK] me about OS/2 Warp

I discovered WWW::Mechanize today.

It's pretty fun. :toot:

# ? Jul 6, 2012 16:55

jeeves: May 27, 2001; Deranged Psychopathic
Butler Extraordinaire

I teaching myself Perl this summer, and I am working on problem about regular expression matching.

The problem has me stripping html tags out of text, stuff like turning into new lines, and I have all of that working correctly.

However, the part I am stuck is putting dashes around all words contained within and tags. I really don't know where to start with this, besides what I've read about matching full words in RE via /b.

Should I do an if-check on if is found, and then throw the rest of the text between until it finds into an array, and then add the dashes around each array items using a foreach? I can't seem to figure out an easy way to do 'look forward' for the closing once the initial is found.

Example input:
This is a test
Okay

Output:
-This- -is- -a- -test-
-Okay-

jeeves fucked around with this message at 19:59 on Jul 16, 2012

# ? Jul 16, 2012 19:47

het: Nov 14, 2002; A dark black past
is my most valued
possession

Not to rain on your parade but the real answer is seriously "don't use regexps to parse html", it's the wrong tool for the job, you're better off using a module that'll parse it for you.

That said, you can use s///m for multi-line matching.

# ? Jul 16, 2012 20:07

jeeves: May 27, 2001; Deranged Psychopathic
Butler Extraordinaire

het posted:

Not to rain on your parade but the real answer is seriously "don't use regexps to parse html", it's the wrong tool for the job, you're better off using a module that'll parse it for you.

That said, you can use s///m for multi-line matching.

Yeah, I figured that there is a better way to do this, but I have just been doing the example questions and it seems whoever wrote this thought that this would be a good question for a RE substitution chapter or such.

Thanks for multi-line matching though, I'll look into that.

jeeves fucked around with this message at 20:20 on Jul 16, 2012

# ? Jul 16, 2012 20:10

Polygynous: Dec 13, 2006; welp

The easiest / laziest thing I can think of is just to set a flag if there's an open tag and then look for a closing tag instead while outputting appropriately.

And (efb of course) people are going to yell at you for trying to parse html with regexps in the first place.

# ? Jul 16, 2012 20:10

jeeves: May 27, 2001; Deranged Psychopathic
Butler Extraordinaire

spoon0042 posted:

The easiest / laziest thing I can think of is just to set a flag if there's an open tag and then look for a closing tag instead while outputting appropriately.

I didn't think to set a flag and then automatically add dashes while also looking for the closing tag. Much easier than first trying to find then closing tag and then processing the in between text after. Thanks!

And yeah, if this question was for a modules chapter I am sure I would use a module, but it was written for a RE chapter so oh well :v:

# ? Jul 16, 2012 20:20

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 22:13

MacGowans Teeth: Aug 13, 2003

jeeves posted:

I teaching myself Perl this summer, and I am working on problem about regular expression matching.

The problem has me stripping html tags out of text, stuff like turning into new lines, and I have all of that working correctly.

However, the part I am stuck is putting dashes around all words contained within and tags. I really don't know where to start with this, besides what I've read about matching full words in RE via /b.

Should I do an if-check on if is found, and then throw the rest of the text between until it finds into an array, and then add the dashes around each array items using a foreach? I can't seem to figure out an easy way to do 'look forward' for the closing once the initial is found.

Example input:
This is a test
Okay

Output:
-This- -is- -a- -test-
-Okay-

This isn't exactly what you're asking for, but you might be able to use it. I wrote (i.e., found on Perl Monks and then slightly modified) this to pull XML out of a bunch of log files a while back.

code:

            while (<$fh>) {
                if ( s/.*xml=[\s\S]*?(<$tag>)/$1/ .. s/(<(\/)$tag>).*/$1/ ) {
                    $extractedXML .= $_;
                    last if $2;
                }
            }

As I understand it, if you use .. inside an if like this, it starts returning true as soon as the left hand side matches, and it stays true over multiple lines until the right side matches. This piece of code is expecting to find garbage on each side of the opening and closing tags, hence the s//, but I think this could be used in a plain match as well.

# ? Jul 16, 2012 20:47

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »