The Perl Short Questions Megathread: executable line noise

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »

Nevergirls: Jul 4, 2004; It's not right living this way, not letting others know what's true and what's false.

Rohaq posted:

Quick question, I'm using while(<>) to iterate through a file line by line at the moment, is there a quick way to remove the line being processed from the file at the end of the loop, without messing up the while loop? I'd like to process a line, then remove it from the file after it's done.

You can turn on in-place edit mode in the file, just like using -i on the command line:

code:

BEGIN { $^I = ".bak" }

while (<>) {
  print if /dick/;
}

Only the lines you tell perl to print will end up in the file.

# ? Dec 16, 2011 17:06

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 06:03

Mario Incandenza: Aug 24, 2000; Tell me, small fry, have you ever heard of the golden Triumph Forks?

wntd posted:

BEGIN { $^I = ".bak" }

Neat!

# ? Dec 19, 2011 00:36

Clanpot Shake: Aug 10, 2006; shake shake!

At my work we process very large XML files. They can be up to 1GB and are usually linarized or have only a couple line breaks. We use Java's STAX to process the document, going through and replacing placeholder attributes with their correct value, which takes 30-40 minutes for the largest files.

I want to replace this with a Perl script using regular expressions. I've gotten them working, but only if I first pretty print the XML, which takes 7-10 minutes to do. It's still a net gain, but it would be better if I had something that was just one step. Is there a Perl utility out there that handles XML faster than Java that I should look into?

# ? Dec 29, 2011 21:33

JawnV6: Jul 4, 2004; So hot ...

Clanpot Shake posted:

At my work we process very large XML files.

I want to replace this with a Perl script using regular expressions.

No. No you don't.

# ? Dec 29, 2011 22:03

Anaconda Rifle: Mar 23, 2007; Yam Slacker

You might want to try one of the many XML parsers on CPAN. Perl's own FAQ advises against using regular expressions with markup languages.

# ? Dec 29, 2011 22:09

Ursine Catastrophe: Nov 9, 2009; It's a lovely morning in the void and you are a horrible lady-in-waiting.

don't ask how i know; Dinosaur Gum

JawnV6 posted:

No. No you don't.

This has probably been posted in this thread, somewhere, but it still seems obligatory.

# ? Jan 1, 2012 10:59

uG: Apr 23, 2003; by Ralp

edit: n/m I figured it out as soon as I posted this

# ? Jan 2, 2012 18:03

Anaconda Rifle: Mar 23, 2007; Yam Slacker

uG posted:

edit: n/m I figured it out as soon as I posted this

Post it anyway. Maybe it'll help someone with the same problem.

# ? Jan 2, 2012 19:09

TiMBuS: Sep 25, 2007; LOL WUT?

Clanpot Shake posted:

At my work we process very large XML files. They can be up to 1GB and are usually linarized or have only a couple line breaks. We use Java's STAX to process the document, going through and replacing placeholder attributes with their correct value, which takes 30-40 minutes for the largest files.

I want to replace this with a Perl script using regular expressions. I've gotten them working, but only if I first pretty print the XML, which takes 7-10 minutes to do. It's still a net gain, but it would be better if I had something that was just one step. Is there a Perl utility out there that handles XML faster than Java that I should look into?

pfft, StAX is pretty fast dude. The only thing to compare to it is probably this: http://search.cpan.org/~codechild/XML-Bare-0.45/Bare.pm
See the benchmarks at the bottom.

If yu really want speed the best XML parser I can think of is the one in the D programming language. http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/, although I don't see you considering D too seriously, it is very comparable to java in syntax (its almost like a superset of java).
Note this is a sorta old benchmark, things might have changed and stax could be even faster?? In the end you can only try them all and find out for yourself.

e:whoops broken url

TiMBuS fucked around with this message at 05:15 on Jan 3, 2012

# ? Jan 3, 2012 04:06

Rohaq: Aug 11, 2006

TiMBuS posted:

pfft, StAX is pretty fast dude. The only thing to compare to it is probably this: http://search.cpan.org/~codechild/XML-Bare-0.45/Bare.pm
See the benchmarks at the bottom.

This is going to be super useful for a project I've got tomorrow, I was going to use XML::Simple, but that has a few niggles that make it annoying at times. Thanks!

# ? Jan 3, 2012 06:26

tef: May 30, 2004; -> some l-system crap ->

I've also heard of http://search.cpan.org/~mirod/XML-Twig-3.39/Twig.pm Xml::Twig

# ? Jan 3, 2012 08:39

uG: Apr 23, 2003; by Ralp

Anaconda Rifle posted:

Post it anyway. Maybe it'll help someone with the same problem.

I couldn't figure out why PAUSE was showing a module I just uploaded as 'Text' instead of the full namespace. I forgot it uses the uploaded archive's file name as the distro name.

# ? Jan 3, 2012 16:33

Clanpot Shake: Aug 10, 2006; shake shake!

OriginalPseudonym posted:

This has probably been posted in this thread, somewhere, but it still seems obligatory.

This is hilarious. In my defense, I'm looking for a very specific, known attribute - not trying to parse the entire document. Just looking for one distinct thing and replacing it (many times over). One of the answers in that link says it is sometimes acceptable to use regexs for a one time use of a known dataset, which is kind of what this is.

# ? Jan 4, 2012 17:23

Rohaq: Aug 11, 2006

Clanpot Shake posted:

This is hilarious. In my defense, I'm looking for a very specific, known attribute - not trying to parse the entire document. Just looking for one distinct thing and replacing it (many times over). One of the answers in that link says it is sometimes acceptable to use regexs for a one time use of a known dataset, which is kind of what this is.

You can't just do something like this?

code:

perl -pi -w -e 's/<tag>dataold<\/tag>/<tag>datanew<\/tag>/g' *.xml

# ? Jan 4, 2012 19:13

Clanpot Shake: Aug 10, 2006; shake shake!

Rohaq posted:

You can't just do something like this?
code:
perl -pi -w -e 's/<tag>dataold<\/tag>/<tag>datanew<\/tag>/g' *.xml

It's a bit more complicated. I'm counting things, and then replacing placeholder sequence numbers with actual sequence numbers. It requires 2 passes and some sort of structure to hold the sequence information. This can't be done at the time the XML is generated, unfortunately.

# ? Jan 4, 2012 21:14

qntm: Jun 17, 2009

Is there a way to read an arbitrary set of command line options into a hash using GetOpt::Long or something else? I want to be able to consume an arbitrary command line with any combination of arguments and return a hash containing all the arguments and values that were specified. Something like:

code:

perl opts.pl -a -b -c -c --d=e --f g

Resulting hash is

code:

(
 "a" => 1,
 "b" => 1,
 "c" => 2,
 "d" => "e",
 "f" => "g",
)

Then I can validate the hash myself.

# ? Jan 5, 2012 13:13

Mithaldu: Sep 25, 2007; Let's cuddle.

qntm posted:

Is there a way to read an arbitrary set of command line options into a hash using GetOpt::Long or something else? I want to be able to consume an arbitrary command line with any combination of arguments and return a hash containing all the arguments and values that were specified.

Sounds like you want: Getopt::Long::Descriptive

(When in doubt look on metacpan at the list of modules by rjbs, dagolden, mst.)

# ? Jan 5, 2012 13:22

qntm: Jun 17, 2009

Mithaldu posted:

Sounds like you want: Getopt::Long::Descriptive

(When in doubt look on metacpan at the list of modules by rjbs, dagolden, mst.)

I don't see how Getopt::Long::Descriptive can do what I need it to do. You still need to specify an explicit list of options.

qntm fucked around with this message at 13:37 on Jan 5, 2012

# ? Jan 5, 2012 13:35

Mithaldu: Sep 25, 2007; Let's cuddle.

qntm posted:

I don't see how Getopt::Long::Descriptive can do what I need it to do. You still need to specify an explicit list of options.

Ah, sorry, i misunderstood you. I'm not entirely sure that what you're asking has been done yet. Command line parameters aren't generally interpreted by having one generic parser for the "language" of options, but one that is built from the specs you configure it with.

Why exactly would you need something like this?

Maybe you can generate the specs automatically?

# ? Jan 5, 2012 14:05

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

Getopt::Whatever. I use it all the time for very short scripts that I don't expect anyone else to use.

There's also Getopt::Casual.

# ? Jan 5, 2012 14:15

prefect: Sep 11, 2001; No one, Woodhouse.
No one.; Dead Man’s Band

Filburt Shellbach posted:

Getopt::Whatever. I use it all the time for very short scripts that I don't expect anyone else to use.

There's also Getopt::Casual.

It's this kind of thing that makes me all wistful that I could get back to Perl. CPAN is the best ever.

# ? Jan 5, 2012 15:34

qntm: Jun 17, 2009

Mithaldu posted:

Ah, sorry, i misunderstood you. I'm not entirely sure that what you're asking has been done yet. Command line parameters aren't generally interpreted by having one generic parser for the "language" of options, but one that is built from the specs you configure it with.

Why exactly would you need something like this?

Maybe you can generate the specs automatically?

I'm building a command-line wrapper for a pre-existing Perl API. It's a bunch of .pl scripts. Each .pl script corresponds to a single API method. Each API method

(1) accepts its parameters in the form of a simple hash,
(2) has its own perfectly adequate procedures for validating input parameters and
(3) doesn't expose the list of acceptable parameters programmatically.

So obviously, the low-effort solution is to have each .pl script convert command line arguments into a hash, pass the hash to the API method, retrieve the results from the API and return the results to the user. The only difference between each script would be the name of the method to invoke.

It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request. Especially in Perl, which has the very common "sub blah { my %args = @_; ... }" idiom, which applies precisely the same concept to subroutines.

# ? Jan 5, 2012 16:00

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

qntm posted:

It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request.

It's not.

# ? Jan 5, 2012 16:29

Mithaldu: Sep 25, 2007; Let's cuddle.

qntm posted:

Thanks for explaining, that makes sense.

qntm posted:

It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request. Especially in Perl, which has the very common "sub blah { my %args = @_; ... }" idiom, which applies precisely the same concept to subroutines.

Well, consider this parameter string:

code:

--prefix -s

How would a generic parser decide which of the following hashes this is meant to be?

code:

{
    prefix => 1,
    s      => 1,
}

{
    prefix => '-s'
}

You can of course define a restricted set of parsing rules, but most GetOpt modules try to implement the whole range of options.

# ? Jan 5, 2012 17:14

RICHUNCLEPENNYBAGS: Dec 21, 2010

I'm just finishing up Learning Perl and I'm wondering where I should go next. I guess there is a follow-up in Intermediate Perl, but it has pretty mixed reviews. I also have copies lying around that I borrowed of Programming Perl and Advanced Perl Programming ed. 1, but the former seems more like a reference book than an instructional book and the latter has some interesting content but is rather obviously antiquated from leafing through.

I am also pretty new to any sort of programming or scripting beyond HTML or bash so if any volume would be informative w/r/t general principles of programming as well that'd be great.

# ? Jan 17, 2012 06:24

syphon: Jan 1, 2001

I really liked Intermediate Perl. In fact, I think one of the most useful things I've ever learned in Perl is a mastery of complex data structures with judicious use of Data:

umper (which that book covers).

# ? Jan 17, 2012 07:19

RICHUNCLEPENNYBAGS: Dec 21, 2010

Should I bother with the first-ed Panther book? I mean it's obviously out of date but on the other hand it's sitting right here on my desk. I guess if Intermediate Perl is much better I will just give in and go with that instead though.

# ? Jan 17, 2012 08:14

Mithaldu: Sep 25, 2007; Let's cuddle.

RICHUNCLEPENNYBAGS posted:

I'm wondering where I should go next.

You'll likely find something to your taste on http://perl-tutorial.org

# ? Jan 17, 2012 09:34

qntm: Jun 17, 2009

RICHUNCLEPENNYBAGS posted:

I'm just finishing up Learning Perl and I'm wondering where I should go next. I guess there is a follow-up in Intermediate Perl, but it has pretty mixed reviews. I also have copies lying around that I borrowed of Programming Perl and Advanced Perl Programming ed. 1, but the former seems more like a reference book than an instructional book and the latter has some interesting content but is rather obviously antiquated from leafing through.

I am also pretty new to any sort of programming or scripting beyond HTML or bash so if any volume would be informative w/r/t general principles of programming as well that'd be great.

I found Programming Perl close to useless, even as a reference. It's incredibly verbose and poorly structured, it spends far too much time attempting to explain features of Perl using long paragraphs of clever analogies instead of giving a single example line of code, it's full of bad, smug puns and it utterly fails to explain the important concepts with any clarity.

# ? Jan 17, 2012 14:06

MacGowans Teeth: Aug 13, 2003

qntm posted:

I found Programming Perl close to useless, even as a reference. It's incredibly verbose and poorly structured, it spends far too much time attempting to explain features of Perl using long paragraphs of clever analogies instead of giving a single example line of code, it's full of bad, smug puns and it utterly fails to explain the important concepts with any clarity.

I've only been using Perl for about a year, but I found Programming Perl useful. I haven't read it cover to cover, just the main sections, but every time I go back to it and browse, I find something interesting.

# ? Jan 17, 2012 16:11

Rohaq: Aug 11, 2006

Anybody here used the CGI module in Perl before?

I've loaded some info up using CGI::ReadParse:

code:

use CGI ':standard';
use CGI::Carp 'fatalsToBrowser'; 

our %input;
CGI::ReadParse(*input);

And want to load up multiple arguments with the same name, i.e.

code:

script.pl op=search_names search=dave search=bill

Which shows as the following once it's loaded in Perl:

code:

  DB<1> x %input
0  'op'
1  'search_names'
2  'search'
3  "dave\c@bill"

Obviously I could turn this into an array with

code:

@search_args = split('\c@',$input{search});

but I'm sure there's a better way of doing this with the CGI module.

Any suggestions?

# ? Jan 17, 2012 16:44

Mithaldu: Sep 25, 2007; Let's cuddle.

Rohaq posted:

Anybody here used the CGI module in Perl before?

Yes, it's terribly old and crap. You'll have more fun by using either Plack directly (and Plack::Request) or Web::Simple or Dancer.

Also, in the debugger you want to run: x \%inputs

# ? Jan 17, 2012 17:09

Rohaq: Aug 11, 2006

Mithaldu posted:

Yes, it's terribly old and crap. You'll have more fun by using either Plack directly (and Plack::Request) or Web::Simple or Dancer.

Also, in the debugger you want to run: x \%inputs

It doesn't look like Plack or Web::Simple do anything I need that CGI doesn't do. My script connects to a database via DBI and returns sections of HTML for my jQuery page to use through AJAX, so all my script needs to give a poo poo about is data incoming, pulling data from the database, then printing stuff out for jQuery to play with.

Also, x \%input just returns the following:

code:

0  HASH(0x383a840)
   'op' => "search"
   'search' => 'dave\c@bill'

Which doesn't solve the issue of splitting the search inputs, so it's not much help to me, sorry.

Using split('\c@',$inputs{search} for now, if anyone knows any better, please let me know.

# ? Jan 17, 2012 17:56

Mithaldu: Sep 25, 2007; Let's cuddle.

Rohaq posted:

It doesn't look like Plack or Web::Simple do anything I need that CGI doesn't do.

Look again:

search.cgi

code:

#!/usr/bin/env perl

use Web::Simple;

__PACKAGE__->run_if_script;

sub dispatch_request { '/ + %op=&@search=' => \&do_search }

sub html { [ 200, [ 'Content-type', 'text/plain' ], [@_] ] }

sub do_search {
    my ( $op, $search ) = @_;
    my $searches = join ', ', @{$search};
    return html( "Hello world! Today we are doing $op with: $searches" );
}

Mithaldu fucked around with this message at 19:07 on Jan 17, 2012

# ? Jan 17, 2012 19:03

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

Why are you touting Web::Simple instead of a real framework that has both users and developers?

# ? Jan 18, 2012 02:58

Mithaldu: Sep 25, 2007; Let's cuddle.

Filburt Shellbach posted:

Why are you touting Web::Simple instead of a real framework that has both users and developers?

Haha, tell that to mst and watch what happens. :smug:

Anyhow, reasoning: He's using CGI.pm and AJAX, meaning he'd not be too thrilled about having each call take a second minimum on top of the db time, when running with Catalyst. Dancer v1 would be the next possible, but thanks to its current internal structure i will not recommend it to anyone, as it's a liability. That leaves pure Plack as a last option, but the code for that wouldn't have been as short.

# ? Jan 18, 2012 09:17

Rohaq: Aug 11, 2006

I'm trying to read around Plack now; is it just me, or is it running as a service in the background, and then making calls to that service in order to avoid calling the interpreter every time the script is called?

I mean, I'd like to be able to do it; having a single DB connector to handle everything would be super, but at this moment, I don't have the time to request another service to be enabled on the server - this is a work project, and I have a couple of weeks to produce this - not enough time to get a change order put through.

Or maybe I'm just wildly misinterpreting Plack. I'm having some real trouble finding decent workable tutorials around.

# ? Jan 18, 2012 14:32

Mithaldu: Sep 25, 2007; Let's cuddle.

Rohaq posted:

I'm trying to read around Plack now; is it just me, or is it running as a service in the background, and then making calls to that service in order to avoid calling the interpreter every time the script is called?

I mean, I'd like to be able to do it; having a single DB connector to handle everything would be super, but at this moment, I don't have the time to request another service to be enabled on the server - this is a work project, and I have a couple of weeks to produce this - not enough time to get a change order put through.

Or maybe I'm just wildly misinterpreting Plack. I'm having some real trouble finding decent workable tutorials around.

Plack is a middleman. You can either use plackup (or some other Plack server) to start up a server on a free port that your web server can reverse proxy to. OR you you can use Plack::Handler::CGI to make a small script.cgi file that you can just run as you're used to in normal CGI mode. Both will be fast, just that the former will be fast.

Oh, also, the example with Web::Simple i gave above does automatic double duty. You can either go:

# plackup script.pl

And get a server, or you can just drop it in your web root as script.cgi and let your web server run it. It will automatically detect that it's being used as CGI and run only once.

# ? Jan 18, 2012 15:26

syphon: Jan 1, 2001

I'm sure people are gonna hate me for saying this... but I've used CGI + HTML::Template + CGI::Ajax for smaller webapps with great success before. I keep trying to get into Catalyst or Plack (this was before I knew about Dancer) and the added overhead just seemed tremendous for what I was trying to do.

# ? Jan 18, 2012 18:14

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 06:03

Ninja Rope: Oct 22, 2005; Wee.

syphon posted:

I'm sure people are gonna hate me for saying this... but I've used CGI + HTML::Template + CGI::Ajax for smaller webapps with great success before. I keep trying to get into Catalyst or Plack (this was before I knew about Dancer) and the added overhead just seemed tremendous for what I was trying to do.

Or CGI::Stateless, CGI::Session, CGI::Cookie, HTML::Template, and AnyEvent::FCGI? :whatup:

# ? Jan 18, 2012 22:31

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »