Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Nevergirls
Jul 4, 2004

It's not right living this way, not letting others know what's true and what's false.

Rohaq posted:

Quick question, I'm using while(<>) to iterate through a file line by line at the moment, is there a quick way to remove the line being processed from the file at the end of the loop, without messing up the while loop? I'd like to process a line, then remove it from the file after it's done.

You can turn on in-place edit mode in the file, just like using -i on the command line:

code:
BEGIN { $^I = ".bak" }

while (<>) {
  print if /dick/;
}
Only the lines you tell perl to print will end up in the file.

Adbot
ADBOT LOVES YOU

Mario Incandenza
Aug 24, 2000

Tell me, small fry, have you ever heard of the golden Triumph Forks?

wntd posted:

BEGIN { $^I = ".bak" }
Neat!

Clanpot Shake
Aug 10, 2006
shake shake!

At my work we process very large XML files. They can be up to 1GB and are usually linarized or have only a couple line breaks. We use Java's STAX to process the document, going through and replacing placeholder attributes with their correct value, which takes 30-40 minutes for the largest files.

I want to replace this with a Perl script using regular expressions. I've gotten them working, but only if I first pretty print the XML, which takes 7-10 minutes to do. It's still a net gain, but it would be better if I had something that was just one step. Is there a Perl utility out there that handles XML faster than Java that I should look into?

JawnV6
Jul 4, 2004

So hot ...

Clanpot Shake posted:

At my work we process very large XML files.

I want to replace this with a Perl script using regular expressions.

No. No you don't.

Anaconda Rifle
Mar 23, 2007

Yam Slacker
You might want to try one of the many XML parsers on CPAN. Perl's own FAQ advises against using regular expressions with markup languages.

Ursine Catastrophe
Nov 9, 2009

It's a lovely morning in the void and you are a horrible lady-in-waiting.



don't ask how i know

Dinosaur Gum

JawnV6 posted:

No. No you don't.

This has probably been posted in this thread, somewhere, but it still seems obligatory.

uG
Apr 23, 2003

by Ralp
edit: n/m I figured it out as soon as I posted this

Anaconda Rifle
Mar 23, 2007

Yam Slacker

uG posted:

edit: n/m I figured it out as soon as I posted this

Post it anyway. Maybe it'll help someone with the same problem.

TiMBuS
Sep 25, 2007

LOL WUT?

Clanpot Shake posted:

At my work we process very large XML files. They can be up to 1GB and are usually linarized or have only a couple line breaks. We use Java's STAX to process the document, going through and replacing placeholder attributes with their correct value, which takes 30-40 minutes for the largest files.

I want to replace this with a Perl script using regular expressions. I've gotten them working, but only if I first pretty print the XML, which takes 7-10 minutes to do. It's still a net gain, but it would be better if I had something that was just one step. Is there a Perl utility out there that handles XML faster than Java that I should look into?

pfft, StAX is pretty fast dude. The only thing to compare to it is probably this: http://search.cpan.org/~codechild/XML-Bare-0.45/Bare.pm
See the benchmarks at the bottom.

If yu really want speed the best XML parser I can think of is the one in the D programming language. http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/, although I don't see you considering D too seriously, it is very comparable to java in syntax (its almost like a superset of java).
Note this is a sorta old benchmark, things might have changed and stax could be even faster?? In the end you can only try them all and find out for yourself.

e:whoops broken url

TiMBuS fucked around with this message at 05:15 on Jan 3, 2012

Rohaq
Aug 11, 2006

TiMBuS posted:

pfft, StAX is pretty fast dude. The only thing to compare to it is probably this: http://search.cpan.org/~codechild/XML-Bare-0.45/Bare.pm
See the benchmarks at the bottom.

This is going to be super useful for a project I've got tomorrow, I was going to use XML::Simple, but that has a few niggles that make it annoying at times. Thanks!

tef
May 30, 2004

-> some l-system crap ->
I've also heard of http://search.cpan.org/~mirod/XML-Twig-3.39/Twig.pm Xml::Twig

uG
Apr 23, 2003

by Ralp

Anaconda Rifle posted:

Post it anyway. Maybe it'll help someone with the same problem.
I couldn't figure out why PAUSE was showing a module I just uploaded as 'Text' instead of the full namespace. I forgot it uses the uploaded archive's file name as the distro name.

Clanpot Shake
Aug 10, 2006
shake shake!

OriginalPseudonym posted:

This has probably been posted in this thread, somewhere, but it still seems obligatory.
This is hilarious. In my defense, I'm looking for a very specific, known attribute - not trying to parse the entire document. Just looking for one distinct thing and replacing it (many times over). One of the answers in that link says it is sometimes acceptable to use regexs for a one time use of a known dataset, which is kind of what this is.

Rohaq
Aug 11, 2006

Clanpot Shake posted:

This is hilarious. In my defense, I'm looking for a very specific, known attribute - not trying to parse the entire document. Just looking for one distinct thing and replacing it (many times over). One of the answers in that link says it is sometimes acceptable to use regexs for a one time use of a known dataset, which is kind of what this is.

You can't just do something like this?
code:
perl -pi -w -e 's/<tag>dataold<\/tag>/<tag>datanew<\/tag>/g' *.xml

Clanpot Shake
Aug 10, 2006
shake shake!

Rohaq posted:

You can't just do something like this?
code:
perl -pi -w -e 's/<tag>dataold<\/tag>/<tag>datanew<\/tag>/g' *.xml
It's a bit more complicated. I'm counting things, and then replacing placeholder sequence numbers with actual sequence numbers. It requires 2 passes and some sort of structure to hold the sequence information. This can't be done at the time the XML is generated, unfortunately.

qntm
Jun 17, 2009
Is there a way to read an arbitrary set of command line options into a hash using GetOpt::Long or something else? I want to be able to consume an arbitrary command line with any combination of arguments and return a hash containing all the arguments and values that were specified. Something like:

code:
perl opts.pl -a -b -c -c --d=e --f g
Resulting hash is

code:
(
 "a" => 1,
 "b" => 1,
 "c" => 2,
 "d" => "e",
 "f" => "g",
)
Then I can validate the hash myself.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

qntm posted:

Is there a way to read an arbitrary set of command line options into a hash using GetOpt::Long or something else? I want to be able to consume an arbitrary command line with any combination of arguments and return a hash containing all the arguments and values that were specified.

Sounds like you want: Getopt::Long::Descriptive

(When in doubt look on metacpan at the list of modules by rjbs, dagolden, mst.)

qntm
Jun 17, 2009

Mithaldu posted:

Sounds like you want: Getopt::Long::Descriptive

(When in doubt look on metacpan at the list of modules by rjbs, dagolden, mst.)

I don't see how Getopt::Long::Descriptive can do what I need it to do. You still need to specify an explicit list of options.

qntm fucked around with this message at 13:37 on Jan 5, 2012

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

qntm posted:

I don't see how Getopt::Long::Descriptive can do what I need it to do. You still need to specify an explicit list of options.

Ah, sorry, i misunderstood you. I'm not entirely sure that what you're asking has been done yet. Command line parameters aren't generally interpreted by having one generic parser for the "language" of options, but one that is built from the specs you configure it with.

Why exactly would you need something like this?

Maybe you can generate the specs automatically?

Filburt Shellbach
Nov 6, 2007

Apni tackat say tujay aaj mitta juu gaa!
Getopt::Whatever. I use it all the time for very short scripts that I don't expect anyone else to use.

There's also Getopt::Casual.

prefect
Sep 11, 2001

No one, Woodhouse.
No one.




Dead Man’s Band

Filburt Shellbach posted:

Getopt::Whatever. I use it all the time for very short scripts that I don't expect anyone else to use.

There's also Getopt::Casual.

It's this kind of thing that makes me all wistful that I could get back to Perl. CPAN is the best ever.

qntm
Jun 17, 2009

Mithaldu posted:

Ah, sorry, i misunderstood you. I'm not entirely sure that what you're asking has been done yet. Command line parameters aren't generally interpreted by having one generic parser for the "language" of options, but one that is built from the specs you configure it with.

Why exactly would you need something like this?

Maybe you can generate the specs automatically?

I'm building a command-line wrapper for a pre-existing Perl API. It's a bunch of .pl scripts. Each .pl script corresponds to a single API method. Each API method

(1) accepts its parameters in the form of a simple hash,
(2) has its own perfectly adequate procedures for validating input parameters and
(3) doesn't expose the list of acceptable parameters programmatically.

So obviously, the low-effort solution is to have each .pl script convert command line arguments into a hash, pass the hash to the API method, retrieve the results from the API and return the results to the user. The only difference between each script would be the name of the method to invoke.

It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request. Especially in Perl, which has the very common "sub blah { my %args = @_; ... }" idiom, which applies precisely the same concept to subroutines.

Filburt Shellbach
Nov 6, 2007

Apni tackat say tujay aaj mitta juu gaa!

qntm posted:

It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request.

It's not.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

qntm posted:

:words:
Thanks for explaining, that makes sense.

qntm posted:

It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request. Especially in Perl, which has the very common "sub blah { my %args = @_; ... }" idiom, which applies precisely the same concept to subroutines.
Well, consider this parameter string:
code:
--prefix -s
How would a generic parser decide which of the following hashes this is meant to be?
code:
{
    prefix => 1,
    s      => 1,
}

{
    prefix => '-s'
}
You can of course define a restricted set of parsing rules, but most GetOpt modules try to implement the whole range of options.

RICHUNCLEPENNYBAGS
Dec 21, 2010
I'm just finishing up Learning Perl and I'm wondering where I should go next. I guess there is a follow-up in Intermediate Perl, but it has pretty mixed reviews. I also have copies lying around that I borrowed of Programming Perl and Advanced Perl Programming ed. 1, but the former seems more like a reference book than an instructional book and the latter has some interesting content but is rather obviously antiquated from leafing through.

I am also pretty new to any sort of programming or scripting beyond HTML or bash so if any volume would be informative w/r/t general principles of programming as well that'd be great.

syphon
Jan 1, 2001
I really liked Intermediate Perl. In fact, I think one of the most useful things I've ever learned in Perl is a mastery of complex data structures with judicious use of Data::Dumper (which that book covers).

RICHUNCLEPENNYBAGS
Dec 21, 2010
Should I bother with the first-ed Panther book? I mean it's obviously out of date but on the other hand it's sitting right here on my desk. I guess if Intermediate Perl is much better I will just give in and go with that instead though.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

RICHUNCLEPENNYBAGS posted:

I'm wondering where I should go next.

You'll likely find something to your taste on http://perl-tutorial.org :)

qntm
Jun 17, 2009

RICHUNCLEPENNYBAGS posted:

I'm just finishing up Learning Perl and I'm wondering where I should go next. I guess there is a follow-up in Intermediate Perl, but it has pretty mixed reviews. I also have copies lying around that I borrowed of Programming Perl and Advanced Perl Programming ed. 1, but the former seems more like a reference book than an instructional book and the latter has some interesting content but is rather obviously antiquated from leafing through.

I am also pretty new to any sort of programming or scripting beyond HTML or bash so if any volume would be informative w/r/t general principles of programming as well that'd be great.

I found Programming Perl close to useless, even as a reference. It's incredibly verbose and poorly structured, it spends far too much time attempting to explain features of Perl using long paragraphs of clever analogies instead of giving a single example line of code, it's full of bad, smug puns and it utterly fails to explain the important concepts with any clarity.

MacGowans Teeth
Aug 13, 2003

qntm posted:

I found Programming Perl close to useless, even as a reference. It's incredibly verbose and poorly structured, it spends far too much time attempting to explain features of Perl using long paragraphs of clever analogies instead of giving a single example line of code, it's full of bad, smug puns and it utterly fails to explain the important concepts with any clarity.

I've only been using Perl for about a year, but I found Programming Perl useful. I haven't read it cover to cover, just the main sections, but every time I go back to it and browse, I find something interesting.

Rohaq
Aug 11, 2006
Anybody here used the CGI module in Perl before?

I've loaded some info up using CGI::ReadParse:
code:
use CGI ':standard';
use CGI::Carp 'fatalsToBrowser'; 

our %input;
CGI::ReadParse(*input);
And want to load up multiple arguments with the same name, i.e.
code:
script.pl op=search_names search=dave search=bill
Which shows as the following once it's loaded in Perl:
code:
  DB<1> x %input
0  'op'
1  'search_names'
2  'search'
3  "dave\c@bill"
Obviously I could turn this into an array with
code:
@search_args = split('\c@',$input{search});
but I'm sure there's a better way of doing this with the CGI module.

Any suggestions?

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Rohaq posted:

Anybody here used the CGI module in Perl before?
Yes, it's terribly old and crap. You'll have more fun by using either Plack directly (and Plack::Request) or Web::Simple or Dancer. :)

Also, in the debugger you want to run: x \%inputs

Rohaq
Aug 11, 2006

Mithaldu posted:

Yes, it's terribly old and crap. You'll have more fun by using either Plack directly (and Plack::Request) or Web::Simple or Dancer. :)

Also, in the debugger you want to run: x \%inputs
It doesn't look like Plack or Web::Simple do anything I need that CGI doesn't do. My script connects to a database via DBI and returns sections of HTML for my jQuery page to use through AJAX, so all my script needs to give a poo poo about is data incoming, pulling data from the database, then printing stuff out for jQuery to play with.

Also, x \%input just returns the following:
code:
0  HASH(0x383a840)
   'op' => "search"
   'search' => 'dave\c@bill'
Which doesn't solve the issue of splitting the search inputs, so it's not much help to me, sorry.

Using split('\c@',$inputs{search} for now, if anyone knows any better, please let me know.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Rohaq posted:

It doesn't look like Plack or Web::Simple do anything I need that CGI doesn't do.
Look again:

search.cgi
code:
#!/usr/bin/env perl

use Web::Simple;

__PACKAGE__->run_if_script;

sub dispatch_request { '/ + %op=&@search=' => \&do_search }

sub html { [ 200, [ 'Content-type', 'text/plain' ], [@_] ] }

sub do_search {
    my ( $op, $search ) = @_;
    my $searches = join ', ', @{$search};
    return html( "Hello world! Today we are doing $op with: $searches" );
}

Mithaldu fucked around with this message at 19:07 on Jan 17, 2012

Filburt Shellbach
Nov 6, 2007

Apni tackat say tujay aaj mitta juu gaa!
Why are you touting Web::Simple instead of a real framework that has both users and developers?

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Filburt Shellbach posted:

Why are you touting Web::Simple instead of a real framework that has both users and developers?
Haha, tell that to mst and watch what happens. :smug:

Anyhow, reasoning: He's using CGI.pm and AJAX, meaning he'd not be too thrilled about having each call take a second minimum on top of the db time, when running with Catalyst. Dancer v1 would be the next possible, but thanks to its current internal structure i will not recommend it to anyone, as it's a liability. That leaves pure Plack as a last option, but the code for that wouldn't have been as short.

Rohaq
Aug 11, 2006
I'm trying to read around Plack now; is it just me, or is it running as a service in the background, and then making calls to that service in order to avoid calling the interpreter every time the script is called?

I mean, I'd like to be able to do it; having a single DB connector to handle everything would be super, but at this moment, I don't have the time to request another service to be enabled on the server - this is a work project, and I have a couple of weeks to produce this - not enough time to get a change order put through.

Or maybe I'm just wildly misinterpreting Plack. I'm having some real trouble finding decent workable tutorials around.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Rohaq posted:

I'm trying to read around Plack now; is it just me, or is it running as a service in the background, and then making calls to that service in order to avoid calling the interpreter every time the script is called?

I mean, I'd like to be able to do it; having a single DB connector to handle everything would be super, but at this moment, I don't have the time to request another service to be enabled on the server - this is a work project, and I have a couple of weeks to produce this - not enough time to get a change order put through.

Or maybe I'm just wildly misinterpreting Plack. I'm having some real trouble finding decent workable tutorials around.

Plack is a middleman. You can either use plackup (or some other Plack server) to start up a server on a free port that your web server can reverse proxy to. OR you you can use Plack::Handler::CGI to make a small script.cgi file that you can just run as you're used to in normal CGI mode. Both will be fast, just that the former will be fast.

Oh, also, the example with Web::Simple i gave above does automatic double duty. You can either go:

# plackup script.pl

And get a server, or you can just drop it in your web root as script.cgi and let your web server run it. It will automatically detect that it's being used as CGI and run only once.

syphon
Jan 1, 2001
I'm sure people are gonna hate me for saying this... but I've used CGI + HTML::Template + CGI::Ajax for smaller webapps with great success before. I keep trying to get into Catalyst or Plack (this was before I knew about Dancer) and the added overhead just seemed tremendous for what I was trying to do.

Adbot
ADBOT LOVES YOU

Ninja Rope
Oct 22, 2005

Wee.

syphon posted:

I'm sure people are gonna hate me for saying this... but I've used CGI + HTML::Template + CGI::Ajax for smaller webapps with great success before. I keep trying to get into Catalyst or Plack (this was before I knew about Dancer) and the added overhead just seemed tremendous for what I was trying to do.

Or CGI::Stateless, CGI::Session, CGI::Cookie, HTML::Template, and AnyEvent::FCGI? :whatup:

  • Locked thread