|
Rohaq posted:Quick question, I'm using while(<>) to iterate through a file line by line at the moment, is there a quick way to remove the line being processed from the file at the end of the loop, without messing up the while loop? I'd like to process a line, then remove it from the file after it's done. You can turn on in-place edit mode in the file, just like using -i on the command line: code:
|
# ? Dec 16, 2011 17:06 |
|
|
# ? May 21, 2024 06:03 |
|
wntd posted:BEGIN { $^I = ".bak" }
|
# ? Dec 19, 2011 00:36 |
|
At my work we process very large XML files. They can be up to 1GB and are usually linarized or have only a couple line breaks. We use Java's STAX to process the document, going through and replacing placeholder attributes with their correct value, which takes 30-40 minutes for the largest files. I want to replace this with a Perl script using regular expressions. I've gotten them working, but only if I first pretty print the XML, which takes 7-10 minutes to do. It's still a net gain, but it would be better if I had something that was just one step. Is there a Perl utility out there that handles XML faster than Java that I should look into?
|
# ? Dec 29, 2011 21:33 |
|
Clanpot Shake posted:At my work we process very large XML files. No. No you don't.
|
# ? Dec 29, 2011 22:03 |
|
You might want to try one of the many XML parsers on CPAN. Perl's own FAQ advises against using regular expressions with markup languages.
|
# ? Dec 29, 2011 22:09 |
|
JawnV6 posted:No. No you don't. This has probably been posted in this thread, somewhere, but it still seems obligatory.
|
# ? Jan 1, 2012 10:59 |
|
edit: n/m I figured it out as soon as I posted this
|
# ? Jan 2, 2012 18:03 |
|
uG posted:edit: n/m I figured it out as soon as I posted this Post it anyway. Maybe it'll help someone with the same problem.
|
# ? Jan 2, 2012 19:09 |
|
Clanpot Shake posted:At my work we process very large XML files. They can be up to 1GB and are usually linarized or have only a couple line breaks. We use Java's STAX to process the document, going through and replacing placeholder attributes with their correct value, which takes 30-40 minutes for the largest files. pfft, StAX is pretty fast dude. The only thing to compare to it is probably this: http://search.cpan.org/~codechild/XML-Bare-0.45/Bare.pm See the benchmarks at the bottom. If yu really want speed the best XML parser I can think of is the one in the D programming language. http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/, although I don't see you considering D too seriously, it is very comparable to java in syntax (its almost like a superset of java). Note this is a sorta old benchmark, things might have changed and stax could be even faster?? In the end you can only try them all and find out for yourself. e:whoops broken url TiMBuS fucked around with this message at 05:15 on Jan 3, 2012 |
# ? Jan 3, 2012 04:06 |
|
TiMBuS posted:pfft, StAX is pretty fast dude. The only thing to compare to it is probably this: http://search.cpan.org/~codechild/XML-Bare-0.45/Bare.pm This is going to be super useful for a project I've got tomorrow, I was going to use XML::Simple, but that has a few niggles that make it annoying at times. Thanks!
|
# ? Jan 3, 2012 06:26 |
|
I've also heard of http://search.cpan.org/~mirod/XML-Twig-3.39/Twig.pm Xml::Twig
|
# ? Jan 3, 2012 08:39 |
|
Anaconda Rifle posted:Post it anyway. Maybe it'll help someone with the same problem.
|
# ? Jan 3, 2012 16:33 |
|
OriginalPseudonym posted:This has probably been posted in this thread, somewhere, but it still seems obligatory.
|
# ? Jan 4, 2012 17:23 |
|
Clanpot Shake posted:This is hilarious. In my defense, I'm looking for a very specific, known attribute - not trying to parse the entire document. Just looking for one distinct thing and replacing it (many times over). One of the answers in that link says it is sometimes acceptable to use regexs for a one time use of a known dataset, which is kind of what this is. You can't just do something like this? code:
|
# ? Jan 4, 2012 19:13 |
|
Rohaq posted:You can't just do something like this?
|
# ? Jan 4, 2012 21:14 |
|
Is there a way to read an arbitrary set of command line options into a hash using GetOpt::Long or something else? I want to be able to consume an arbitrary command line with any combination of arguments and return a hash containing all the arguments and values that were specified. Something like:code:
code:
|
# ? Jan 5, 2012 13:13 |
|
qntm posted:Is there a way to read an arbitrary set of command line options into a hash using GetOpt::Long or something else? I want to be able to consume an arbitrary command line with any combination of arguments and return a hash containing all the arguments and values that were specified. Sounds like you want: Getopt::Long::Descriptive (When in doubt look on metacpan at the list of modules by rjbs, dagolden, mst.)
|
# ? Jan 5, 2012 13:22 |
|
Mithaldu posted:Sounds like you want: Getopt::Long::Descriptive I don't see how Getopt::Long::Descriptive can do what I need it to do. You still need to specify an explicit list of options. qntm fucked around with this message at 13:37 on Jan 5, 2012 |
# ? Jan 5, 2012 13:35 |
|
qntm posted:I don't see how Getopt::Long::Descriptive can do what I need it to do. You still need to specify an explicit list of options. Ah, sorry, i misunderstood you. I'm not entirely sure that what you're asking has been done yet. Command line parameters aren't generally interpreted by having one generic parser for the "language" of options, but one that is built from the specs you configure it with. Why exactly would you need something like this? Maybe you can generate the specs automatically?
|
# ? Jan 5, 2012 14:05 |
|
Getopt::Whatever. I use it all the time for very short scripts that I don't expect anyone else to use. There's also Getopt::Casual.
|
# ? Jan 5, 2012 14:15 |
|
Filburt Shellbach posted:Getopt::Whatever. I use it all the time for very short scripts that I don't expect anyone else to use. It's this kind of thing that makes me all wistful that I could get back to Perl. CPAN is the best ever.
|
# ? Jan 5, 2012 15:34 |
|
Mithaldu posted:Ah, sorry, i misunderstood you. I'm not entirely sure that what you're asking has been done yet. Command line parameters aren't generally interpreted by having one generic parser for the "language" of options, but one that is built from the specs you configure it with. I'm building a command-line wrapper for a pre-existing Perl API. It's a bunch of .pl scripts. Each .pl script corresponds to a single API method. Each API method (1) accepts its parameters in the form of a simple hash, (2) has its own perfectly adequate procedures for validating input parameters and (3) doesn't expose the list of acceptable parameters programmatically. So obviously, the low-effort solution is to have each .pl script convert command line arguments into a hash, pass the hash to the API method, retrieve the results from the API and return the results to the user. The only difference between each script would be the name of the method to invoke. It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request. Especially in Perl, which has the very common "sub blah { my %args = @_; ... }" idiom, which applies precisely the same concept to subroutines.
|
# ? Jan 5, 2012 16:00 |
|
qntm posted:It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request. It's not.
|
# ? Jan 5, 2012 16:29 |
|
qntm posted:qntm posted:It surprises me that "just give me all of the command line options in hash; I'll deal with them myself" is an unusual request. Especially in Perl, which has the very common "sub blah { my %args = @_; ... }" idiom, which applies precisely the same concept to subroutines. code:
code:
|
# ? Jan 5, 2012 17:14 |
|
I'm just finishing up Learning Perl and I'm wondering where I should go next. I guess there is a follow-up in Intermediate Perl, but it has pretty mixed reviews. I also have copies lying around that I borrowed of Programming Perl and Advanced Perl Programming ed. 1, but the former seems more like a reference book than an instructional book and the latter has some interesting content but is rather obviously antiquated from leafing through. I am also pretty new to any sort of programming or scripting beyond HTML or bash so if any volume would be informative w/r/t general principles of programming as well that'd be great.
|
# ? Jan 17, 2012 06:24 |
|
I really liked Intermediate Perl. In fact, I think one of the most useful things I've ever learned in Perl is a mastery of complex data structures with judicious use of Data:umper (which that book covers).
|
# ? Jan 17, 2012 07:19 |
|
Should I bother with the first-ed Panther book? I mean it's obviously out of date but on the other hand it's sitting right here on my desk. I guess if Intermediate Perl is much better I will just give in and go with that instead though.
|
# ? Jan 17, 2012 08:14 |
|
RICHUNCLEPENNYBAGS posted:I'm wondering where I should go next. You'll likely find something to your taste on http://perl-tutorial.org
|
# ? Jan 17, 2012 09:34 |
|
RICHUNCLEPENNYBAGS posted:I'm just finishing up Learning Perl and I'm wondering where I should go next. I guess there is a follow-up in Intermediate Perl, but it has pretty mixed reviews. I also have copies lying around that I borrowed of Programming Perl and Advanced Perl Programming ed. 1, but the former seems more like a reference book than an instructional book and the latter has some interesting content but is rather obviously antiquated from leafing through. I found Programming Perl close to useless, even as a reference. It's incredibly verbose and poorly structured, it spends far too much time attempting to explain features of Perl using long paragraphs of clever analogies instead of giving a single example line of code, it's full of bad, smug puns and it utterly fails to explain the important concepts with any clarity.
|
# ? Jan 17, 2012 14:06 |
|
qntm posted:I found Programming Perl close to useless, even as a reference. It's incredibly verbose and poorly structured, it spends far too much time attempting to explain features of Perl using long paragraphs of clever analogies instead of giving a single example line of code, it's full of bad, smug puns and it utterly fails to explain the important concepts with any clarity. I've only been using Perl for about a year, but I found Programming Perl useful. I haven't read it cover to cover, just the main sections, but every time I go back to it and browse, I find something interesting.
|
# ? Jan 17, 2012 16:11 |
|
Anybody here used the CGI module in Perl before? I've loaded some info up using CGI::ReadParse: code:
code:
code:
code:
Any suggestions?
|
# ? Jan 17, 2012 16:44 |
|
Rohaq posted:Anybody here used the CGI module in Perl before? Also, in the debugger you want to run: x \%inputs
|
# ? Jan 17, 2012 17:09 |
|
Mithaldu posted:Yes, it's terribly old and crap. You'll have more fun by using either Plack directly (and Plack::Request) or Web::Simple or Dancer. Also, x \%input just returns the following: code:
Using split('\c@',$inputs{search} for now, if anyone knows any better, please let me know.
|
# ? Jan 17, 2012 17:56 |
|
Rohaq posted:It doesn't look like Plack or Web::Simple do anything I need that CGI doesn't do. search.cgi code:
Mithaldu fucked around with this message at 19:07 on Jan 17, 2012 |
# ? Jan 17, 2012 19:03 |
|
Why are you touting Web::Simple instead of a real framework that has both users and developers?
|
# ? Jan 18, 2012 02:58 |
|
Filburt Shellbach posted:Why are you touting Web::Simple instead of a real framework that has both users and developers? Anyhow, reasoning: He's using CGI.pm and AJAX, meaning he'd not be too thrilled about having each call take a second minimum on top of the db time, when running with Catalyst. Dancer v1 would be the next possible, but thanks to its current internal structure i will not recommend it to anyone, as it's a liability. That leaves pure Plack as a last option, but the code for that wouldn't have been as short.
|
# ? Jan 18, 2012 09:17 |
|
I'm trying to read around Plack now; is it just me, or is it running as a service in the background, and then making calls to that service in order to avoid calling the interpreter every time the script is called? I mean, I'd like to be able to do it; having a single DB connector to handle everything would be super, but at this moment, I don't have the time to request another service to be enabled on the server - this is a work project, and I have a couple of weeks to produce this - not enough time to get a change order put through. Or maybe I'm just wildly misinterpreting Plack. I'm having some real trouble finding decent workable tutorials around.
|
# ? Jan 18, 2012 14:32 |
|
Rohaq posted:I'm trying to read around Plack now; is it just me, or is it running as a service in the background, and then making calls to that service in order to avoid calling the interpreter every time the script is called? Plack is a middleman. You can either use plackup (or some other Plack server) to start up a server on a free port that your web server can reverse proxy to. OR you you can use Plack::Handler::CGI to make a small script.cgi file that you can just run as you're used to in normal CGI mode. Both will be fast, just that the former will be fast. Oh, also, the example with Web::Simple i gave above does automatic double duty. You can either go: # plackup script.pl And get a server, or you can just drop it in your web root as script.cgi and let your web server run it. It will automatically detect that it's being used as CGI and run only once.
|
# ? Jan 18, 2012 15:26 |
|
I'm sure people are gonna hate me for saying this... but I've used CGI + HTML::Template + CGI::Ajax for smaller webapps with great success before. I keep trying to get into Catalyst or Plack (this was before I knew about Dancer) and the added overhead just seemed tremendous for what I was trying to do.
|
# ? Jan 18, 2012 18:14 |
|
|
# ? May 21, 2024 06:03 |
|
syphon posted:I'm sure people are gonna hate me for saying this... but I've used CGI + HTML::Template + CGI::Ajax for smaller webapps with great success before. I keep trying to get into Catalyst or Plack (this was before I knew about Dancer) and the added overhead just seemed tremendous for what I was trying to do. Or CGI::Stateless, CGI::Session, CGI::Cookie, HTML::Template, and AnyEvent::FCGI?
|
# ? Jan 18, 2012 22:31 |