Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
qntm
Jun 17, 2009
Perl Best Practices says to avoid subroutine prototypes and gives a fairly compelling example to demonstrate why.

Adbot
ADBOT LOVES YOU

Rohaq
Aug 11, 2006

Sang- posted:

perl's gc is "okay", it can't handle cyclic references at all - so if you have an array containing a bunch of reference, then add a reference to the current array, perl will never be able to collect it.

high cpu usage doesn't really suggest that though (from my experience at least), might want to look into devel::gladiator and a few others
Hm, it should just be a list of strings, if I recall rightly (I haven't worked on the script for almost a year), so it should be clearing the list from memory, unless of course the issues are coming from reading in the file input or Net::Stomp subscription.

I've not used Devel::Gladiator before, I'll give it a go and see if it yields any clues. Thanks!

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Sang- posted:

perl's gc is "okay", it can't handle cyclic references at all - so if you have an array containing a bunch of reference, then add a reference to the current array, perl will never be able to collect it.
That's not quite true. With Scalar::Util you can create weak references that aren't counted and will be garbage-collected even when it's cyclical.



qntm posted:

Perl Best Practices says to avoid subroutine prototypes and gives a fairly compelling example to demonstrate why.

What are you talking about? Prototypes are a great way to define routes in a web app! :haw:

Catalyst-proof
May 11, 2011

better waste some time with you
I'm interested in picking up a little Perl, mostly for web stuff and processing text and little scripty things. Does anyone know if Modern Perl is any good?

I understand Perl has a comprehensive perldoc system, are there things in there that are good tutorials?

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
Modern Perl is good

IIRC you're coming from a lispy background, so you probably won't learn too much in terms of fundamentals from Higher Order Perl, but you may find it fun down the line. In any case, it's free and the author is a certified Cool Dude

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

WHOIS John Galt posted:

I'm interested in picking up a little Perl, mostly for web stuff and processing text and little scripty things. Does anyone know if Modern Perl is any good?

I understand Perl has a comprehensive perldoc system, are there things in there that are good tutorials?

It's awesome and Higher Order Perl is awesome too. Your best resource right now for just plain figuring out where to start looking will be http://perl-tutorial.org

Filburt Shellbach
Nov 6, 2007

Apni tackat say tujay aaj mitta juu gaa!

gently caress this noise.

Mario Incandenza
Aug 24, 2000

Tell me, small fry, have you ever heard of the golden Triumph Forks?
5.18.0 is out, some cool new things there, might want to hold back on upgrading in production though as some CPAN modules are broken under the new version ATM due to the hash fixes made by Yves (which has caused some butthurt on p5p, sigh).

Filburt Shellbach
Nov 6, 2007

Apni tackat say tujay aaj mitta juu gaa!

Mario Incandenza posted:

5.18.0 is out, some cool new things there

I added the three new DTrace probes :smug:

toadee
Aug 16, 2003

North American Turtle Boy Love Association

So I have a problem that I hope isn't too dumb. Basically, I have a directory that gets about 100 files in it a day, each file somewhere between 17 and 50 megs in size (the files contain call records). I'd like to scan through these files very quickly, as quick as possible, for an arbitrary string.

I've found this code: http://cseweb.ucsd.edu/~sorourke/wf.pl written for the Widefinder project. It is indeed very, very fast. I go through a file in a little under 0.1 seconds in fact, so this would be good. Unfortunately, after jamming it into a subroutine called via a loop through this list of files, performance seems to bog down after 10 files or so. I'm sure this has to do with file IO issues that I don't even begin to understand, but I have this nagging feeling I'm going about this in a very wrong way, and I'm wondering if anyone here has tackled similar issues/has any suggestions?

Rohaq
Aug 11, 2006

toadee posted:

So I have a problem that I hope isn't too dumb. Basically, I have a directory that gets about 100 files in it a day, each file somewhere between 17 and 50 megs in size (the files contain call records). I'd like to scan through these files very quickly, as quick as possible, for an arbitrary string.

I've found this code: http://cseweb.ucsd.edu/~sorourke/wf.pl written for the Widefinder project. It is indeed very, very fast. I go through a file in a little under 0.1 seconds in fact, so this would be good. Unfortunately, after jamming it into a subroutine called via a loop through this list of files, performance seems to bog down after 10 files or so. I'm sure this has to do with file IO issues that I don't even begin to understand, but I have this nagging feeling I'm going about this in a very wrong way, and I'm wondering if anyone here has tackled similar issues/has any suggestions?
I wouldn't scan through them constantly, for a start. Even if you need something 'instantly', figure out how 'instant' it has to be to be acceptable, and run your script periodically.

If it does need to be instant, then I suggest hooking into the inotify service present in most Linux kernels to check for file operations, and only scanning files that have changed. There's a helpful module for that. It looks like there's even an equivalent for Windows, if that's your thing. Though this is only useful if file changes are periodic, otherwise you'll be constantly scanning anyway.

If it's a constant stream of logs, I'd look into hooking into the program doing the logging directly, rather than scanning its log files. Does it provide any kind of API?

toadee
Aug 16, 2003

North American Turtle Boy Love Association

Unfortunately, the files are generated constantly, at 15 minute intervals. Their point of origination (a VoIP switch) would only have the last 15 minutes of call records before the next batch is slurped, and in general, the searches that need to be performed would cover many such intervals while looking for calls.

Gazpacho
Jun 18, 2004

by Fluffdaddy
Slippery Tilde
The reference code doesn't unmap the file in the parent process after searching it. You probably should, as a matter of general hygiene and to make sure you aren't creating any leaks that kill your performance which would be my first guess.

I'd also recommend looking at the process list to see whether you are creating zombie processes that the parent needs to clean up.

Gazpacho fucked around with this message at 23:39 on May 21, 2013

het
Nov 14, 2002

A dark black past
is my most valued
possession

toadee posted:

Unfortunately, the files are generated constantly, at 15 minute intervals. Their point of origination (a VoIP switch) would only have the last 15 minutes of call records before the next batch is slurped, and in general, the searches that need to be performed would cover many such intervals while looking for calls.
How much of a difference does that particular code make compared to like a normal straightforward perl script? Is it faster than just grep? What do you need to do with the lines once you find them, just save them to a file?

JawnV6
Jul 4, 2004

So hot ...
I'm not really following what "constantly, at 15 minute intervals" means? At any given second could there be new files, or is it just at 15:04, 30:04 past the hour, etc.? Once a new file is present, will more data be appended to it or is it static and must be searched within 15 minutes before the source deletes it?

I think if you nail down the problem statement it'll help, though I understand if that's a little too much detail to give.

toadee
Aug 16, 2003

North American Turtle Boy Love Association

het posted:

How much of a difference does that particular code make compared to like a normal straightforward perl script? Is it faster than just grep? What do you need to do with the lines once you find them, just save them to a file?

Much faster than grep, grep takes about 3 seconds to search one of the files (depending on how large it is ie. how many call records are in it). fgrep is a good deal faster, about half a second on average, but the code above is about 5 times faster than fgrep at this. I'm gathering these to deliver to a CGI request, basically a tool for us to use in our NOC to query these call records quickly.

JawnV6 posted:

I'm not really following what "constantly, at 15 minute intervals" means? At any given second could there be new files, or is it just at 15:04, 30:04 past the hour, etc.? Once a new file is present, will more data be appended to it or is it static and must be searched within 15 minutes before the source deletes it?

Essentially, every 15 minutes a file is grabbed from several VoIP switches that contains a ;-separated list of call data records, these get stored in a directory for us to reference when there are reports of issues with calls (they contain info like trunk group selection, outgoing carrier selection, etc). Right now we've just been using command line tools to grep through them but I got the idea to take some spare time and hack together a perl CGI/jquery formatted web interface for the searches, as it's actually a hell of a lot easier to parse through a big list of these things in a nicely formatted table.

Gazpacho posted:

The reference code doesn't unmap the file in the parent process after searching it. You probably should, as a matter of general hygiene and to make sure you aren't creating any leaks that kill your performance which would be my first guess.

I'd also recommend looking at the process list to see whether you are creating zombie processes that the parent needs to clean up.

So I just tried unmapping after finishing up searching through $str and curiously it slows the whole thing way down. I'm guessing this is why the original didn't end up doing so as well. I do end up with defunct processes as observed mega-scientifically via staring at top while it runs, but I'm not sure how to avoid that?

het
Nov 14, 2002

A dark black past
is my most valued
possession

toadee posted:

So I just tried unmapping after finishing up searching through $str and curiously it slows the whole thing way down. I'm guessing this is why the original didn't end up doing so as well. I do end up with defunct processes as observed mega-scientifically via staring at top while it runs, but I'm not sure how to avoid that?
You're creating J processes for each run of that script, and when a process exits, its parent process gets the SIGCHLD signal. Until the parent calls wait() (or waitpid() or whatever), the process will be a zombie/defunct process.

Have you considered using a database for this? If we're talking about CDRs, it's already a tabular format, and I'm assuming you're searching on fields like TN or whatever, which a database could index to optimize searches.

toadee
Aug 16, 2003

North American Turtle Boy Love Association

het posted:

You're creating J processes for each run of that script, and when a process exits, its parent process gets the SIGCHLD signal. Until the parent calls wait() (or waitpid() or whatever), the process will be a zombie/defunct process.

Have you considered using a database for this? If we're talking about CDRs, it's already a tabular format, and I'm assuming you're searching on fields like TN or whatever, which a database could index to optimize searches.

I have considered a DB however I'm not sure how best to go about this, I suppose I could make an arbitrary cutoff point for how long I'll want to store CDR records, then just drop tables that are older than that cutoff date, but continually filling it with each days' CDR would get unwieldy pretty quickly. It's something I will look into further to be sure, but in the meantime if anyone has any theories/suggestions on how best to search through files as quickly as possible I'd like to hear them. I tried using Coro with Coro::Handle but at least using the same method I've done before for concurrent HTTP requests but it didn't produce very concurrent looking results and performance.

het
Nov 14, 2002

A dark black past
is my most valued
possession

toadee posted:

I have considered a DB however I'm not sure how best to go about this, I suppose I could make an arbitrary cutoff point for how long I'll want to store CDR records, then just drop tables that are older than that cutoff date, but continually filling it with each days' CDR would get unwieldy pretty quickly. It's something I will look into further to be sure, but in the meantime if anyone has any theories/suggestions on how best to search through files as quickly as possible I'd like to hear them. I tried using Coro with Coro::Handle but at least using the same method I've done before for concurrent HTTP requests but it didn't produce very concurrent looking results and performance.
The best way to repeatedly search through data as quickly as possible is to index it somehow so that searches aren't starting with a blank slate every time. If you don't refine your problem definition beyond "search for arbitrary data in arbitrary datasets", performance optimizations become difficult.

toadee
Aug 16, 2003

North American Turtle Boy Love Association

het posted:

The best way to repeatedly search through data as quickly as possible is to index it somehow so that searches aren't starting with a blank slate every time. If you don't refine your problem definition beyond "search for arbitrary data in arbitrary datasets", performance optimizations become difficult.

Well the problem really is 'search for an arbitrary user provided string among a list of 96 files in semicolon delimited format'. I was hoping for some way to say run several concurrent processes and get return data from each. I do understand that the absolute best and quickest way to do this is if they were all in a database beforehand, but as of this writing that's not possible and while I'm trying to make it so, I was hoping there would be a way to do this more quickly than simply doing what amounts to a serialized line by line regex search for patterns.

Rohaq
Aug 11, 2006

toadee posted:

Well the problem really is 'search for an arbitrary user provided string among a list of 96 files in semicolon delimited format'. I was hoping for some way to say run several concurrent processes and get return data from each. I do understand that the absolute best and quickest way to do this is if they were all in a database beforehand, but as of this writing that's not possible and while I'm trying to make it so, I was hoping there would be a way to do this more quickly than simply doing what amounts to a serialized line by line regex search for patterns.
If it's semicolon delimited, might I suggest using a split, and choosing the right entry from the resulting list? Regex is great at pulling out data from oddly structured strings (and making you look like a goddamned wizard in the process), but even the most optimised of expressions will rarely beat a split in terms of performance if you're extracting from a character delimited string. I had one case where I found an area of a script where a regex was being used where a split would suffice, and saw performance increase massively as a result.

Other than that, I decided to search CPAN to see if there were any modules that might help with your tailing of continuously updated files. Have you tried giving File::Tail a go?

Rohaq fucked around with this message at 17:28 on May 22, 2013

het
Nov 14, 2002

A dark black past
is my most valued
possession

Rohaq posted:

If it's semicolon delimited, might I suggest using a split, and choosing the right entry from the resulting list? Regex is great at pulling out data from oddly structured strings (and making you look like a goddamned wizard in the process), but even the most optimised of expressions will rarely beat a split in terms of performance if you're extracting from a character delimited string. I had one case where I found an area of a script where a regex was being used where a split would suffice, and saw performance increase massively as a result.
He needs to know which field to search in, and might want to find results in multiple columns, e.g. one column is source number, one is destination, and he might want to search on both.

quote:

Other than that, I decided to search CPAN to see if there were any modules that might help with your tailing of continuously updated files. Have you tried giving File::Tail a go?
File::Tail is handy but is honestly not very good for performance (also he's not talking about continuously updated files, I think they are discrete files that are delivered in 15 minute intervals).

Eeyo
Aug 29, 2004

Hey so I'm trying to write a lil script that just takes a bunch of numbers of energy deposited and sums them up based on a dependent variable which I read from a file. So I think it's working pretty much as I expect, but there's some bizarro rounding stuff going on I think. Here's what the file mostly looks like (for example):

code:
Event: 0 Energy .16
1.4342
3.4342
2.3423
...
Event: 1 Energy .16
1.4222
3.1111
...
Event: 0 Energy .17
4.1123
3.2133
...
So I have it go through and look at each line with a regex matching the Event: Energy: line and when that happens it captures the energy and puts all the energies in an array at the (energy * 100)th index. So yeah there's probably a better way to do it, but anyway on to the problem:

The code seems to skip the 29th (and some other one too) index. It just adds all of the entries into the 28th index instead. I've gone through with the debugger and stepped through it and confirmed that it has properly read in the index to be 29 (printing the variable gives me 29), but when the array is accessed with the variable, it just goes to 28 instead. Is there a way to force perl to round in a way which would eliminate this issue? Here's the code:

code:
#!/usr/bin/perl

my @energy;
my $fileName = $ARGV[0];
my $currentEnergy;

open(my $fileHandle, "<", $fileName) or die "Error: Cannot open file $fileName";

while(<$fileHandle>) {
        if($_ =~ /Event:\t(\S+)\tEnergy:\t(\S+)/) {
                $currentEnergy = $2 * 100;
        }
        else {
                if($_ < 5.0 && $_ > 1.0) {
                        $energy[$currentEnergy] += $_;
                }
        }
}
It's definitely where I try to access the $currentEnergy'th element of the array. I'll also entertain suggestions for a better tool.

uG
Apr 23, 2003

by Ralp
code:
#!/usr/bin/perl

my @energy;
my $fileName = $ARGV[0];
my $currentEnergy;

open(my $fileHandle, "<", $fileName) or die "Error: Cannot open file $fileName";

while(<$fileHandle>) {
        if($_ =~ /Event:\t(\S+)\tEnergy:\t(\S+)/) {
                # add your initial array entry?
                $energy[$currentEnergy] = $currentEnergy = $2 * 100;
        }
        else {
                if($_ < 5.0 && $_ > 1.0) {
                        $energy[$currentEnergy] += $_;
                }
        }
}
Seems like you want to set $energy[$currentEnergy] inside your if statement, but i'm still not completely sure what you mean by 29th iteration without example input.

Eeyo
Aug 29, 2004

Yeah I probably described it kinda vague. Essentially the whole file looks like what I posted, so it's got some lines describing an index and a bunch of numbers I want to add into that index. So ideally like this:
code:
   .
   .
   .
Index: 20
2.3324     #Bunch of floats
1.3434
5.6445
Index:21
1.3242
3.3423
0.4324
...
Index:28
2.1345
3.8888
1.0553
Index:29    #this part doesn't work
1.6627
4.2222
1.6233
   .
   .
   .
So it adds 2.3324 + 1.3434 + 5.6445 to array[20] and 1.3242 + 3.3423 + 0.4324 to array[21] and same with the entries under Index:28 and Index:29. The problem is that when it gets to the "Index:29" line, it reads the index properly as 29, but when I try to assign 1.6627 + 4.2222 + 1.6233 to array[29], it gives it to array[28] instead for whatever reason. Make sense? So I just need a way to force the code to round the Index:29 that it reads in better. In my specific code, I need to round the $currentEnergy variable, since it doesn't work right when I add numbers to energy[$currentEnergy] for certain values of $currentEnergy.

Rohaq
Aug 11, 2006

het posted:

He needs to know which field to search in, and might want to find results in multiple columns, e.g. one column is source number, one is destination, and he might want to search on both.
I'm pretty sure that this:
Perl code:
my @fields = split(/;/, $string);
if ( $fields[3] eq 'foo' || $fields[6] eq 'bar' ) {
    # Do stuff
}
...would still be faster than this though, and much easier to read:

Perl code:
if ( $string =~ /^(?:[^;]*;){3}foo;/ || $string =~ /^(?:[^;]*;){6}bar;/ ) {
    # Do stuff
}
EDIT: Amended regex to allow for empty fields.

Rohaq fucked around with this message at 04:52 on May 29, 2013

Polygynous
Dec 13, 2006
welp

Eeyo posted:

So it adds 2.3324 + 1.3434 + 5.6445 to array[20] and 1.3242 + 3.3423 + 0.4324 to array[21] and same with the entries under Index:28 and Index:29. The problem is that when it gets to the "Index:29" line, it reads the index properly as 29, but when I try to assign 1.6627 + 4.2222 + 1.6233 to array[29], it gives it to array[28] instead for whatever reason. Make sense? So I just need a way to force the code to round the Index:29 that it reads in better. In my specific code, I need to round the $currentEnergy variable, since it doesn't work right when I add numbers to energy[$currentEnergy] for certain values of $currentEnergy.

Would just ignoring the '.' in the regex work? Based on your initial example which had like "Energy .16" anyway. (Or use a hash or hack it to work some other way if the decimal point is significant, I guess.)

JawnV6
Jul 4, 2004

So hot ...

quote:

code:
while(<$fileHandle>) {
        if($_ =~ /Event:\t(\S+)\tEnergy:\t(\S+)/) {
		...
        }
        else {
                if($_ < 5.0 && $_ > 1.0) {
                        ...
                }
        }
}
When I hit goofy issues parsing a file, I generally add structure and rely less and less on builtins and they tend to go away. I don't like how you're assuming if it's not an "Event:" line you assume it's a float without any checks besides range.

Add an 'else' to the second if and see what's getting by both the regex and the value check.

qntm
Jun 17, 2009
E: what the hell is this Python snippet doing in the Perl Short Questions Megathread

qntm fucked around with this message at 15:59 on Jun 2, 2013

Eeyo
Aug 29, 2004

Thanks for all the advice! I pretty much ended up changing it so it's reading in whole numbers instead of multiplying .29 by 100 or something goofy like that. Seemed to fix it. And yeah I probably should check the input more, but I'm in control of the output from the simulation I'm running so I know it's getting passed the correct stuff. But I should check what it's reading more rigorously in the future. I made another version where I just stored it in a hash, is that good procedure? I mean I just put the "energy" as the key and added to its value. May be better than doing some goofy array nonsense and is more flexible for what input it can parse.

Mario Incandenza
Aug 24, 2000

Tell me, small fry, have you ever heard of the golden Triumph Forks?
Most Perl hackers use hashes for pretty much everything, they're way more flexible than arrays. I only use arrays when I specifically need them (i.e. a numerically indexed, ordered list).

If you're writing Perl, learn to love hashes.

Powered Descent
Jul 13, 2008

We haven't had that spirit here since 1969.

Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl? Because so far I feel like I'm scrambling to impose order on a chaotic mess.

Back in the day I spent several years as a C++ developer. I've now jumped headfirst into a new job as a Linux admin, and so I'm learning Perl as fast as I can. And by a lot of measures that's going very well -- I can write perfectly functional code that does what I need it to do. But my Perl code looks almost exactly like C, because I still think in C and then translate the syntax as necessary. Other people's code is enigmatic at best -- it's like figuring out a rebus. Anything that's described as "Perlish" seems designed to be as obfuscated and impenetrable as humanly possible.

Clearly I still don't "get" Perl. So much important stuff is left implicit and unstated. (I'm still completely :psyduck: at the very concept of a default variable. Why the hell would you DO that? I can't think of a single reason to ever use $_ in place of a real variable, and the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.) All this implicit stuff feels like building a house out of jello.

I'm clearly still missing an essential concept here. Can anyone recommend any likely paths to enlightenment, to that "aha", to that moment of grokking just what the hell Larry Wall was driving at? Any books, tutorials, psychoactive drugs, meditations upon mountaintops?

magimix
Dec 31, 2003

MY FAT WAIFU!!! :love:
She's fetish efficient :3:

Nap Ghost

Powered Descent posted:

Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl? Because so far I feel like I'm scrambling to impose order on a chaotic mess.

Back in the day I spent several years as a C++ developer. I've now jumped headfirst into a new job as a Linux admin, and so I'm learning Perl as fast as I can. And by a lot of measures that's going very well -- I can write perfectly functional code that does what I need it to do. But my Perl code looks almost exactly like C, because I still think in C and then translate the syntax as necessary. Other people's code is enigmatic at best -- it's like figuring out a rebus. Anything that's described as "Perlish" seems designed to be as obfuscated and impenetrable as humanly possible.

Clearly I still don't "get" Perl. So much important stuff is left implicit and unstated. (I'm still completely :psyduck: at the very concept of a default variable. Why the hell would you DO that? I can't think of a single reason to ever use $_ in place of a real variable, and the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.) All this implicit stuff feels like building a house out of jello.

I'm clearly still missing an essential concept here. Can anyone recommend any likely paths to enlightenment, to that "aha", to that moment of grokking just what the hell Larry Wall was driving at? Any books, tutorials, psychoactive drugs, meditations upon mountaintops?

Personally, I don't think you are missing enlightenment, as such. Like any language, you are using Perl to implement solutions to problems. Nothing about that precludes the writing of well structured, readable, maintainable code. One can exploit the strengths of the language without being intentionally terse or cryptic. That said, Perl being as permissive as it can be, you might sometimes have to wrestle with some real poo poo. With that in mind, try to learn the language fundamentals. Understand its types (especially if you are coming from a strongly-typed background), understand context, and key concepts. Beyond that, if you haven't already done so, bookmark http://perldoc.perl.org/. It'll help you more ably exploit Perl's strengths (that is to say, to 'think' in Perl, rather than mentally transliterate from what you'd do in C++), and also deconstruct much of the pointlessly terse and cryptic Perl that so many people seem to poop out[1].

That said, the various (http://perldoc.perl.org/perlvar.html) predefined variables in Perl do for the most part perform important duties - often their use is definitely necessary, or at the least recommended. But, by virtue of them being all predefined and what-not, they can be, and often are (in my opinion) used in scenarios where said use is not necessary or beneficial. To touch upon a specific perlvar you mention - in a 'foreach' my preference is to use a lexical as opposed to $_. However, when using things like 'map' and 'grep, use of $_ is expected and appropriate.

[1] Its a sore spot for me, I guess. Over the years I've reviewed a poo poo-load of code from a wide variety of people, and I've also had to pick apart and rebuild understanding of large, ancient, hairy legacy Perl systems made by people that left donkeys years ago.

magimix fucked around with this message at 10:28 on Jun 23, 2013

het
Nov 14, 2002

A dark black past
is my most valued
possession

Powered Descent posted:

Clearly I still don't "get" Perl. So much important stuff is left implicit and unstated. (I'm still completely :psyduck: at the very concept of a default variable. Why the hell would you DO that? I can't think of a single reason to ever use $_ in place of a real variable, and the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.)
I'm up too late to give a full answer, but this is a standard perl convention. Using default variables outside of one-liners or stuff like grep/map/postfix-for is not necessarily that common. Certainly it's not not-perlish to assign variables (though I'm not 100% certain if "my $foo = shift @_" doesn't different behavior than "my $foo = $_[0]", I suspect it might?)

edit: heh, beaten

qntm
Jun 17, 2009

Powered Descent posted:

Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl? Because so far I feel like I'm scrambling to impose order on a chaotic mess.

Back in the day I spent several years as a C++ developer. I've now jumped headfirst into a new job as a Linux admin, and so I'm learning Perl as fast as I can. And by a lot of measures that's going very well -- I can write perfectly functional code that does what I need it to do. But my Perl code looks almost exactly like C, because I still think in C and then translate the syntax as necessary. Other people's code is enigmatic at best -- it's like figuring out a rebus. Anything that's described as "Perlish" seems designed to be as obfuscated and impenetrable as humanly possible.

Clearly I still don't "get" Perl. So much important stuff is left implicit and unstated. (I'm still completely :psyduck: at the very concept of a default variable. Why the hell would you DO that? I can't think of a single reason to ever use $_ in place of a real variable, and the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.) All this implicit stuff feels like building a house out of jello.

I'm clearly still missing an essential concept here. Can anyone recommend any likely paths to enlightenment, to that "aha", to that moment of grokking just what the hell Larry Wall was driving at? Any books, tutorials, psychoactive drugs, meditations upon mountaintops?

Perl is a horrible programming language, built out of gotchas and idiotic design decisions. It's possible to write good code in almost any programming language and it's possible to write bad code in every programming language, and it's definitely possible to write C in Perl, but Perl's central philosophy "There's More Than One Way To Do It" actively encourages users to avoid best practice and to make full use of Perl's idiotic idioms. Your observation that "Perlish" seems to be the same thing as "obfuscated and impenetrable" is correct. Perhaps the following meditation will help: Larry Wall isn't all that.

You are correct that leaving things implicit and unstated in one's code is a bad thing, and that Perl has many features which encourage you to do this. As for the uses of $_ specifically, it's intended to let you turn something like this:

Perl code:
foreach my $elem (@array) {
    print $elem;
}
into

Perl code:
foreach (@array) {
    print $_;
}
Since print operates on $_ by default, this can become:

Perl code:
foreach (@array) {
    print;
}
and finally:

Perl code:
print foreach @array;
...and now you know how it's done, I think you will understand why you shouldn't do it.

There is a book, Perl Best Practices, which you may find illuminating. It explains and justifies which practices are good and which are bad.

Gazpacho
Jun 18, 2004

by Fluffdaddy
Slippery Tilde
A default variable is useful for programs that operate only, or primarily, on one variable such as the current line of input. Less redundancy and less typing.

I'm of the opinion that no implicit semantics should be used in any Perl program that is stored in its own file. It's hostile to other programmers and often enough it's hostile to yourself. However I have no reservations about using them on the perl command line.

Powered Descent posted:

Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl?
With regard to this particular problem, the way to "get it" is to be put in a situation where you have to write a lot of Unix shell scripts, until you find yourself pining for a more integrated language that doesn't sacrifice the useful features of the scripting toolset.

Gazpacho fucked around with this message at 11:31 on Jun 23, 2013

magimix
Dec 31, 2003

MY FAT WAIFU!!! :love:
She's fetish efficient :3:

Nap Ghost

qntm posted:

Perl code:
print foreach @array;
...and now you know how it's done, I think you will understand why you shouldn't do it.

There is a book, Perl Best Practices, which you may find illuminating. It explains and justifies which practices are good and which are bad.

Your code is too verbose! :argh:

Perl code:
print @array;
:colbert:

Actually I'm just replying to get behind your mention of Perl Best Practices, and to say something I forgot to in my original post - Powered Descent, know that Larry Wall's "Programming Perl" is a poo poo book. It is a poor language reference, and I regard it as a *terrible* book for anyone trying to learn the language, let alone learn to use it well. 'Well' in this context having 'readability', 'clarity', and 'maintainability' as core requirements. (Also the style of the book rubs me up the wrong way. It's like the technical-book version of an aging hipster. I guess that is the least of its problems though.)

qntm
Jun 17, 2009

Powered Descent posted:

the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.)

het posted:

Certainly it's not not-perlish to assign variables (though I'm not 100% certain if "my $foo = shift @_" doesn't different behavior than "my $foo = $_[0]", I suspect it might?)


Unpacking @_ is best practice because Perl calls by reference.

Perl code:
my $x = "red";

sub modify {
    $_[0] = "blue";
}

modify($x)
print $x; # "blue"
You should always unpack @_ to protect yourself from this behaviour, which is very unexpected, because almost nobody ever sees or uses it, because everybody almost always unpacks @_.

Perl code:
sub modify {
    my $foo = $_[0];
    # now you can modify $foo freely
}

qntm
Jun 17, 2009

magimix posted:

Larry Wall's "Programming Perl" is a poo poo book. It is a poor language reference, and I regard it as a *terrible* book for anyone trying to learn the language, let alone learn to use it well. 'Well' in this context having 'readability', 'clarity', and 'maintainability' as core requirements. (Also the style of the book rubs me up the wrong way. It's like the technical-book version of an aging hipster. I guess that is the least of its problems though.)

This is absolutely true as well. If you're wondering whether Larry Wall has his head screwed on this is the book you need to look at. It's about twice as long as it should be because he spends pages and pages explaining Perl concepts using cute metaphors and not-very-funny jokes instead of letting example code do the talking.

Adbot
ADBOT LOVES YOU

welcome to hell
Jun 9, 2006
The only time you should use $_ in perl is in a grep or map block, or in a one-liner. Any other time is better served by using a named variable. Similarly, @_ (or $_[x]) should only ever appear at the top of subs when you unpack the parameters. Anyone writing reasonable code will avoid the things left implicit.

Unfortunately perl has plenty of warts (like everything in perlvar) and always gives you plenty of rope to hang yourself with. And plenty of people will take advantage of that either through ignorance or excess cleverness to make really terrible code.

Modern Perl is a pretty good book for learning perl that focuses mainly on the parts of the language you'd want to use. Not sure how useful it is for C++ programmers or people dealing with existing code bases.

About half of Perl Best Practices is pretty good, but at least half of the modules suggested in it are terrible. Damian Conway falls firmly in the "excess cleverness" category.

  • Locked thread