|
Perl Best Practices says to avoid subroutine prototypes and gives a fairly compelling example to demonstrate why.
|
# ? Jan 13, 2013 23:58 |
|
|
# ? May 17, 2024 17:30 |
|
Sang- posted:perl's gc is "okay", it can't handle cyclic references at all - so if you have an array containing a bunch of reference, then add a reference to the current array, perl will never be able to collect it. I've not used Devel::Gladiator before, I'll give it a go and see if it yields any clues. Thanks!
|
# ? Jan 19, 2013 15:14 |
|
Sang- posted:perl's gc is "okay", it can't handle cyclic references at all - so if you have an array containing a bunch of reference, then add a reference to the current array, perl will never be able to collect it. qntm posted:Perl Best Practices says to avoid subroutine prototypes and gives a fairly compelling example to demonstrate why. What are you talking about? Prototypes are a great way to define routes in a web app!
|
# ? Jan 20, 2013 06:16 |
|
I'm interested in picking up a little Perl, mostly for web stuff and processing text and little scripty things. Does anyone know if Modern Perl is any good? I understand Perl has a comprehensive perldoc system, are there things in there that are good tutorials?
|
# ? Jan 21, 2013 22:44 |
|
Modern Perl is good IIRC you're coming from a lispy background, so you probably won't learn too much in terms of fundamentals from Higher Order Perl, but you may find it fun down the line. In any case, it's free and the author is a certified Cool Dude
|
# ? Jan 21, 2013 23:00 |
|
WHOIS John Galt posted:I'm interested in picking up a little Perl, mostly for web stuff and processing text and little scripty things. Does anyone know if Modern Perl is any good? It's awesome and Higher Order Perl is awesome too. Your best resource right now for just plain figuring out where to start looking will be http://perl-tutorial.org
|
# ? Jan 23, 2013 22:29 |
|
gently caress this noise.
|
# ? Jan 24, 2013 03:34 |
|
5.18.0 is out, some cool new things there, might want to hold back on upgrading in production though as some CPAN modules are broken under the new version ATM due to the hash fixes made by Yves (which has caused some butthurt on p5p, sigh).
|
# ? May 20, 2013 21:18 |
|
Mario Incandenza posted:5.18.0 is out, some cool new things there I added the three new DTrace probes
|
# ? May 21, 2013 03:14 |
|
So I have a problem that I hope isn't too dumb. Basically, I have a directory that gets about 100 files in it a day, each file somewhere between 17 and 50 megs in size (the files contain call records). I'd like to scan through these files very quickly, as quick as possible, for an arbitrary string. I've found this code: http://cseweb.ucsd.edu/~sorourke/wf.pl written for the Widefinder project. It is indeed very, very fast. I go through a file in a little under 0.1 seconds in fact, so this would be good. Unfortunately, after jamming it into a subroutine called via a loop through this list of files, performance seems to bog down after 10 files or so. I'm sure this has to do with file IO issues that I don't even begin to understand, but I have this nagging feeling I'm going about this in a very wrong way, and I'm wondering if anyone here has tackled similar issues/has any suggestions?
|
# ? May 21, 2013 17:48 |
|
toadee posted:So I have a problem that I hope isn't too dumb. Basically, I have a directory that gets about 100 files in it a day, each file somewhere between 17 and 50 megs in size (the files contain call records). I'd like to scan through these files very quickly, as quick as possible, for an arbitrary string. If it does need to be instant, then I suggest hooking into the inotify service present in most Linux kernels to check for file operations, and only scanning files that have changed. There's a helpful module for that. It looks like there's even an equivalent for Windows, if that's your thing. Though this is only useful if file changes are periodic, otherwise you'll be constantly scanning anyway. If it's a constant stream of logs, I'd look into hooking into the program doing the logging directly, rather than scanning its log files. Does it provide any kind of API?
|
# ? May 21, 2013 20:51 |
|
Unfortunately, the files are generated constantly, at 15 minute intervals. Their point of origination (a VoIP switch) would only have the last 15 minutes of call records before the next batch is slurped, and in general, the searches that need to be performed would cover many such intervals while looking for calls.
|
# ? May 21, 2013 21:05 |
|
The reference code doesn't unmap the file in the parent process after searching it. You probably should, as a matter of general hygiene and to make sure you aren't creating any leaks that kill your performance which would be my first guess. I'd also recommend looking at the process list to see whether you are creating zombie processes that the parent needs to clean up. Gazpacho fucked around with this message at 23:39 on May 21, 2013 |
# ? May 21, 2013 23:33 |
|
toadee posted:Unfortunately, the files are generated constantly, at 15 minute intervals. Their point of origination (a VoIP switch) would only have the last 15 minutes of call records before the next batch is slurped, and in general, the searches that need to be performed would cover many such intervals while looking for calls.
|
# ? May 21, 2013 23:50 |
|
I'm not really following what "constantly, at 15 minute intervals" means? At any given second could there be new files, or is it just at 15:04, 30:04 past the hour, etc.? Once a new file is present, will more data be appended to it or is it static and must be searched within 15 minutes before the source deletes it? I think if you nail down the problem statement it'll help, though I understand if that's a little too much detail to give.
|
# ? May 21, 2013 23:54 |
|
het posted:How much of a difference does that particular code make compared to like a normal straightforward perl script? Is it faster than just grep? What do you need to do with the lines once you find them, just save them to a file? Much faster than grep, grep takes about 3 seconds to search one of the files (depending on how large it is ie. how many call records are in it). fgrep is a good deal faster, about half a second on average, but the code above is about 5 times faster than fgrep at this. I'm gathering these to deliver to a CGI request, basically a tool for us to use in our NOC to query these call records quickly. JawnV6 posted:I'm not really following what "constantly, at 15 minute intervals" means? At any given second could there be new files, or is it just at 15:04, 30:04 past the hour, etc.? Once a new file is present, will more data be appended to it or is it static and must be searched within 15 minutes before the source deletes it? Essentially, every 15 minutes a file is grabbed from several VoIP switches that contains a ;-separated list of call data records, these get stored in a directory for us to reference when there are reports of issues with calls (they contain info like trunk group selection, outgoing carrier selection, etc). Right now we've just been using command line tools to grep through them but I got the idea to take some spare time and hack together a perl CGI/jquery formatted web interface for the searches, as it's actually a hell of a lot easier to parse through a big list of these things in a nicely formatted table. Gazpacho posted:The reference code doesn't unmap the file in the parent process after searching it. You probably should, as a matter of general hygiene and to make sure you aren't creating any leaks that kill your performance which would be my first guess. So I just tried unmapping after finishing up searching through $str and curiously it slows the whole thing way down. I'm guessing this is why the original didn't end up doing so as well. I do end up with defunct processes as observed mega-scientifically via staring at top while it runs, but I'm not sure how to avoid that?
|
# ? May 22, 2013 12:02 |
|
toadee posted:So I just tried unmapping after finishing up searching through $str and curiously it slows the whole thing way down. I'm guessing this is why the original didn't end up doing so as well. I do end up with defunct processes as observed mega-scientifically via staring at top while it runs, but I'm not sure how to avoid that? Have you considered using a database for this? If we're talking about CDRs, it's already a tabular format, and I'm assuming you're searching on fields like TN or whatever, which a database could index to optimize searches.
|
# ? May 22, 2013 15:43 |
|
het posted:You're creating J processes for each run of that script, and when a process exits, its parent process gets the SIGCHLD signal. Until the parent calls wait() (or waitpid() or whatever), the process will be a zombie/defunct process. I have considered a DB however I'm not sure how best to go about this, I suppose I could make an arbitrary cutoff point for how long I'll want to store CDR records, then just drop tables that are older than that cutoff date, but continually filling it with each days' CDR would get unwieldy pretty quickly. It's something I will look into further to be sure, but in the meantime if anyone has any theories/suggestions on how best to search through files as quickly as possible I'd like to hear them. I tried using Coro with Coro::Handle but at least using the same method I've done before for concurrent HTTP requests but it didn't produce very concurrent looking results and performance.
|
# ? May 22, 2013 15:50 |
|
toadee posted:I have considered a DB however I'm not sure how best to go about this, I suppose I could make an arbitrary cutoff point for how long I'll want to store CDR records, then just drop tables that are older than that cutoff date, but continually filling it with each days' CDR would get unwieldy pretty quickly. It's something I will look into further to be sure, but in the meantime if anyone has any theories/suggestions on how best to search through files as quickly as possible I'd like to hear them. I tried using Coro with Coro::Handle but at least using the same method I've done before for concurrent HTTP requests but it didn't produce very concurrent looking results and performance.
|
# ? May 22, 2013 16:50 |
|
het posted:The best way to repeatedly search through data as quickly as possible is to index it somehow so that searches aren't starting with a blank slate every time. If you don't refine your problem definition beyond "search for arbitrary data in arbitrary datasets", performance optimizations become difficult. Well the problem really is 'search for an arbitrary user provided string among a list of 96 files in semicolon delimited format'. I was hoping for some way to say run several concurrent processes and get return data from each. I do understand that the absolute best and quickest way to do this is if they were all in a database beforehand, but as of this writing that's not possible and while I'm trying to make it so, I was hoping there would be a way to do this more quickly than simply doing what amounts to a serialized line by line regex search for patterns.
|
# ? May 22, 2013 17:03 |
|
toadee posted:Well the problem really is 'search for an arbitrary user provided string among a list of 96 files in semicolon delimited format'. I was hoping for some way to say run several concurrent processes and get return data from each. I do understand that the absolute best and quickest way to do this is if they were all in a database beforehand, but as of this writing that's not possible and while I'm trying to make it so, I was hoping there would be a way to do this more quickly than simply doing what amounts to a serialized line by line regex search for patterns. Other than that, I decided to search CPAN to see if there were any modules that might help with your tailing of continuously updated files. Have you tried giving File::Tail a go? Rohaq fucked around with this message at 17:28 on May 22, 2013 |
# ? May 22, 2013 17:26 |
|
Rohaq posted:If it's semicolon delimited, might I suggest using a split, and choosing the right entry from the resulting list? Regex is great at pulling out data from oddly structured strings (and making you look like a goddamned wizard in the process), but even the most optimised of expressions will rarely beat a split in terms of performance if you're extracting from a character delimited string. I had one case where I found an area of a script where a regex was being used where a split would suffice, and saw performance increase massively as a result. quote:Other than that, I decided to search CPAN to see if there were any modules that might help with your tailing of continuously updated files. Have you tried giving File::Tail a go?
|
# ? May 22, 2013 19:20 |
|
Hey so I'm trying to write a lil script that just takes a bunch of numbers of energy deposited and sums them up based on a dependent variable which I read from a file. So I think it's working pretty much as I expect, but there's some bizarro rounding stuff going on I think. Here's what the file mostly looks like (for example):code:
The code seems to skip the 29th (and some other one too) index. It just adds all of the entries into the 28th index instead. I've gone through with the debugger and stepped through it and confirmed that it has properly read in the index to be 29 (printing the variable gives me 29), but when the array is accessed with the variable, it just goes to 28 instead. Is there a way to force perl to round in a way which would eliminate this issue? Here's the code: code:
|
# ? May 24, 2013 05:04 |
|
code:
|
# ? May 24, 2013 05:21 |
|
Yeah I probably described it kinda vague. Essentially the whole file looks like what I posted, so it's got some lines describing an index and a bunch of numbers I want to add into that index. So ideally like this:code:
|
# ? May 24, 2013 05:58 |
|
het posted:He needs to know which field to search in, and might want to find results in multiple columns, e.g. one column is source number, one is destination, and he might want to search on both. Perl code:
Perl code:
Rohaq fucked around with this message at 04:52 on May 29, 2013 |
# ? May 24, 2013 13:20 |
|
Eeyo posted:So it adds 2.3324 + 1.3434 + 5.6445 to array[20] and 1.3242 + 3.3423 + 0.4324 to array[21] and same with the entries under Index:28 and Index:29. The problem is that when it gets to the "Index:29" line, it reads the index properly as 29, but when I try to assign 1.6627 + 4.2222 + 1.6233 to array[29], it gives it to array[28] instead for whatever reason. Make sense? So I just need a way to force the code to round the Index:29 that it reads in better. In my specific code, I need to round the $currentEnergy variable, since it doesn't work right when I add numbers to energy[$currentEnergy] for certain values of $currentEnergy. Would just ignoring the '.' in the regex work? Based on your initial example which had like "Energy .16" anyway. (Or use a hash or hack it to work some other way if the decimal point is significant, I guess.)
|
# ? May 24, 2013 14:13 |
|
quote:
Add an 'else' to the second if and see what's getting by both the regex and the value check.
|
# ? May 24, 2013 22:29 |
|
E: what the hell is this Python snippet doing in the Perl Short Questions Megathread
qntm fucked around with this message at 15:59 on Jun 2, 2013 |
# ? May 28, 2013 22:20 |
|
Thanks for all the advice! I pretty much ended up changing it so it's reading in whole numbers instead of multiplying .29 by 100 or something goofy like that. Seemed to fix it. And yeah I probably should check the input more, but I'm in control of the output from the simulation I'm running so I know it's getting passed the correct stuff. But I should check what it's reading more rigorously in the future. I made another version where I just stored it in a hash, is that good procedure? I mean I just put the "energy" as the key and added to its value. May be better than doing some goofy array nonsense and is more flexible for what input it can parse.
|
# ? May 29, 2013 02:30 |
|
Most Perl hackers use hashes for pretty much everything, they're way more flexible than arrays. I only use arrays when I specifically need them (i.e. a numerically indexed, ordered list). If you're writing Perl, learn to love hashes.
|
# ? May 29, 2013 10:06 |
|
Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl? Because so far I feel like I'm scrambling to impose order on a chaotic mess. Back in the day I spent several years as a C++ developer. I've now jumped headfirst into a new job as a Linux admin, and so I'm learning Perl as fast as I can. And by a lot of measures that's going very well -- I can write perfectly functional code that does what I need it to do. But my Perl code looks almost exactly like C, because I still think in C and then translate the syntax as necessary. Other people's code is enigmatic at best -- it's like figuring out a rebus. Anything that's described as "Perlish" seems designed to be as obfuscated and impenetrable as humanly possible. Clearly I still don't "get" Perl. So much important stuff is left implicit and unstated. (I'm still completely at the very concept of a default variable. Why the hell would you DO that? I can't think of a single reason to ever use $_ in place of a real variable, and the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.) All this implicit stuff feels like building a house out of jello. I'm clearly still missing an essential concept here. Can anyone recommend any likely paths to enlightenment, to that "aha", to that moment of grokking just what the hell Larry Wall was driving at? Any books, tutorials, psychoactive drugs, meditations upon mountaintops?
|
# ? Jun 23, 2013 08:17 |
|
Powered Descent posted:Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl? Because so far I feel like I'm scrambling to impose order on a chaotic mess. Personally, I don't think you are missing enlightenment, as such. Like any language, you are using Perl to implement solutions to problems. Nothing about that precludes the writing of well structured, readable, maintainable code. One can exploit the strengths of the language without being intentionally terse or cryptic. That said, Perl being as permissive as it can be, you might sometimes have to wrestle with some real poo poo. With that in mind, try to learn the language fundamentals. Understand its types (especially if you are coming from a strongly-typed background), understand context, and key concepts. Beyond that, if you haven't already done so, bookmark http://perldoc.perl.org/. It'll help you more ably exploit Perl's strengths (that is to say, to 'think' in Perl, rather than mentally transliterate from what you'd do in C++), and also deconstruct much of the pointlessly terse and cryptic Perl that so many people seem to poop out[1]. That said, the various (http://perldoc.perl.org/perlvar.html) predefined variables in Perl do for the most part perform important duties - often their use is definitely necessary, or at the least recommended. But, by virtue of them being all predefined and what-not, they can be, and often are (in my opinion) used in scenarios where said use is not necessary or beneficial. To touch upon a specific perlvar you mention - in a 'foreach' my preference is to use a lexical as opposed to $_. However, when using things like 'map' and 'grep, use of $_ is expected and appropriate. [1] Its a sore spot for me, I guess. Over the years I've reviewed a poo poo-load of code from a wide variety of people, and I've also had to pick apart and rebuild understanding of large, ancient, hairy legacy Perl systems made by people that left donkeys years ago. magimix fucked around with this message at 10:28 on Jun 23, 2013 |
# ? Jun 23, 2013 10:20 |
|
Powered Descent posted:Clearly I still don't "get" Perl. So much important stuff is left implicit and unstated. (I'm still completely at the very concept of a default variable. Why the hell would you DO that? I can't think of a single reason to ever use $_ in place of a real variable, and the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.) edit: heh, beaten
|
# ? Jun 23, 2013 10:21 |
|
Powered Descent posted:Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl? Because so far I feel like I'm scrambling to impose order on a chaotic mess. Perl is a horrible programming language, built out of gotchas and idiotic design decisions. It's possible to write good code in almost any programming language and it's possible to write bad code in every programming language, and it's definitely possible to write C in Perl, but Perl's central philosophy "There's More Than One Way To Do It" actively encourages users to avoid best practice and to make full use of Perl's idiotic idioms. Your observation that "Perlish" seems to be the same thing as "obfuscated and impenetrable" is correct. Perhaps the following meditation will help: Larry Wall isn't all that. You are correct that leaving things implicit and unstated in one's code is a bad thing, and that Perl has many features which encourage you to do this. As for the uses of $_ specifically, it's intended to let you turn something like this: Perl code:
Perl code:
Perl code:
Perl code:
There is a book, Perl Best Practices, which you may find illuminating. It explains and justifies which practices are good and which are bad.
|
# ? Jun 23, 2013 10:57 |
|
A default variable is useful for programs that operate only, or primarily, on one variable such as the current line of input. Less redundancy and less typing. I'm of the opinion that no implicit semantics should be used in any Perl program that is stored in its own file. It's hostile to other programmers and often enough it's hostile to yourself. However I have no reservations about using them on the perl command line. Powered Descent posted:Waking this thread up for a general newbie question: is there anything a native speaker of C can do to better "get" Perl? Gazpacho fucked around with this message at 11:31 on Jun 23, 2013 |
# ? Jun 23, 2013 11:00 |
|
qntm posted:
Your code is too verbose! Perl code:
Actually I'm just replying to get behind your mention of Perl Best Practices, and to say something I forgot to in my original post - Powered Descent, know that Larry Wall's "Programming Perl" is a poo poo book. It is a poor language reference, and I regard it as a *terrible* book for anyone trying to learn the language, let alone learn to use it well. 'Well' in this context having 'readability', 'clarity', and 'maintainability' as core requirements. (Also the style of the book rubs me up the wrong way. It's like the technical-book version of an aging hipster. I guess that is the least of its problems though.)
|
# ? Jun 23, 2013 11:25 |
|
Powered Descent posted:the very first thing that happens in my subroutines is to shift the real parameters out of that @_ thing.) het posted:Certainly it's not not-perlish to assign variables (though I'm not 100% certain if "my $foo = shift @_" doesn't different behavior than "my $foo = $_[0]", I suspect it might?) Unpacking @_ is best practice because Perl calls by reference. Perl code:
Perl code:
|
# ? Jun 23, 2013 11:25 |
|
magimix posted:Larry Wall's "Programming Perl" is a poo poo book. It is a poor language reference, and I regard it as a *terrible* book for anyone trying to learn the language, let alone learn to use it well. 'Well' in this context having 'readability', 'clarity', and 'maintainability' as core requirements. (Also the style of the book rubs me up the wrong way. It's like the technical-book version of an aging hipster. I guess that is the least of its problems though.) This is absolutely true as well. If you're wondering whether Larry Wall has his head screwed on this is the book you need to look at. It's about twice as long as it should be because he spends pages and pages explaining Perl concepts using cute metaphors and not-very-funny jokes instead of letting example code do the talking.
|
# ? Jun 23, 2013 11:29 |
|
|
# ? May 17, 2024 17:30 |
|
The only time you should use $_ in perl is in a grep or map block, or in a one-liner. Any other time is better served by using a named variable. Similarly, @_ (or $_[x]) should only ever appear at the top of subs when you unpack the parameters. Anyone writing reasonable code will avoid the things left implicit. Unfortunately perl has plenty of warts (like everything in perlvar) and always gives you plenty of rope to hang yourself with. And plenty of people will take advantage of that either through ignorance or excess cleverness to make really terrible code. Modern Perl is a pretty good book for learning perl that focuses mainly on the parts of the language you'd want to use. Not sure how useful it is for C++ programmers or people dealing with existing code bases. About half of Perl Best Practices is pretty good, but at least half of the modules suggested in it are terrible. Damian Conway falls firmly in the "excess cleverness" category.
|
# ? Jun 23, 2013 12:39 |