The Perl Short Questions Megathread: executable line noise

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »

S133460: Mar 17, 2008; hello

Mithaldu posted:

I don't know a lot about threads, but i know this much: The best way of handling threads is literally this: Fork/Create as early as loving possible, if possible in a BEGIN loop before anything else and have communication only happen through shared scalars. Reason for that being that Perl does those things by duplicating your entire application, which however does not work entirely perfect as far as creating actual seperate duplicates goes, nor does it bode well for the memory use of your application.

That's a general rule with fork, perl aside. See fork(2).

# ? May 4, 2009 05:33

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 17:41

Mithaldu: Sep 25, 2007; Let's cuddle.

I'm currently trying to figure out a performance issue i have when splitting a CSV file into arrays of arrays. The first run through a file is pretty fast. However EVERY run after that is very slow.

Here's a barebones example of the code affected:

code:

#! perl -w
use 5.010;
use strict;

sub x{
    open my $fh, '<', shift or die $!;
    my @AoA;
    push @AoA, [ split ',' ] while <$fh>;
    close $fh;
    return scalar @AoA;;
}

for ( 1 .. 2 ) {
    my $start = time;
    printf "Records: %d in %.3f seconds\n",
        x( sprintf 'junk%d.dat', 1+ ($_ & 1) ),
        time() - $start;
}

And here's example output under 32 bit ActivePerl:

code:

# perl test.pl
Records: 308273 in 5.641 seconds
Records: 279997 in 98.281 seconds
Records: 308273 in 128.656 seconds
Records: 279997 in 96.953 seconds
Records: 308273 in 129.188 seconds

I've also run a few permutations of the code under activeperl, cygwin and strawberry perl through NYTProf and the results are here: http://drop.io/perl_performance/asset/ap-vs-cw-vs-sb-rar

Summary: Under ActivePerl and Strawberry Perl the splitting of the lines from the files into an array takes considerably more time on subsequent runs AS LONG as their reference gets pushed into another array. When i remove that pushing or use Cygwin, it runs at normal speed.

Any suggestions as to how this could be caused and how i can avoid it? (Suggestions as to who i could ask about this other than Perlmonks would be neat too.)

Mithaldu fucked around with this message at 10:53 on May 7, 2009

# ? May 7, 2009 10:49

leedo: Nov 28, 2000

Mithaldu posted:

I'm currently trying to figure out a performance issue i have when splitting a CSV file into arrays of arrays. The first run through a file is pretty fast. However EVERY run after that is very slow.

...

Any suggestions as to how this could be caused and how i can avoid it? (Suggestions as to who i could ask about this other than Perlmonks would be neat too.)

While I can't tell you what is causing the slowdown, I can heartily recommend using Text::CSV_XS for any CSV processing.

# ? May 7, 2009 16:05

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

The YAPC::NA schedule was just published. My "Extending Moose for Applications" talk was accepted :toot:

It's at 8am though.

Anyone planning on going? It's in Pittsburgh, June 22-24.

# ? May 7, 2009 17:07

Puck42: Oct 7, 2005

Anyone here use HTML::Mason with virtual servers on Apache?

I'm having issues getting it to work with more than 2 sites.

Using PerlOptions +Parent works fine with 2 sites, as does using a handler hash in handler.pl.

But as soon as I add a third site, it starts loading files from the wrong folder. The original 2 still work though, it's just the new one.

Can anyone help?

# ? May 7, 2009 23:15

Mithaldu: Sep 25, 2007; Let's cuddle.

leedo posted:

While I can't tell you what is causing the slowdown, I can heartily recommend using Text::CSV_XS for any CSV processing.

I'm aware that it is very fast, but honestly, for the amount of data i'm looking to process, splitting it like that is faster. Well, it would be if it didn't gently caress up when confronted with more than one file.

# ? May 7, 2009 23:20

slipped: Jul 12, 2001

Mithaldu posted:

I'm currently trying to figure out a performance issue i have when splitting a CSV file into arrays of arrays. The first run through a file is pretty fast. However EVERY run after that is very slow.

Here's a barebones example of the code affected:
code:
#! perl -w
use 5.010;
use strict;

sub x{
    open my $fh, '<', shift or die $!;
    my @AoA;
    push @AoA, [ split ',' ] while <$fh>;
    close $fh;
    return scalar @AoA;;
}

for ( 1 .. 2 ) {
    my $start = time;
    printf "Records: %d in %.3f seconds\n",
        x( sprintf 'junk%d.dat', 1+ ($_ & 1) ),
        time() - $start;
}
And here's example output under 32 bit ActivePerl:
code:
# perl test.pl
Records: 308273 in 5.641 seconds
Records: 279997 in 98.281 seconds
Records: 308273 in 128.656 seconds
Records: 279997 in 96.953 seconds
Records: 308273 in 129.188 seconds
I've also run a few permutations of the code under activeperl, cygwin and strawberry perl through NYTProf and the results are here: http://drop.io/perl_performance/asset/ap-vs-cw-vs-sb-rar

Summary: Under ActivePerl and Strawberry Perl the splitting of the lines from the files into an array takes considerably more time on subsequent runs AS LONG as their reference gets pushed into another array. When i remove that pushing or use Cygwin, it runs at normal speed.

Any suggestions as to how this could be caused and how i can avoid it? (Suggestions as to who i could ask about this other than Perlmonks would be neat too.)

your code doesn't match the file output.
$ for(1..2) {
> printf 'junk%d.dat', 1+ ($_ & 1); print "\n";
}
junk2.dat
junk1.dat

are you reparsing the same file 5 times and it slows down?

# ? May 8, 2009 00:06

Mithaldu: Sep 25, 2007; Let's cuddle.

The output was from when it was

code:

for ( 1..5 )

. It basically skips back and forth between two files a while. But the details of that aren't really important as it happens as soon as it does a second one no matter what.

# ? May 8, 2009 00:12

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Wild-rear end guess: might be a memory configuration issue, if there's something wonky about the garbage collectors for those Perl implementations. You can certainly see this sort of rapid degradation if GC is thrashing and for some reason not finding those large allocations.

# ? May 8, 2009 00:50

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

There's only one implementation of Perl and it uses reference counting. :confused:

# ? May 8, 2009 01:10

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Sartak posted:

There's only one implementation of Perl and it uses reference counting.

And that implementation has gone through multiple revisions and is distributed in different default configurations by various vendors, any of which might impact GC correctness/performance. Even reference-count GC can have bugs.

# ? May 8, 2009 01:46

S133460: Mar 17, 2008; hello

Mithaldu posted:

I'm currently trying to figure out a performance issue i have when splitting a CSV file into arrays of arrays. The first run through a file is pretty fast. However EVERY run after that is very slow.

I'd try to simplify it further to isolate the culprit. Try taking IO out of the equation entirely. I don't know what your input files look like, but even something like kernel caching of the file on disk could affect your test between iterations. Try benchmarking split on its own between compilers. The problem could be how memory allocation is done for all those scalars you're creating, or umpteen other variances between how perl was compiled (compare "perl -V" outputs).

If you can isolate the real problem area, compare syntax trees (see B::Concise) and see how they differ between perls. You might be surprised by how differently the optimizer interprets your code between even minor versions. XS modules often perform more consistently since the C parts are not subjected to this.

# ? May 8, 2009 03:41

Roseo: Jun 1, 2000; Forum Veteran

I'm coding an internal API to a bandwidth management system and taking the opportunity to learn Moose while I'm at it. My API makes a call to a database and populates the object attributes based on a unique field in the database via code in BUILDARGS(). However, I need to do error handling in the case of trying to base the object on a nonexistent db entry.

What's the best way to handle an error during Moose object creation, make the creation fail and pass an error back to the calling app, but not completely die?

My experience with error handling in Perl in general is pretty nonexistant. I use it primarily for scripting, so my experience is basically limited to if ($x) {action} else {die "Blaarg!"} type stuff.

# ? May 8, 2009 17:53

Rapportus: Oct 31, 2004; The 4th Blue Man

Roseo posted:

I'm coding an internal API to a bandwidth management system and taking the opportunity to learn Moose while I'm at it. My API makes a call to a database and populates the object attributes based on a unique field in the database via code in BUILDARGS(). However, I need to do error handling in the case of trying to base the object on a nonexistent db entry.

What's the best way to handle an error during Moose object creation, make the creation fail and pass an error back to the calling app, but not completely die?

My experience with error handling in Perl in general is pretty nonexistant. I use it primarily for scripting, so my experience is basically limited to if ($x) {action} else {die "Blaarg!"} type stuff.

I don't know enough about Moose, but that sounds exactly like exception handling.

# ? May 8, 2009 18:57

Roseo: Jun 1, 2000; Forum Veteran

Rapportus posted:

I don't know enough about Moose, but that sounds exactly like exception handling.

Well, yeah -- the problem is that there's no reason to die for an error that's recoverable. I want to pass an error like "Record not found" out of the object, so that the program calling is can handle the error and continue on its way. A die() within the object's method kills the program calling it as well.

And I'd prefer not to have to stick every new object into an eval block; there has to be a better way to handle this.

# ? May 8, 2009 19:10

Roseo: Jun 1, 2000; Forum Veteran

Quote's not edit

# ? May 8, 2009 19:12

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

In my opinion... Generally, if you're okay with getting an undefined object back, the constructor should emit a warning. If the construction of the object is expected to be flawless in nearly all scenarios OR being unable to construct the object threatens the stability/safety of future operations, then you should die.

But yeah, no proper error handling framework. Unless you want to throw/catch an Error (it's a package).

# ? May 8, 2009 19:13

Roseo: Jun 1, 2000; Forum Veteran

Triple Tech posted:

In my opinion... Generally, if you're okay with getting an undefined object back, the constructor should emit a warning. If the construction of the object is expected to be flawless in nearly all scenarios OR being unable to construct the object threatens the stability/safety of future operations, then you should die.

But yeah, no proper error handling framework. Unless you want to throw/catch an Error (it's a package).

I think I'm going to stay away from putting a module that claims that its syntax 'tends to break' into production. Thanks anyway, eval{} blocks it is!

Edit: After working with Moose, I'm beginning to see why people reccommend using it. It certainly makes dealing with objects simple, and I'm loving not having to set up the attribute validation implementation. It lets me deal with how I want the object to be laid out and handle all the basic crap behind the scenes. If anyone else is on the fence, give it a try, I don't think you'll regret it.

Roseo fucked around with this message at 22:41 on May 8, 2009

# ? May 8, 2009 22:37

Mithaldu: Sep 25, 2007; Let's cuddle.

rjmccall posted:

Wild-rear end guess: might be a memory configuration issue, if there's something wonky about the garbage collectors for those Perl implementations. You can certainly see this sort of rapid degradation if GC is thrashing and for some reason not finding those large allocations.

I'm not entirely sure what you mean with this, as it's way over my head. For what it's worth: I have more than enough free RAM.

All I'm doing here is trying to find:
1. a way around the problem
2. the cause of it
3. a venue by which i can communicate the cause to the relevant people so it might get fixed

Due to my skill and experience level however i can only provide the raw data and some guesses and will have to rely on other people to do point 2.

satest3 posted:

IO

IO is not an issue at all. I played around with it some more and when passing back a reference to the AoA created from the file and pushing it in another array, the performance jumps up again.

satest3 posted:

If you can isolate the real problem area, compare syntax trees (see B::Concise) and see how they differ between perls.

I have no idea whatsoever how to read this, but for the sake of allowing people who know more than me to find things, i've included the output of that in my next batch of benchmarks.

In other news: I found a way around the problem. As mentioned above, keeping the references to the AoAs around takes care of the performance drop. I have no idea how or why, but it's a good enough solution for me.

However, in the interest of resolving this for good, here's more benchmarks: http://drop.io/perl_performance/asset/arrays-zip

If anyone is in fact interested in resolving this, i'll happily provide more data, but at this point that's about all i can do.

# ? May 9, 2009 11:01

slipped: Jul 12, 2001

I wasn't able to reproduce the odd behaviour on linux x86 (it ran as expected). One thing, you weren't using Time::HiRes. Not sure if that makes a difference on windows, but you should
use Time::HiRes qw(time); for benchmarking.

# ? May 9, 2009 17:20

Mithaldu: Sep 25, 2007; Let's cuddle.

I use Devel::NYTProf for benchmarking and originally did use HiRes to get an idea of the timings, but when a run goes 10 seconds, that's not really needed anymore.

# ? May 9, 2009 17:26

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Just curious, have you tried using the non-XS version of Text::CVS to see if that's slow? Maybe your code is wrong. I dunno, I just can't shake that feeling.

But then again I'm kind of ignorant to implementation-level bugs, so whatever.

# ? May 9, 2009 20:29

Mithaldu: Sep 25, 2007; Let's cuddle.

Umm, i haven't even touched any CSV module at all. v :shobon:

v This is all I've been doing:

code:

#! perl -w
use 5.010;
use strict;

sub x{
    open my $fh, '<', shift or die $!;
    my @AoA;
    push @AoA, [ split ',' ] while <$fh>;
    close $fh;
    return scalar @AoA;;
}

for ( 1 .. 2 ) {
    my $start = time;
    printf "Records: %d in %.3f seconds\n",
        x( sprintf 'junk%d.dat', 1+ ($_ & 1) ),
        time() - $start;
}

If I've been doing anything that's wrong then it's in there.

# ? May 9, 2009 20:44

leedo: Nov 28, 2000

Sartak posted:

The YAPC::NA schedule was just published. My "Extending Moose for Applications" talk was accepted

It's at 8am though.

Anyone planning on going? It's in Pittsburgh, June 22-24.

I'm going to be at YAPC::NA, I'll try to check it out. I am probably most interested in the Moose and Perl 6 talks.

# ? May 12, 2009 02:05

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

Cool. My talk is on Tuesday at 8AM. I hope I can even wake up in time.

# ? May 12, 2009 02:45

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

I see that YAPC is quickly approaching... So I guess I will attend and see if my company will reimburse me after the fact. I'll just use up all my vacation days and check out sunny Pittsburg! Ugh...

How much of this bitch can I put on credit... And I guess now it's a great time to look into an Apple laptop. Can't do mobile hacking without one, right?

Edit: Gorsh, this poo poo is expensive. Are you guys going for like all the days, Thursday, Friday, Sunday?

Triple Tech fucked around with this message at 18:34 on May 15, 2009

# ? May 15, 2009 03:24

leedo: Nov 28, 2000

Triple Tech posted:

I see that YAPC is quickly approaching... So I guess I will attend and see if my company will reimburse me after the fact. I'll just use up all my vacation days and check out sunny Pittsburg! Ugh...

How much of this bitch can I put on credit... And I guess now it's a great time to look into an Apple laptop. Can't do mobile hacking without one, right?

Edit: Gorsh, this poo poo is expensive. Are you guys going for like all the days, Thursday, Friday, Sunday?

I'm just doing Monday - Wednesday, it doesn't look like too much is going on the rest of the week. And it was only 99 dollars if you ordered tickets a month or so ago, I can't think of a cheaper conference!

# ? May 16, 2009 21:20

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Yeah I'm starting to reevaluate everything... I'm probably only going Mon-Wed, checking in Sun and leaving Thurs. Still getting the laptop though, haha, probably a Macbook.

# ? May 16, 2009 22:23

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Registered, housing, and MacBook... Just gotta get my plane tickets in order. See ya there!

Edit: And done with plane tickets!

Triple Tech fucked around with this message at 14:38 on May 18, 2009

# ? May 18, 2009 14:07

CanSpice: Jan 12, 2002; GO CANUCKS GO

I know I'm not exactly active here, but I just thought I'd say I'm heading to YAPC as well. Thinking of submitting a talk, and if they've closed off submissions, I guess I'll take a stab at a lightning talk.

Edit: Eh well I see that the submission deadline was April 24th, but gently caress it, I still submitted a talk.

CanSpice fucked around with this message at 22:02 on Jun 5, 2009

# ? Jun 5, 2009 19:56

gregday: May 23, 2003

I've searched around and tried a few things from the Google results, but nothing's worked so far. I'm reading in a file and doing some stuff on it that would go totally bonkers if it encountered an empty line, so I need to remove those first.

code:

sub DetailRecord {
        open(INFILE) or die "Could not open $INFILE";

        $INFILE =~ (stuff that would hopefully strip blank lines)

        foreach $line (<INFILE>) {
                chomp($line);
               ...my stuff...
}

Basically I want INFILE to not contain any empty lines by the time the foreach starts.

# ? Jun 6, 2009 02:24

CanSpice: Jan 12, 2002; GO CANUCKS GO

gregday posted:

I've searched around and tried a few things from the Google results, but nothing's worked so far. I'm reading in a file and doing some stuff on it that would go totally bonkers if it encountered an empty line, so I need to remove those first.
code:
sub DetailRecord {
        open(INFILE) or die "Could not open $INFILE";

        $INFILE =~ (stuff that would hopefully strip blank lines)

        foreach $line (<INFILE>) {
                chomp($line);
               ...my stuff...
}
Basically I want INFILE to not contain any empty lines by the time the foreach starts.

After the chomp, just do:

code:

next if $line eq '';

That'll skip any blank lines and just go on to the next one.

Edit: I'd said "before", switched to "after". You could also do something like:

code:

next if $line =~ /^\s+$/;

...if you're worried about any lines being full of whitespace and nothing else.

# ? Jun 6, 2009 02:28

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

As a matter of style, you shouldn't use global file handles.

Instead of open INFILE, 'file'

you should write open my $infile, '<', 'file'

If another piece of the program opens up a file with the same global filehandle name, you can have weird interactions. Lexical filehandles also automatically close when they go out of scope.

As for explicitly passing in the "I want to open this for reading", consider a filename (perhaps chosen by the user) like ">app.db". Perl will helpfully overwrite your app.db file.

but yeah what CanSpice said

# ? Jun 6, 2009 02:36

gregday: May 23, 2003

Thanks! That's working great.

# ? Jun 6, 2009 02:36

toben: Aug 23, 2007

I have a bash script that I can convert to perl if you help me with 1 part.

People ftp files to a directory. My script grabs the files and encodes them. However if a file is still being uploaded it messes stuff up and I have to tell the people to re-upload the file. I can delay the script, but as I never know when the upload is complete I need to change it.

I need a perl command that basically checks if the file is in use before doing anything with it.

So this line:
for movie in /home/smartdata/www/html/files/encode/mp4/720p/*.* ; do
Should be something like this:
for "movie file that is completly uploaded" in /home/smartdata/www/html/files/encode/mp4/720p/*.* ; do

Here is my current bash script:
#!/bin/bash
while true ; do
rm -f /home/smartdata/www/files/encode/mp4/720p/*.log
for movie in /home/smartdata/www/html/files/encode/mp4/720p/*.* ; do
NEWNAME=`basename $movie`
ORIGINALPATH=$movie
NEWNAME="${NEWNAME%.*}.mp4"
NEWPATH=/home/smartdata/www/html/files/encoding/mp4/$NEWNAME
FINALPATH=/home/smartdata/www/html/files/encoded/mp4/$NEWNAME
COPYPATH=/home/smartdata/www/html/files/encode/mp4/copy/$NEWNAME
echo Processing $movie
echo -- New name is $NEWNAME
echo -- New path is $NEWPATH
echo -- Final path is $FINALPATH
echo -- Original path is $ORIGINALPATH
ffmpeg -i "$movie" -y -an -pass 1 -vcodec libx264 -s 1280x720 -b 1500kb -flags +loop -cmp +chroma -partitions +parti4x4+partp8x8+partb8x8 -me epzs -subq 1 -trellis 0 -refs 1 -bf 16 -b_strategy 1 -coder 1 -me_range 16 -g 250 -keyint_min 25 -sc_threshold 40 -i_qfactor 0.71 -bt 1500kb -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 "$NEWPATH"
ffmpeg -i "$movie" -y -threads auto -acodec libfaac -ab 128kb -pass 2 -vcodec libx264 -s 1280x720 -b 1500kb -flags +loop -cmp +chroma -partitions +parti8x8+parti4x4+partp8x8+partb8x8 -flags2 +brdo+dct8x8+wpred+bpyramid+mixed_refs -me umh -subq 6 -trellis 1 -refs 4 -bf 16 -directpred 3 -b_strategy 1 -bidir_refine 1 -coder 1 -me_range 16 -g 250 -keyint_min 25 -sc_threshold 40 -i_qfactor 0.71 -bt 1500kb -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 "$NEWPATH"

mv $NEWPATH $FINALPATH

mv $ORIGINALPATH /home/smartdata/www/html/files/encode/mp4/finished/

done
echo Waiting 5 minutes
sleep 300 #wait 5 min
echo Waiting 5 minutes
done

# ? Jun 11, 2009 18:11

Mario Incandenza: Aug 24, 2000; Tell me, small fry, have you ever heard of the golden Triumph Forks?

You don't really need perl for that, since the rest isn't anyway. Might as well replace for movie in /home/smartdata/www/html/files/encode/mp4/720p/*.* with for movie in $(find /home/smartdata/www/html/files/encode/mp4/720p/ -type f -mmin +5). This lists only files that haven't been modified in the last 5 minutes.

# ? Jun 11, 2009 18:55

toben: Aug 23, 2007

atomicstack posted:

You don't really need perl for that, since the rest isn't anyway. Might as well replace for movie in /home/smartdata/www/html/files/encode/mp4/720p/*.* with for movie in $(find /home/smartdata/www/html/files/encode/mp4/720p/ -type f -mmin +5). This lists only files that haven't been modified in the last 5 minutes.

Thanks,

I have been looking for a simple solution like that for months on end.

# ? Jun 11, 2009 20:27

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

Miyagawa has written a quick blog post about building native OS X applications out of Perl programs and their dependencies. It uses Platypus to do the heavy lifting.

http://remediecode.org/2009/06/binary-builds-for-os-x-leopard-users.html

I'm pretty excited by this, and hope it eventually becomes as easy to package up apps for Windows and Linux.

# ? Jun 13, 2009 22:26

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

What is up with your avatar? Should I post this in the Wave thread?

Also who's ready for YAPC? Ehh? EHH!? Do I need to pack anything? Or be mindful of something to prepare? Or are clothes and a laptop enough?

# ? Jun 14, 2009 00:05

Adbot: ADBOT LOVES YOU

# ? May 21, 2024 17:41

Filburt Shellbach: Nov 6, 2007; Apni tackat say tujay aaj mitta juu gaa!

Triple Tech posted:

What is up with your avatar? Should I post this in the Wave thread?

It's from the new Punch-Out game.

Triple Tech posted:

Also who's ready for YAPC? Ehh? EHH!? Do I need to pack anything? Or be mindful of something to prepare? Or are clothes and a laptop enough?

Not me I have to write all kinds of slides.

As far as what to bring: usual travel stuff. If you're staying in the dorms they provide you a roof, bed, towel, pillow, and a place to shower. That's about it, so bring your own toothpaste and deodorant and panties. Also plenty of cash for food and beer and (just in case) taxi or other stuff.

Laptop is good, bring em if you got em.

# ? Jun 14, 2009 00:13

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »