Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
TiMBuS
Sep 25, 2007

LOL WUT?

genericadmin posted:

Catalyst started off great, but it has become a gigantic bloated pig. The whole project went into the woods after a while.

I've looked at all the perl frameworks at length, and I can't say I like any of them anymore. What I'd really like to see is something that very efficiently processes the HTTP request of an environment (CGI, mod_perl, etc) into some object, and a fast url->action mapper. That's the only parts of a framework I find I actually want. The rest is all crap from CPAN (and I do mean crap).
I went hunting today, and I concur. All I really want are sessions handled for me, a pluggable, easy to manage template system and automatic dispatching. Oh and the ability to specify different methods of dispatch handling.

genericadmin posted:

Continuity is a neat idea, but it's not extremely useful since its going to have much steeper memory requirements than the traditional approach. That's a bad thing in a perl environment.
True. But it's still cool. I guess dispatch tables and sessions suffice anyway.

Adbot
ADBOT LOVES YOU

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Had a humbling moment just now... There's this employee here who's largely useless, I'm not sure what he does. But I was called in as programmming muscle for his project to help him out. And he tries to dabble in Perl to learn.

So I do my report and I send it off, and then he says, oh, look, my numbers are the same as yours. I'm like, how is that possible? I look over at his screen and sure enough the stats are accurate. And I see multiple run throughs of it on his screen, I'm like, how long does that take you to run? He's like, oh just about a minute. And I'm thinking, no way, When I run it, it's at least seven minutes, and I already went through optimizing my script for speed.

Long story short, the data was fixed width, 1.3 million rows. I made a widths array, built up an unpack string, and used that to split out the data into an array. What he did was have 160 lines (that's how many columns of data there were) of my $scalar = substr. His ran about 2x as fast as mine. I was floored.

Brute force won this battle.

Edit: I was trying to come up with a compromise. If I had the data tables ready, could I just build up the substr'ish Perl code itself and eval that for speed? For some reason, doing an assignment inside of an eval isn't saving the scalar. Is it supposed to?

Triple Tech fucked around with this message at 19:17 on Apr 7, 2008

heeen
May 14, 2005

CAT NEVER STOPS
how about
code:
$data[$i] = substr $input, $i*$col_width, ($i+1)*$col_width for my $i (1..160)
#or
@data = $input=~/(.{$col_width}){160}/ #or a precompiled regexp

tef
May 30, 2004

-> some l-system crap ->
Post your code?

tef fucked around with this message at 19:34 on Apr 7, 2008

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
I tried jamming the substr's into a list (@array = substr, substr, substr), that made it slow. The col_width technique that heeen posted wouldn't work because each column is different. And I'm pretty sure using a regexp would be universally slower than unpack or substr.

Edit: I also tried pushing them onto a stack instead of having Perl make that temp list, but it was still slow. I can't believe scalars are so much faster than arrays...

Triple Tech fucked around with this message at 19:50 on Apr 7, 2008

tef
May 30, 2004

-> some l-system crap ->
(from irc)

Eval is not run in a sandbox:

code:
$ perl
$a = 10 ; $b = 'my $a = 11;' ; eval $b ; print $a ; print "\n";
10

$ perl
$a = 10 ; $b = '$a = 11;' ; eval $b ; print $a ; print "\n";
11
Welcome to the world of lexical scope :D

tef fucked around with this message at 19:50 on Apr 7, 2008

<deleted user>

Triple Tech posted:

I tried jamming the substr's into a list (@array = substr, substr, substr), that made it slow. The col_width technique that heeen posted wouldn't work because each column is different. And I'm pretty sure using a regexp would be universally slower than unpack or substr.

I'm not following your description of the data. I think you are saying: 1.3 million records, each record has 160 fields, and you know the width of each field? You should be able to smoke through that. Are the fields all text or are some binary?

Is the data in a flat file? How are you reading from that (seek/read I hope)?

quote:

Edit: I also tried pushing them onto a stack instead of having Perl make that temp list, but it was still slow. I can't believe scalars are so much faster than arrays...

You're probably doing this already, but make sure you are using array references and not pouring data through list context needlessly. Here's why:

code:
------------
#!/usr/bin/perl -w

package main;

use strict;
use Benchmark;

my @foo = qw/one two three four five/;
my (@bar, @bam, @baz);

Benchmark::cmpthese(-5, {
    bylist  => sub { push(@bar, [@foo]) },
    byref   => sub { push(@bam, \@foo) },
    byref2  => sub { my $aref = \@foo; push(@baz, $aref) },
});
code:
[genadmin@mancub sand]$ perl b.arrays.pl
            Rate bylist byref2  byref
bylist  333040/s     --   -75%   -81%
byref2 1341143/s   303%     --   -23%
byref  1738044/s   422%    30%     --
I'm sure you'd get more insight if some code were posted, but I highly doubt your co-workers approach is optimal.

<deleted user>

TiMBuS posted:

I went hunting today, and I concur. All I really want are sessions handled for me, a pluggable, easy to manage template system and automatic dispatching. Oh and the ability to specify different methods of dispatch handling.

And sessions and templating are (rightly) not even provided by the framework. I think people go in this progression of "acknowledge anarchy -> adopt framework -> learn patterns -> rewrite large chunks of framework -> wonder why they need a framework".

And don't even get me started on template systems. :jihad:

Zombywuf
Mar 29, 2008

re: Frameworks

What about CGI::Application?

TiMBuS
Sep 25, 2007

LOL WUT?

genericadmin posted:

And sessions and templating are (rightly) not even provided by the framework. I think people go in this progression of "acknowledge anarchy -> adopt framework -> learn patterns -> rewrite large chunks of framework -> wonder why they need a framework".
Touché.
So in the end I just used my own crappy 'framework' complete with its post/get mixing of data and you know what I don't give a drat. User browsers don't even know the data is being mixed anyway since I'm using URL mapping rather than "?page=thunk" so wc3 can suck it with their "OH GOD UNDEFINED BEHAVIOUR". :colbert:

genericadmin posted:

And don't even get me started on template systems. :jihad:
I use HTML::Template. It's the only simple template system I could find.
Template Toolkit and Mason forgot what the hell a template was a long time ago.

Zombywuf posted:

re: Frameworks

What about CGI::Application?
It's not bad, but it had a few cons that made me leave it alone. For starters I wanted URL mapping not this '?rm=name' crap so I'd have to use a dispatch plugin that seemed to just hate working with fastCGI, let alone multiple files (because everyone wants every possible dispatch in one single namespace, amirite). However it had another plugin that I could use for this, and it was called (I poo poo you not) "CGI::Application::Plugin::AutoRunmode::FileDelegate".
It was probably right at this point I thought "hey wait a second this is exactly what my crappy framework does except my framework is 8x smaller, probably faster, doesn't have a 51 character import and it's built specifically for fastCGI" and I just used that instead.

Zombywuf
Mar 29, 2008

I find your requirements strange and disturbing.

But hey, whatever floats your boat.

Khorne
May 1, 2002
How do I convert an a4 value in perl (DWORD) to a string of numbers based on the values of the bytes?

I tried sprintf("%08x", $var); and it just returns 8 0s. I tried bit shifts with & and other far more work than it should be solutions. I'm willing to admit that I am overlooking something stupidly simple. Possibly pack.

Edit: I am so confused and want to go running home to C right now. I think it's treating $var as a string, but it has a 4 byte value, so the resulting string is completely messed up. I think I need to use unpack but I have no clue how, or if that's even what I have to use.

Edit2: Why do the people who coded this use a4 for something that isn't a string

Khorne fucked around with this message at 01:54 on Apr 8, 2008

TiMBuS
Sep 25, 2007

LOL WUT?

Zombywuf posted:

I find your requirements strange and disturbing.

But hey, whatever floats your boat.

fastCGI, url->handler mapping and the ability to delegate tasks over multiple files/namespaces are strange and disturbing requirements?

tef
May 30, 2004

-> some l-system crap ->

Khorne posted:

How do I convert an a4 value in perl (DWORD) to a string, or at least a decimal/hexadecimal number that I can pad some how?

I'm not sure what you need to do, but it sounds like you need :

http://perldoc.perl.org/functions/unpack.html

http://perldoc.perl.org/functions/pack.html

<deleted user>

Khorne posted:

How do I convert an a4 value in perl (DWORD) to a string, or at least a decimal/hexadecimal number that I can pad some how?

I tried sprintf("%08x", $var); and it just returns 8 0s. I tried bit shifts with & and other far more work than it should be solutions. I'm willing to admit that I am overlooking something stupidly simple.

Check out pack() and unpack().

code:
my $b = pack('l', 0xffffffff);

<deleted user>

Zombywuf posted:

I find your requirements strange and disturbing.

But hey, whatever floats your boat.

URL->action mapping is not a strange thing to want at all. It is much more inline with HTTP to make application requests by path than by query string, and having a good URL structure is very important for things like Google indexing.

TiMBuS posted:

Template Toolkit and Mason forgot what the hell a template was a long time ago.

Yeah, both of these could be put out to pasture. They are mostly relics of the big transition to embedded scripting many years ago, back when ASP and PHP first really caught on. Now the fad is frameworks. I wonder what's next after that fizzles. It's amazing that Perl has managed not to change at all during any of this. :frog:

Khorne
May 1, 2002

tef posted:

I'm not sure what you need to do, but it sounds like you need :

http://perldoc.perl.org/functions/unpack.html

http://perldoc.perl.org/functions/pack.html
Thanks, that did it.

I still don't understand why they'd make a long an a4, can anyone explain the benefits, if any, of doing so?

Khorne fucked around with this message at 03:31 on Apr 8, 2008

Zombywuf
Mar 29, 2008

TiMBuS posted:

fastCGI, url->handler mapping and the ability to delegate tasks over multiple files/namespaces are strange and disturbing requirements?

I looked up fastCGI, and all I found was something claiming to be faster than Netscape 1.1. Assuming this is the fastCGI you're thinking of, why not just use mod_perl? I have no idea if fastCGI is any good, but that site rings my BS meter up to 10.

Also, imports with long names?

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

genericadmin posted:

I'm not following your description of the data. I think you are saying: 1.3 million records, each record has 160 fields, and you know the width of each field? You should be able to smoke through that. Are the fields all text or are some binary?

Is the data in a flat file? How are you reading from that (seek/read I hope)?

It's a text file with about 1.3 million rows. Each row is like 1200 characters long. This row has 160 columns, each of different lengths. Given some magical source of the length of each column (like an array or a PDF of documentation), what's the fastest way of reading in the data? None of the fields are binary.

For our current purposes, the data doesn't need to be stored, just accumulated for statistical purposes. So, there's no data structure at the end that has 1.3 million entries in it. Maybe just one or two scalars that calculate the density of a given column (is the data blank or is it meaningful? etc).

The fastest method to date (from my informal testing) looks like this:
code:
my $first_col  = substr $_, 0, 10;
my $second_col = substr $_, 10, 2;
my $third_col  = substr $_, 12, 6;
# and so on
Edit: Of course all of these methods were inside of a while loop, so pretend you already have $_. Unless you think some non-while method is even faster. Like reading the file raw with sysreads and stuff?

Filburt Shellbach
Nov 6, 2007

Apni tackat say tujay aaj mitta juu gaa!

Triple Tech posted:

The fastest method to date (from my informal testing) looks like this:
code:
my $first_col  = substr $_, 0, 10;
my $second_col = substr $_, 10, 2;
my $third_col  = substr $_, 12, 6;
# and so on

I'd really be surprised if this wasn't the fastest method:

code:
my @cols = unpack 'a10a2a6...', $_;

Erasmus Darwin
Mar 6, 2001

Triple Tech posted:

Unless you think some non-while method is even faster. Like reading the file raw with sysreads and stuff?

I would think that would give some improvements because of the line length. I'm a little fuzzy on the behind-the-scenes details, but I suspect Perl's traditional, user-friendly file reading mechanism might be inefficient when dealing with long lines. I wouldn't be surprised if there's a lot of string reallocation going on behind the scenes.

Sartak posted:

I'd really be surprised if this wasn't the fastest method:

code:
my @cols = unpack 'a10a2a6...', $_;

Wouldn't Perl still have to parse the 'a10a2a6...' string for every line of the file? I suspect that might be slower than the brute-force substr method which gets turned into bytecode once.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

Sartak posted:

I'd really be surprised if [unpack] wasn't the fastest method:
I think it has something to do with the overhead of list context (which genericadmin maybe have alluded to previously). On 20k rows, the substr method runs in 2 seconds. The unpack method, with no assignment attached to it, runs in 2.9 seconds.

PS, using Unix to count Brawl trophies, very classy.

heeen
May 14, 2005

CAT NEVER STOPS
^^^ might look a bit hackish, but maybe ($col1, $col2, ..., $col160) performs faster?

I would really like to see a simple benchmark of the various methods posted so far, including the regexp. I think the regexp wouldn't perform too badly, since it gets compiled once at compile time and should be reasonably fast after that. If you build it with fixed length matches of any character it won't have backtracking or complicated matching cases.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Okay, here you go, the code expires in 24 hours.

http://rafb.net/p/EvRzEL74.html
code:
#
                   Rate by_regexp by_unpack_array by_unpack_scalar by_substr_scalar
by_regexp        2.19/s        --            -36%             -60%             -84%
by_unpack_array  3.42/s       56%              --             -37%             -74%
by_unpack_scalar 5.43/s      148%             59%               --             -59%
by_substr_scalar 13.4/s      511%            292%             147%               --
I think the regexp one is blindingly fast if you take the assignment away, but obviously the assignment is required because we need to stare at the data. Rarely do we need every column, so I'm wondering if there are further optimizations to be done.

Edit: A dummy data generator is included at the bottom of the script.

permanoob
Sep 28, 2004

Yeah it's a lot like that.
I am fairly dumb and posted this in SH/SC. But I need a hand with this script that a former employee wrote, that's supposed to extract data from a csv file and write the results to another csv. I can run the script, but it's not outputting the file. I've looked over it with my limited understanding of programming and can't see anything that stands out as being a problem. Any help on this would be greatly appreciated. Here's a link to the code:

http://rafb.net/p/77NHz999.html

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
It's not dying at the open RETURNS? Have you tried printing out (or dieing) the value of $dest_file immediately after line 8? And then looking for files of that name? Are you taking into account working/current directories?

Erasmus Darwin
Mar 6, 2001

permanoob posted:

I can run the script, but it's not outputting the file.

If nothing's late, it won't output a file.

Also, the code uses '=' as a comparison operator instead of '=='. However, by coincidence because of how the code is structured, that bug doesn't break anything (unless months and days in the data are 0-based instead of 1-based).

Edit: Also, a lot of that crazy date comparison logic can be simplified.

code:
if ($year1 * 600 + $mn1 * 40 + $day1 < $year2 * 600 + $mn2 * 40 + $day2) {
  # It's late.  Write out the return.
} else {
  # It's not late.  Go to the next line.
}

Erasmus Darwin fucked around with this message at 20:06 on Apr 8, 2008

permanoob
Sep 28, 2004

Yeah it's a lot like that.

Triple Tech posted:

It's not dying at the open RETURNS? Have you tried printing out (or dieing) the value of $dest_file immediately after line 8? And then looking for files of that name? Are you taking into account working/current directories?

It isn't dying and open RETURNS no. I just tried printing and dieing the $dest_file value. Print outputs the correct file name it should be writing to. Die is killing the script with

quote:

04_08_2008XXXXXXXXXX.csv at refund.pl line 9.

And I'm not finding any files on my computer with that name.

When I run the script, it gives this output

quote:

15746 iterations
171 failures!

There are 15746 lines in the csv I'm trying to query and very likely would be 171 returns on what I'm looking for. It's just not making the return file.

<deleted user>

permanoob posted:

It isn't dying and open RETURNS no. I just tried printing and dieing the $dest_file value. Print outputs the correct file name it should be writing to. Die is killing the script with

It sounds like the open() at line 100 is simply not being reached. Is the print statement on line 102 ever outputting anything? Try putting a print statement after line 56 to see if you ever make it that far.

The script looks pretty error-prone, so you may just want to break out the debugger.

TiMBuS
Sep 25, 2007

LOL WUT?

Zombywuf posted:

I looked up fastCGI, and all I found was something claiming to be faster than Netscape 1.1. Assuming this is the fastCGI you're thinking of, why not just use mod_perl? I have no idea if fastCGI is any good, but that site rings my BS meter up to 10.
You haven't heard of FastCGI before? Wow.
It's not a product, it's a specification. FastCGI is the method of loading a static process and feeding each server request to it. The process returns the output which the server sends back through to the user. Most servers come with it as standard (including apache, lighttpd, abyss webserver, IIS).
Heck, I think FCGI and CGI::Fast come with a standard perl distribution, too.

As for mod_perl.
mod_perl is apache specific, slower if you use registry handlers, usually not installed on most webhosts, and generally requires more screwing around if you want to take advantage of the static behaviour of your script. You need to throw everything in a BEGIN{} block if you use mod_perl, whereas with FastCGI all you have to do is load all your static stuff before you enter the FCGI loop.

So to answer your question "why not just use mod_perl?" more directly:
Because fcgi is better.

Zombywuf posted:

Also, imports with long names?
CGI::Application::Plugin::AutoRunmode::FileDelegate

nrichprime
May 29, 2004

permanoob posted:

I am fairly dumb and posted this in SH/SC. But I need a hand with this script that a former employee wrote, that's supposed to extract data from a csv file and write the results to another csv. I can run the script, but it's not outputting the file. I've looked over it with my limited understanding of programming and can't see anything that stands out as being a problem. Any help on this would be greatly appreciated. Here's a link to the code:

http://rafb.net/p/77NHz999.html

There's a lot of
code:
if ($var1 = $var2)
instead of
code:
if ($var1 == $var2)
statements peppered through the code, these may be the source of some of your problems.

Zombywuf
Mar 29, 2008

TiMBuS posted:

So to answer your question "why not just use mod_perl?" more directly:
Because fcgi is better.
Fair enough, fastcgi.org could do with some serious work then.

quote:

CGI::Application::Plugin::AutoRunmode::FileDelegate

The question mark was meant to imply "so what?" It fits on one line of a standard 80 column terminal.

Erasmus Darwin
Mar 6, 2001

nrichprime posted:

There's a lot of
code:
if ($var1 = $var2)
instead of
code:
if ($var1 == $var2)
statements peppered through the code, these may be the source of some of your problems.

Except that it's structured as:

code:
if ($var1 > $var2) {
} elsif ($var1 < $var2) {
} elsif ($var1 = $var2) {
}
Which means it works by coincidence, as long as $var1 and $var2 aren't 0 (which is true if the CSV is using normal, human-readable, 1-based dates). So even though it's wrong, I don't think it's the source of the problem unless the CSV uses 0-based dates (e.g. January = 0 rather than 1).

TiMBuS
Sep 25, 2007

LOL WUT?

Zombywuf posted:

Fair enough, fastcgi.org could do with some serious work then.
You mean because the 'official' CGI spec site is so much better? Or perhaps the SCGI page better tickles your fancy?

Zombywuf posted:

The question mark was meant to imply "so what?" It fits on one line of a standard 80 column terminal.
It's not just the length, it's the fact I need an extension to a plugin to an application that I inherit just to do something I can punch out in ~10 lines.

However the length itself does get on my nerves too. Have you seen the synopsis for it?
code:
package MyApp;
use base 'CGI::Application';
use CGI::Application::Plugin::AutoRunmode 
        qw [ cgiapp_prerun];
use CGI::Application::Plugin::AutoRunmode::FileDelegate();
        
        sub setup{
                my ($self) = @_;
                my $delegate = new CGI::Application::Plugin::AutoRunmode::FileDelegate
                        ('/path/to/runmodes')
                $self->param('::Plugin::AutoRunmode::delegate' => $delegate);
        }
Jesus.

permanoob
Sep 28, 2004

Yeah it's a lot like that.
There were two problems in this situation. Both have been worked out. First, the file format I was pulling from UPS was different than the previous employee had been pulling in the past. The field definitions weren't correct and it was failing on lines 26-31.

The second problem was the "sanity check" the guy had written in 27-31. I have a friend that knows some Perl basics and he's been looking over this. What he did was took out 27-31 and replaced it with:
code:
		if ($iterations < 1) {
			next;
		} else {
It's pulling the correct data out now but it's formatting it in an undesirable fashion. It's putting a break at the end of each row and putting quotations around the return info from $field5. I can't see where or why it's doing that.

Zombywuf
Mar 29, 2008

TiMBuS posted:

You mean because the 'official' CGI spec site is so much better? Or perhaps the SCGI page better tickles your fancy?

It doesn't have an article about being faster than Netscape 1.1 though.

quote:

However the length itself does get on my nerves too. Have you seen the synopsis for it?
code:
package MyApp;
use base 'CGI::Application';
use CGI::Application::Plugin::AutoRunmode 
        qw [ cgiapp_prerun];
use CGI::Application::Plugin::AutoRunmode::FileDelegate();
        
        sub setup{
                my ($self) = @_;
                my $delegate = new CGI::Application::Plugin::AutoRunmode::FileDelegate
                        ('/path/to/runmodes')
                $self->param('::Plugin::AutoRunmode::delegate' => $delegate);
        }
Jesus.

Yeah that's pretty bad, url rewriting would be simpler.

6174
Dec 4, 2004
How would I rewrite this C++ function in a Perlish manner?

code:
bool line_match(const string base, const string insert)
{
     if (base.substr(74, 8) == insert.substr(74, 8)) {
          if (base.substr(89, 8) == insert.substr(89, 8)) {
               if (base.substr(114, 8) == insert.substr(114, 8)) {
                    return true;
               }
          }
     }
     return false;
}

Erasmus Darwin
Mar 6, 2001

6174 posted:

How would I rewrite this C++ function in a Perlish manner?

code:
sub line_match($$) {
  my ($base, $insert) = @_;

  return (substr($base,  74, 8) eq substr($insert,  74, 8)) &&
         (substr($base,  89, 8) eq substr($insert,  89, 8)) &&
         (substr($base, 114, 8) eq substr($insert, 114, 8));
}
It's also worth noting that the C++ version can be rewritten the same way.

6174
Dec 4, 2004

Erasmus Darwin posted:

code:
sub line_match($$) {
  my ($base, $insert) = @_;

  return (substr($base,  74, 8) eq substr($insert,  74, 8)) &&
         (substr($base,  89, 8) eq substr($insert,  89, 8)) &&
         (substr($base, 114, 8) eq substr($insert, 114, 8));
}
It's also worth noting that the C++ version can be rewritten the same way.

Thanks. I was expecting something that didn't translate so directly.

Adbot
ADBOT LOVES YOU

TiMBuS
Sep 25, 2007

LOL WUT?

Zombywuf posted:

It doesn't have an article about being faster than Netscape 1.1 though.
Yes because I expect an article written 13 years ago to have up-to-date technologies or it just isn't worth reading. I mean never mind the fact that the text still provides a well written explanation of what FCGI is, IT'S OLD!.

Zombywuf posted:

Yeah that's pretty bad, url rewriting would be simpler.
URL rewriting has nothing to do with separating 'run modes' into multiple files.

Troll elsewhere, I came here for advice from people who knew what they were talking about (thanks, genericadmin). I didn't post here to get replies I could have just gotten from 7chan.

  • Locked thread