|
genericadmin posted:Catalyst started off great, but it has become a gigantic bloated pig. The whole project went into the woods after a while. genericadmin posted:Continuity is a neat idea, but it's not extremely useful since its going to have much steeper memory requirements than the traditional approach. That's a bad thing in a perl environment.
|
# ? Apr 7, 2008 08:14 |
|
|
# ? May 7, 2024 20:15 |
|
Had a humbling moment just now... There's this employee here who's largely useless, I'm not sure what he does. But I was called in as programmming muscle for his project to help him out. And he tries to dabble in Perl to learn. So I do my report and I send it off, and then he says, oh, look, my numbers are the same as yours. I'm like, how is that possible? I look over at his screen and sure enough the stats are accurate. And I see multiple run throughs of it on his screen, I'm like, how long does that take you to run? He's like, oh just about a minute. And I'm thinking, no way, When I run it, it's at least seven minutes, and I already went through optimizing my script for speed. Long story short, the data was fixed width, 1.3 million rows. I made a widths array, built up an unpack string, and used that to split out the data into an array. What he did was have 160 lines (that's how many columns of data there were) of my $scalar = substr. His ran about 2x as fast as mine. I was floored. Brute force won this battle. Edit: I was trying to come up with a compromise. If I had the data tables ready, could I just build up the substr'ish Perl code itself and eval that for speed? For some reason, doing an assignment inside of an eval isn't saving the scalar. Is it supposed to? Triple Tech fucked around with this message at 19:17 on Apr 7, 2008 |
# ? Apr 7, 2008 19:06 |
|
how aboutcode:
|
# ? Apr 7, 2008 19:28 |
|
Post your code?
tef fucked around with this message at 19:34 on Apr 7, 2008 |
# ? Apr 7, 2008 19:32 |
|
I tried jamming the substr's into a list (@array = substr, substr, substr), that made it slow. The col_width technique that heeen posted wouldn't work because each column is different. And I'm pretty sure using a regexp would be universally slower than unpack or substr. Edit: I also tried pushing them onto a stack instead of having Perl make that temp list, but it was still slow. I can't believe scalars are so much faster than arrays... Triple Tech fucked around with this message at 19:50 on Apr 7, 2008 |
# ? Apr 7, 2008 19:44 |
|
(from irc) Eval is not run in a sandbox: code:
tef fucked around with this message at 19:50 on Apr 7, 2008 |
# ? Apr 7, 2008 19:47 |
|
Triple Tech posted:I tried jamming the substr's into a list (@array = substr, substr, substr), that made it slow. The col_width technique that heeen posted wouldn't work because each column is different. And I'm pretty sure using a regexp would be universally slower than unpack or substr. I'm not following your description of the data. I think you are saying: 1.3 million records, each record has 160 fields, and you know the width of each field? You should be able to smoke through that. Are the fields all text or are some binary? Is the data in a flat file? How are you reading from that (seek/read I hope)? quote:Edit: I also tried pushing them onto a stack instead of having Perl make that temp list, but it was still slow. I can't believe scalars are so much faster than arrays... You're probably doing this already, but make sure you are using array references and not pouring data through list context needlessly. Here's why: code:
code:
|
# ? Apr 7, 2008 21:16 |
|
TiMBuS posted:I went hunting today, and I concur. All I really want are sessions handled for me, a pluggable, easy to manage template system and automatic dispatching. Oh and the ability to specify different methods of dispatch handling. And sessions and templating are (rightly) not even provided by the framework. I think people go in this progression of "acknowledge anarchy -> adopt framework -> learn patterns -> rewrite large chunks of framework -> wonder why they need a framework". And don't even get me started on template systems.
|
# ? Apr 7, 2008 21:24 |
|
re: Frameworks What about CGI::Application?
|
# ? Apr 7, 2008 22:23 |
|
genericadmin posted:And sessions and templating are (rightly) not even provided by the framework. I think people go in this progression of "acknowledge anarchy -> adopt framework -> learn patterns -> rewrite large chunks of framework -> wonder why they need a framework". So in the end I just used my own crappy 'framework' complete with its post/get mixing of data and you know what I don't give a drat. User browsers don't even know the data is being mixed anyway since I'm using URL mapping rather than "?page=thunk" so wc3 can suck it with their "OH GOD UNDEFINED BEHAVIOUR". genericadmin posted:And don't even get me started on template systems. Template Toolkit and Mason forgot what the hell a template was a long time ago. Zombywuf posted:re: Frameworks It was probably right at this point I thought "hey wait a second this is exactly what my crappy framework does except my framework is 8x smaller, probably faster, doesn't have a 51 character import and it's built specifically for fastCGI" and I just used that instead.
|
# ? Apr 7, 2008 23:00 |
|
I find your requirements strange and disturbing. But hey, whatever floats your boat.
|
# ? Apr 7, 2008 23:12 |
|
How do I convert an a4 value in perl (DWORD) to a string of numbers based on the values of the bytes? I tried sprintf("%08x", $var); and it just returns 8 0s. I tried bit shifts with & and other far more work than it should be solutions. I'm willing to admit that I am overlooking something stupidly simple. Possibly pack. Edit: I am so confused and want to go running home to C right now. I think it's treating $var as a string, but it has a 4 byte value, so the resulting string is completely messed up. I think I need to use unpack but I have no clue how, or if that's even what I have to use. Edit2: Why do the people who coded this use a4 for something that isn't a string Khorne fucked around with this message at 01:54 on Apr 8, 2008 |
# ? Apr 8, 2008 01:43 |
|
Zombywuf posted:I find your requirements strange and disturbing. fastCGI, url->handler mapping and the ability to delegate tasks over multiple files/namespaces are strange and disturbing requirements?
|
# ? Apr 8, 2008 01:44 |
|
Khorne posted:How do I convert an a4 value in perl (DWORD) to a string, or at least a decimal/hexadecimal number that I can pad some how? I'm not sure what you need to do, but it sounds like you need : http://perldoc.perl.org/functions/unpack.html http://perldoc.perl.org/functions/pack.html
|
# ? Apr 8, 2008 01:54 |
|
Khorne posted:How do I convert an a4 value in perl (DWORD) to a string, or at least a decimal/hexadecimal number that I can pad some how? Check out pack() and unpack(). code:
|
# ? Apr 8, 2008 01:55 |
|
Zombywuf posted:I find your requirements strange and disturbing. URL->action mapping is not a strange thing to want at all. It is much more inline with HTTP to make application requests by path than by query string, and having a good URL structure is very important for things like Google indexing. TiMBuS posted:Template Toolkit and Mason forgot what the hell a template was a long time ago. Yeah, both of these could be put out to pasture. They are mostly relics of the big transition to embedded scripting many years ago, back when ASP and PHP first really caught on. Now the fad is frameworks. I wonder what's next after that fizzles. It's amazing that Perl has managed not to change at all during any of this.
|
# ? Apr 8, 2008 02:08 |
|
tef posted:I'm not sure what you need to do, but it sounds like you need : I still don't understand why they'd make a long an a4, can anyone explain the benefits, if any, of doing so? Khorne fucked around with this message at 03:31 on Apr 8, 2008 |
# ? Apr 8, 2008 02:40 |
|
TiMBuS posted:fastCGI, url->handler mapping and the ability to delegate tasks over multiple files/namespaces are strange and disturbing requirements? I looked up fastCGI, and all I found was something claiming to be faster than Netscape 1.1. Assuming this is the fastCGI you're thinking of, why not just use mod_perl? I have no idea if fastCGI is any good, but that site rings my BS meter up to 10. Also, imports with long names?
|
# ? Apr 8, 2008 13:43 |
|
genericadmin posted:I'm not following your description of the data. I think you are saying: 1.3 million records, each record has 160 fields, and you know the width of each field? You should be able to smoke through that. Are the fields all text or are some binary? It's a text file with about 1.3 million rows. Each row is like 1200 characters long. This row has 160 columns, each of different lengths. Given some magical source of the length of each column (like an array or a PDF of documentation), what's the fastest way of reading in the data? None of the fields are binary. For our current purposes, the data doesn't need to be stored, just accumulated for statistical purposes. So, there's no data structure at the end that has 1.3 million entries in it. Maybe just one or two scalars that calculate the density of a given column (is the data blank or is it meaningful? etc). The fastest method to date (from my informal testing) looks like this: code:
|
# ? Apr 8, 2008 14:44 |
|
Triple Tech posted:The fastest method to date (from my informal testing) looks like this: I'd really be surprised if this wasn't the fastest method: code:
|
# ? Apr 8, 2008 14:51 |
|
Triple Tech posted:Unless you think some non-while method is even faster. Like reading the file raw with sysreads and stuff? I would think that would give some improvements because of the line length. I'm a little fuzzy on the behind-the-scenes details, but I suspect Perl's traditional, user-friendly file reading mechanism might be inefficient when dealing with long lines. I wouldn't be surprised if there's a lot of string reallocation going on behind the scenes. Sartak posted:I'd really be surprised if this wasn't the fastest method: Wouldn't Perl still have to parse the 'a10a2a6...' string for every line of the file? I suspect that might be slower than the brute-force substr method which gets turned into bytecode once.
|
# ? Apr 8, 2008 15:00 |
|
Sartak posted:I'd really be surprised if [unpack] wasn't the fastest method: PS, using Unix to count Brawl trophies, very classy.
|
# ? Apr 8, 2008 15:02 |
|
^^^ might look a bit hackish, but maybe ($col1, $col2, ..., $col160) performs faster? I would really like to see a simple benchmark of the various methods posted so far, including the regexp. I think the regexp wouldn't perform too badly, since it gets compiled once at compile time and should be reasonably fast after that. If you build it with fixed length matches of any character it won't have backtracking or complicated matching cases.
|
# ? Apr 8, 2008 15:03 |
|
Okay, here you go, the code expires in 24 hours. http://rafb.net/p/EvRzEL74.html code:
Edit: A dummy data generator is included at the bottom of the script.
|
# ? Apr 8, 2008 16:26 |
|
I am fairly dumb and posted this in SH/SC. But I need a hand with this script that a former employee wrote, that's supposed to extract data from a csv file and write the results to another csv. I can run the script, but it's not outputting the file. I've looked over it with my limited understanding of programming and can't see anything that stands out as being a problem. Any help on this would be greatly appreciated. Here's a link to the code: http://rafb.net/p/77NHz999.html
|
# ? Apr 8, 2008 19:36 |
|
It's not dying at the open RETURNS? Have you tried printing out (or dieing) the value of $dest_file immediately after line 8? And then looking for files of that name? Are you taking into account working/current directories?
|
# ? Apr 8, 2008 19:52 |
|
permanoob posted:I can run the script, but it's not outputting the file. If nothing's late, it won't output a file. Also, the code uses '=' as a comparison operator instead of '=='. However, by coincidence because of how the code is structured, that bug doesn't break anything (unless months and days in the data are 0-based instead of 1-based). Edit: Also, a lot of that crazy date comparison logic can be simplified. code:
Erasmus Darwin fucked around with this message at 20:06 on Apr 8, 2008 |
# ? Apr 8, 2008 19:58 |
|
Triple Tech posted:It's not dying at the open RETURNS? Have you tried printing out (or dieing) the value of $dest_file immediately after line 8? And then looking for files of that name? Are you taking into account working/current directories? It isn't dying and open RETURNS no. I just tried printing and dieing the $dest_file value. Print outputs the correct file name it should be writing to. Die is killing the script with quote:04_08_2008XXXXXXXXXX.csv at refund.pl line 9. And I'm not finding any files on my computer with that name. When I run the script, it gives this output quote:15746 iterations There are 15746 lines in the csv I'm trying to query and very likely would be 171 returns on what I'm looking for. It's just not making the return file.
|
# ? Apr 8, 2008 20:53 |
|
permanoob posted:It isn't dying and open RETURNS no. I just tried printing and dieing the $dest_file value. Print outputs the correct file name it should be writing to. Die is killing the script with It sounds like the open() at line 100 is simply not being reached. Is the print statement on line 102 ever outputting anything? Try putting a print statement after line 56 to see if you ever make it that far. The script looks pretty error-prone, so you may just want to break out the debugger.
|
# ? Apr 9, 2008 00:55 |
|
Zombywuf posted:I looked up fastCGI, and all I found was something claiming to be faster than Netscape 1.1. Assuming this is the fastCGI you're thinking of, why not just use mod_perl? I have no idea if fastCGI is any good, but that site rings my BS meter up to 10. It's not a product, it's a specification. FastCGI is the method of loading a static process and feeding each server request to it. The process returns the output which the server sends back through to the user. Most servers come with it as standard (including apache, lighttpd, abyss webserver, IIS). Heck, I think FCGI and CGI::Fast come with a standard perl distribution, too. As for mod_perl. mod_perl is apache specific, slower if you use registry handlers, usually not installed on most webhosts, and generally requires more screwing around if you want to take advantage of the static behaviour of your script. You need to throw everything in a BEGIN{} block if you use mod_perl, whereas with FastCGI all you have to do is load all your static stuff before you enter the FCGI loop. So to answer your question "why not just use mod_perl?" more directly: Because fcgi is better. Zombywuf posted:Also, imports with long names?
|
# ? Apr 9, 2008 08:21 |
|
permanoob posted:I am fairly dumb and posted this in SH/SC. But I need a hand with this script that a former employee wrote, that's supposed to extract data from a csv file and write the results to another csv. I can run the script, but it's not outputting the file. I've looked over it with my limited understanding of programming and can't see anything that stands out as being a problem. Any help on this would be greatly appreciated. Here's a link to the code: There's a lot of code:
code:
|
# ? Apr 9, 2008 08:38 |
|
TiMBuS posted:So to answer your question "why not just use mod_perl?" more directly: quote:CGI::Application::Plugin::AutoRunmode::FileDelegate The question mark was meant to imply "so what?" It fits on one line of a standard 80 column terminal.
|
# ? Apr 9, 2008 10:10 |
|
nrichprime posted:There's a lot of Except that it's structured as: code:
|
# ? Apr 9, 2008 13:38 |
|
Zombywuf posted:Fair enough, fastcgi.org could do with some serious work then. Zombywuf posted:The question mark was meant to imply "so what?" It fits on one line of a standard 80 column terminal. However the length itself does get on my nerves too. Have you seen the synopsis for it? code:
|
# ? Apr 9, 2008 14:27 |
|
There were two problems in this situation. Both have been worked out. First, the file format I was pulling from UPS was different than the previous employee had been pulling in the past. The field definitions weren't correct and it was failing on lines 26-31. The second problem was the "sanity check" the guy had written in 27-31. I have a friend that knows some Perl basics and he's been looking over this. What he did was took out 27-31 and replaced it with: code:
|
# ? Apr 9, 2008 18:52 |
|
TiMBuS posted:You mean because the 'official' CGI spec site is so much better? Or perhaps the SCGI page better tickles your fancy? It doesn't have an article about being faster than Netscape 1.1 though. quote:However the length itself does get on my nerves too. Have you seen the synopsis for it? Yeah that's pretty bad, url rewriting would be simpler.
|
# ? Apr 10, 2008 10:43 |
|
How would I rewrite this C++ function in a Perlish manner?code:
|
# ? Apr 10, 2008 17:45 |
|
6174 posted:How would I rewrite this C++ function in a Perlish manner? code:
|
# ? Apr 10, 2008 18:33 |
|
Erasmus Darwin posted:
Thanks. I was expecting something that didn't translate so directly.
|
# ? Apr 10, 2008 18:36 |
|
|
# ? May 7, 2024 20:15 |
|
Zombywuf posted:It doesn't have an article about being faster than Netscape 1.1 though. Zombywuf posted:Yeah that's pretty bad, url rewriting would be simpler. Troll elsewhere, I came here for advice from people who knew what they were talking about (thanks, genericadmin). I didn't post here to get replies I could have just gotten from 7chan.
|
# ? Apr 11, 2008 06:09 |