Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Cheesus
Oct 17, 2002

Let us retract the foreskin of ignorance and apply the wirebrush of enlightenment.
Yam Slacker

German Joey posted:

the more complicated a parsing method can handle, the slower it'll be.
That's the theory I've quietly believed since he brought it up. He's given me some examples to try using Yapp and I'll be shocked if they're faster than the regexes I'm using in my test case.

quote:

the main question i have for your coworker then is: why do you need something more powerful than a regex for parsing a loving log file? and why do you need something more efficient? how big is this file, in the gigabytes?
Potentially the amount of data we'll be processing will be in the gigabytes of data but most likely per day.

Adbot
ADBOT LOVES YOU

Strict 9
Jun 20, 2001

by Y Kant Ozma Post
Regex question: How the hell do I remove weird ASCII characters? Like upside-down question marks. I've tried a copy and paste, and that doesn't work. I've tried "\xBF", which is the hex code, and that doesn't work either. It just won't match the drat things.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Have you tried making an inverted character class? "Substitute all NOT A-Z..."

EvilJay
Jul 25, 2005

Triple Tech posted:

Have you tried making an inverted character class? "Substitute all NOT A-Z..."

I had to do this like a year ago. It was ALOT easier to do exactly this; only allow what you need.

Zombywuf
Mar 29, 2008

Strict 9 posted:

Regex question: How the hell do I remove weird ASCII characters? Like upside-down question marks. I've tried a copy and paste, and that doesn't work. I've tried "\xBF", which is the hex code, and that doesn't work either. It just won't match the drat things.

Upside down question marks are not ASCII. Why do you need to remove them? Usually when I've seen code that does this kind of thing, it's doing something wrong.

npe
Oct 15, 2004
The "upside down question mark" is probably your editor/terminal/whatever program's way of indicating that there is an unsupported character of some sort there (wide unicode character in a terminal expecting 7bit ASCII or latin-1). You should examine the data in hex and see what it really is, it may be iso-8859 of some form, or unicode.

mister_gosh
May 24, 2002

How can I NOT store the path for the files I put in a zip archive?

In other words, when you open the zip file, each file has a path of abc/def (see $myDir below). I don't want a user extracting the zip and having it generate the directories abc/def. I just want it to extract to a directory equaling the filename of the zip.

For what it's worth, here's the code-

code:
use Archive::Zip qw( :ERROR_CODES :CONSTANTS );
              
my $myDir = "C:/abc/def";
          
my $zip = Archive::Zip->new();
my $zipped;
my @fileList;
find sub { push @fileList, $File::Find::name if basename($File::Find::name) =~ /\.xml/ }, $myDir;

foreach $file (@fileList) {
   $zipped = $zip->addFile( $file );
}
if($zip->writeToFileNamed( "C:/sa.zip" ) != AZ_OK) {
   print "you loving bitch\n";
}
Thanks in advance!

Zombywuf
Mar 29, 2008

yaoi prophet posted:

The "upside down question mark" is probably your editor/terminal/whatever program's way of indicating that there is an unsupported character of some sort there (wide unicode character in a terminal expecting 7bit ASCII or latin-1). You should examine the data in hex and see what it really is, it may be iso-8859 of some form, or unicode.

Unicode is not a character encoding.

npe
Oct 15, 2004
Yeah, I meant UTF-8.

tef
May 30, 2004

-> some l-system crap ->

mister_gosh posted:

How can I NOT store the path for the files I put in a zip archive?

Your code doesn't run for me, but there is a second parameter to addFiles which can specify the name of the zip in the file.

I.e do $zip->addFile( $file,$file_name_without_directories )

http://search.cpan.org/dist/Archive-Zip/lib/Archive/Zip.pm#Zip_Archive_Member_Operations

fansipans
Nov 20, 2005

Internets. Serious Business.

mister_gosh posted:

How can I NOT store the path for the files I put in a zip archive?

In other words, when you open the zip file, each file has a path of abc/def (see $myDir below). I don't want a user extracting the zip and having it generate the directories abc/def. I just want it to extract to a directory equaling the filename of the zip.

For what it's worth, here's the code-

code:
use Archive::Zip qw( :ERROR_CODES :CONSTANTS );
              
my $myDir = "C:/abc/def";
          
my $zip = Archive::Zip->new();
my $zipped;
my @fileList;
find sub { push @fileList, $File::Find::name if basename($File::Find::name) =~ /\.xml/ }, $myDir;

foreach $file (@fileList) {
   $zipped = $zip->addFile( $file );
}
if($zip->writeToFileNamed( "C:/sa.zip" ) != AZ_OK) {
   print "you loving bitch\n";
}
Thanks in advance!

From the Archive::Zip documentation, it appears that addFile will also accept a new name for the file once it's added. Keep in mind "filename" for zips includes their path, so you should be able to do:

code:
  $zip->addFile("C:/punk/junk/funk.txt", "funk.txt");
or with File::Basename:

code:
  $zip->addFile($file,basename($file));
Which is funny, because I already had those two URLs open in my browser :psyduck:

mister_gosh
May 24, 2002

fansipans posted:

Which is funny, because I already had those two URLs open in my browser :psyduck:

Haha, thanks guys!

I saw the new name parameter but dismissed it immediately because I wanted the name to be the same in the zip file (duh!)...I hadn't considered that the filename I was passing in was the full name, it makes sense now.

LightI3ulb
Oct 28, 2006

Standard pleasure model.
I'm really confused as to why my script keeps giving me the errors

DBD::mysql::st fetchrow_hashref failed: fetch() without execute() at ./tmcseed.pl line 24.
DBD::mysql::st fetchrow_hashref failed: fetch() without execute() at ./tmcseed.pl line 38.

on my code
code:
#!/usr/bin/perl -w
use strict;

use DBI;
my $dbh = DBI->connect( 'DBI:mysql:database=CONSUMERS;host=localhost;
port=3306;', '<user>', '<password>', undef);
my $sth0;
my $sth1;
my $sth2;

for(my $i=0;$i<1;$i++){
   $sth0 = $dbh->prepare(qq[select EMAIL, HOME_PHONE from RAW_CONSUMERS
 where date(creation_date)=date_sub(curdate(), INTERVAL '$i' DAY)
 and CONTRACT_ID='46']);
   $sth0->execute or die $dbh->errstr;
   my $num_rows = $sth0->rows;
   $sth1 = $dbh->prepare(qq[SELECT PHONE FROM PHONE_DETAILS
 where date(SCRUB_DATE)=date_sub(curdate(), INTERVAL '$i' DAY) 
and (NATIONAL_DNC='1' or STATE_DNC='1' or WIRELESS_IN_BLOCKED_STATE='1')
and VALID='1']);
   $sth1->execute or die $dbh->errstr;
   $sth2 = $dbh->prepare(qq[select EMAIL from RAW_CONSUMERS where
 CONTRACT_ID <> 46 and date(CREATION_DATE)=
date_sub(curdate(), INTERVAL '$i' DAY)]);
   $sth2->execute or die $dbh->errstr;
   my $dnccount = 0;
   my $dupcount = 0;
   while(my $data = $sth0->fetchrow_hashref()){

      while (my $dncdata = $sth1->fetchrow_hashref()){
         my $phone = "";
         my $home_phone = "";
         if (defined $dncdata->{'PHONE'}){
            $phone = $dncdata->{'PHONE'};
         }
         if (defined $data->{'HOME_PHONE'}){
            $home_phone = $data->{'HOME_PHONE'};
         }
         if ($phone eq $home_phone){
            $dnccount++;
         }
      }

      while (my $dupdata = $sth2->fetchrow_hashref()){
         my $email = "";
         my $email2 = "";
         if (defined $dupdata->{'EMAIL'}){
            $email = $dupdata->{'EMAIL'};
         }
         if (defined $data->{'EMAIL'}){
            $email2 = $data->{'EMAIL'};
         }
         if ($email eq $email2){
            $dupcount++;
         }
      }
   }
   print $num_rows." ".$dnccount." ".$dupcount;
}
The lines in question are the nested while loops where it's trying to fetch from $sth1 and $sth2. The output shows no errors from the executes, so I'm really not sure where this is going wrong.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Your code makes my mind hurt. Have you just tried writing my $sth = $dbh->do($sql) or die; instead? I'm not intimate with DBI but that's how I've always done it and I haven't had any problems.

LightI3ulb
Oct 28, 2006

Standard pleasure model.
What's headache-inducing?

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
There's a lot of unnecessary punctuation, not enough whitespace, and the SQL is hard to read.

magimix
Dec 31, 2003

MY FAT WAIFU!!! :love:
She's fetish efficient :3:

Nap Ghost
I might be flying off in totally the wrong direction, but can MySQL actually handle multiple active statements on a single database handle?

I've always found I couldn't execute a second statement on a DB handle without first fetching or discarding the result-set of the previously executed statement.

LightI3ulb
Oct 28, 2006

Standard pleasure model.

magimix posted:

I might be flying off in totally the wrong direction, but can MySQL actually handle multiple active statements on a single database handle?

I've always found I couldn't execute a second statement on a DB handle without first fetching or discarding the result-set of the previously executed statement.

Yeah, I've done it a ton of times.

Mario Incandenza
Aug 24, 2000

Tell me, small fry, have you ever heard of the golden Triumph Forks?

LightI3ulb posted:

I'm really confused as to why my script keeps giving me the errors
Replace your C-style for-loop with until ($counter++ == 1) or something like that. Move my $sth0 and the rest of the $sth handles into the loop's scope. Replace undef in your DBI->connect line with {RaiseErrors => 1}. Inspect all the return values from ->execute calls, keeping in mind DBI may return '0E0' which is true in a boolean context but zero when evaluated numerically. A result with zero rows shouldn't cause DBI to complain about a missing execute statement, but whatever.

Also, use placeholders: my $rows = $dbh->prepare("INSERT INTO foo (bar) VALUES (?)")->execute("this replaces the question mark");

LightI3ulb
Oct 28, 2006

Standard pleasure model.
See, that's the strangest part. I checked the return values from the executes, and it gave me the number of rows it returned just like it was actually working, but still giving me execute errors.

syphon^2
Sep 22, 2004
Can anyone share a really simple method of timing my code? I want to see how long certain code takes (in milliseconds), but can't find a clear, easy way to do this. I've tinkered around with time(), localtime(), Time::HiRes, and Benchmark.

I'm basically looking for something that'll do this...
code:
my $start_time = startclock();
# ... do a bunch of stuff
my $end_time = stopclock();
my $duration_in_ms = duration($start_time, $end_time);
print "Your code took ". $duration_in_ms . "ms to run!\n";
Nothing seems to do this as easily as I'd hope. Benchmark came close, but I have no idea how to interpret the results it's giving me.

dagard
Mar 31, 2005
I've honestly almost always just used Time::HiRes for things like that. Something like:
code:
use Time::HiRes qw(tv_interval gettimeofday);

my $start_time = [gettimeofday];
#
# things go here
#
my $end_time = [gettimeofday];
print "It took " . tv_interval($start_time, $end_time) . "\n";
will generally give me a decent amount of accuracy for how long (things) took.

Cheesus
Oct 17, 2002

Let us retract the foreskin of ignorance and apply the wirebrush of enlightenment.
Yam Slacker
Yes, Time::HiRes is a great way to time certain parts of code. I've been throwing results in a hash so I can time different items like:
code:
$t0 = [Time::HiRes::gettimeofday];
...
if( !defined( $TimedEvals{key} ) ){
   $TimedEvals{key} = 0;
}
$TimedEvals{key} += Time::HiRes::tv_interval( $t0 );
I've been using it all week to time code for my "regular expressions are slow you should use syntax trees instead" task. Which has had predictable results.

Stabby McDamage
Dec 11, 2005

Doctor Rope

magimix posted:

I might be flying off in totally the wrong direction, but can MySQL actually handle multiple active statements on a single database handle?

I've always found I couldn't execute a second statement on a DB handle without first fetching or discarding the result-set of the previously executed statement.

Yes and no. MySQL itself doesn't seem to support this, but library trickery makes it seem like it works.

By default, the entire result set is streamed to the DBI library at execute time. This is "mysql_store_result" mode. Because the whole result set is cached at the client before any fetches occur, MySQL never sees more than one query going on at a time.

This isn't a good solution when you want to iterate very large data sets that don't fit in RAM, as I had to do recently. For that, you need to use "mysql_use_result", which transfers records as you fetch them. The problem with this is that you can't do other queries inside of that loop, since the outer query is still in progress.

I couldn't find an elegant solution to this, so I just read my outer query out to disk first, then iterated from disk for my small inner queries, which worked fine in mysql_store_result mode.

Basically, if you can't do this for lack of memory:

code:
my $sth_giant_query = $dbh->prepare("SELECT * FROM bigtable") or die();
my $sth_inner_query = $dbh->prepare("UPDATE othertable SET thing=? WHERE id=?") or die();
$sth_giant_query->execute();
while ($row = $sth_giant_query->fetchrow_hashref()) {
  $sth_inner_query->execute(function($row),$row->{id});
}
Then you might try to change that first line so as to not cache the giant query results:

code:
my $sth_giant_query = $dbh->prepare("SELECT * FROM bigtable",{ "mysql_use_result" => 1}) or die();
But this will fail on the $sth_inner_query execute because multiple queries can't run at the same time on the MySQL side, so you might end up having to do something like this:

code:
my $sth_giant_query = $dbh->prepare("SELECT * FROM bigtable",{ "mysql_use_result" => 1}) or die();
my $sth_inner_query = $dbh->prepare("UPDATE othertable SET thing=? WHERE id=?") or die();
$sth_giant_query->execute();
while ($row = $sth_giant_query->fetchrow_hashref()) {
  write $row to disk
}

while ($row = read from disk) {
  $sth_inner_query->execute(function($row),$row->{id});
}
This kind of crap is what made me give up MySQL for any kind of data analysis. MySQL will do for powering a dopey blog or something, but nothing serious.

LightI3ulb
Oct 28, 2006

Standard pleasure model.
Is the result set in the stored in the dbh or the sth? If it's dbh, then I understand why I had that problem. Regardless, I realised I was able to do that entire comparison with mysql joins. Thanks for the help.

syphon^2
Sep 22, 2004

dagard posted:

I've honestly almost always just used Time::HiRes for things like that.
I've got a stupid question...

From http://search.cpan.org/~jhi/Time-HiRes-1.9715/HiRes.pm

quote:

tv_interval

tv_interval ( $ref_to_gettimeofday [, $ref_to_later_gettimeofday] )

Returns the floating seconds between the two times, which should have been returned by gettimeofday(). If the second argument is omitted, then the current time is used.
Uhhh, what are "floating seconds"? Ironically, a google search just refers you back to that Time::HiRes doc, which doesn't explain it. I'm able to tell that querying this server took my app "0.315718" floating seconds... but I have no idea how long that is in relevant time.

dagard
Mar 31, 2005

syphon^2 posted:

I've got a stupid question...

From http://search.cpan.org/~jhi/Time-HiRes-1.9715/HiRes.pm
Uhhh, what are "floating seconds"? Ironically, a google search just refers you back to that Time::HiRes doc, which doesn't explain it. I'm able to tell that querying this server took my app "0.315718" floating seconds... but I have no idea how long that is in relevant time.

That just means it's .315718 seconds. The stock perl sleep and time functions are only accurate to the nearest second, so they're integers, whereas the functions in Time::HiRes use floats, since they can sleep < 1 second (or measure more accurately than 1 second).

syphon^2
Sep 22, 2004
I've got another simple question (boy, this really shows how incompetent I am at reading documentation!)

I'm using Win32::TieRegistry to gather information from the registry of a series of remote servers. My script has 6 methods, each which need to connect to the registry and grab some data. After implementing Time::HiRes, I can see that this is causing quite a delay, as each of these methods grabs several keys, and the methods are called multiple times for various servers.

Since all the keys I'm interested are stored under HKLM/Software/CompanyName, I'd like to connect once and get a dump of all sub keys just once, and then let Perl iterate through the data structure rather than connect multiple times to get the data it needs.

Any ideas? It seems like the whole module is built around connecting to the registry each time for individual keys.

Cheesus
Oct 17, 2002

Let us retract the foreskin of ignorance and apply the wirebrush of enlightenment.
Yam Slacker
What's the best way to interface with a C++ library from Perl?

And once I'm there, what's the best way to return a C++ map as a hash?

I've been reading up on swig and while it seems pretty easy to call functions, I'm having a difficult time understanding how to return complex, dynamic structures.

Subotai
Jan 24, 2004

Cheesus posted:

What's the best way to interface with a C++ library from Perl?

And once I'm there, what's the best way to return a C++ map as a hash?

I've been reading up on swig and while it seems pretty easy to call functions, I'm having a difficult time understanding how to return complex, dynamic structures.

The most efficient way would be to use XS. It not not that easy to use though. A lot of Perl modules use it to interface with C++.

German Joey
Dec 18, 2004

Cheesus posted:

What's the best way to interface with a C++ library from Perl?

And once I'm there, what's the best way to return a C++ map as a hash?

I've been reading up on swig and while it seems pretty easy to call functions, I'm having a difficult time understanding how to return complex, dynamic structures.

depends what you're doing. Win32::Api, Inline::C (and Inline::CPP), and XS all offer different ways to interface with C++ with varying degrees of ease. if you don't want to do any post-processing, XS is probably your best bet of returning your data back to perl DIRECTLY as a hash, but if i were you i'd take a look at Win32::Api and Inline first. you can also use XS with Inline if you like what you see.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
So I want to call method from a super class that uses data defined in each sub class. Sort like...

code:
# object method

sub parse {
  my $self = shift;

  _munge($self::parameter);
}
Is this the way to go? Is my point clear enough? I don't even know if that example works of if it's even best practice. Obviously, the other way would be to just define shallow methods...

code:
sub parse {
  my $self = shift;

  _munge($self->getParameter);
}

heeen
May 14, 2005

CAT NEVER STOPS

Triple Tech posted:

So I want to call method from a super class that uses data defined in each sub class. Sort like...

code:
# object method

sub parse {
  my $self = shift;

  _munge($self::parameter);
}
Is this the way to go? Is my point clear enough? I don't even know if that example works of if it's even best practice. Obviously, the other way would be to just define shallow methods...

code:
sub parse {
  my $self = shift;

  _munge($self->getParameter);
}

What type is your object? blessed hashref? inside-out object?

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Reg'lur old blessed hash.

Filburt Shellbach
Nov 6, 2007

Apni tackat say tujay aaj mitta juu gaa!
It's quite common to use regular methods for this. I use this pattern all the time. For example:

code:
package My::Widget;

sub type { croak "You must defined a type in " . blessed($_[0]) }

sub render {
    my $self = shift;
    my $type = $self->type;
    print qq{<input type="$type" />};
}

package My::Widget::Text;
use parent 'My::Widget';

sub type { 'text' }

package My::Widget::Submit;
use parent 'My::Widget';

sub type { 'submit' }


My::Widget::Text->render;
My::Widget::Submit->render;

syphon^2
Sep 22, 2004
Does anyone know of a way to take a dump (heh heh) of a portion of the Win32 registry, and store it in a hash to be parsed through later?

A script I'm working on gathers data from a bunch of remote servers, some of them across the world. The data I need all resides within a key in HKLM/Software/KEYNAME, and isn't terribly large. I'd like gather all these keys in one go, and use Perl to iterate through them.

I'm currently using Win32::TieRegistry to do this.
code:
#!perl
use Win32::TieRegistry;
$Registry->Delimiter("/");	
my $server = 'REMOTESERVERNAME';
my $value1 = $Registry->{"//$server/LMachine/SOFTWARE/KEYNAME/SUBKEY"};
my $value2 = $Registry->{"//$server/LMachine/SOFTWARE/KEYNAME/SUBKEY2"};
my $value3 = $Registry->{"//$server/LMachine/SOFTWARE/KEYNAME/SUBKEY3"};
# ... and so on
That's a really simplified example, but I need to gather about 20 registry values from various places inside this key. This needs to happen for several servers.

As I mentioned, some of the servers are (physically) very far away, so the latency is very high. Win32::TieRegistry seems to re-connect to the server each time it needs to gather a registry value. In some of the extreme cases, it's taking my script anywhere from 2-10 seconds to gather each registry key, for a total of about 30 seconds per server. This is SLOOOOOW.

Do you guys know of anything that will connect to the remote server, take a dump of the Key I specify (and all subkeys), then disconnect, maybe giving me a hash of the whole Key? I tried using Win32::TieRegistry, but it only seems to support tied-hashes (meaning, even though I try to store all the data at once, Win32::TieRegistry remotely queries each key as it's needed).

Any ideas?

Ninja Rope
Oct 22, 2005

Wee.
You should be able to enumerate all of the keys for a given registry entry just like a hash, something like:
code:
my %copy;
foreach my $key ( keys(%{$Registry->{"//$server/LMachine/SOFTWARE/KEYNAME"}) ) {
 $copy{$key} = $Registry->{"//$server/LMachine/SOFTWARE/KEYNAME"}->{$key};
}
It's just like copying any other hash key by key. You might even be able to say:
code:
my %copy = %{$Registry->{"//$server/LMachine/SOFTWARE/KEYNAME"});
That's what I gather from reading the documents, anyway.

syphon^2
Sep 22, 2004
I'll try that tomorrow... but like I said, I suspect Win32::TieRegistry only works with tied-hashes. Thus, when it goes through the loop, it makes and breaks the connection for every iteration. That's what happened when I tried it earlier (although I did it slightly differently).

EDIT: It's as I suspected. Using either snippet of code from above, creating the hash is reasonably quick (around the order of 600-800ms on my test scenario). Executing 'print Dumper(%copy);', however, took approximately 51938ms.

EDIT2: Well, I'm starting to think the bottleneck isn't Win32::TieRegistry, but really just the act of querying a remote registry. On a whim, I changed to calling reg.exe with a backtick (reg.exe is WinXP's native command line registry tool)
code:
my $reg = `reg query \\$server\HKLM\SOFTWARE\KEYNAME /s`;
and this process STILL took about 30s to complete.

syphon^2 fucked around with this message at 23:27 on Jun 30, 2008

Erasmus Darwin
Mar 6, 2001
Rather than relying on the registry query code to handle communicating with each remote server, what about installing a local script on each one that bundles up all the relevant registry keys and spits it out as a single blob of data to your main script?

Adbot
ADBOT LOVES YOU

syphon^2
Sep 22, 2004

Erasmus Darwin posted:

Rather than relying on the registry query code to handle communicating with each remote server, what about installing a local script on each one that bundles up all the relevant registry keys and spits it out as a single blob of data to your main script?
The script is responsible for querying servers out of a pool of ~300. New ones are added and old ones are deprecated on a nearly daily basis.

  • Locked thread