Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."

Mithaldu posted:

It completely replaces open/close etc by wrapping the filehandle in an object that does all the sanity-checks and automatically closes the FH when the object is destroyed at the end of a function. Here's code of either streaming or slurping that should do what you need:

Perl code:
use strict;
use warnings;
use utf8;
use HTML::Entities 'decode_entities';
use IO::ALL -binary, -utf8;

binmode STDOUT, ":utf8";

my $in_file      = "input.csv";
my $out_file     = "output.csv";
my %replacements = ( '…' => '...' );

# stream_big_files();
slurp_small_files();

#read in output file and print to screen to confirm
print io( $out_file )->all;

sub clean_line {
    my ( $line ) = @_;

    decode_entities( $line );    # in place

    #Find and replace
    for my $k ( keys %replacements ) {
        $line =~ s/$k/$replacements{$k}/g;
    }

    return $line;
}

sub slurp_small_files {
    my @lines = io( $in_file )->all;
    $_ = clean_lines( $_ ) for @lines;
    io( $out_file )->print( @lines );
    return;
}

sub stream_big_files {
    my $in  = io( $in_file );
    my $out = io( $out_file );
    while ( my $line = $in->getline ) {
        last if !defined $line;
        $line = clean_line( $line );
        $out->print( $line );
    }
}

After installing the IO::All package, I'm not getting an "undefined subroutine" error on any calls to io() Is there something else I'm missing?

Adbot
ADBOT LOVES YOU

Anaconda Rifle
Mar 23, 2007

Yam Slacker
Am I the only one here who thinks raej should be using sed or awk?

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."
Even trying to get a proof of concept unicode to utf8 conversion is proving difficult.

Perl code:
use HTML::Entities;
binmode STDIN, ":encoding(UTF-8)";

$input = "Mont Sal\u00E8ve Rye Porter \xE8";
$input =~ s/\u00/\x/g;
 print $input, "\n";
 print HTML::Entities::decode($input), "\n";
 print HTML::Entities::encode($input), "\n";
Which produces
code:
Mont Sal ve Rye Porter Φ
Mont Sal ve Rye Porter Φ
Mont Sal�ve Rye Porter è
the \u00E8 is what I'm seeing in my data file (Which is encoded at utf8 to begin with). I'm trying to get the \u00E8 to è.

If I change $input =~ s/\u00/\x/g; to $input =~ s/\u00E8/è/g;, the output from HTML::Entities::encode spits out the HTML &#egrave; but it seems silly to go through a find/replace->encode->decode

http://www.charbase.com/00e8-unicode-latin-small-letter-e-with-grave

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

raej posted:

After installing the IO::All package, I'm not getting an "undefined subroutine" error on any calls to io() Is there something else I'm missing?
I didn't noticed you'd typoed All as ALL. Also i got a typo of my own in there with clean_line/clean_lines.


Anaconda Rifle posted:

Am I the only one here who thinks raej should be using sed or awk?
You can do utf8/html entities with those?

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."

Mithaldu posted:

I didn't noticed you'd typoed All as ALL. Also i got a typo of my own in there with clean_line/clean_lines.
You can do utf8/html entities with those?

Aha, ok, got it running now. But I'm still having the unicode notation conversion issue.

Even if I change the replacements to be my %replacements = ( '\\u00E8' => 'è' );

the output looks like
Mont Sal\ève Rye Porter

If I could get rid of the / I think I would be in business

Anaconda Rifle
Mar 23, 2007

Yam Slacker

Mithaldu posted:

You can do utf8/html entities with those?

You should be able to. You might have to fiddle with locale.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

raej posted:

Aha, ok, got it running now. But I'm still having the unicode notation conversion issue.

Even if I change the replacements to be my %replacements = ( '\\u00E8' => 'è' );

the output looks like
Mont Sal\ève Rye Porter

If I could get rid of the / I think I would be in business
Upload a zip or make a github repo with the script and a tiny example where it doesn't work right.

Anaconda Rifle posted:

You should be able to. You might have to fiddle with locale.
I'm absolutely not a linux person, but i think html entities don't have much to do with locales. Am i missing someting?

Anaconda Rifle
Mar 23, 2007

Yam Slacker

Mithaldu posted:

I'm absolutely not a linux person, but i think html entities don't have much to do with locales. Am i missing someting?

I was just talking about UTF-8. Html entities are just text, right? "<" and stuff like that?

uG
Apr 23, 2003

by Ralp
You double slashed your Unicode character character hash key, so I imagine that extra slash is the first character of that key and why it's showing up before your 'è'. You have the key in single quotes so there is no need to escape it

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."

Mithaldu posted:

Upload a zip or make a github repo with the script and a tiny example where it doesn't work right.

Here you go!
https://github.com/prognar/encodetoutf8

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Sent you a pull request. You had two issues:

- don't put semicolons in lists
- you needed \Q\E around your $k, which instructs the regex to search for the string exactly as given instead of trying to interpolate it as a regex itself

Also, holy balls that is a hosed-up data set. :aaa:

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."

Mithaldu posted:

Sent you a pull request. You had two issues:

- don't put semicolons in lists
- you needed \Q\E around your $k, which instructs the regex to search for the string exactly as given instead of trying to interpolate it as a regex itself

Also, holy balls that is a hosed-up data set. :aaa:

Woo, that's it!

I added in every Unicode translation and it parsed through them like a champ.

You weren't kidding about the source data, I had to put in a follow up replacements function to clean up all the Unicode that didn't have a heading \

Thank you so much!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I've written my first simple script in Python, and I want to try and write it in Perl to compare the two.

I'm getting stuck on understanding how calling CPAN modules and functions work. I'm on Windows 8, I installed Strawberry Perl, and I used its CPAN client to install CAM::PDF. Now, looking at the CAM::PDF page, there is a getpdftext.pl under the Documentation section that I want to use.

Is that getpdftext.pl part of the CAM::PDF module that I installed? How do I go about calling it in a very simple script? I think I understand how to use that script from a command line but I'm not sure how to incorporate it into another script.


My first goal is to write a very basic script that converts "testFile.pdf" to a text file, and I want to do it from within a script, not the command line.

Hughmoris fucked around with this message at 02:57 on Feb 22, 2015

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Hughmoris posted:

Is that getpdftext.pl part of the CAM::PDF module that I installed? How do I go about calling it in a very simple script?
It's probably not installed. Just click source on its page, copy-paste the code into an appropiately named file, then in your script use backticks like so to call it:

code:
my $output =  `perl $scriptname $arguments`;
Additionally, you sound very confused, which is ok, but also like you didn't read *any* teaching materials on Perl, which is less ok. Please head to http://perl-tutorial.org and read some of the stuff linked there. (i recommend ovid's beginning perl, and modern perl)

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Mithaldu posted:

It's probably not installed. Just click source on its page, copy-paste the code into an appropiately named file, then in your script use backticks like so to call it:

code:
my $output =  `perl $scriptname $arguments`;
Additionally, you sound very confused, which is ok, but also like you didn't read *any* teaching materials on Perl, which is less ok. Please head to http://perl-tutorial.org and read some of the stuff linked there. (i recommend ovid's beginning perl, and modern perl)

Thanks for this.

You're right, I probably am not doing as much reading as I need to in order to understand the basics of Perl, but I have done some. I've been reading ovid's book. Either way, thanks again for posting a solution.

uG
Apr 23, 2003

by Ralp
code:
use Moops;
use Moo;
require MooX::Options;

our $VERSION = 0.01;

class Options {
    role VM {
        use MooX::Options;
        option 'install_vm' => (
            short       => 'vm',
            is          => 'rw',
            format      => 's@',
            default     => sub {['moar']},
            autosplit   => 1,
            required    => 1,
            doc         => 'VM backend(s) to install',
        );
    }


    role Compiler {
        use MooX::Options;
        option 'install_compiler' => (
            short       => 'c',
            is          => 'rw',
            format      => 's@',
            default     => sub {['rakudo']},
            autosplit   => 1,
            required    => 0,
            doc         => 'Perl6 compiler(s) to install',
        );
    }

    role ModuleInstaller {
        use MooX::Options;
        option 'install_panda' => (
            short       => 'panda',
            is          => 'ro',
            required    => 0,
            doc         => 'Install panda, a module installer for Rakudo Perl6',
        );
    }
} # class Options


# MAIN
class App::p6brew with Options::VM with Options::Compiler with Options::ModuleInstaller {
    use MooX::Options;
    option verbose => (
        is  => 'ro',
        isa => Bool,
    );
} # class App::p6brew



1;



__END__
I can call the above with `my $app = App::p6brew->new_with_options;` but I can't figure out how to apply all roles from class Options to App::p6brew (notice they are applied individually). I'm also trying to figure out how to structure it so I don't have to use MooX::Options inside each role. Any ideas?

Mithaldu
Sep 25, 2007

Let's cuddle. :3:
Don't have any ideas due to lack of experience with whatever that is you're doing, however i can recommend going onto irc.perl.org and asking in #web-simple.

Salt Fish
Sep 11, 2003

Cybernetic Crumb
Is there any easier/more concise/built in way to do this?

code:
sub split_entry{
        #take in a string, split it, then return the requested field (starting with field 0)
        #usage: split_entry(string, field separator, field number)
        #split_entry("Hello $ Goodbye",$,0) returns "Hello"
        @result = split($_[0],$_[1]);
        return @result[$_[2]];
        }

Basically in bash I would do cut -d$ -f2 or something, but afaik you can't have split just return a single field instead of the whole array.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Salt Fish posted:

Is there any easier/more concise/built in way to do this?

code:
sub split_entry{
        #take in a string, split it, then return the requested field (starting with field 0)
        #usage: split_entry(string, field separator, field number)
        #split_entry("Hello $ Goodbye",$,0) returns "Hello"
        @result = split($_[0],$_[1]);
        return @result[$_[2]];
        }

Basically in bash I would do cut -d$ -f2 or something, but afaik you can't have split just return a single field instead of the whole array.
A small note: Your example doesn't match what your code does. In any case, this is a little shorter:
code:
#take in a string, split it, then return the requested field (starting with field 0)
#usage: split_entry(string, field separator, field number)
#split_entry("Hello $ Goodbye",$,0) returns "Hello"
sub split_entry {
    my ( $string, $sep, $index ) = @_;
    return ( split $sep, $string )[$index];
}

print split_entry( 'Hello $ Goodbye', '\$', 0 );

Salt Fish
Sep 11, 2003

Cybernetic Crumb

Mithaldu posted:

A small note: Your example doesn't match what your code does. In any case, this is a little shorter:
code:
#take in a string, split it, then return the requested field (starting with field 0)
#usage: split_entry(string, field separator, field number)
#split_entry("Hello $ Goodbye",$,0) returns "Hello"
sub split_entry {
    my ( $string, $sep, $index ) = @_;
    return ( split $sep, $string )[$index];
}

print split_entry( 'Hello $ Goodbye', '\$', 0 );

Thats a great answer thank you. I'm curious now why I can't just use:

code:
my $string = "data 1 # comment 1";
print (split '#', $string)[0];
But I'm working my way through Learning Perl so I reckon I'll get that part soon. Thanks!

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Salt Fish posted:

Thats a great answer thank you. I'm curious now why I can't just use:

code:
my $string = "data 1 # comment 1";
print (split '#', $string)[0];
But I'm working my way through Learning Perl so I reckon I'll get that part soon. Thanks!

Because print steals the parens. Try this: print( (split '#', $string)[0] );

Salt Fish
Sep 11, 2003

Cybernetic Crumb

Mithaldu posted:

Because print steals the parens. Try this: print( (split '#', $string)[0] );

That's awesome, thank you again. Here is the complete draw a big rear end table code which is simplier for sure:

code:
use CGI qw/:standard/;
        open(FILE, "file.txt") or die("died");
        print table({-border=>1},
                caption('A test table'),
                Tr({-align=>'CENTER'},
                [
                        th(["Data", "Comment"]),
                        map { td( (split '#', $_)[0] ) . td ((split '#', $_ )[1] ) } <FILE>,
                        ]
                        )
                );
        close(FILE);
Let me tell you, this looks a lot better than when I wrote it in php to compare. Perl is good imo.

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

Salt Fish posted:

That's awesome, thank you again. Here is the complete draw a big rear end table code which is simplier for sure:

[...]

Let me tell you, this looks a lot better than when I wrote it in php to compare. Perl is good imo.

It is quite good, but let me also tell you that Perl looks a lot nicer when you run code through Perl::Tidy. ;)

code:
use CGI qw/:standard/;
open( FILE, "file.txt" ) or die( "died" );
print table(
    { -border => 1 },
    caption( 'A test table' ),
    Tr(
        { -align => 'CENTER' },
        [
            th( [ "Data", "Comment" ] ),    #
            map { td( ( split '#', $_ )[0] ) . td( ( split '#', $_ )[1] ) } <FILE>,
        ]
    )
);
close( FILE );
Also, i only just now noticed you mentioned Learning Perl, and boy do i have bad news for you. You picked a book that is so old that i'd confidently describe it as poo poo. The things i see you use, bareword filehandles, CGI, no strict/warnings are not very good things to learn. Please instead try:

http://onyxneon.com/books/modern_perl/

and:

http://www.wrox.com/WileyCDA/WroxTitle/Beginning-Perl.productCd-1118013840.html

or

http://web.archive.org/web/20120709053246/http://ofps.oreilly.com/titles/9781118013847/index.html

EVGA Longoria
Dec 25, 2005

Let's go exploring!

Mithaldu posted:

It is quite good, but let me also tell you that Perl looks a lot nicer when you run code through Perl::Tidy. ;)

code:
use CGI qw/:standard/;
open( FILE, "file.txt" ) or die( "died" );
print table(
    { -border => 1 },
    caption( 'A test table' ),
    Tr(
        { -align => 'CENTER' },
        [
            th( [ "Data", "Comment" ] ),    #
            map { td( ( split '#', $_ )[0] ) . td( ( split '#', $_ )[1] ) } <FILE>,
        ]
    )
);
close( FILE );
Also, i only just now noticed you mentioned Learning Perl, and boy do i have bad news for you. You picked a book that is so old that i'd confidently describe it as poo poo. The things i see you use, bareword filehandles, CGI, no strict/warnings are not very good things to learn. Please instead try:

http://onyxneon.com/books/modern_perl/

and:

http://www.wrox.com/WileyCDA/WroxTitle/Beginning-Perl.productCd-1118013840.html

or

http://web.archive.org/web/20120709053246/http://ofps.oreilly.com/titles/9781118013847/index.html

There's still tons of CGI out there so there's no harm in learning it. Nothing you pick up in CGI will be harmful to good practice, but it's useful to understand it for when you get tasked with maintaining CGI.

That said, 100% use Modern Perl. It's the best source for learning perl anytime this decade.

Toshimo
Aug 23, 2012

He's outta line...

But he's right!
Ok, so I wrote some code and I'm like 300% sleep deprived and haven't eaten anything all day and it's 11pm and BRAAIINNNSSS. Can someone just look this over and make sure it's doing what I think it's doing? The output looks pretty good (considering it's a 260,000 long wall of text), but I don't even know at this point.

I've got a bunch of files that I merged together into a bigass file that is a list of IP address ranges in the format:
###.###.###.### - ###.###.###.###

However, the program I'm feeding it to takes great offense at overlapping ranges, so I needed to remove all duplication.

I lovingly crafted this:
code:
#!/usr/bin/perl

use Exporter;
use Net::IPAddress;
use strict;
use warnings;

my $path_to_file = 'data.txt';
open(my $handle, '<', $path_to_file);
chomp(my @lines = <$handle>);
close $handle;

my @ipranges;

foreach my $line (@lines)
{
        $line =~ /([0-9\.]+) - ([0-9\.]+)/;
        my @temp_ranges = (ip2num("$1"), ip2num("$2"));
        push @ipranges, \@temp_ranges;
}

my $duplicates = 0;

do{
        my @temp_ranges;
        my $test_lower = $ipranges[0][0];
        my $test_upper = $ipranges[0][1];
        for my $i ( 1 .. $#ipranges)
        {
                if($ipranges[$i][0]<$test_upper)
                {
                        $duplicates = 1;
                        if($ipranges[$i][1]>$test_upper)
                        {
                                $test_upper = $ipranges[$i][1];
                        }
                }
                else
                {
                        my @range_arr = ($test_lower, $test_upper);
                        push @temp_ranges, \@range_arr;
                        $test_lower = $ipranges[$i][0];
                        $test_upper = $ipranges[$i][1];
                }
        }
        @ipranges = @temp_ranges;
}while (!$duplicates);

for my $range (0 .. $#ipranges)
{
        print num2ip($ipranges[$range][0]) . " - " . num2ip($ipranges[$range][1]) . " , 000 ,  \n";
}
The program I'm feeding it to hasn't complained and I went from 562,000 lines of input to 212,000 lines of output so, hopefully, maybe...

het
Nov 14, 2002

A dark black past
is my most valued
possession

Toshimo posted:

Ok, so I wrote some code and I'm like 300% sleep deprived and haven't eaten anything all day and it's 11pm and BRAAIINNNSSS. Can someone just look this over and make sure it's doing what I think it's doing? The output looks pretty good (considering it's a 260,000 long wall of text), but I don't even know at this point.

I've got a bunch of files that I merged together into a bigass file that is a list of IP address ranges in the format:
###.###.###.### - ###.###.###.###

However, the program I'm feeding it to takes great offense at overlapping ranges, so I needed to remove all duplication.

I lovingly crafted this:


The program I'm feeding it to hasn't complained and I went from 562,000 lines of input to 212,000 lines of output so, hopefully, maybe...
I'm pretty sure this isn't doing what you intend it to do, I tested with this as data.txt:

quote:

0.0.0.0 - 1.0.0.0
0.5.0.0 - 0.6.0.0
0.5.0.0 - 1.1.1.1
1.1.0.0 - 2.0.0.0
126.0.0.0 - 127.0.0.0
and got this output:

quote:

0.0.0.0 - 2.0.0.0 , 000 ,
(edit: the input I used in this example was sorted already and still produced incorrect output but I found when using unsorted input it gave even more incorrect output, thus the comment below)


I would suggest sorting the ipranges first, it simplifies the algorithm:
code:
#!/usr/bin/perl

use Exporter;
use Net::IPAddress;
use strict;
use warnings;

my $path_to_file = 'data.txt';
open(my $handle, '<', $path_to_file);
chomp(my @lines = <$handle>);
close $handle;

my @ipranges;

foreach my $line (@lines)
{
        $line =~ /([0-9\.]+) - ([0-9\.]+)/;
        my @temp_ranges = (ip2num("$1"), ip2num("$2"));
        push @ipranges, \@temp_ranges;
}

@ipranges = sort { $a->[0] <=> $b->[0] } @ipranges;

my ($lower, $upper) = (0, 0);
my @result = ();
for my $i (0 .. $#ipranges) {
    if ($ipranges[$i][0] < $upper) {
        if ($ipranges[$i][1] > $upper) {
            $upper = $ipranges[$i][1];
        }
    }
    else {
        push(@result, [ $lower, $upper ]) unless ($lower == 0 and $upper == 0);
        ($lower, $upper) = @{$ipranges[$i]};

    }
}
push(@result, [ $lower, $upper ]);

@ipranges = @result;

for my $range (0 .. $#ipranges)
{
print num2ip($ipranges[$range][0]) . " - " . num2ip($ipranges[$range][1]) . " , 000 ,  \n";

het fucked around with this message at 07:38 on Mar 22, 2015

Roseo
Jun 1, 2000
Forum Veteran
Net::CIDR or Net::CIDR::Lite may be more appropriate for this use case than the IP address class.

Toshimo
Aug 23, 2012

He's outta line...

But he's right!

het posted:

I'm pretty sure this isn't doing what you intend it to do, I tested with this as data.txt:

and got this output:

(edit: the input I used in this example was sorted already and still produced incorrect output but I found when using unsorted input it gave even more incorrect output, thus the comment below)


I would suggest sorting the ipranges first, it simplifies the algorithm:

Thanks. I'll give this a shot. The data I'm working with is pre-sorted, but it can't hurt to sort anyway.

Roseo posted:

Net::CIDR or Net::CIDR::Lite may be more appropriate for this use case than the IP address class.

Yeah. I tried this with Net::CIDR and it's been going for 3 hours and hasn't showed any inclination on finishing yet. :ohdear:

code:
#!/usr/bin/perl

use Exporter;
use Net::CIDR;
use strict;
use warnings;

my $path_to_file = 'data.txt';
open(my $handle, '<', $path_to_file);
chomp(my @lines = <$handle>);
close $handle;

my @cidr_list;

foreach my $line (@lines)
{
        $line =~ s/ //;
        @cidr_list=Net::CIDR::cidradd($line, @cidr_list);
}

my @a=Net::CIDR::cidr2range(@cidr_list);

print join("\n", @a);

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Been reading up Modern Perl and trying to convert some of my simple Python scripts to Perl.

For my first: I have a folder full of .vgr files. For each file in the directory, I want to iterate line by line to see if it contains the string "LOAD_ORDER=OIS=208972". If it does, I want to stop, print the name of that file then move on to the next file.

Perl code:
use warnings;
use strict;
use diagnostics;

my @vgr_files = glob("*.vgr");
foreach my $file (@vgr_files)
{
	open my $fh, '<', $file
		or die "can't open it";
	
	foreach (<$fh>)
		{
		 	if ($_ =~ /LOAD_ORDER=OIS=208972/)
		 	{
		 		print "$file\n";
		 		last;
		 	}

		}
}
Can I make this code any cleaner/quicker? Also, I ran into a problem where I couldn't glob the .vgr files in the directory using a full file path. I'm thinking it has something to do with spaces in the file path (glob("c:\folder\new folder\*.vgr")). I couldn't get it working so I just moved my script to that folder and used glob("*.vgr")

uG
Apr 23, 2003

by Ralp
code:
use Modern::Perl;
use Path::Class;
use experimental qw/smartmatch/;
use FindBin;

my @exts = (qr/vgr$/);
dir("$FindBin::Bin/data_directory")->recurse(callback => sub {
    my $file = (-f $_[0] && $_[0] ~~ @exts) ? $_[0] : return;
    open my $fh, '<', $file; 
    while(<$fh>) { say $file and last if m/LOAD_ORDER=OIS=208972/ }
    close $fh;
});
This is how i'd do it

uG fucked around with this message at 04:02 on Mar 26, 2015

Hughmoris
Apr 21, 2007
Let's go to the abyss!

uG posted:

code:
use Modern::Perl;
use Path::Class;
use experimental qw/smartmatch/;
use FindBin;

my @exts = (qr/vgr$/);
dir("$FindBin::Bin/data_directory")->recurse(callback => sub {
    my $file = (-f $_[0] && $_[0] ~~ @exts) ? $_[0] : return;
    open my $fh, '<', $file; 
    while(<$fh>) { say $file and last if m/LOAD_ORDER=OIS=208972/ }
    close $fh;
});
This is how i'd do it

That hurts my brain. Can you break this down?
Perl code:
dir("$FindBin::Bin/data_directory")->recurse(callback => sub {
    my $file = (-f $_[0] && $_[0] ~~ @exts) ? $_[0] : return;

Hughmoris fucked around with this message at 04:23 on Mar 26, 2015

uG
Apr 23, 2003

by Ralp
code:
# An array containing precompiled regexes that we will use to match against the file name.
# In this case its a single regex, but you could put as many as you want into the array
my @exts = (qr/vgr$/);

# `dir` is a Path::Class object, and in this case it the directory given as an argument ($FindBin::Bin just points to the /path to the perl script) and recursively 
# runs the supplied callback each time it finds a file or folder
dir("$FindBin::Bin/data_directory")->recurse(callback => sub {
    
    # The first parameter supplied to the callback is the name of the folder or file. Since we are not interested in folders, we will test if its a file with `-f $file`
    # We also apply the filter on file extension here with the smart match operator, '~~', which basically says "does $_ match anything in @exts?" and tries to match appropriately 
    # @exts contains a regex, its doing something like `for (@exts) { return $file if $_ =~ m/vgr$/; }`
    my $file = (-f $_[0] && $_[0] ~~ @exts) ? $_[0] : return;

Hughmoris
Apr 21, 2007
Let's go to the abyss!

uG posted:

code:
# An array containing precompiled regexes that we will use to match against the file name.
# In this case its a single regex, but you could put as many as you want into the array
my @exts = (qr/vgr$/);

# `dir` is a Path::Class object, and in this case it the directory given as an argument ($FindBin::Bin just points to the /path to the perl script) and recursively 
# runs the supplied callback each time it finds a file or folder
dir("$FindBin::Bin/data_directory")->recurse(callback => sub {
    
    # The first parameter supplied to the callback is the name of the folder or file. Since we are not interested in folders, we will test if its a file with `-f $file`
    # We also apply the filter on file extension here with the smart match operator, '~~', which basically says "does $_ match anything in @exts?" and tries to match appropriately 
    # @exts contains a regex, its doing something like `for (@exts) { return $file if $_ =~ m/vgr$/; }`
    my $file = (-f $_[0] && $_[0] ~~ @exts) ? $_[0] : return;

Thanks for taking the time to write that up.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Trying to learn simple databases and Perl, together. I installed SQLite and am using the DBD::SQLite module. Any recommend readings to ease my way into using Perl with databases?

For my first database, I was going to try a project someone here in CoC recommended. Create a table full of randomized patient names, social security #s, home address and insurance carrier.

uG
Apr 23, 2003

by Ralp
I'd read the DBIx::Class documents and use that, but lots of people don't like ORMs

code:
perl -MDateTime -MModern::Perl -e 'use DBIx::Class::Schema::Loader qw/make_schema_at/; make_schema_at("New::Schema::Name", { debug => 0 }, [q/dbi:sqlite:ncaa_cfb/,"user","pass", {} ]); my $table = New::Schema::Name->resultset("Schedule"); my $query = $table->search({ date => {"<", DateTime->new( month => 9, day => 11, year => 2001 )->ymd } }); say "all columns in table:" . join(",", @{$row->result_source->columns}); while(my $row = $query->next) { say "date: " . $row->date;  }'
Learning how to setup the schemas can have a bit of a learning curve, but you can have them generated dynamically like the above command in the mean time.

My general boilerplate looks something like
code:
use Modern::Perl;
use Config::General;
use FindBin;
use DB::Schema;

# Database Connections
my $conf       = Config::General->new("$FindBin::Bin/../DB/db.cnf");
my %config    = $conf->getall;
my $DB        = DB::Schema->connect( "dbi:mysql:" . $config{mysql}{db} . ':host=' . $config{mysql}{host}, $config{mysql}{user}, $config{mysql}{pass}, { mysql_enable_utf8 => 1} );
my $teams_rs  = $DB->resultset('DimTeamNcaaId');
my $games_rs  = $DB->resultset('Schedule');
Then later on you can start handling your relationships/joins in a perl-y fashion, so you don't have to have separate schemas variables if they have a relationship setup in their schema
code:
$teams_rs->search({ name => 'Michigan State' })->games; 
Its a steep learning curve, but DBIx::Class is one of Perl's best tools. In fact its the only thing that keeps me from using Perl6 full time :getin:

uG fucked around with this message at 04:29 on Mar 28, 2015

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks, this will give me some light reading to do over the weekend.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Played with perl and DBI a little more. I successfully created my first database and figured out how to loop over a file containing names, and enter those names into my database. However, it is slow as dirt.

I have a 1,220-line file, called combined_names.txt where each line has a first name and last name:
code:
alexander hamilton
harry dresden
sarah smith
My database has 3 columns: Id, fname, lname

Here's my ugly code that creates the database, then iterates over my combined_names list and inserts the names into the db. Takes about 30 seconds to insert 1,220 rows... Any tips?
Perl code:
#!perl
use warnings;
use strict;
use diagnostics;
use DBI;

my $dbfile = "sample.db";

my $dsn = "dbi:SQLite:dbname=$dbfile";
my $user = "";
my $password = "";
my $dbh = DBI->connect($dsn, $user, $password, {
	PrintError => 0,
	RaiseError => 1,
	AutoCommit => 1,
	FetchHashKeyName => 'NAME_1c',
	});

my $sql = <<'END_SQL';
CREATE TABLE patients (
	id		INTEGER PRIMARY KEY,
	fname 	VARCHAR(100),
	lname	VARCHAR(100)
	)
END_SQL

$dbh->do($sql);

my @split_names = ();

open my $fh, '<', 'C:/Strawberry/Projects/Names_project/combined_names.txt'
	or die "Can't open it";

foreach (<$fh>)
{
	@split_names = split / /, $_, 2;
	$dbh->do('INSERT INTO patients (fname, lname) VALUES (?,?)', undef, $split_names[0], $split_names[1]);
}

$dbh->disconnect;

uG
Apr 23, 2003

by Ralp
Look into turning autocommit off if transactional integrity doesn't have to be considered. Also prepare() the statement outside the loop

uG fucked around with this message at 07:39 on Mar 28, 2015

toadee
Aug 16, 2003

North American Turtle Boy Love Association

uG posted:

Look into turning autocommit off if transactional integrity doesn't have to be considered. Also prepare() the statement outside the loop

To expand on this, outside the loop:

code:
my $sth = $dbh->prepare('INSERT INTO patients (fname, lname) VALUES (?,?)');
Then inside the loop:

code:
$sth->execute( @bind_vars );
That should run quite a bit faster.

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!

uG posted:

Look into turning autocommit off if transactional integrity doesn't have to be considered. Also prepare() the statement outside the loop


toadee posted:

To expand on this, outside the loop:

code:
my $sth = $dbh->prepare('INSERT INTO patients (fname, lname) VALUES (?,?)');
Then inside the loop:

code:
$sth->execute( @bind_vars );
That should run quite a bit faster.

Wow. Turning off autocommit and then preparing my statement outside the loop dropped my create & write time from ~ 32 seconds to being less than 1 second. Thanks.

  • Locked thread