The Perl Short Questions Megathread: executable line noise

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »

Rohaq: Aug 11, 2006

qr is a function that defines a regex-like object, much like q and qq define single and double quoted strings in my previous example. I don't think Perl lets you set a variable to a regex-like object using slashes.

And you probably shouldn't either. If you want to keep your regex inside variables, store them in single quoted strings, and stick the variable in your regex, like so:

Perl code:

$counter1 = 0;

$regexg = 'G{6,}';
$regexc = 'C{6,}';

foreach $line (<MYFILE>) {
    chomp($line);
    if (( $line =~ /$regexg/i ) && ( $line =~ /$regexc/i )) {
        $counter1++;
    }
}

(Also, chomped your line as it comes in, just good practice unless you need that newline for something...)

But Mithaldu is right, maybe you should try reading up on Perl first, before jumping right in?

Rohaq fucked around with this message at 23:33 on Nov 13, 2013

# ? Nov 13, 2013 23:28

Adbot: ADBOT LOVES YOU

# ? Jun 3, 2024 23:51

Pollyanna: Mar 5, 2005; Milk's on them.

e: q != e

# ? Nov 13, 2013 23:42

Pollyanna: Mar 5, 2005; Milk's on them.

Mithaldu posted:

Wow. I'd strongly suggest you read that book before doing anything else. Whereever you've been learning Perl, it's been teaching you in the worst possible way.

Rohaq posted:

But Mithaldu is right, maybe you should try reading up on Perl first, before jumping right in?

It's...complicated. We've been tasked to rewrite our Python scripts that we've been working on for the past two months in Perl within 4 days of first being exposed to Perl. This is an online terminal Masters intro course.

:shepicide:

I'm gonna go curl up with Modern Perl and question my life choices.

# ? Nov 13, 2013 23:43

Rohaq: Aug 11, 2006

That's odd; is there any reason it has to be in Perl? Python has a perfectly capable regex module called 're'; it's different from the way Perl implements regex - it's a separate module, and not an inherent comparison operator, like =~ in Perl.

Unless you're using some kind of custom Perl module for your work, of course, then that would make sense, but I thought that Python was pretty much the most popular language in the bioinformatics world.

# ? Nov 14, 2013 00:30

Mithaldu: Sep 25, 2007; Let's cuddle.

Pollyanna posted:

It's...complicated. We've been tasked to rewrite our Python scripts that we've been working on for the past two months in Perl within 4 days of first being exposed to Perl. This is an online terminal Masters intro course.

I'm gonna go curl up with Modern Perl and question my life choices.

In that case you will want to read this too: http://web.archive.org/web/20120709053246/http://ofps.oreilly.com/titles/9781118013847/index.html

Or maybe even order the full book.

Perl is rather easy to learn. However pretty much any resource for it written before 2011 teaches practically Perl 4 and is 20 years out of date. Once you have modern references you should be able to get up to speed pretty quickly.

quote:

I thought that Python was pretty much the most popular language in the bioinformatics world.

That depends on who you work with. Gene Campus in Cambridge uses Perl mainly and several french institutes too.

# ? Nov 14, 2013 01:00

Pollyanna: Mar 5, 2005; Milk's on them.

Perl is common in the bioinformatics community because that's been the main language since like 2001. Most of the relevant modules and libraries are built in Perl. People stick to it because it's well-established and was used in things like the Human Genome Project.

(ps is that not a loving badass title for project cmon just say it out loud and try not to swoon)

But yeah, I prefer Python for programming in general. Regular expressions can be super useful in bioinformatics, but there's other tools we need as well and Python is just as good at those, possibly better than Perl.

# ? Nov 14, 2013 04:49

Rohaq: Aug 11, 2006

I use both languages; they've both got their strong points, though I generally prefer Perl for anything ETL related, because the inline regex comparison operators make quickly setting up regex for data extraction a piece of piss, and I prefer Python for anything involving writing small tools quickly, and for anything related to handling and visualising data, because numpy and pylab/matplotlib are quite frankly loving badass.

# ? Nov 14, 2013 05:23

EVGA Longoria: Dec 25, 2005; Let's go exploring!

Does anyone have any experience with NTLM auth for web apps, especially Dancer?

I can build an LDAP bind easily enough, but it would be best if I could use the NTLM automated negotiation on top of this.

Background is that I've got an internal web app that keeps expanding, and as it gets rolled out to new teams we end up either creating a dozen more accounts, or we end up creating one for an entire team, which I hate. Going to move to LDAP, but I'd also like to hook in NTLM, since it works with just about everything now.

# ? Dec 5, 2013 14:39

Pollyanna: Mar 5, 2005; Milk's on them.

Say I have a string "AGCT". If I try to run:

code:

$output =~ tr/A/T/;
$output =~ tr/T/A/;
$output =~ tr/G/C/;
$output =~ tr/C/G/;

The output will be "AGGA", not "AGCT". I think what's happening is that the transliterations happen successively instead of concurrently. How do I set off four of them at the same time?

EDIT: Whoops, the answer is "$output =~ tr/AGCT/TCGA/;". Wish that was a bit clearer.

# ? Dec 13, 2013 22:48

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Successive statements are normally executed successively in an imperative language.

# ? Dec 13, 2013 22:58

Pollyanna: Mar 5, 2005; Milk's on them.

Otto Skorzeny posted:

Successive statements are normally executed successively in an imperative language.

Yeah, I totally brainfarted on that. I figured it out not long after, to be fair!

# ? Dec 14, 2013 00:50

Pollyanna: Mar 5, 2005; Milk's on them.

Sorry about the double post, but I've got another issue. I'm trying to use CGI.pm to make a webpage, and part of my code is not working correctly:

Perl code:

#!/usr/bin/perl

use CGI;
use CGI ('div');

my $cgi = CGI->new;

print $cgi->header("text/html");

print $cgi->start_html(-title=>"Reverse Complement", -BGCOLOR=>"gray");

my ($input,$output);

$input = $cgi->param("input");
$output = $cgi->param("output");

if ($input) {  # if an input exists/is present upon loading a page
	$output = reverse($input);
	$output =~ tr/AGCTUagctu/TCGAAtcgaa/;  # generate an output

	print "Input: ";
	print $cgi->br,"\n";
	print $cgi->div({-style=>"width: 800px; word-wrap: break-word;"}, $input);
	print $cgi->br,"\n";
	print "Output: ";
	print $cgi->br,"\n";
	print $cgi->div({-style=>"width: 800px; word-wrap: break-word;"}, $output);
	print $cgi->br,"\n";
}

print $cgi->hr,"\n";
print $cgi->start_form,"\n";

print $cgi->strong("Input your sequence.");
print $cgi->br, "\n";
print $cgi->strong("Non-ACGTU characters will not be translated, but will be reversed.");
print $cgi->br, "\n";
print $cgi->textarea({-style=>"height: 200px; width: 300px;"}, -name=>"input");
print $cgi->br, "\n";
print $cgi->submit(-name=>"submit_sequence", -label=>"Submit");

print $cgi->end_form;

print $cgi->end_html;  #end of program

The issue in question is that submitting the data in the textarea will make the program not print the "Input:" "Output:" condition. However, if I use textfield instead, it works perfectly well, with or without a submit button.

I suspect that this is because there is no data being passed after submitting the textarea form. Why? Doesn't it work exactly the same as textfield?

# ? Dec 16, 2013 03:38

Mithaldu: Sep 25, 2007; Let's cuddle.

Pollyanna posted:

Sorry about the double post, but I've got another issue.

Oh god, did they give you that godawful BioPerl programmer book to learn from?

As full effect measure i strongly recommend you start by reading Modern Perl (an evening), and then Ovid's Beginning Perl (a week). Links to both free and payable versions are on http://perl-tutorial.org

As for your code, i recommend reading the CGI docs again if you really wish to continue using it (bad idea). It seems to me that you got the syntax of the parameters passed for style wrong, since other calls in the docs format it like this:

code:

-style=>{'src'=>'/styles/style1.css'},

# ? Dec 16, 2013 19:26

Pollyanna: Mar 5, 2005; Milk's on them.

Mithaldu posted:

Oh god, did they give you that godawful BioPerl programmer book to learn from?

As full effect measure i strongly recommend you start by reading Modern Perl (an evening), and then Ovid's Beginning Perl (a week). Links to both free and payable versions are on http://perl-tutorial.org

As for your code, i recommend reading the CGI docs again if you really wish to continue using it (bad idea). It seems to me that you got the syntax of the parameters passed for style wrong, since other calls in the docs format it like this:
code:
-style=>{'src'=>'/styles/style1.css'},

Nope. No book at all. We've been learning off of our professor's lecture notes with maybe a link to some CPAN docs, which has been...lacking, to say the least. I didn't even know that Bioperl had a book associated with it :gonk:

(Any good books/tutorials for doing bioinfo in Perl would help, tho!)

I didn't really wanna bring this up, but this is an assignment. CGI is required

I've been working through Modern Perl, though at a snail's pace. Beginning Perl is a new one, I'll check it out.

# ? Dec 16, 2013 21:12

syphon: Jan 1, 2001

CGI isn't bad at all if you use it with a template system like Template::Toolkit or HTML::Template. I suppose that's outside the scope of your assignment though, so it probably doesn't help you at all.

# ? Dec 16, 2013 21:49

toadee: Aug 16, 2003; North American Turtle Boy Love Association

HTML::Template with CGI really can do quite a bit and makes things so much neater and easier, thanks to keeping the HTML and perl code separate.

For the current issue though, I would try printing what is returned from $cgi->param with no argument to see what the textarea's stuff is getting passed in as.

Perhaps like:

Perl code:

use Data::Dumper;

....

my @paramslist = $cgi->param;
print Dumper \@paramslist;

# ? Dec 16, 2013 22:19

Mithaldu: Sep 25, 2007; Let's cuddle.

Pollyanna posted:

Nope. No book at all. We've been learning off of our professor's lecture notes with maybe a link to some CPAN docs, which has been...lacking, to say the least. I didn't even know that Bioperl had a book associated with it (Any good books/tutorials for doing bioinfo in Perl would help, tho!)

I didn't really wanna bring this up, but this is an assignment. CGI is required

I've been working through Modern Perl, though at a snail's pace. Beginning Perl is a new one, I'll check it out.

I feel for you.

Right now the main thing i can tell you as for getting help and learning is: Get on irc.perl.org and sit in #perl-help (perl help) and #pdl (perl scientists). Use a shell, or an irc bouncer so you don't annoy the living piss out of people by dropping every 5 minutes your cellphone gets out of tower range; then proceed to ask way too many questions and get really good.

That said, if you can skip the whole rat's nest of actually generating the HTML with CGI, you'll spare yourself a ton of grief. (HTML::Template isn't entirely painful, but if you can i'd recommend the current cream of the crop, Text::Xslate.) If you are required to do that, at least try not to print directly, but actively separate the workflow such that you first figure out all the data you want on the website, then generate the html into a single scalar, then print that.

Did my guess about the style parameter help?

# ? Dec 16, 2013 22:54

Pollyanna: Mar 5, 2005; Milk's on them.

Mithaldu posted:

As for your code, i recommend reading the CGI docs again if you really wish to continue using it (bad idea). It seems to me that you got the syntax of the parameters passed for style wrong, since other calls in the docs format it like this:
code:
-style=>{'src'=>'/styles/style1.css'},

That's for referencing a .css file, I think. The {-style=>"height: 200px; width: 300px;"} format is for inline CSS (yeah, I know).

toadee posted:

For the current issue though, I would try printing what is returned from $cgi->param with no argument to see what the textarea's stuff is getting passed in as.

Perhaps like:
Perl code:
use Data::Dumper;

....

my @paramslist = $cgi->param;
print Dumper \@paramslist;

With a textfield, I get $VAR1 = [ 'input', 'submit_sequence' ];, and it works, so let's look for that.

With a textarea...

code:

$VAR1 = [ 'submit_sequence' ];

Okay, so the input parameter is not getting passed for some reason. Is it because of the way I wrote it? If I try print $cgi->textarea({-style=>"height: 200px; width: 300px;", -name=>"input"});...

code:

$VAR1 = [ 'input', 'submit_sequence' ];

There we go! Looks like Perl just shits its pants over the braces. It works now, thankfully. So, yes, the style parameter was the problem. (The way I originally wrote it is how it was written in the lecture notes, soooo...not my fault~)

As for the other approaches to generating HTML, I have some experience with HTML templating, so I'll definitely look into Template and Xslate. Thanks, guys!

# ? Dec 17, 2013 01:12

toadee: Aug 16, 2003; North American Turtle Boy Love Association

When in doubt, print it out

A huge portion of my debugging issues are solved when I just print out every variable involved and then go look for why something is getting something it isn't supposed to.

# ? Dec 17, 2013 01:21

Ninja Rope: Oct 22, 2005; Wee.

Pollyanna posted:

Okay, so the input parameter is not getting passed for some reason. Is it because of the way I wrote it? If I try print $cgi->textarea({-style=>"height: 200px; width: 300px;", -name=>"input"});...
code:
$VAR1 = [ 'input', 'submit_sequence' ]; 
There we go! Looks like Perl just shits its pants over the braces.

The (curly) braces make a big difference. The braces indicate that a new hash table should be created containing the keys/values inside the braces and than that hash table should be passed to the textarea() method. The text area method doesn't want a hash table, so passing it one causes things not to work. CGI is a little confusing about what methods want hash tables and what don't, so it helps to check the perldoc often.

# ? Dec 17, 2013 05:18

EVGA Longoria: Dec 25, 2005; Let's go exploring!

Pollyanna posted:

That's for referencing a .css file, I think. The {-style=>"height: 200px; width: 300px;"} format is for inline CSS (yeah, I know).

With a textfield, I get $VAR1 = [ 'input', 'submit_sequence' ];, and it works, so let's look for that.

With a textarea...
code:
$VAR1 = [ 'submit_sequence' ];
Okay, so the input parameter is not getting passed for some reason. Is it because of the way I wrote it? If I try print $cgi->textarea({-style=>"height: 200px; width: 300px;", -name=>"input"});...
code:
$VAR1 = [ 'input', 'submit_sequence' ]; 
There we go! Looks like Perl just shits its pants over the braces. It works now, thankfully. So, yes, the style parameter was the problem. (The way I originally wrote it is how it was written in the lecture notes, soooo...not my fault~)

As for the other approaches to generating HTML, I have some experience with HTML templating, so I'll definitely look into Template and Xslate. Thanks, guys!

Just to expand on this a bit, since it's one of the biggest things with parameters in Perl:

Whenever you call a function or a method, everything you pass to it gets passed as one big list. So, for example:

$cgi->textarea(-style => "height: 200px; width: 300px;", -name => "input");

This is the same as:

my @list = (-style => "height: 200px; width: 300px;", -name => "input");
$cgi->textarea(@list);

The => is called the fat comma, and it makes this the same as:

my @list = ("-style", "height: 200px; width: 300px;", "-name", "input");
$cgi->textarea(@list);

Hashes in perl are just lists, where odd entries are keys, and even are values. Since we're passing lists, you can use it as a hash. Generally, the code for a function expecting this will work something like:

code:

sub myFunction {
  my %args = (@_);
}

When you add those {}s around, you're creating an anonymous hash in memory (called anonymous because it has no name) and then passing a reference to it (called a reference because you're passing a link back to the original anonymous hash, and not a copy). Hash references are scalars (single items), so you get the whole thing in one chunk:

code:

sub myFunction {
  my $args = shift;
}

This will load the entry hash into the $args variable, which you then need to access as a reference.

The big differences here is that the first example is a brand new hash -- if you were to change the values inside the myFunction, they would only stay changed in the scope of myFunction -- once you exit, the changes are lost. Passing by reference means you are changing the original hash anytime you make changes -- if you were to have myFunction change the -name field, it would be changed permanently.

# ? Dec 23, 2013 13:02

JayTee: May 24, 2004

I've downloaded a bioinformatics script written in French that I'm trying to make work having never used Perl before this week (but knowing a bit of Python) and not knowing French. I suspect the problem I'm having is due to the authors not including everything that it requires as it's not finding some file, but I'm not sure. Here's the edited relevant chunk of code, it fails on the final line:

Perl code:

my $paff = # a directory on my computer
my $paff_hm = # a directory on my computer
my $fichier_modifie = $paff_HM."/modified_input_trad"; 
my $firstline = 1;
open (NOUVEAU_FICHIER, ">".$fichier_modifie);
while (my $ligneP = <FICH_PROT> ) {
	# A bunch of stuff gets written to the file
}

close NOUVEAU_FICHIER;
close FICH_PROT;

system("chmod 777 $fichier_modifie");
my $profilHMM;
open(PROFILS, "ls ".$paff."/HMMSearch_web/profilsHMM-Tases/transp*.hmm |") or die "Fatal Error when trying to access HMM profils directory." ;


#Je lance la commande hmmpfam (lit s�quence par s�quence dans le seqfile et cherche similitude avec un motif donn�) :
# I run the command hmmpfam (bed frame by frame in seqfile and looks similar to a given pattern)

chdir ($paff."/man/man1");

# deleted some scalars that don't get called before it crashes...

foreach $profilHMM (<PROFILS>) {
	#print $fichier_modifie;
	$profilHMM =~ s/\s//g;
	my $cmd = "hmmpfam $profilHMM $fichier_modifie"; #cmd = commande
	print $cmd."\n"; #This doesn't print anything in the console, I don't know if it should
 
	# doesn't work
	open(HMMPFAM, "hmmpfam -E 1E-3 $profilHMM $fichier_modifie |") or die ("Fatal Error when trying to execute hmmpfam: $!") ;

Here's the error message:
hmmpfam /.../profilsHMM-Tases/transp110.hmm /.../modified_input_trad
Fatal Error when trying to execute hmmpfam: No such file or directory at genome_HMM_search_E1_1.pl line 302, <PROFILS> line 27.

hmmpfam is a module in Bioperl (which is installed), but that's the first mention of it in the script. The other two files mentioned in the error message exist in the locations it's looking for them in.

Any clues as to what's going wrong here would be appreciated.

# ? Feb 13, 2014 18:03

toadee: Aug 16, 2003; North American Turtle Boy Love Association

JayTee posted:

Any clues as to what's going wrong here would be appreciated.

Perl code:

my $cmd = "hmmpfam $profilHMM $fichier_modifie"; #cmd = commande
	print $cmd."\n"; #This doesn't print anything in the console, I don't know if it should

Is printing this part: hmmpfam /.../profilsHMM-Tases/transp110.hmm /.../modified_input_trad

However, it's not actually running that command string, instead it's running:

code:

open(HMMPFAM, "hmmpfam -E 1E-3 $profilHMM $fichier_modifie |")

Which is resulting in that error. What it's doing is trying to run that command on the computer this script is running on. So, if you drop to a command prompt and type in hmmpfam -E 1E-3 /.../profilsHMM-Tases/transp110.hmm /.../modified_input_trad -- that doesn't run, for whatever reason. When you fix that, this part of this script should execute.

# ? Feb 13, 2014 18:30

JayTee: May 24, 2004

drat, I was hoping it wasn't that. Time to email the authors. Thanks for the info!

# ? Feb 13, 2014 18:46

RICHUNCLEPENNYBAGS: Dec 21, 2010

JayTee posted:

I've downloaded a bioinformatics script written in French that I'm trying to make work having never used Perl before this week (but knowing a bit of Python) and not knowing French. I suspect the problem I'm having is due to the authors not including everything that it requires as it's not finding some file, but I'm not sure. Here's the edited relevant chunk of code, it fails on the final line:
Perl code:
my $paff = # a directory on my computer
my $paff_hm = # a directory on my computer
my $fichier_modifie = $paff_HM."/modified_input_trad"; 
my $firstline = 1;
open (NOUVEAU_FICHIER, ">".$fichier_modifie);
while (my $ligneP = <FICH_PROT> ) {
	# A bunch of stuff gets written to the file
}

close NOUVEAU_FICHIER;
close FICH_PROT;

system("chmod 777 $fichier_modifie");
my $profilHMM;
open(PROFILS, "ls ".$paff."/HMMSearch_web/profilsHMM-Tases/transp*.hmm |") or die "Fatal Error when trying to access HMM profils directory." ;


#Je lance la commande hmmpfam (lit s�quence par s�quence dans le seqfile et cherche similitude avec un motif donn�) :
# I run the command hmmpfam (bed frame by frame in seqfile and looks similar to a given pattern)

chdir ($paff."/man/man1");

# deleted some scalars that don't get called before it crashes...

foreach $profilHMM (<PROFILS>) {
	#print $fichier_modifie;
	$profilHMM =~ s/\s//g;
	my $cmd = "hmmpfam $profilHMM $fichier_modifie"; #cmd = commande
	print $cmd."\n"; #This doesn't print anything in the console, I don't know if it should
 
	# doesn't work
	open(HMMPFAM, "hmmpfam -E 1E-3 $profilHMM $fichier_modifie |") or die ("Fatal Error when trying to execute hmmpfam: $!") ;
Here's the error message:
hmmpfam /.../profilsHMM-Tases/transp110.hmm /.../modified_input_trad
Fatal Error when trying to execute hmmpfam: No such file or directory at genome_HMM_search_E1_1.pl line 302, <PROFILS> line 27.

hmmpfam is a module in Bioperl (which is installed), but that's the first mention of it in the script. The other two files mentioned in the error message exist in the locations it's looking for them in.

Any clues as to what's going wrong here would be appreciated.

Might not help, but I"m assuming you used automated translation here... "lit" does mean bed, but here it means "read."

# ? Feb 14, 2014 13:52

uG: Apr 23, 2003; by Ralp

code:

my $index_scraper = scraper {
    process '//tr[contains(@class, "forum")]/td[@class="title"]', 
        'forums[]' => scraper {
            # Main Forum
            process '//a[@class="forum"]', url => '@href', name => 'TEXT';
            # Sub Forums
            process '//div[@class="subforums"]', 
                'subforums[]' => scraper {
                    process '//a[contains(@class, "forum")]', url => '@href', name => 'TEXT';
                }; 
        };
};

I'm wondering why this Web::Scraper/XPath won't list more than 1 subforum. There are multiple 'a' elements that contain 'forum' in their class name that are inside a div element named 'subforums' :shobon:

Full code:

code:

use v5.10;
use Web::Scraper;
use Data::Dumper;

my $html = q[
<tr class="forum forum_1">
 <td class="icon">
    <a href="forumdisplay.php?forumid=1"><img src="http://fi.somethingawful.com/forumicons/gbs.gif" title="1227755 replies in 8843 threads" alt=""></a>
 </td>
 <td class="title">
    <a class="forum" href="forumdisplay.php?forumid=1" title="General discussion about anything, including current events; no flamewars or work-unfriendly crap allowed.">GBS 1.4</a>
    <div class="subforums"><b>SUBFORUMS:</b> <a class="forum_155" href="forumdisplay.php?forumid=155">SA's Front Page Discussion</a>, <a class="forum_214" href="forumdisplay.php?forumid=214">E/N Bullshit</a></div>
 </td>
 <td class="moderators"><a href="member.php?action=getinfo&amp;userid=43050">Inspector_666</a>, <a href="member.php?action=getinfo&amp;userid=45450">Downtown Abey</a>, <a href="member.php?action=getinfo&amp;userid=49334">Corn Thongs</a>, <a href="member.php?action=getinfo&amp;userid=140500">Spanish Manlove</a></td>
</tr>
];

my $index_scraper = scraper {
    process '//tr[contains(@class, "forum")]/td[@class="title"]', 
        'forums[]' => scraper {
            # Main Forum
            process '//a[@class="forum"]', url => '@href', name => 'TEXT';
            # Sub Forums
            process '//div[@class="subforums"]', 
                'subforums[]' => scraper {
                    process '//a[contains(@class, "forum")]', url => '@href', name => 'TEXT';
                }; 
        };
};


say Dumper( $index_scraper->scrape($html) );


1;


__END__

Where is E/N?

$VAR1 = {
          'forums' => [
                        {
                          'subforums' => [
                                           {
                                             'name' => 'SA\'s Front Page Discussion',
                                             'url' => 'forumdisplay.php?forumid=155'
                                           }
                                         ],
                          'url' => 'forumdisplay.php?forumid=1',
                          'name' => 'GBS 1.4'
                        }
                      ]
        };

# ? Feb 28, 2014 17:45

Mithaldu: Sep 25, 2007; Let's cuddle.

Due to how sparse the documentation is, i can't give you a solution, but i can say two things:

1. Your subforums scraper is definitely the wrong path to take. Look at the synopsis and it shows it's grabbing all "li.status" elements into the array "tweets", then uses the scraper to get attributes for the tweets from the elements inside each li. Possibly also look at https://metacpan.org/source/MIYAGAWA/Web-Scraper-0.37/eg to get more examples.

2. Get on irc.perl.org and ask miyagawa in #plack, after all the wrote the thing.

(3. strict, warnings)

# ? Feb 28, 2014 19:52

uG: Apr 23, 2003; by Ralp

I'm not sure if this is the best way to do it or not, but after redoing the XPath bits I was able to extract all the subforums. I guess the part that confuses me is

code:

process '//div[@class="subforums"]//a[contains(@class, "forum")]', 'subforums[]' =>

works, when in my head that reads as finding

code:

<div class='subforums'><a class='forum_1'>X</a></div>
<div class='subforums'><a class='forum_2'>Y</a></div>

and not

code:

<div class='subforums>
    <a class='forum_1'>X</a>
    <a class='forum_2'>Y</a>
</div>

Working code: https://gist.github.com/anonymous/53032390768a65dfd9d9#file-gistfile1-pl-L38

# ? Feb 28, 2014 20:12

Mithaldu: Sep 25, 2007; Let's cuddle.

Good that you worked it out. The way you did it is actually what i thought you'd need to be doing.

# ? Feb 28, 2014 20:14

uG: Apr 23, 2003; by Ralp

And miyagawa modules usually handle your strict/warnings incantations :ssh:

# ? Feb 28, 2014 20:19

Mithaldu: Sep 25, 2007; Let's cuddle.

I've never seen them to that before, and Plack, Web::Scraper, Hash::MultiValue don't. Can you give any examples?

# ? Feb 28, 2014 21:08

uG: Apr 23, 2003; by Ralp

Mithaldu posted:

I've never seen them to that before, and Plack, Web::Scraper, Hash::MultiValue don't. Can you give any examples?

https://github.com/miyagawa/web-scraper/blob/master/lib/Web/Scraper.pm#L2-L3

https://github.com/plack/Plack/blob/master/lib/Plack.pm#L3-L4

https://github.com/miyagawa/Hash-MultiValue/blob/master/lib/Hash/MultiValue.pm#L3

If you use these modules it imports their 'use strict' and 'use warnings' I thought?

e: i see now that it doesn't, don't know why I thought that for so long

uG fucked around with this message at 21:47 on Feb 28, 2014

# ? Feb 28, 2014 21:44

Mithaldu: Sep 25, 2007; Let's cuddle.

As you found, these don't, they just set strict and warnings for that module. Modules that export pragmas have to have these calls in their import method: https://metacpan.org/source/CHROMATIC/Modern-Perl-1.20140107/lib/Modern/Perl.pm#L33

# ? Feb 28, 2014 22:18

uG: Apr 23, 2003; by Ralp

Need help with a regex...

Runnable example code: https://gist.github.com/anonymous/0d89484375909786e5b5
(iterates over each line and outputs [OK] or [ERROR])

code:

while( $play =~ m/\b(?:penalty|penalized)\b.*?(?<yards>$RE{num}{int}) yards?\b.*?(?:(?!penalty)|(?!penalized)|\Z)/gi ) {
    my $yards = $+{yards};
    $penalty_yards = (defined $penalty_yards)?($penalty_yards + $yards):$yards;      
}

The above code is not properly ignoring the line:
'Joe Shmoe rush for 10 yards, penalty MICHIGAN -420 yards declined'

The solution i'm looking for I thought would go something like:

code:

# note that additional (?!declined)
m/\b(?:penalty|penalized)\b.*?(?<yards>$RE{num}{int}) yards?\b.*?(?!declined).*?(?:(?!,.*?penalty)|(?!,.*?penalized)|\Z)/gi

but this still matches $RE{num}{int} as -420 instead of flat out ignoring it as a match.

It may be worth mentioning that (?:(?!,.*?penalty)|(?!,.*?penalized)|\Z) is to handle a line of text that has more than 1 penalty (like: 'Southern Utah penalized -5 yards, Michigan penalized 11 yards'). ,.*? is supposed to use ", TEAM NAME penalty" as an 'end of this penalty' token. I can't just split them apart on a comma because sometimes it will insert a player's name into the play like: 'PENALTY HAWAII intentional grounding (Brennan, C) 0 yards to the Hawaii19, PENALTY HAWAII illegal block 10 yards' and it would split on "Brennan, C"

# ? Apr 29, 2014 20:30

Extortionist: Aug 31, 2001; Leave the gun. Take the cannoli.

uG posted:

code:
while( $play =~ m/\b(?:penalty|penalized)\b.*?(?<yards>$RE{num}{int}) yards?\b.*?(?:(?!penalty)|(?!penalized)|\Z)/gi ) {
    my $yards = $+{yards};
    $penalty_yards = (defined $penalty_yards)?($penalty_yards + $yards):$yards;      
}
The above code is not properly ignoring the line:
'Joe Shmoe rush for 10 yards, penalty MICHIGAN -420 yards declined'

Instead of trying to wrestle that regex into doing what you want, you might try to break it into smaller pieces. Something like:

Perl code:

while ($play =~ m/penal(?:ty|ized)(.*?)(?:penal(?:ty|ized)|\Z)/gi){
    my $penalty_phrase = $1;
    next if ($penalty_phrase =~ /declined/i);
    $penalty_phrase =~ m/($RE{num}{int}) yards/;
    $penalty_yards += $1;
}

Extortionist fucked around with this message at 23:01 on Apr 29, 2014

# ? Apr 29, 2014 22:51

uG: Apr 23, 2003; by Ralp

Extortionist posted:

Instead of trying to wrestle that regex into doing what you want, you might try to break it into smaller pieces. Something like:
Perl code:
while ($play =~ m/penal(?:ty|ized)(.*?)(?:penal(?:ty|ized)|\Z)/gi){
    my $penalty_phrase = $1;
    next if ($penalty_phrase =~ /declined/i);
    $penalty_phrase =~ m/($RE{num}{int}) yards/;
    $penalty_yards += $1;
}

I tried that after getting frustrated with this but it doesn't iterate over the matches like I want. The code you posted lacks negative lookahead so it will take "penalty smu 10 yards blah blah, penalty msu 5 yards", then the first match will be "penalty smu 10 yards blah blah, penalty", leaving "msu 5 yards" not to get iterated over. Adding negative lookahead leaves the (.*?) (aka $1 aka $penalty_phrase) to always capture nothing.

# ? Apr 29, 2014 23:45

Extortionist: Aug 31, 2001; Leave the gun. Take the cannoli.

uG posted:

I tried that after getting frustrated with this but it doesn't iterate over the matches like I want. The code you posted lacks negative lookahead so it will take "penalty smu 10 yards blah blah, penalty msu 5 yards", then the first match will be "penalty smu 10 yards blah blah, penalty", leaving "msu 5 yards" not to get iterated over. Adding negative lookahead leaves the (.*?) (aka $1 aka $penalty_phrase) to always capture nothing.

Oops, yeah, it'll do that. You'd want to use a positive look-ahead instead:

Perl code:

while ($play =~ m/penal(?:ty|ized)(.*?)(?=penal(?:ty|ized)|\Z)/gi){

# ? Apr 30, 2014 00:01

Olesh: Aug 4, 2008; Why did the circus close?

A long, chilling list of animal rights violations.

Hey, I've got a pair of questions:

I have a custom test that I've written in Perl for the purposes of load testing our company's web-based app. Because my boss was the one who originally coded the app which is obtuse and arcane, and because the requirements placed on the test are equally obtuse and arcane (I have to synchronize each instance so each runs as simultaneously as possible, including final submission to the server), I can't use any existing software or frameworks. I have a Perl script that simulates a browser and performs the exact sequences of GETs and POSTs and redirects to test the app.

The question, then, is what exactly am I looking for when load testing the server? The server is a Windows Server 2008 machine running IIS7. The goal my boss handed down is to find out what it takes to "break the server", but he himself doesn't actually know what that means so I can't ask him for any more clarification. What counters would I be tracking in Perfmon to determine where the bottleneck is, besides the standard CPU/memory?

As a secondary question, I'm trying to separate STDOUT and STDERR output into two separate logfiles - I want regular program output to go to one, and errors to go to the other logfile. This is what I've got, stolen from elsewhere on the internet:

Perl code:

open OUTPUT, '>', $logfile or die "Can't create filehandle: $!";
select OUTPUT; $|= 1;
open STDERR, '>', $errlog or die "Can't create error filehandle: $!";

Now, $logfile gets written to just fine. However, $errlog always stays empty. My code is still in progress and there are still issues I'm running into (for example, I discovered that WWW::Mechanize will apparently throw an exception and the script will end if it fails to GET a page for any reason) but none of these errors are showing up in the logfile designated by $errlog, nor are they showing up in $logfile.

What's going on here? Does STDERR only actually contain things I specifically output to it , i.e. PRINT STDERR "This is an error.\n"; ? Since I'm running hundreds of separate instances across multiple geographically distributed VMs, I can't get any individual script output. Is my best bet for debugging basically to be throwing a PRINT STDERR "Some debug message" after almost every line of code?

# ? Apr 30, 2014 09:08

uG: Apr 23, 2003; by Ralp

Olesh posted:

I have a custom test that I've written in Perl for the purposes of load testing our company's web-based app. Because my boss was the one who originally coded the app which is obtuse and arcane, and because the requirements placed on the test are equally obtuse and arcane (I have to synchronize each instance so each runs as simultaneously as possible, including final submission to the server), I can't use any existing software or frameworks. I have a Perl script that simulates a browser and performs the exact sequences of GETs and POSTs and redirects to test the app.

The question, then, is what exactly am I looking for when load testing the server? The server is a Windows Server 2008 machine running IIS7. The goal my boss handed down is to find out what it takes to "break the server", but he himself doesn't actually know what that means so I can't ask him for any more clarification. What counters would I be tracking in Perfmon to determine where the bottleneck is, besides the standard CPU/memory?

I'm not sure if i completely understand your 'synchronize each instance' requirement... do you mean you run a single threaded perl script from all of your server instances at the same time? Because what you should really be doing to hammer a server is using threads or an async module (and a module to make mechanize async friendly, LWP::Protocol::AnyEvent::http) like AnyEvent to fire off hundreds of connections from each server at the same time. From there you could do stuff like log the response time, make sure it returns what you expect, etc and look for where, say, the response time falls apart or you start getting ->response codes you don't expect.

Olesh posted:

As a secondary question, I'm trying to separate STDOUT and STDERR output into two separate logfiles - I want regular program output to go to one, and errors to go to the other logfile. This is what I've got, stolen from elsewhere on the internet:
Perl code:
open OUTPUT, '>', $logfile or die "Can't create filehandle: $!";
select OUTPUT; $|= 1;
open STDERR, '>', $errlog or die "Can't create error filehandle: $!";
Now, $logfile gets written to just fine. However, $errlog always stays empty.

Try IO::Handle (core perl module). Here is a bit from perlmonks:

code:

use strict;
use warnings;
use IO::Handle;

open INPUT,  '<', "input.txt"  or die $!;
open OUTPUT, '>', "output.txt" or die $!;
open ERROR,  '>', "error.txt"  or die $!;

 STDIN->fdopen( \*INPUT,  'r' ) or die $!;
STDOUT->fdopen( \*OUTPUT, 'w' ) or die $!;
STDERR->fdopen( \*ERROR,  'w' ) or die $!;

# prints to output.txt:
print "Hello Output File\n";

# reads everything from input.txt and prints it to output.txt:
print <>;

# prints to error.txt:
warn "Hello Error File\n";

Olesh posted:

My code is still in progress and there are still issues I'm running into (for example, I discovered that WWW::Mechanize will apparently throw an exception and the script will end if it fails to GET a page for any reason) but none of these errors are showing up in the logfile designated by $errlog, nor are they showing up in $logfile.

You can set WWW::Mechanize->new( autocheck => 0 ); to handle the errors yourself (check $mech->success) or use Try::Tiny (or use an eval block if you can't use CPAN);

Olesh posted:

Is my best bet for debugging basically to be throwing a PRINT STDERR "Some debug message" after almost every line of code?

Again if you can use CPAN, I would suggest Log::Log4perl (use Log::Log4perl qw( :easy ); for simple logging like ERROR 'some message'; DEBUG 'some message';). This lets you set different outputs to each level of message (trace, info, debug, warn, error, fatal) and adds meta info like the date/time its written.

# ? Apr 30, 2014 20:23

Adbot: ADBOT LOVES YOU

# ? Jun 3, 2024 23:51

Olesh: Aug 4, 2008; Why did the circus close?

A long, chilling list of animal rights violations.

uG posted:

I'm not sure if i completely understand your 'synchronize each instance' requirement... do you mean you run a single threaded perl script from all of your server instances at the same time? Because what you should really be doing to hammer a server is using threads or an async module (and a module to make mechanize async friendly, LWP::Protocol::AnyEvent::http) like AnyEvent to fire off hundreds of connections from each server at the same time. From there you could do stuff like log the response time, make sure it returns what you expect, etc and look for where, say, the response time falls apart or you start getting ->response codes you don't expect.

Currently, when I'm running the test, I have anywhere between 1 and 40 VMs running, and each VM is triggered via a PHP script to run X copies of the perl script. Each perl script is a separate instance, with a total number of instances being equal to (# of VMs) * (X copies per VM).

As far as the requirements go, I don't understand it entirely myself. My boss has this idea that clients are going to want to know what happens if, as users go through the application, the stars align and 10,000 users all click the submit button at the exact same instant. Thus, for the load testing, since I don't possess a machine capable of running 20,000 instances, I have to have some way to verify that each seperate VM is synched up and requests are reaching the server at the same instant (or as close as I can manage).

He also is insistent that each instance has to be as close to a real user as possible, i.e., they have to go through the app from start to finish hitting every page and submitting using the data generated by the app, necessitating the labyrinthine sequence of GET and POSTs (51 in total). Successfully navigating the app generates a database entry as well, which is sort of the final arbiter of whether or not any errors occurred. All while ensuring that each request hits the server as near-simultaneously as possible.

My ultimate goal in this is to simulate a variable number of users, find out at what number of users errors occur, identify the bottleneck (whether it's CPU, memory, etc), and be able to use the data to guide future expansion with regard to load balancing when that becomes necessary in the future.

The idea of event driven programming, like in AnyEvent is almost completely foreign to me. All of the work I've done so far has been PHP/javascript based; this is my only foray into Perl ever and the only time I've ever used an event for anything was through a need to use top.postMessage() in javascript to bypass the normal restrictions on cross-domain scripting. My boss, as far as I'm aware, has never even heard of it - all of his code is 100% block code.

After looking into AnyEvent::http, I'm not sure it's suitable. Getting from the start of the app to the end of it requires continuity, which is why the automatic cookie handling of WWW::Mechanize was so helpful. I'm having trouble visualizing how I could run a single script, generate a variable number X of sub-processes which each asynchronously handle log-writing, maintain a tightly synchronized schedule of page requests based on the system time, while maintaining the continuity required along each sub-process short of actually forking the process itself, which does not appear significantly different than simply running multiple instances. At the very least, I don't know how I would dynamically generate and maintain an arbitrary number of user agents for the continuity I require.

To give you a more precise example of what I mean, I know that in javascript I can generate an arbitrary number of variables and access them later by storing them in an object, without knowing ahead of time how many variables I will need.

JavaScript code:

$i = $object.length;
$object[$i]["foo"] = "foo";
$object[$i]["bar"] = "bar";
$object[$i]["baz"] = new Baz("baz");
updateObjectCount();

This should be possible in Perl, of course, but I don't know how. A quick google search suggests that I should be able to use a hash for this.

Perl code:

use strict;

my @array;
my $array;
my $arrayLength;
my $example = 3;
my $i;

for ($i=0; $i<10; $i++) {
	push @array, {};
	$arrayLength = $array;
	$arrayLength--;
	$array[$arrayLength]{foo} = "foo_".$i;
	$array[$arrayLength]{bar} = "bar_".$i;
	$array[$arrayLength]{baz} = ["baz_".$i, "baz1_".$i, "baz2_".$i, "baz3_".$i];
}

print $array[0]{foo}."\n"; #should print "foo_0"
print $array[3]{bar}."\n"; #should print "bar_3"
print $array[5]{baz}[3]."\n"; #should print "baz3_5"

This appears to work as intended; am I getting the general idea with regard to how to handle dynamic variable assignment in Perl?

uG posted:

Try IO::Handle (core perl module). Here is a bit from perlmonks:
(excised)

I'll give that a shot.

uG posted:

You can set WWW::Mechanize->new( autocheck => 0 ); to handle the errors yourself (check $mech->success) or use Try::Tiny (or use an eval block if you can't use CPAN);

Again if you can use CPAN, I would suggest Log::Log4perl (use Log::Log4perl qw( :easy ); for simple logging like ERROR 'some message'; DEBUG 'some message'. This lets you set different outputs to each level of message (trace, info, debug, warn, error, fatal) and adds meta info like the date/time its written.

I appreciate the suggestions! I can use CPAN; I'm just not very linux literate so installing a new module requires me to actually go back to my written instructions, vs using the Perl Package Manager that comes with ActiveState Perl.

# ? Apr 30, 2014 23:33

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Perl Short Questions Megathread: executable line noise

«‹›72 »