Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Anaconda Rifle
Mar 23, 2007

Yam Slacker

tef posted:

code:
%bar = reverse %bar;

What do Perl newbies think this does?

Adbot
ADBOT LOVES YOU

Rohaq
Aug 11, 2006

Centripetal Horse posted:

You worked that out without running it or Googling? Very nicely done! The "..|" stumped many people, including some experienced Perl Monks, when I used it as my Perl signature (back when such a thing was common... is it still?) Mostly, it was the capitalizing of the "A" in Ace that they couldn't get a grip on without running the code.
I had to look up \u on Perldocs, but I recognised it as a special escape character, that was about it. The ..| didn't phase me though ;)

Centripetal Horse posted:

I love Perl. I especially love regular expressions in Perl.
Damned skippy. You can do it in almost any other language too, but Perl's built-in regex operators make it all so easy. Being able to do mass replacements with regex in a single one-liner is something I've not seen in other languages either.

I hear a lot of bad mouthing when it comes to Perl, and I can understand why; Perl lets you write some loving abysmal code, but it doesn't mean that you have to code loving abysmally.

In other news, I am becoming known as The Regex King in my workplace. If people want it parsed, transformed, extracted, or migrated from one system to another, they seem to be coming to me.

Anaconda Rifle
Mar 23, 2007

Yam Slacker

Rohaq posted:

Damned skippy. You can do it in almost any other language too, but Perl's built-in regex operators make it all so easy. Being able to do mass replacements with regex in a single one-liner is something I've not seen in other languages either.

Last I checked, Ruby allows you to do it too, but it's not The Ruby Way.

Rohaq posted:

In other news, I am becoming known as The Regex King in my workplace. If people want it parsed, transformed, extracted, or migrated from one system to another, they seem to be coming to me.

Same here. I still have people coming to me from previous jobs for help. If they would just put in a few hours to learn and then practice every now and then, they would see that regexes aren't magical or difficult.

Rohaq
Aug 11, 2006

Anaconda Rifle posted:

Last I checked, Ruby allows you to do it too, but it's not The Ruby Way.
Ah, Ruby isn't a language I've looked into picking up. I am loving Python though, if only for its knack to make me feel utterly retarded when I write out masses of code, then end up reducing it into about 6 lines.

Anaconda Rifle posted:

Same here. I still have people coming to me from previous jobs for help. If they would just put in a few hours to learn and then practice every now and then, they would see that regexes aren't magical or difficult.
They're not difficult to begin with, but I've found that no matter how well I think I know them, there's always something new to pick up.

My manager was looking at a regex I was using to extract info out of a log file with comma separated fields, and asked why I was using "([^,]+)," instead of "(.+?)," as a matching group. I explained that matching everything that wasn't a "," character greedily up until a "," was far more efficient than matching every character non-greedily up until the first instance of ",", and stepped him through the process in Regex Coach* to show it. Apparently he's been using Perl and regex for years and didn't realise this.

* Speaking of which, Regex Coach is by far my favourite regex tester, and is some seriously awesome software. It's not been updated in a long time though, and is missing some of the more recent features of the PCRE engine, like named groups, etc. Does anybody know of any other decent PCRE compatible regex checkers with the same level of functionality?

Rohaq fucked around with this message at 10:45 on Oct 10, 2012

Anaconda Rifle
Mar 23, 2007

Yam Slacker

Rohaq posted:

My manager was looking at a regex I was using to extract info out of a log file with comma separated fields, and asked why I was using "([^,]+)," instead of "(.+?)," as a matching group. I explained that matching everything that wasn't a "," character greedily up until a "," was far more efficient than matching every character non-greedily up until the first instance of ",", and stepped him through the process in Regex Coach* to show it. Apparently he's been using Perl and regex for years and didn't realise this.

It's not just an efficiency thing. /"(.+?)",/ and /"([^"]+)",/ can get different results. It's subtle, but extremely important.

I changed your example intentionally to make a point.

uG
Apr 23, 2003

by Ralp
Question on module naming...

I have a module. xxx::xxx::damerau I want to make an automaton in it so I can feed it a list and ultimately not have to actually run the algorithm on every item in the list/tree.

My question is should I include this automaton in the module itself as a new method, or should I create a new module xxx:xxx::damerau::automaton, even though i'm pretty sure I won't be able to use the exact code in xxx:xxx::damerau in the automaton?

tef
May 30, 2004

-> some l-system crap ->

het posted:

Oh, I did but I was just doing one-liners on the command line, that's funny

I learned about sort in a void context from a crazed intercal programmer.

leedo
Nov 28, 2000

Anaconda Rifle posted:

It's not just an efficiency thing. /"(.+?)",/ and /"([^"]+)",/ can get different results. It's subtle, but extremely important.

I changed your example intentionally to make a point.

[^"] matches newlines by default whereas . doesn't? Took me a while of staring to even come up with a guess.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

leedo posted:

[^"] matches newlines by default whereas . doesn't? Took me a while of staring to even come up with a guess.

The regex /"(.+?)",/ matches the string '""",' but the regex /"([^"]+)",/ does not.

Anaconda Rifle
Mar 23, 2007

Yam Slacker
Run /"(.+?)",/ and /"([^"]+)",/ against the following string:

a"b"c"d",

The first will return "b"c"d", and the second returns "d",.

Rohaq
Aug 11, 2006
I should have probably said; my regex contained no quote marks, I just put them in quote marks in my post.

But to my other question: Anybody got any favourite regex testers in here?

Sebbe
Feb 29, 2004

Rohaq posted:

But to my other question: Anybody got any favourite regex testers in here?
I'm quite fond of RegexBuddy. I've use it so many times for debugging regexes, and it supports a ton of different regex implementations.

The Gripper
Sep 14, 2004
i am winner
I don't find myself testing regexps often enough to install anything, so I just use http://rubular.com/

homercles
Feb 14, 2010

If only there was a perl module that processed text files formatted as a csv. And if only there was an xs version of it so it ran quickly.

I wonder what that module would be called? All right all right it's a log file with comma-joined fields not a CSV but still...

homercles fucked around with this message at 21:16 on Oct 21, 2012

Nevergirls
Jul 4, 2004

It's not right living this way, not letting others know what's true and what's false.

Rohaq posted:

I should have probably said; my regex contained no quote marks, I just put them in quote marks in my post.

But to my other question: Anybody got any favourite regex testers in here?

Regexp::Debugger, like any good DCONWAY module, is insane. It's brand new and I've only used it a couple times when my lovely one-liner isn't working (via rxrx). It allows you to step through what the engine is doing, not just get a match/non-match.

Here's a screencast from YAPC::NA:
https://www.youtube.com/watch?v=zcSFIUiMgAs

Rohaq
Aug 11, 2006

homercles posted:

If only there was a perl module that processed text files formatted as a csv. And if only there was an xs version of it so it ran quickly.

I wonder what that module would be called? All right all right it's a log file with comma-joined fields not a CSV but still...
Yes yes, /Text::CSV(?:_XS)?/ would parse out comma separated values, but I also added support for multiple log formats as an input, and used named groups to pull out the relevant fields, so let's say I have the following:

STDIN:
code:
blah1, blah2, thingiwant1, blah3, thingiwant2, thingiwant3
thingiwant1, thingiwant2, thingiwant3
I can do the following for much cleaner looking code than checking for the number of columns present etc:

code:
my $re_longline_match = '^(?:[^,]+, ){2}(?<group1>[^,]+), [^,]+, (?<group2>[^,]+), (?<group3>.+)$';
my $re_shortline_match = '^(?<group1>[^,]+), (?<group2>[^,]+), (?<group3>[^,]+)$';
while (<STDIN>) {
  if ( $_ =~ /$re_longline_match/ || $_ =~ /$re_shortline_match/ ) {
    print 'Group 1: ', $+{group1}, "\n";
    print 'Group 2: ', $+{group2}, "\n";
    print 'Group 3: ', $+{group3}, "\n\n";
  }
}
Interestingly, the regex above in $re_longline_match fails under Regexp::Debugger's rxrx, but works in a script.

Sebbe posted:

I'm quite fond of RegexBuddy. I've use it so many times for debugging regexes, and it supports a ton of different regex implementations.
I'd love to try it, but I'm not paying money for a tool where I can't test its functionality first, money back guarantee or not.

Rohaq fucked around with this message at 03:20 on Oct 22, 2012

Ninja Rope
Oct 22, 2005

Wee.
I like The Regex Coach but the newer versions are Windows only. :(

Edit: It's free.

Rohaq
Aug 11, 2006

Ninja Rope posted:

I like The Regex Coach but the newer versions are Windows only. :(

Edit: It's free.
Yep, that's my favourite too, but it's not been updated in ages, and lacks features like named groups, etc.

It's also donationware, and free for personal/non-commercial use only, so trying to get it approved at work is a bitch.

'I'd like this software for my job.'
'How much does it cost?'
'Err, however much you like? $5? $15? I don't know.'

The Gripper
Sep 14, 2004
i am winner

Rohaq posted:

Yep, that's my favourite too, but it's not been updated in ages, and lacks features like named groups, etc.

It's also donationware, and free for personal/non-commercial use only, so trying to get it approved at work is a bitch.

'I'd like this software for my job.'
'How much does it cost?'
'Err, however much you like? $5? $15? I don't know.'
It can be worse than an annoyance, depending on policy. A joint I worked at wouldn't approve donationware use because they couldn't account for the cost without an actual invoice, a donation receipt wasn't enough.

Rohaq
Aug 11, 2006

The Gripper posted:

It can be worse than an annoyance, depending on policy. A joint I worked at wouldn't approve donationware use because they couldn't account for the cost without an actual invoice, a donation receipt wasn't enough.
It's a shame too, because it's really loving useful software, even when you just need to extract some info into something useful.

'Yeah, we need you to map data 1 against data 2, all the mappings are in an Excel table.'

*Copies table, paste into regex coach*
*Tick 'm' for multiline*
Regex: /^([^\t]+)\t(.+)$/
Replace with: '\1' => '\2',
*Paste hash layout into Perl script, surround with %map_blah ( <map> );*
code:
if ( exists $map_blah{$extracted_data} ) {
  $extracted_data = $map_blah{$extracted_data};
} else {
  warn "No mapping for blah: '$extracted_data'.\n";
}
Much faster than manually typing that poo poo out.

Rohaq fucked around with this message at 10:43 on Oct 23, 2012

uG
Apr 23, 2003

by Ralp
I'm trying to get into tests like a good programmer, but i'm hung up trying to figure out how to pass a hash reference, an array, or anything other than a single scalar using Test::Base :(

code:
=== test matching
--- input
four
--- expected
0

qntm
Jun 17, 2009
A hash reference is a single scalar. Can you show some example code?

uG
Apr 23, 2003

by Ralp
Basically I have a function that takes a scalar (which returns a scalar) or an array reference (which returns a hash reference). Here is a test that tests passing a scalar: https://github.com/ugexe/Text--Levenshtein--Damerau/blob/master/t/02_dld.t

I'm trying to write a similar test for when an array is passed and a hash ref is returned. Here is code that doesn't work, but hopefully explains what i'm trying to do:
code:
use Test::Base;
use Text::Levenshtein::Damerau;

plan tests => 1 * blocks;
my $tld = Text::Levenshtein::Damerau->new('four');

run {
	my @input = $block->input;
	# print $input[0] prints four

	my @expected = $block->expected
	# print $expected[0] prints 0

	my $hashref_of_both = ...# Might be pushing it here due to ordering?
	# print $hash_of_both->{'four'}; prints 0
	# print $hash_of_both->{'fuor'}; prints 1
	# print $hash_of_both->{'fourteen'}; prints 4

	is( $tld->dld({ list => \@input }), $hashref_of_both )
}

__END__

=== test matching
--- input
four,fuor,fourteen	# See below comment
--- expected
0,1,4	# This will be passed as a scalar naturally, 
	# but I want to pass it as a list

The only thing I can think of is to just split the strings and construct things myself in the run block, but I thought maybe Test::Base had better way.

edit: As much as I like Test::Base, I just wrote the tests with Test::More instead.

code:
use Test::More tests => 8;
use Text::Levenshtein::Damerau;

my $tld = Text::Levenshtein::Damerau->new('four');

#Test scalar argument
cmp_ok( $tld->dld('four'),	'==', 0, 'test helper matching');
cmp_ok( $tld->dld('for'), 	'==', 1, 'test helper insertion');
cmp_ok( $tld->dld('fourth'),'==', 2, 'test helper deletion');
cmp_ok( $tld->dld('fuor'), 	'==', 1, 'test helper transposition');
cmp_ok( $tld->dld('fxxr'), 	'==', 2, 'test helper substitution');
cmp_ok( $tld->dld('FOuR'), 	'==', 3, 'test helper case');
cmp_ok( $tld->dld(''), 	'==', 4, 'test helper empty');


#Test array reference argument
my @list = ('four','fourty','fourteen','');
is_deeply($tld->dld({ list => \@list }), { four => 0, fourty => 2, fourteen => 4, '', => 4 }, 'test dld(\@array_ref)');

uG fucked around with this message at 01:18 on Oct 25, 2012

SaxMaverick
Jun 9, 2005

The stuff of nightmares
Alright, I have held back asking this question because it was due last Friday, but after searching forever and no help from a douchebag TA, I have to know how I can make this work.

code:
+#!/usr/bin/perl
+use warnings;
+use strict;
+
+my $file_in = $ARGV[0];
+my $key;
+my $direction = "Forward";
+my $rdirection = "Reverse";
+my %fasta_hash ;
+
+open ( FASTA_IN , "<" , $file_in) || die $!;
+open ( OUT , ">", "output.txt") || die $!;
+## parse fasta file into a hash ##
+while ( my $line = <FASTA_IN>){
+    chomp $line;
+	if ($line =~/^>/){
+		$key = $line;
+		}
+
+	else{
+	$fasta_hash{$key} .= uc($line);	
+	chomp $line;
+		}
+    }
+
+close (FASTA_IN);
+my @scaffolds = keys %fasta_hash;
+
+foreach (@scaffolds){
+    get_ORFs_of_210_nt($_,$fasta_hash{$_},$direction);				
+	get_rORFs_of_210_nt($_,$fasta_hash{$_},$rdirection);
+	}
+	
+exit;
+ 
+sub get_ORFs_of_210_nt{
+	my $header = $_[0];
+	my $seq = $_[1];
+	my $dir = $_[2];
+	my @fstarts;
+	my @fstops;
+	my $codon;
+	my @frame1starts;
+	my @frame2starts;
+	my @frame3starts;
+	my @frame1stops;
+	my @frame2stops;
+	my @frame3stops;
+	my $length1;
+	my $start1;
+	my $stop1;
+	my $length2;
+	my $start2;
+	my $stop2;
+	my $length3;
+	my $start3;
+	my $stop3;
+	
+	while ($seq =~/ATG/ig){
+        push (@fstarts , pos ($seq)-3);
+	}
+    while ($seq =~/TAG|TAA|N|TGA/ig){
+        push (@fstops , pos ($seq));								
+    }   
+    while (@fstarts){
+        my $codon = shift @fstarts;
+        if ($codon%3 == 0){												
+            push @frame1starts, $codon;
+            }
+        elsif ($codon%3==1){
+            push @frame2starts, $codon;
+            }
+        elsif ($codon%3==2){
+            push @frame3starts, $codon;
+            }
+        else {
+            die "Illegal Modulus Return";
+            }
+        }  
+	while (@fstops){
+		my $codon = shift @fstops;
+		if ($codon%3 == 0){
+            push @frame1stops, $codon-1;
+            }
+        elsif ($codon%3==1){
+            push @frame2stops, $codon-1;
+            }
+        elsif ($codon%3==2){
+            push @frame3stops, $codon-1;
+            }
+        else {
+            die "Illegal Modulus Return";
+            }
+        }
#####This is the problem section ##########
+	foreach (@frame1starts){
+		$start1 = shift(@frame1starts);
+		$stop1 = shift (@frame1stops);
+		$length1 = ($stop1-$start1)+1;
+		if ($start1 < $stop1){
+			if(($length1) >= 210){
+				print OUT "Frame 1:  ",$header," Direction: ",$dir,"\n";
+				my $substring = substr($seq,$start1,$length1);
+				print OUT "   ",$substring."\n";
+				$start1 = shift(@frame1starts);
+                               $stop1 = shift(@frame1stops);
+			}
+			elsif(($length1) < 210 && ($length1) > 0){
+				$stop1 = shift(@frame1stops);
+			}
+		}
+		else{
+			$stop1=shift(@frame1stops);
+		}
+	}
######## Below this is the same procedure as above, for next frame #######
+	foreach (@frame2starts){
+		$start2 = shift(@frame2starts);
+		$stop2 = shift (@frame2stops);
+		$length2 = ($stop2-$start2)+1;
+		if ($start2 < $stop2){
+			if(($length2) >= 210){
+				print OUT "Frame 2:  ",$header," Direction: ",$dir,"\n";
+				my $substring = substr($seq,$start2,$length2);
+				print OUT "   ",$substring."\n";
+				$stop2 = shift(@frame2stops);
+			}
+			elsif(($length2) < 210 && ($length2) > 0){
+				$stop2 = shift(@frame2stops);
+			}
+		}
+		else{
+			$stop2=shift(@frame2stops);
+		}
+	}
+	foreach (@frame3starts){
+		$start3 = shift(@frame3starts);
+		$stop3 = shift (@frame3stops);
+		$length3 = ($stop3-$start3)+1;
+		if ($start3 < $stop3){
+			if(($length3) >= 210){
+				print OUT "Frame 3:  ",$header," Direction: ",$dir,"\n";
+				my $substring = substr($seq,$start3,$length3);
+				print OUT "   ",$substring."\n";
+				$stop3 = shift(@frame3stops);
+			}
+			elsif(($length3) < 210 && ($length3) > 0){
+				$stop3 = shift(@frame3stops);
+			}
+		}
+		else{
+			$stop3=shift(@frame3stops);
+		}
+	}
+}
+
########this is the 2nd sub called, almost exactly identical except #######
########the sequence read in is reversed and translated to the complement #######

+sub get_rORFs_of_210_nt{
+	my $rheader = $_[0];
+	my $rseq = $_[1];
+	my $rdir = $_[2];
+	my @rfstarts;
+	my @rfstops;
+	my $rcodon;
+	my @rframe1starts;
+	my @rframe2starts;
+	my @rframe3starts;
+	my @rframe1stops;
+	my @rframe2stops;
+	my @rframe3stops;
+	my $rlength1;
+	my $rstart1;
+	my $rstop1;
+	my $rlength2;
+	my $rstart2;
+	my $rstop2;
+	my $rlength3;
+	my $rstart3;
+	my $rstop3;
+	
+	$rseq = reverse($rseq);
+	$rseq = ~tr/ATGCatgc/TACGtagc/;
#####the rest is the same as the first sub######
The subroutine in question is supposed to function as such:
  • The starts and stops for each frame are put into corresponding arrays
  • The first start and stop needs to be shifted into a scalar of some sort - this is the first spot I have trouble with
  • If the start is before the stop and at least 210 base pairs away, then it is printed along with the substring - this works fine.
  • BOTH the start and stop are shifted, and the loop should start over
  • If the start is prior to the stop, but less than 210 away, only the stop should shift, then start loop again
  • If the start is after the stop, shift the stop and re-loop as before

I'm getting different errors no matter which way I try. I still am awful with handling uninitialized variables - when I get the code to actually work, it is only due to minutes of scrolling unit warning messages. Also, I don't think I can get the arrays to loop correctly and shift - should I foreach the stops, or while for stops?

Don't need an answer or anything like that, but everyone I have talked to has been useless. Just some hints would be amazing.

Rohaq
Aug 11, 2006
Did you copy that out of a unified diff or something? What's with all the pluses?

Other than that, please don't dick around with "#####the rest is the same as the first sub######" when trying to post a code example, because using 'the same as the first sub' after stripping those pluses out results in the following:

code:
Global symbol "$seq" requires explicit package name at wtf.pl line 186.
Global symbol "@fstarts" requires explicit package name at wtf.pl line 187.
Global symbol "$seq" requires explicit package name at wtf.pl line 187.
Global symbol "$seq" requires explicit package name at wtf.pl line 189.
Global symbol "@fstops" requires explicit package name at wtf.pl line 190.
Global symbol "$seq" requires explicit package name at wtf.pl line 190.
Global symbol "@fstarts" requires explicit package name at wtf.pl line 192.
Global symbol "@fstarts" requires explicit package name at wtf.pl line 193.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 195.

Global symbol "@frame2starts" requires explicit package name at wtf.pl line 198.

Global symbol "@frame3starts" requires explicit package name at wtf.pl line 201.

Global symbol "@fstops" requires explicit package name at wtf.pl line 207.
Global symbol "@fstops" requires explicit package name at wtf.pl line 208.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 210.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 213.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 216.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 223.

Global symbol "$start1" requires explicit package name at wtf.pl line 224.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 224.

Global symbol "$stop1" requires explicit package name at wtf.pl line 225.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 225.
Global symbol "$length1" requires explicit package name at wtf.pl line 226.
Global symbol "$stop1" requires explicit package name at wtf.pl line 226.
Global symbol "$start1" requires explicit package name at wtf.pl line 226.
Global symbol "$start1" requires explicit package name at wtf.pl line 227.
Global symbol "$stop1" requires explicit package name at wtf.pl line 227.
Global symbol "$length1" requires explicit package name at wtf.pl line 228.
Global symbol "$header" requires explicit package name at wtf.pl line 229.
Global symbol "$dir" requires explicit package name at wtf.pl line 229.
Global symbol "$seq" requires explicit package name at wtf.pl line 230.
Global symbol "$start1" requires explicit package name at wtf.pl line 230.
Global symbol "$length1" requires explicit package name at wtf.pl line 230.
Global symbol "$start1" requires explicit package name at wtf.pl line 232.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 232.

Global symbol "$stop1" requires explicit package name at wtf.pl line 233.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 233.
Global symbol "$length1" requires explicit package name at wtf.pl line 235.
Global symbol "$length1" requires explicit package name at wtf.pl line 235.
Global symbol "$stop1" requires explicit package name at wtf.pl line 236.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 236.
Global symbol "$stop1" requires explicit package name at wtf.pl line 240.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 240.
Global symbol "@frame2starts" requires explicit package name at wtf.pl line 244.

Global symbol "$start2" requires explicit package name at wtf.pl line 245.
Global symbol "@frame2starts" requires explicit package name at wtf.pl line 245.

Global symbol "$stop2" requires explicit package name at wtf.pl line 246.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 246.
Global symbol "$length2" requires explicit package name at wtf.pl line 247.
Global symbol "$stop2" requires explicit package name at wtf.pl line 247.
Global symbol "$start2" requires explicit package name at wtf.pl line 247.
Global symbol "$start2" requires explicit package name at wtf.pl line 248.
Global symbol "$stop2" requires explicit package name at wtf.pl line 248.
Global symbol "$length2" requires explicit package name at wtf.pl line 249.
Global symbol "$header" requires explicit package name at wtf.pl line 250.
Global symbol "$dir" requires explicit package name at wtf.pl line 250.
Global symbol "$seq" requires explicit package name at wtf.pl line 251.
Global symbol "$start2" requires explicit package name at wtf.pl line 251.
Global symbol "$length2" requires explicit package name at wtf.pl line 251.
Global symbol "$stop2" requires explicit package name at wtf.pl line 253.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 253.
Global symbol "$length2" requires explicit package name at wtf.pl line 255.
Global symbol "$length2" requires explicit package name at wtf.pl line 255.
Global symbol "$stop2" requires explicit package name at wtf.pl line 256.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 256.
Global symbol "$stop2" requires explicit package name at wtf.pl line 260.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 260.
Global symbol "@frame3starts" requires explicit package name at wtf.pl line 263.

Global symbol "$start3" requires explicit package name at wtf.pl line 264.
Global symbol "@frame3starts" requires explicit package name at wtf.pl line 264.

Global symbol "$stop3" requires explicit package name at wtf.pl line 265.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 265.
Global symbol "$length3" requires explicit package name at wtf.pl line 266.
Global symbol "$stop3" requires explicit package name at wtf.pl line 266.
Global symbol "$start3" requires explicit package name at wtf.pl line 266.
Global symbol "$start3" requires explicit package name at wtf.pl line 267.
Global symbol "$stop3" requires explicit package name at wtf.pl line 267.
Global symbol "$length3" requires explicit package name at wtf.pl line 268.
Global symbol "$header" requires explicit package name at wtf.pl line 269.
Global symbol "$dir" requires explicit package name at wtf.pl line 269.
Global symbol "$seq" requires explicit package name at wtf.pl line 270.
Global symbol "$start3" requires explicit package name at wtf.pl line 270.
Global symbol "$length3" requires explicit package name at wtf.pl line 270.
Global symbol "$stop3" requires explicit package name at wtf.pl line 272.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 272.
Global symbol "$length3" requires explicit package name at wtf.pl line 274.
Global symbol "$length3" requires explicit package name at wtf.pl line 274.
Global symbol "$stop3" requires explicit package name at wtf.pl line 275.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 275.
Global symbol "$stop3" requires explicit package name at wtf.pl line 279.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 279.
wtf.pl had compilation errors.
Because the second sub doesn't have the same variable names, and I'm not going to do a selective find and replace to fix it.

Give us something we don't have to hack together to get running, and maybe we can offer some advice?

uG
Apr 23, 2003

by Ralp
The first thing i'd do is make it so you aren't shifting a loop's array multiple times in a single iteration.

Also, the foreach loop you are using is making a copy of the array in memory, so I don't think your shifts are going to affect the loop like you think. I imagine you want a while loop.

uG fucked around with this message at 01:24 on Oct 30, 2012

The Gripper
Sep 14, 2004
i am winner
Shifting values off the array you're iterating over isn't a good idea:
Perl code:
my @x = (1,2,3,4,5,6);

foreach (@x) {
    my $y = shift @x;
    print $y."\n";
}
#Output:
#1
#2
#3
The foreach will keep track of the index it's up to and if the array has been shortened by shift it will end (current index > length of array).

I don't actually know/can't tell what the intended result is of those loops because of the brokenness of them, but if you're hellbent on consuming the array you could use:
Perl code:
my @x = (1,2,3,4,5,6);
my @y = (7,8,9,10,11,12);

while (my $start1 = shift @x) {
    my $stop1 = shift @y;
    print "start: $start1 end: $stop1\n";
}
#Output:
#start: 1 end: 7
#start: 2 end: 8
#start: 3 end: 9
#start: 4 end: 10
#start: 5 end: 11
#start: 6 end: 12
but even then I don't know why you're shifting off @frame2stops a second time so I can't offer any advice past that. Maybe it will just work?

The Gripper fucked around with this message at 01:52 on Oct 30, 2012

SaxMaverick
Jun 9, 2005

The stuff of nightmares
I apologize for the bad code and how I formatted it, maybe let me break it down a little better, then I will fix up my complete code. Is there a better way to post it than pasting it all here?

Rohaq
Aug 11, 2006
PasteBin or similar, plus you can include syntax highlighting.

SaxMaverick
Jun 9, 2005

The stuff of nightmares
Thanks. As you can tell this is bioinformatics being taught by biology trained people, so we're learning syntax/regex as its needed, instead of incrementally. It's not majorly important in the long run, but since the other section of the course is in FORTRAN (which I spent four weeks on this summer), I went ahead and got a good book on perl to teach myself instead, because basically, anything is better than fortran.

The Gripper
Sep 14, 2004
i am winner
Does anyone actually use Fortran unless they're forced to, anymore?

Also I just noticed that your tr/x/y/ line is probably wrong unless you want lowercase "gc" to be untouched.

SaxMaverick
Jun 9, 2005

The stuff of nightmares

The Gripper posted:

Does anyone actually use Fortran unless they're forced to, anymore?

Also I just noticed that your tr/x/y/ line is probably wrong unless you want lowercase "gc" to be untouched.

It's actually one of the most powerful methods for large scale genomic selection in animal/plant breeding (I'm in animal genetics, but I'm more of a molecular guy). Taking 50,000 genetic markers, and using matrix algebra to calculate breeding values and make selection decisions. Let me tell you about the thrill of waiting five hours to invert a 1,000x50,000 matrix.

Roseo
Jun 1, 2000
Forum Veteran

SaxMaverick posted:

It's actually one of the most powerful methods for large scale genomic selection in animal/plant breeding (I'm in animal genetics, but I'm more of a molecular guy). Taking 50,000 genetic markers, and using matrix algebra to calculate breeding values and make selection decisions. Let me tell you about the thrill of waiting five hours to invert a 1,000x50,000 matrix.

General comments, nothing specific to fixing your code.

code:
open ( my $fasta_in , "<" , $file_in) || die $!;
You can open a file with a scalar filehandle, which gives it implicit scope. You don't bneed to close it, for example, it'll do so when it goes out of scope. Though you are using three arg open, that's good.

code:
if ($line =~/^>/){
		$key = $line;
		}

$fasta_hash{$key} .= uc($line);
...
my @scaffolds = keys %fasta_hash;

foreach (@scaffolds){  get_ORFs_of_210_nt($_,$fasta_hash{$_},$direction) }
What does %fasta_hash mean? Does it contain scaffolds? It's implicitly a hash, you say so with the sigil. I know it's the fasta file, but you've done some processing on it. Would something like the below make more sense?


code:
if ($line =~/^>/){
		my $scaffold = $line;
		}

$scaffolds{$scaffold} .= uc($line);
...

for my $scaffold (keys %scaffolds){  get_ORFs_of_210_nt($_,$fasta_hash{$_},$direction) }
Predeclaring variables is something I personally hate, as it leads to scope issues that bite you in the rear end. Declare them as you need them.

code:
for my $thingy (qw/foo bar baz/) {
  my $value = process_thingy($thingy);
  if ($value) {
    my $result = do_stuff_with_thingy($thingy);
  }
}
code:
	my @frame1starts;
	my @frame2starts;
	my @frame3starts;
	my @frame1stops;
	my @frame2stops;
	my @frame3stops;
	my $length1;
	my $start1;
	my $stop1;
	my $length2;
	my $start2;
	my $stop2;
	my $length3;
	my $start3;
	my $stop3;
This could probably be better handled with a data structure.
code:
my @framestops = [ [ qw/12 23 34/ ],
                   [ qw/32 43 54/ ],
                   [ qw/13 24 36/ ],
                 ];

for my $stop (@{$framestops[0]}) {
  ... ## do stuff with @frame1stops -- 12, 23, 24.
}
code:
foreach (@frame1starts){
		$start1 = shift(@frame1starts);
		$stop1 = shift (@frame1stops);
}
This is what i mean by scope issues. Every time through, if you don't assign to start1 or stop1, it'll use the previous values. Is this intended? You could do:

code:
foreach (@frame1starts){
		my $start = shift(@frame1starts);
		my $stop = shift (@frame1stops);
}
It's implicitly a 'start1' or 'stop1' because it's implicitly in frame1starts.


You're doing the same thing three times with three sets of global variables, differing only by the frame. What about something like:

code:
my @frame1starts = [ { start => 1,
                       stop  => 10,
                     }
                   ];

for my $frame (@frame1starts) {
  process_start(1, $frame);
}
for my $frame (@frame2starts) {
  process_start(2, $frame);
}
for my $frame (@frame3starts) {
  process_start(3, $frame);
}

sub process_start {
  my ($frame_no, $frame) = @_;
  my $length = $frame->{stop} - $frame->{start} + 1;
  if ($frame->{start} < $frame->{stop){
    if($length >= 210){
      print $stdout "Frame $frame_no:  ";
      ... #other stuff... I really can't figure out what you're trying to do
    }
  }
}
Aside from the oddity in iterating over an array and shifting from it inside the iteration. That's really odd, as mentioned by others.


Have you looked into BioPerl?

Ninja Rope
Oct 22, 2005

Wee.

Roseo posted:

You can open a file with a scalar filehandle, which gives it implicit scope. You don't bneed to close it, for example, it'll do so when it goes out of scope. Though you are using three arg open, that's good.

Doesn't it get closed when Perl GC's the variable? Which doesn't necessarily have to happen right as it goes out of scope, it could happen later (or possibly not until the program exits?). In practice it will close the filehandle pretty much right after it goes out of scope but I don't believe that's a guarantee. If you need the handle closed it's best to close it explicitly.

Roseo
Jun 1, 2000
Forum Veteran

Ninja Rope posted:

Doesn't it get closed when Perl GC's the variable? Which doesn't necessarily have to happen right as it goes out of scope, it could happen later (or possibly not until the program exits?). In practice it will close the filehandle pretty much right after it goes out of scope but I don't believe that's a guarantee. If you need the handle closed it's best to close it explicitly.

If you have circular references, yes, in which case you can use Scalar::Util's weaken. It's still better than bareword filehandles. Generally, it'll do the right thing.

welcome to hell
Jun 9, 2006
Perl uses reference counting for cleaning up variables. In the absence of circular or external references, variables are guaranteed to be cleaned up immediately when they go out of scope.

When a data structure or object has a (non-weakened) circular reference, it isn't guaranteed when it will be destroyed. In practice though, this will only happen if the references are manually broken, or on program exit. The documentation states that a more complete garbage collection strategy could be implemented, but there hasn't been any movement on that I'm aware of.

Ninja Rope
Oct 22, 2005

Wee.

Haarg posted:

variables are guaranteed to be cleaned up immediately when they go out of scope

Not that I don't believe you, but can you show me where it's documented? I'd like to know more.

uG
Apr 23, 2003

by Ralp
I'm having some XS troubles. First, let me present the code:
http://pastebin.com/SeMGSuA8

If you strip out the cxs_edistance function and the Perl headers, it will compile and return the correct result. This comes as a surprise to me because my Perl tests have been failing and I thought my C was bad.

As you can see, I hard coded the same values in the function main as I did to the end of cxs_edistance. This was based on a guess that the data was getting passed in wrong from my wrapper function, but proved that the problem lies elsewhere.

The error is a segmentation fault, and it happens the first time it hits line 58. I don't mean to make this into a 'debug my C code' post inside the Perl thread, especially since the C code appears to work on its own. Just kinda wondering where I should look next, since i've apparently been tweaking C code for no reason.

perl Makefile.PL and make test always fail (it never gives a reason, I believe the testing module crap out entirely on XS segfaults?), and a forced install (of this code as a module) always results in segfault for whatever script calls the exported function.

homercles
Feb 14, 2010

What happens if you declare your methods static? You're polluting the global namespace for no reason.

Adbot
ADBOT LOVES YOU

homercles
Feb 14, 2010

Ninja Rope posted:

Not that I don't believe you, but can you show me where it's documented? I'd like to know more.
There are some good quotes in the 5.14.* documentation. They have been removed in 5.16, I don't know why and care less as well.

Urls of interest are:
http://perldoc.perl.org/5.14.2/perlref.html

perlref 5.14.2 posted:

Hard references are smart--they keep track of reference counts for you, automatically freeing the thing referred to when its reference count goes to zero. (Reference counts for values in self-referential or cyclic data structures may not go to zero without a little help; see Two-Phased Garbage Collection in perlobj for a detailed explanation.) If that thing happens to be an object, the object is destructed. See perlobj for more about objects. (In a sense, everything in Perl is an object, but we usually reserve the word for references to objects that have been officially "blessed" into a class package.)

http://perldoc.perl.org/5.14.2/perlobj.html#Two-Phased-Garbage-Collection

That last link doesn't state the GC rules in English but it does explicitly enumerate them.

  • Locked thread