- Anaconda Rifle
- Mar 23, 2007
-
-
Yam Slacker
|
code:%bar = reverse %bar;
What do Perl newbies think this does?
|
#
?
Oct 9, 2012 20:10
|
|
- Adbot
-
ADBOT LOVES YOU
|
|
#
?
May 21, 2024 17:36
|
|
- Rohaq
- Aug 11, 2006
-
|
You worked that out without running it or Googling? Very nicely done! The "..|" stumped many people, including some experienced Perl Monks, when I used it as my Perl signature (back when such a thing was common... is it still?) Mostly, it was the capitalizing of the "A" in Ace that they couldn't get a grip on without running the code.
I had to look up \u on Perldocs, but I recognised it as a special escape character, that was about it. The ..| didn't phase me though
I love Perl. I especially love regular expressions in Perl.
Damned skippy. You can do it in almost any other language too, but Perl's built-in regex operators make it all so easy. Being able to do mass replacements with regex in a single one-liner is something I've not seen in other languages either.
I hear a lot of bad mouthing when it comes to Perl, and I can understand why; Perl lets you write some loving abysmal code, but it doesn't mean that you have to code loving abysmally.
In other news, I am becoming known as The Regex King in my workplace. If people want it parsed, transformed, extracted, or migrated from one system to another, they seem to be coming to me.
|
#
?
Oct 9, 2012 20:46
|
|
- Anaconda Rifle
- Mar 23, 2007
-
-
Yam Slacker
|
Damned skippy. You can do it in almost any other language too, but Perl's built-in regex operators make it all so easy. Being able to do mass replacements with regex in a single one-liner is something I've not seen in other languages either.
Last I checked, Ruby allows you to do it too, but it's not The Ruby Way.
In other news, I am becoming known as The Regex King in my workplace. If people want it parsed, transformed, extracted, or migrated from one system to another, they seem to be coming to me.
Same here. I still have people coming to me from previous jobs for help. If they would just put in a few hours to learn and then practice every now and then, they would see that regexes aren't magical or difficult.
|
#
?
Oct 9, 2012 20:50
|
|
- Rohaq
- Aug 11, 2006
-
|
Last I checked, Ruby allows you to do it too, but it's not The Ruby Way.
Ah, Ruby isn't a language I've looked into picking up. I am loving Python though, if only for its knack to make me feel utterly retarded when I write out masses of code, then end up reducing it into about 6 lines.
Same here. I still have people coming to me from previous jobs for help. If they would just put in a few hours to learn and then practice every now and then, they would see that regexes aren't magical or difficult.
They're not difficult to begin with, but I've found that no matter how well I think I know them, there's always something new to pick up.
My manager was looking at a regex I was using to extract info out of a log file with comma separated fields, and asked why I was using "([^,]+)," instead of "(.+?)," as a matching group. I explained that matching everything that wasn't a "," character greedily up until a "," was far more efficient than matching every character non-greedily up until the first instance of ",", and stepped him through the process in Regex Coach* to show it. Apparently he's been using Perl and regex for years and didn't realise this.
* Speaking of which, Regex Coach is by far my favourite regex tester, and is some seriously awesome software. It's not been updated in a long time though, and is missing some of the more recent features of the PCRE engine, like named groups, etc. Does anybody know of any other decent PCRE compatible regex checkers with the same level of functionality?
Rohaq fucked around with this message at 10:45 on Oct 10, 2012
|
#
?
Oct 9, 2012 21:09
|
|
- Anaconda Rifle
- Mar 23, 2007
-
-
Yam Slacker
|
My manager was looking at a regex I was using to extract info out of a log file with comma separated fields, and asked why I was using "([^,]+)," instead of "(.+?)," as a matching group. I explained that matching everything that wasn't a "," character greedily up until a "," was far more efficient than matching every character non-greedily up until the first instance of ",", and stepped him through the process in Regex Coach* to show it. Apparently he's been using Perl and regex for years and didn't realise this.
It's not just an efficiency thing. /"(.+?)",/ and /"([^"]+)",/ can get different results. It's subtle, but extremely important.
I changed your example intentionally to make a point.
|
#
?
Oct 9, 2012 21:14
|
|
- uG
- Apr 23, 2003
-
by Ralp
|
Question on module naming...
I have a module. xxx::xxx::damerau I want to make an automaton in it so I can feed it a list and ultimately not have to actually run the algorithm on every item in the list/tree.
My question is should I include this automaton in the module itself as a new method, or should I create a new module xxx:xxx::damerau::automaton, even though i'm pretty sure I won't be able to use the exact code in xxx:xxx::damerau in the automaton?
|
#
?
Oct 9, 2012 22:21
|
|
- tef
- May 30, 2004
-
-> some l-system crap ->
|
Oh, I did but I was just doing one-liners on the command line, that's funny
I learned about sort in a void context from a crazed intercal programmer.
|
#
?
Oct 10, 2012 01:45
|
|
- leedo
- Nov 28, 2000
-
|
It's not just an efficiency thing. /"(.+?)",/ and /"([^"]+)",/ can get different results. It's subtle, but extremely important.
I changed your example intentionally to make a point.
[^"] matches newlines by default whereas . doesn't? Took me a while of staring to even come up with a guess.
|
#
?
Oct 20, 2012 04:57
|
|
- ShoulderDaemon
- Oct 9, 2003
-
support goon fund
-
Taco Defender
|
[^"] matches newlines by default whereas . doesn't? Took me a while of staring to even come up with a guess.
The regex /"(.+?)",/ matches the string '""",' but the regex /"([^"]+)",/ does not.
|
#
?
Oct 20, 2012 06:06
|
|
- Anaconda Rifle
- Mar 23, 2007
-
-
Yam Slacker
|
Run /"(.+?)",/ and /"([^"]+)",/ against the following string:
a"b"c"d",
The first will return "b"c"d", and the second returns "d",.
|
#
?
Oct 20, 2012 11:43
|
|
- Rohaq
- Aug 11, 2006
-
|
I should have probably said; my regex contained no quote marks, I just put them in quote marks in my post.
But to my other question: Anybody got any favourite regex testers in here?
|
#
?
Oct 21, 2012 03:28
|
|
- Sebbe
- Feb 29, 2004
-
|
But to my other question: Anybody got any favourite regex testers in here?
I'm quite fond of RegexBuddy. I've use it so many times for debugging regexes, and it supports a ton of different regex implementations.
|
#
?
Oct 21, 2012 20:48
|
|
- The Gripper
- Sep 14, 2004
-
i am winner
|
I don't find myself testing regexps often enough to install anything, so I just use http://rubular.com/
|
#
?
Oct 21, 2012 20:51
|
|
- homercles
- Feb 14, 2010
-
|
If only there was a perl module that processed text files formatted as a csv. And if only there was an xs version of it so it ran quickly.
I wonder what that module would be called? All right all right it's a log file with comma-joined fields not a CSV but still...
homercles fucked around with this message at 21:16 on Oct 21, 2012
|
#
?
Oct 21, 2012 21:13
|
|
- Nevergirls
- Jul 4, 2004
-
It's not right living this way, not letting others know what's true and what's false.
|
I should have probably said; my regex contained no quote marks, I just put them in quote marks in my post.
But to my other question: Anybody got any favourite regex testers in here?
Regexp::Debugger, like any good DCONWAY module, is insane. It's brand new and I've only used it a couple times when my lovely one-liner isn't working (via rxrx). It allows you to step through what the engine is doing, not just get a match/non-match.
Here's a screencast from YAPC::NA:
https://www.youtube.com/watch?v=zcSFIUiMgAs
|
#
?
Oct 22, 2012 01:13
|
|
- Rohaq
- Aug 11, 2006
-
|
If only there was a perl module that processed text files formatted as a csv. And if only there was an xs version of it so it ran quickly.
I wonder what that module would be called? All right all right it's a log file with comma-joined fields not a CSV but still...
Yes yes, /Text::CSV(?:_XS)?/ would parse out comma separated values, but I also added support for multiple log formats as an input, and used named groups to pull out the relevant fields, so let's say I have the following:
STDIN:
code:blah1, blah2, thingiwant1, blah3, thingiwant2, thingiwant3
thingiwant1, thingiwant2, thingiwant3
I can do the following for much cleaner looking code than checking for the number of columns present etc:
code:my $re_longline_match = '^(?:[^,]+, ){2}(?<group1>[^,]+), [^,]+, (?<group2>[^,]+), (?<group3>.+)$';
my $re_shortline_match = '^(?<group1>[^,]+), (?<group2>[^,]+), (?<group3>[^,]+)$';
while (<STDIN>) {
if ( $_ =~ /$re_longline_match/ || $_ =~ /$re_shortline_match/ ) {
print 'Group 1: ', $+{group1}, "\n";
print 'Group 2: ', $+{group2}, "\n";
print 'Group 3: ', $+{group3}, "\n\n";
}
}
Interestingly, the regex above in $re_longline_match fails under Regexp::Debugger's rxrx, but works in a script.
I'm quite fond of RegexBuddy. I've use it so many times for debugging regexes, and it supports a ton of different regex implementations.
I'd love to try it, but I'm not paying money for a tool where I can't test its functionality first, money back guarantee or not.
Rohaq fucked around with this message at 03:20 on Oct 22, 2012
|
#
?
Oct 22, 2012 02:31
|
|
- Rohaq
- Aug 11, 2006
-
|
I like The Regex Coach but the newer versions are Windows only.
Edit: It's free.
Yep, that's my favourite too, but it's not been updated in ages, and lacks features like named groups, etc.
It's also donationware, and free for personal/non-commercial use only, so trying to get it approved at work is a bitch.
'I'd like this software for my job.'
'How much does it cost?'
'Err, however much you like? $5? $15? I don't know.'
|
#
?
Oct 23, 2012 08:06
|
|
- The Gripper
- Sep 14, 2004
-
i am winner
|
Yep, that's my favourite too, but it's not been updated in ages, and lacks features like named groups, etc.
It's also donationware, and free for personal/non-commercial use only, so trying to get it approved at work is a bitch.
'I'd like this software for my job.'
'How much does it cost?'
'Err, however much you like? $5? $15? I don't know.'
It can be worse than an annoyance, depending on policy. A joint I worked at wouldn't approve donationware use because they couldn't account for the cost without an actual invoice, a donation receipt wasn't enough.
|
#
?
Oct 23, 2012 08:18
|
|
- Rohaq
- Aug 11, 2006
-
|
It can be worse than an annoyance, depending on policy. A joint I worked at wouldn't approve donationware use because they couldn't account for the cost without an actual invoice, a donation receipt wasn't enough.
It's a shame too, because it's really loving useful software, even when you just need to extract some info into something useful.
'Yeah, we need you to map data 1 against data 2, all the mappings are in an Excel table.'
*Copies table, paste into regex coach*
*Tick 'm' for multiline*
Regex: /^([^\t]+)\t(.+)$/
Replace with: '\1' => '\2',
*Paste hash layout into Perl script, surround with %map_blah ( <map> );*
code:if ( exists $map_blah{$extracted_data} ) {
$extracted_data = $map_blah{$extracted_data};
} else {
warn "No mapping for blah: '$extracted_data'.\n";
}
Much faster than manually typing that poo poo out.
Rohaq fucked around with this message at 10:43 on Oct 23, 2012
|
#
?
Oct 23, 2012 08:29
|
|
- qntm
- Jun 17, 2009
-
|
A hash reference is a single scalar. Can you show some example code?
|
#
?
Oct 24, 2012 11:00
|
|
- uG
- Apr 23, 2003
-
by Ralp
|
Basically I have a function that takes a scalar (which returns a scalar) or an array reference (which returns a hash reference). Here is a test that tests passing a scalar: https://github.com/ugexe/Text--Levenshtein--Damerau/blob/master/t/02_dld.t
I'm trying to write a similar test for when an array is passed and a hash ref is returned. Here is code that doesn't work, but hopefully explains what i'm trying to do:
code:use Test::Base;
use Text::Levenshtein::Damerau;
plan tests => 1 * blocks;
my $tld = Text::Levenshtein::Damerau->new('four');
run {
my @input = $block->input;
# print $input[0] prints four
my @expected = $block->expected
# print $expected[0] prints 0
my $hashref_of_both = ...# Might be pushing it here due to ordering?
# print $hash_of_both->{'four'}; prints 0
# print $hash_of_both->{'fuor'}; prints 1
# print $hash_of_both->{'fourteen'}; prints 4
is( $tld->dld({ list => \@input }), $hashref_of_both )
}
__END__
=== test matching
--- input
four,fuor,fourteen # See below comment
--- expected
0,1,4 # This will be passed as a scalar naturally,
# but I want to pass it as a list
The only thing I can think of is to just split the strings and construct things myself in the run block, but I thought maybe Test::Base had better way.
edit: As much as I like Test::Base, I just wrote the tests with Test::More instead.
code:use Test::More tests => 8;
use Text::Levenshtein::Damerau;
my $tld = Text::Levenshtein::Damerau->new('four');
#Test scalar argument
cmp_ok( $tld->dld('four'), '==', 0, 'test helper matching');
cmp_ok( $tld->dld('for'), '==', 1, 'test helper insertion');
cmp_ok( $tld->dld('fourth'),'==', 2, 'test helper deletion');
cmp_ok( $tld->dld('fuor'), '==', 1, 'test helper transposition');
cmp_ok( $tld->dld('fxxr'), '==', 2, 'test helper substitution');
cmp_ok( $tld->dld('FOuR'), '==', 3, 'test helper case');
cmp_ok( $tld->dld(''), '==', 4, 'test helper empty');
#Test array reference argument
my @list = ('four','fourty','fourteen','');
is_deeply($tld->dld({ list => \@list }), { four => 0, fourty => 2, fourteen => 4, '', => 4 }, 'test dld(\@array_ref)');
uG fucked around with this message at 01:18 on Oct 25, 2012
|
#
?
Oct 24, 2012 19:56
|
|
- SaxMaverick
- Jun 9, 2005
-
The stuff of nightmares
|
Alright, I have held back asking this question because it was due last Friday, but after searching forever and no help from a douchebag TA, I have to know how I can make this work.
code:+#!/usr/bin/perl
+use warnings;
+use strict;
+
+my $file_in = $ARGV[0];
+my $key;
+my $direction = "Forward";
+my $rdirection = "Reverse";
+my %fasta_hash ;
+
+open ( FASTA_IN , "<" , $file_in) || die $!;
+open ( OUT , ">", "output.txt") || die $!;
+## parse fasta file into a hash ##
+while ( my $line = <FASTA_IN>){
+ chomp $line;
+ if ($line =~/^>/){
+ $key = $line;
+ }
+
+ else{
+ $fasta_hash{$key} .= uc($line);
+ chomp $line;
+ }
+ }
+
+close (FASTA_IN);
+my @scaffolds = keys %fasta_hash;
+
+foreach (@scaffolds){
+ get_ORFs_of_210_nt($_,$fasta_hash{$_},$direction);
+ get_rORFs_of_210_nt($_,$fasta_hash{$_},$rdirection);
+ }
+
+exit;
+
+sub get_ORFs_of_210_nt{
+ my $header = $_[0];
+ my $seq = $_[1];
+ my $dir = $_[2];
+ my @fstarts;
+ my @fstops;
+ my $codon;
+ my @frame1starts;
+ my @frame2starts;
+ my @frame3starts;
+ my @frame1stops;
+ my @frame2stops;
+ my @frame3stops;
+ my $length1;
+ my $start1;
+ my $stop1;
+ my $length2;
+ my $start2;
+ my $stop2;
+ my $length3;
+ my $start3;
+ my $stop3;
+
+ while ($seq =~/ATG/ig){
+ push (@fstarts , pos ($seq)-3);
+ }
+ while ($seq =~/TAG|TAA|N|TGA/ig){
+ push (@fstops , pos ($seq));
+ }
+ while (@fstarts){
+ my $codon = shift @fstarts;
+ if ($codon%3 == 0){
+ push @frame1starts, $codon;
+ }
+ elsif ($codon%3==1){
+ push @frame2starts, $codon;
+ }
+ elsif ($codon%3==2){
+ push @frame3starts, $codon;
+ }
+ else {
+ die "Illegal Modulus Return";
+ }
+ }
+ while (@fstops){
+ my $codon = shift @fstops;
+ if ($codon%3 == 0){
+ push @frame1stops, $codon-1;
+ }
+ elsif ($codon%3==1){
+ push @frame2stops, $codon-1;
+ }
+ elsif ($codon%3==2){
+ push @frame3stops, $codon-1;
+ }
+ else {
+ die "Illegal Modulus Return";
+ }
+ }
#####This is the problem section ##########
+ foreach (@frame1starts){
+ $start1 = shift(@frame1starts);
+ $stop1 = shift (@frame1stops);
+ $length1 = ($stop1-$start1)+1;
+ if ($start1 < $stop1){
+ if(($length1) >= 210){
+ print OUT "Frame 1: ",$header," Direction: ",$dir,"\n";
+ my $substring = substr($seq,$start1,$length1);
+ print OUT " ",$substring."\n";
+ $start1 = shift(@frame1starts);
+ $stop1 = shift(@frame1stops);
+ }
+ elsif(($length1) < 210 && ($length1) > 0){
+ $stop1 = shift(@frame1stops);
+ }
+ }
+ else{
+ $stop1=shift(@frame1stops);
+ }
+ }
######## Below this is the same procedure as above, for next frame #######
+ foreach (@frame2starts){
+ $start2 = shift(@frame2starts);
+ $stop2 = shift (@frame2stops);
+ $length2 = ($stop2-$start2)+1;
+ if ($start2 < $stop2){
+ if(($length2) >= 210){
+ print OUT "Frame 2: ",$header," Direction: ",$dir,"\n";
+ my $substring = substr($seq,$start2,$length2);
+ print OUT " ",$substring."\n";
+ $stop2 = shift(@frame2stops);
+ }
+ elsif(($length2) < 210 && ($length2) > 0){
+ $stop2 = shift(@frame2stops);
+ }
+ }
+ else{
+ $stop2=shift(@frame2stops);
+ }
+ }
+ foreach (@frame3starts){
+ $start3 = shift(@frame3starts);
+ $stop3 = shift (@frame3stops);
+ $length3 = ($stop3-$start3)+1;
+ if ($start3 < $stop3){
+ if(($length3) >= 210){
+ print OUT "Frame 3: ",$header," Direction: ",$dir,"\n";
+ my $substring = substr($seq,$start3,$length3);
+ print OUT " ",$substring."\n";
+ $stop3 = shift(@frame3stops);
+ }
+ elsif(($length3) < 210 && ($length3) > 0){
+ $stop3 = shift(@frame3stops);
+ }
+ }
+ else{
+ $stop3=shift(@frame3stops);
+ }
+ }
+}
+
########this is the 2nd sub called, almost exactly identical except #######
########the sequence read in is reversed and translated to the complement #######
+sub get_rORFs_of_210_nt{
+ my $rheader = $_[0];
+ my $rseq = $_[1];
+ my $rdir = $_[2];
+ my @rfstarts;
+ my @rfstops;
+ my $rcodon;
+ my @rframe1starts;
+ my @rframe2starts;
+ my @rframe3starts;
+ my @rframe1stops;
+ my @rframe2stops;
+ my @rframe3stops;
+ my $rlength1;
+ my $rstart1;
+ my $rstop1;
+ my $rlength2;
+ my $rstart2;
+ my $rstop2;
+ my $rlength3;
+ my $rstart3;
+ my $rstop3;
+
+ $rseq = reverse($rseq);
+ $rseq = ~tr/ATGCatgc/TACGtagc/;
#####the rest is the same as the first sub######
The subroutine in question is supposed to function as such:
- The starts and stops for each frame are put into corresponding arrays
- The first start and stop needs to be shifted into a scalar of some sort - this is the first spot I have trouble with
- If the start is before the stop and at least 210 base pairs away, then it is printed along with the substring - this works fine.
- BOTH the start and stop are shifted, and the loop should start over
- If the start is prior to the stop, but less than 210 away, only the stop should shift, then start loop again
- If the start is after the stop, shift the stop and re-loop as before
I'm getting different errors no matter which way I try. I still am awful with handling uninitialized variables - when I get the code to actually work, it is only due to minutes of scrolling unit warning messages. Also, I don't think I can get the arrays to loop correctly and shift - should I foreach the stops, or while for stops?
Don't need an answer or anything like that, but everyone I have talked to has been useless. Just some hints would be amazing.
|
#
?
Oct 29, 2012 21:58
|
|
- Rohaq
- Aug 11, 2006
-
|
Did you copy that out of a unified diff or something? What's with all the pluses?
Other than that, please don't dick around with "#####the rest is the same as the first sub######" when trying to post a code example, because using 'the same as the first sub' after stripping those pluses out results in the following:
code:Global symbol "$seq" requires explicit package name at wtf.pl line 186.
Global symbol "@fstarts" requires explicit package name at wtf.pl line 187.
Global symbol "$seq" requires explicit package name at wtf.pl line 187.
Global symbol "$seq" requires explicit package name at wtf.pl line 189.
Global symbol "@fstops" requires explicit package name at wtf.pl line 190.
Global symbol "$seq" requires explicit package name at wtf.pl line 190.
Global symbol "@fstarts" requires explicit package name at wtf.pl line 192.
Global symbol "@fstarts" requires explicit package name at wtf.pl line 193.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 195.
Global symbol "@frame2starts" requires explicit package name at wtf.pl line 198.
Global symbol "@frame3starts" requires explicit package name at wtf.pl line 201.
Global symbol "@fstops" requires explicit package name at wtf.pl line 207.
Global symbol "@fstops" requires explicit package name at wtf.pl line 208.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 210.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 213.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 216.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 223.
Global symbol "$start1" requires explicit package name at wtf.pl line 224.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 224.
Global symbol "$stop1" requires explicit package name at wtf.pl line 225.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 225.
Global symbol "$length1" requires explicit package name at wtf.pl line 226.
Global symbol "$stop1" requires explicit package name at wtf.pl line 226.
Global symbol "$start1" requires explicit package name at wtf.pl line 226.
Global symbol "$start1" requires explicit package name at wtf.pl line 227.
Global symbol "$stop1" requires explicit package name at wtf.pl line 227.
Global symbol "$length1" requires explicit package name at wtf.pl line 228.
Global symbol "$header" requires explicit package name at wtf.pl line 229.
Global symbol "$dir" requires explicit package name at wtf.pl line 229.
Global symbol "$seq" requires explicit package name at wtf.pl line 230.
Global symbol "$start1" requires explicit package name at wtf.pl line 230.
Global symbol "$length1" requires explicit package name at wtf.pl line 230.
Global symbol "$start1" requires explicit package name at wtf.pl line 232.
Global symbol "@frame1starts" requires explicit package name at wtf.pl line 232.
Global symbol "$stop1" requires explicit package name at wtf.pl line 233.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 233.
Global symbol "$length1" requires explicit package name at wtf.pl line 235.
Global symbol "$length1" requires explicit package name at wtf.pl line 235.
Global symbol "$stop1" requires explicit package name at wtf.pl line 236.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 236.
Global symbol "$stop1" requires explicit package name at wtf.pl line 240.
Global symbol "@frame1stops" requires explicit package name at wtf.pl line 240.
Global symbol "@frame2starts" requires explicit package name at wtf.pl line 244.
Global symbol "$start2" requires explicit package name at wtf.pl line 245.
Global symbol "@frame2starts" requires explicit package name at wtf.pl line 245.
Global symbol "$stop2" requires explicit package name at wtf.pl line 246.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 246.
Global symbol "$length2" requires explicit package name at wtf.pl line 247.
Global symbol "$stop2" requires explicit package name at wtf.pl line 247.
Global symbol "$start2" requires explicit package name at wtf.pl line 247.
Global symbol "$start2" requires explicit package name at wtf.pl line 248.
Global symbol "$stop2" requires explicit package name at wtf.pl line 248.
Global symbol "$length2" requires explicit package name at wtf.pl line 249.
Global symbol "$header" requires explicit package name at wtf.pl line 250.
Global symbol "$dir" requires explicit package name at wtf.pl line 250.
Global symbol "$seq" requires explicit package name at wtf.pl line 251.
Global symbol "$start2" requires explicit package name at wtf.pl line 251.
Global symbol "$length2" requires explicit package name at wtf.pl line 251.
Global symbol "$stop2" requires explicit package name at wtf.pl line 253.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 253.
Global symbol "$length2" requires explicit package name at wtf.pl line 255.
Global symbol "$length2" requires explicit package name at wtf.pl line 255.
Global symbol "$stop2" requires explicit package name at wtf.pl line 256.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 256.
Global symbol "$stop2" requires explicit package name at wtf.pl line 260.
Global symbol "@frame2stops" requires explicit package name at wtf.pl line 260.
Global symbol "@frame3starts" requires explicit package name at wtf.pl line 263.
Global symbol "$start3" requires explicit package name at wtf.pl line 264.
Global symbol "@frame3starts" requires explicit package name at wtf.pl line 264.
Global symbol "$stop3" requires explicit package name at wtf.pl line 265.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 265.
Global symbol "$length3" requires explicit package name at wtf.pl line 266.
Global symbol "$stop3" requires explicit package name at wtf.pl line 266.
Global symbol "$start3" requires explicit package name at wtf.pl line 266.
Global symbol "$start3" requires explicit package name at wtf.pl line 267.
Global symbol "$stop3" requires explicit package name at wtf.pl line 267.
Global symbol "$length3" requires explicit package name at wtf.pl line 268.
Global symbol "$header" requires explicit package name at wtf.pl line 269.
Global symbol "$dir" requires explicit package name at wtf.pl line 269.
Global symbol "$seq" requires explicit package name at wtf.pl line 270.
Global symbol "$start3" requires explicit package name at wtf.pl line 270.
Global symbol "$length3" requires explicit package name at wtf.pl line 270.
Global symbol "$stop3" requires explicit package name at wtf.pl line 272.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 272.
Global symbol "$length3" requires explicit package name at wtf.pl line 274.
Global symbol "$length3" requires explicit package name at wtf.pl line 274.
Global symbol "$stop3" requires explicit package name at wtf.pl line 275.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 275.
Global symbol "$stop3" requires explicit package name at wtf.pl line 279.
Global symbol "@frame3stops" requires explicit package name at wtf.pl line 279.
wtf.pl had compilation errors.
Because the second sub doesn't have the same variable names, and I'm not going to do a selective find and replace to fix it.
Give us something we don't have to hack together to get running, and maybe we can offer some advice?
|
#
?
Oct 29, 2012 22:09
|
|
- uG
- Apr 23, 2003
-
by Ralp
|
The first thing i'd do is make it so you aren't shifting a loop's array multiple times in a single iteration.
Also, the foreach loop you are using is making a copy of the array in memory, so I don't think your shifts are going to affect the loop like you think. I imagine you want a while loop.
uG fucked around with this message at 01:24 on Oct 30, 2012
|
#
?
Oct 30, 2012 01:19
|
|
- The Gripper
- Sep 14, 2004
-
i am winner
|
Shifting values off the array you're iterating over isn't a good idea:
Perl code:my @x = (1,2,3,4,5,6);
foreach (@x) {
my $y = shift @x;
print $y."\n";
}
#Output:
#1
#2
#3
The foreach will keep track of the index it's up to and if the array has been shortened by shift it will end (current index > length of array).
I don't actually know/can't tell what the intended result is of those loops because of the brokenness of them, but if you're hellbent on consuming the array you could use:
Perl code:my @x = (1,2,3,4,5,6);
my @y = (7,8,9,10,11,12);
while (my $start1 = shift @x) {
my $stop1 = shift @y;
print "start: $start1 end: $stop1\n";
}
#Output:
#start: 1 end: 7
#start: 2 end: 8
#start: 3 end: 9
#start: 4 end: 10
#start: 5 end: 11
#start: 6 end: 12
but even then I don't know why you're shifting off @frame2stops a second time so I can't offer any advice past that. Maybe it will just work?
The Gripper fucked around with this message at 01:52 on Oct 30, 2012
|
#
?
Oct 30, 2012 01:49
|
|
- SaxMaverick
- Jun 9, 2005
-
The stuff of nightmares
|
I apologize for the bad code and how I formatted it, maybe let me break it down a little better, then I will fix up my complete code. Is there a better way to post it than pasting it all here?
|
#
?
Oct 30, 2012 16:43
|
|
- Rohaq
- Aug 11, 2006
-
|
PasteBin or similar, plus you can include syntax highlighting.
|
#
?
Oct 30, 2012 22:28
|
|
- SaxMaverick
- Jun 9, 2005
-
The stuff of nightmares
|
Thanks. As you can tell this is bioinformatics being taught by biology trained people, so we're learning syntax/regex as its needed, instead of incrementally. It's not majorly important in the long run, but since the other section of the course is in FORTRAN (which I spent four weeks on this summer), I went ahead and got a good book on perl to teach myself instead, because basically, anything is better than fortran.
|
#
?
Oct 31, 2012 02:31
|
|
- The Gripper
- Sep 14, 2004
-
i am winner
|
Does anyone actually use Fortran unless they're forced to, anymore?
Also I just noticed that your tr/x/y/ line is probably wrong unless you want lowercase "gc" to be untouched.
|
#
?
Oct 31, 2012 04:56
|
|
- SaxMaverick
- Jun 9, 2005
-
The stuff of nightmares
|
Does anyone actually use Fortran unless they're forced to, anymore?
Also I just noticed that your tr/x/y/ line is probably wrong unless you want lowercase "gc" to be untouched.
It's actually one of the most powerful methods for large scale genomic selection in animal/plant breeding (I'm in animal genetics, but I'm more of a molecular guy). Taking 50,000 genetic markers, and using matrix algebra to calculate breeding values and make selection decisions. Let me tell you about the thrill of waiting five hours to invert a 1,000x50,000 matrix.
|
#
?
Oct 31, 2012 13:51
|
|
- Roseo
- Jun 1, 2000
-
Forum Veteran
|
It's actually one of the most powerful methods for large scale genomic selection in animal/plant breeding (I'm in animal genetics, but I'm more of a molecular guy). Taking 50,000 genetic markers, and using matrix algebra to calculate breeding values and make selection decisions. Let me tell you about the thrill of waiting five hours to invert a 1,000x50,000 matrix.
General comments, nothing specific to fixing your code.
code:open ( my $fasta_in , "<" , $file_in) || die $!;
You can open a file with a scalar filehandle, which gives it implicit scope. You don't bneed to close it, for example, it'll do so when it goes out of scope. Though you are using three arg open, that's good.
code:if ($line =~/^>/){
$key = $line;
}
$fasta_hash{$key} .= uc($line);
...
my @scaffolds = keys %fasta_hash;
foreach (@scaffolds){ get_ORFs_of_210_nt($_,$fasta_hash{$_},$direction) }
What does %fasta_hash mean? Does it contain scaffolds? It's implicitly a hash, you say so with the sigil. I know it's the fasta file, but you've done some processing on it. Would something like the below make more sense?
code:if ($line =~/^>/){
my $scaffold = $line;
}
$scaffolds{$scaffold} .= uc($line);
...
for my $scaffold (keys %scaffolds){ get_ORFs_of_210_nt($_,$fasta_hash{$_},$direction) }
Predeclaring variables is something I personally hate, as it leads to scope issues that bite you in the rear end. Declare them as you need them.
code:for my $thingy (qw/foo bar baz/) {
my $value = process_thingy($thingy);
if ($value) {
my $result = do_stuff_with_thingy($thingy);
}
}
code: my @frame1starts;
my @frame2starts;
my @frame3starts;
my @frame1stops;
my @frame2stops;
my @frame3stops;
my $length1;
my $start1;
my $stop1;
my $length2;
my $start2;
my $stop2;
my $length3;
my $start3;
my $stop3;
This could probably be better handled with a data structure.
code:my @framestops = [ [ qw/12 23 34/ ],
[ qw/32 43 54/ ],
[ qw/13 24 36/ ],
];
for my $stop (@{$framestops[0]}) {
... ## do stuff with @frame1stops -- 12, 23, 24.
}
code:foreach (@frame1starts){
$start1 = shift(@frame1starts);
$stop1 = shift (@frame1stops);
}
This is what i mean by scope issues. Every time through, if you don't assign to start1 or stop1, it'll use the previous values. Is this intended? You could do:
code:foreach (@frame1starts){
my $start = shift(@frame1starts);
my $stop = shift (@frame1stops);
}
It's implicitly a 'start1' or 'stop1' because it's implicitly in frame1starts.
You're doing the same thing three times with three sets of global variables, differing only by the frame. What about something like:
code:my @frame1starts = [ { start => 1,
stop => 10,
}
];
for my $frame (@frame1starts) {
process_start(1, $frame);
}
for my $frame (@frame2starts) {
process_start(2, $frame);
}
for my $frame (@frame3starts) {
process_start(3, $frame);
}
sub process_start {
my ($frame_no, $frame) = @_;
my $length = $frame->{stop} - $frame->{start} + 1;
if ($frame->{start} < $frame->{stop){
if($length >= 210){
print $stdout "Frame $frame_no: ";
... #other stuff... I really can't figure out what you're trying to do
}
}
}
Aside from the oddity in iterating over an array and shifting from it inside the iteration. That's really odd, as mentioned by others.
Have you looked into BioPerl?
|
#
?
Nov 1, 2012 01:57
|
|
- Ninja Rope
- Oct 22, 2005
-
Wee.
|
You can open a file with a scalar filehandle, which gives it implicit scope. You don't bneed to close it, for example, it'll do so when it goes out of scope. Though you are using three arg open, that's good.
Doesn't it get closed when Perl GC's the variable? Which doesn't necessarily have to happen right as it goes out of scope, it could happen later (or possibly not until the program exits?). In practice it will close the filehandle pretty much right after it goes out of scope but I don't believe that's a guarantee. If you need the handle closed it's best to close it explicitly.
|
#
?
Nov 1, 2012 06:40
|
|
- Roseo
- Jun 1, 2000
-
Forum Veteran
|
Doesn't it get closed when Perl GC's the variable? Which doesn't necessarily have to happen right as it goes out of scope, it could happen later (or possibly not until the program exits?). In practice it will close the filehandle pretty much right after it goes out of scope but I don't believe that's a guarantee. If you need the handle closed it's best to close it explicitly.
If you have circular references, yes, in which case you can use Scalar::Util's weaken. It's still better than bareword filehandles. Generally, it'll do the right thing.
|
#
?
Nov 1, 2012 13:41
|
|
- welcome to hell
- Jun 9, 2006
-
|
Perl uses reference counting for cleaning up variables. In the absence of circular or external references, variables are guaranteed to be cleaned up immediately when they go out of scope.
When a data structure or object has a (non-weakened) circular reference, it isn't guaranteed when it will be destroyed. In practice though, this will only happen if the references are manually broken, or on program exit. The documentation states that a more complete garbage collection strategy could be implemented, but there hasn't been any movement on that I'm aware of.
|
#
?
Nov 1, 2012 20:41
|
|
- Ninja Rope
- Oct 22, 2005
-
Wee.
|
variables are guaranteed to be cleaned up immediately when they go out of scope
Not that I don't believe you, but can you show me where it's documented? I'd like to know more.
|
#
?
Nov 1, 2012 23:29
|
|
- uG
- Apr 23, 2003
-
by Ralp
|
I'm having some XS troubles. First, let me present the code:
http://pastebin.com/SeMGSuA8
If you strip out the cxs_edistance function and the Perl headers, it will compile and return the correct result. This comes as a surprise to me because my Perl tests have been failing and I thought my C was bad.
As you can see, I hard coded the same values in the function main as I did to the end of cxs_edistance. This was based on a guess that the data was getting passed in wrong from my wrapper function, but proved that the problem lies elsewhere.
The error is a segmentation fault, and it happens the first time it hits line 58. I don't mean to make this into a 'debug my C code' post inside the Perl thread, especially since the C code appears to work on its own. Just kinda wondering where I should look next, since i've apparently been tweaking C code for no reason.
perl Makefile.PL and make test always fail (it never gives a reason, I believe the testing module crap out entirely on XS segfaults?), and a forced install (of this code as a module) always results in segfault for whatever script calls the exported function.
|
#
?
Nov 2, 2012 17:59
|
|
- homercles
- Feb 14, 2010
-
|
What happens if you declare your methods static? You're polluting the global namespace for no reason.
|
#
?
Nov 3, 2012 04:58
|
|
- Adbot
-
ADBOT LOVES YOU
|
|
#
?
May 21, 2024 17:36
|
|
- homercles
- Feb 14, 2010
-
|
Not that I don't believe you, but can you show me where it's documented? I'd like to know more.
There are some good quotes in the 5.14.* documentation. They have been removed in 5.16, I don't know why and care less as well.
Urls of interest are:
http://perldoc.perl.org/5.14.2/perlref.html
perlref 5.14.2 posted:
Hard references are smart--they keep track of reference counts for you, automatically freeing the thing referred to when its reference count goes to zero. (Reference counts for values in self-referential or cyclic data structures may not go to zero without a little help; see Two-Phased Garbage Collection in perlobj for a detailed explanation.) If that thing happens to be an object, the object is destructed. See perlobj for more about objects. (In a sense, everything in Perl is an object, but we usually reserve the word for references to objects that have been officially "blessed" into a class package.)
http://perldoc.perl.org/5.14.2/perlobj.html#Two-Phased-Garbage-Collection
That last link doesn't state the GC rules in English but it does explicitly enumerate them.
|
#
?
Nov 3, 2012 05:25
|
|