Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Pollyanna
Mar 5, 2005

Milk's on them.


Lexical Unit posted:

Python code:
In [1]: a = ['a', 'b', 'c', 0, 'd', 'e', 'f', 0, 'g', 'h', 'i']

In [2]: b = [a for a in a if a != 0]

In [3]: b
Out[3]: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
Like that?

Mmm, more like:

Python code:
[u'\n', <h1>The Song</h1>, u'\n', <hr/>,
u'\n', <h3 align="center">The oo song</h3>, u'\n', <br/>, <br/>, u'\n', <blockquote>
<i>It's gonna be cold<br/>
	There may even be oo</i>
</blockquote>, u'\n', <p>
</p>, <hr/>, u'\n', <a href="index.html"><i>&lt;&lt;&lt;</i></a>, u' |\n',
<a href="mailto:oo@alcyone.com"><i>oo@alcyone.com</i></a>, u'\n']
and what I need is the string between both <hr/> tags. In your example, I'm looking for [d, e, f].

Or maybe there's an easier way to do this, does anyone know Beautiful Soup :confused:

Adbot
ADBOT LOVES YOU

Dominoes
Sep 20, 2007

accipter posted:

Instead of telling the GUI to update, emit a signal from the model/data class. On the GUI side of things, connect to the signal of the model/data class and update the GUI as needed.
Thanks, that's going to work. I didn't consider defining the signal in a module other than the primary one. I'm leaving the connect() part in the primary module, and defining and calling the signal in the child module.

edit: Got it working. It appears that the signal needs to be created in a QObject class in the child module to avoid unbounded signal issues. It needs to be referenced from an instance of this class.

Dominoes fucked around with this message at 03:09 on Aug 29, 2013

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Pollyanna posted:

Say I have a list like [a, b, c, 0, d, e, f, 0, g, h, i]. How do I extract all elements between instances of 0?

Python code:
In [2]: a = ['a', 'b', 'c', 0, 'd', 'e', 'f', 0, 'g', 'h', 'i']

In [3]: a[a.index(0) + 1 : a[a.index(0) + 1:].index(0)+a.index(0)+1]
Out[3]: ['d', 'e', 'f']
Don't do this, as it's inefficient, but I just felt like doing a one liner.

Ok, not a one-liner...though it could be. I converted the 0's to strings because I see in your example with real data they are strings.
Python code:
In [16]: a = ['a', 'b', 'c', '0', 'd', 'e', 'f', '0', 'g', 'h', 'i']

In [17]: astr=''.join(a)

In [18]: astr
Out[18]: 'abc0def0ghi'

In [19]: astr[astr.index('0') + 1:astr.rindex('0')]
Out[19]: 'def'
TBH, I think you should be able to do this straight with BS.

Thermopyle fucked around with this message at 01:06 on Aug 29, 2013

FoiledAgain
May 6, 2007

Pollyanna posted:

Say I have a list like [a, b, c, 0, d, e, f, 0, g, h, i]. How do I extract all elements between instances of 0?

Kinda ugly but:

code:
marker = 0
iterable = ['a', 'b', 'c', 0, 'd', 'e', 'f', 0, 'g', 'h', 'i']
keep = list()
found_marker = False
for item in iterable:
    if item == marker:
        if not found_marker:
            found_marker = True
        else:
            break
    else:
        if found_marker:
            keep.append(item)

print keep
['d','e','f']

Pollyanna
Mar 5, 2005

Milk's on them.


I think BS works by taking the contents of a set of tags, and I can't make it take two <hr/> tags as input...I think. Maybe I'm approaching this wrong...there has to be an easier way to do this, I just need to find out how :bang:

Maybe if I can get it to output everything in <body> tags that is not between <a> or <h1> or anything...

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Thermopyle posted:

Python code:
In [2]: a = ['a', 'b', 'c', 0, 'd', 'e', 'f', 0, 'g', 'h', 'i']

In [3]: a[a.index(0) + 1 : a[a.index(0) + 1:].index(0)+a.index(0)+1]
Out[3]: ['d', 'e', 'f']
Don't do this, as it's inefficient, but I just felt like doing a one liner.

code:
a[a.index(0) + 1 : -(list(reversed(a)).index(0) + 1)]
e:

code:
a[a.index(0) + 1 : -(a[::-1].index(0) + 1)]

Hammerite fucked around with this message at 01:50 on Aug 29, 2013

suffix
Jul 27, 2013

Wheeee!

Pollyanna posted:

and what I need is the string between both <hr/> tags. In your example, I'm looking for [d, e, f].

Or maybe there's an easier way to do this, does anyone know Beautiful Soup :confused:

Start at the first <hr/> tag and keep going until you hit the other?
e.g.
Python code:
def element_is_not_hr(element):
    return element.name != 'hr'

important_stuff = itertools.takewhile(element_is_not_hr, document.hr.next_siblings)

breaks
May 12, 2001

FoiledAgain posted:

Supposing the first element fails the test, the program still has to access that list 5001 time before realizing that, right? It appends 5000 elements to the list first, passes that list to all(), checks 1 element and returns False. Worst case, I have to append 5000 times and then check 5000 times.

If I do this with a generator, I can avoid this problem. I think. Suppose the first item fails the test. all((test(x) for x in X)) would create one generator object, not a list of 5000 items, pass that generator to all(),and then return false after only checking 1 thing. Worst case it only checks 5000 things.

Am I understanding correctly?

Yes, you have got the idea. If you use a list comprehension it must determine the value of every element in the list before it does anything with it. A generator determines its values as it goes, so there is no need to figure it all out in advances, which is advantageous if figuring it out takes a while.

To put it another way this: [test(x) for x in whatever]

Is effectively (and perhaps literally, I forget) the same thing as this: list(test(x) for x in whatever)

You can also see there that you do not need extra parens to create a generator expression in cases where they are immediately inside other parens, you can just do all(test(x) for x in X).

keanu
Jul 27, 2013

Pollyanna posted:

Say I have a list like [a, b, c, 0, d, e, f, 0, g, h, i]. How do I extract all elements between instances of 0?

Python code:
>>> a = ['a', 'b', 'c', 0, 'd', 'e', 'f', 0, 'g', 'h', 'i']
>>> b = a[a.index(0)+1:]
>>> c = b[:b.index(0)]
>>> c
['d', 'e', 'f']
>>> 
This only works if you know that there are two 0 elements in the list, of course.

keanu fucked around with this message at 03:34 on Aug 29, 2013

Pollyanna
Mar 5, 2005

Milk's on them.


I'll start trying to get the text directly from the elements for now, before I try and tackle a string.

suffix posted:

Start at the first <hr/> tag and keep going until you hit the other?
e.g.
Python code:
def element_is_not_hr(element):
    return element.name != 'hr'

important_stuff = itertools.takewhile(element_is_not_hr, document.hr.next_siblings)

Thing is that I don't think BS takes "document.hr" as valid, at least not when I've tried to do so.

I know that contents outputs a list, so I tried using this on that list. Unfortunately, it gives me this:

Python code:
<itertools.takewhile object at 0x10163e560>
What is this?

edit: I think takewhile does something different than what I need, it looks like the way it's written right now it takes everything that is NOT <hr/>, which is not quite what I need. I need it to not take anything until it FINDS an <hr/>, and then stop once it sees an <hr/> again... lemme mull over this some more.

edit2: i swear i'm missing something really obvious here

Pollyanna fucked around with this message at 05:39 on Aug 29, 2013

FoiledAgain
May 6, 2007

Pollyanna posted:

edit: I think takewhile does something different than what I need, it looks like the way it's written right now it takes everything that is NOT <hr/>, which is not quite what I need. I need it to not take anything until it FINDS an <hr/>, and then stop once it sees an <hr/> again... lemme mull over this some more.

Out of curiosity, why didn't my solution earlier work? If you have a single symbol that's "bookending" your desired elements within the list, it should do.

edit: oh I see below now it's actually a way more complicated issue, not just finding a sequence of things in a list.

breaks posted:

Yes, you have got the idea. If you use a list comprehension it must determine the value of every element in the list before it does anything with it. A generator determines its values as it goes, so there is no need to figure it all out in advances, which is advantageous if figuring it out takes a while.

Oh good. Thank you!

FoiledAgain fucked around with this message at 08:02 on Aug 29, 2013

piratepilates
Mar 28, 2004

So I will learn to live with it. Because I can live with it. I can live with it.



Pollyanna posted:

I'll start trying to get the text directly from the elements for now, before I try and tackle a string.


Thing is that I don't think BS takes "document.hr" as valid, at least not when I've tried to do so.

I know that contents outputs a list, so I tried using this on that list. Unfortunately, it gives me this:

Python code:
<itertools.takewhile object at 0x10163e560>
What is this?

edit: I think takewhile does something different than what I need, it looks like the way it's written right now it takes everything that is NOT <hr/>, which is not quite what I need. I need it to not take anything until it FINDS an <hr/>, and then stop once it sees an <hr/> again... lemme mull over this some more.

edit2: i swear i'm missing something really obvious here

I know the background of what you're doing here since I read that other thread.

That site you're trying to parse is kinda shittily written in that it doesn't do a good job of using the HTML DOM -- it has more than one h1 elements (allowed but not recommended, the different types of headers are supposed to be sublevels starting with an h1 at the top), doesn't use anything like class to describe what each tag is supposed to represent, and separates content only by <hr/> tags instead of nested tags.

Beautiful Soup lets you move sideways on tags by using the .next_sibling and .previous_sibling functions on a tag, there's also the .next_siblings and .previous_siblings functions that are iterators over the range of elements it can go next/previous to (an iterator is just an object that's purpose is to move through a collection/object, like an iterator for a list would typically just iterate through each element one by one, that code you posted in angular brackets is just python's print formatting of an object that doesn't have a toString method so it prints a way of identifying what the object is instead of what the object represents itself as).

If the site was laid out better you could just do something like:

code:
for element in main_content.next_siblings:
    print(element)
But it doesn't do that so well so you'll have to do something like find the first <hr/> tag and then use .next_element until you hit another <hr/> tag, like so:

code:
start_tag = soup.find("hr")
element = start_tag
while (element.next_siblings and element.name != "hr")
    print(element)
But probably putting the effort into testing that to make sure it works since I didn't.

edit: also itertools.takewhile is the better choice of thing to use than whatever I just came up with.

edit: itertools.takewhile (and everything else in itertools I'm guessing) returns an iterator object instead of a collection of the things it iterates on, to get the next element in an iterator you use "iterator.next()", typically there would also be a method for an iterator called something like hasNext() so you know when the iterator can't iterate no more but python doesn't seem to have that? And instead just raises a StopIteration exception you have to catch?

So it's something like this I guess:

code:
important_stuff = itertools.takewhile(lambda element: element.name != "hr", document.hr.next_siblings)

try:
	while element = important_stuff.next():
		print element
except StopIteration:
	print "Iteration finished."
edit: oh yeah since you're just going through the iterator forwards and straight through you can just use a for loop

code:
important_stuff = itertools.takewhile(lambda element: element.name != "hr", document.hr.next_siblings)

for element in important_stuff:
	print element

piratepilates fucked around with this message at 07:40 on Aug 29, 2013

SirPablo
May 1, 2004

Pillbug
So I'm working on extracting data (1600) from a tarball with 86,000 files and clocking in at 2.4GB. I've got a list of the station I want to extract but the process is s l o w. Here is my code, am I missing a quicker way of doing this? I think the slow part is the line with extractfile.

code:
# Loop through all stations
for n in stations[:,0]:
    try:
        
        # Extract from tarball
        print 'Starting ',n
        f = T.extractfile('ghcnd_all/%s.dly'%n).readlines()

        # Find where TMAX is
        rows = [i for i, l in enumerate(f) if 'TMAX' in l]

        # Set some vars for ingesting into array
        flags = [11, 4, 2, 4, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
                 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
                 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
                 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
                 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
                 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
                 5, 1, 1, 1, 5, 1, 1, 1]
        cols = [1, 2, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56,
                60, 64, 68, 72, 76, 80, 84, 88, 92, 96, 100, 104, 108, 112, 116, 120, 124]

        # Now load the entire file into a structured array
        d = np.genfromtxt(f, delimiter=flags, usecols=cols)
        
        # Strip out only the TMAX lines
        d = np.array(d[rows], dtype=int)
        
        # Save array
        np.save(n, d)
        print 'Finished', n
    
    except: print 'Error with ',n

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

SirPablo posted:

am I missing a quicker way of doing this?

Yes. Tar files don't provide random access to their contents; to get at a certain file you have to read the tar file up to its position. This can be optimized by seeking from one tar entry to the next, but you're still taking O(n2) time to do something that can be done in O(n). You should iterate over the TarInfo with TarFile.next() and extract each entry if it matches your criteria.

Nippashish
Nov 2, 2005

Let me see you dance!
I can't tell what proportion of the lines have TMAX in them, but if you're ignoring most of the them it will probably be a lot faster to select those lines outside of python with grep or similar and create files that have only the lines you want for python. The speed difference between grepping 2.4GB of text and looping line by line over 2.4GB of text in python is quite large. This might help even if you are using most of the lines, since it would avoid doing two passes through each file in python.

SirPablo
May 1, 2004

Pillbug

Lysidas posted:

Yes. Tar files don't provide random access to their contents; to get at a certain file you have to read the tar file up to its position. This can be optimized by seeking from one tar entry to the next, but you're still taking O(n2) time to do something that can be done in O(n). You should iterate over the TarInfo with TarFile.next() and extract each entry if it matches your criteria.

So sounds like right now I'm iterating over the whole drat thing each time which isn't necessary. Just go through it once and check each entry in the tar against my list of files I want - find a match and suck it out. Thanks!

Nippashish posted:

I can't tell what proportion of the lines have TMAX in them, but if you're ignoring most of the them it will probably be a lot faster to select those lines outside of python with grep or similar and create files that have only the lines you want for python. The speed difference between grepping 2.4GB of text and looping line by line over 2.4GB of text in python is quite large. This might help even if you are using most of the lines, since it would avoid doing two passes through each file in python.

Probably not the stumbling block since that is all in the numpy arrays and they are quite speedy. Issue is probably what was identified by Lysidas, but I'll find out once I make some changes. Thanks for the suggestion.

Philip Rivers
Mar 15, 2010

So is there a goon-consensus preferred introduction to Python, or are any of them in the OP good? I'm in a situation where I need to learn the basics of Python as quick as I can, but I don't have any access to formal instruction. I'm self motivated, but I have attention issues, so if the reading is too slow, I'll get bored, and if it's too fast, I'll probably just lose focus.

I really wish the CS department at my school wasn't so impacted. I'd just take the intro to Python class if I could, but that's not really an option. :(

QuarkJets
Sep 8, 2008

Philip Rivers posted:

So is there a goon-consensus preferred introduction to Python, or are any of them in the OP good? I'm in a situation where I need to learn the basics of Python as quick as I can, but I don't have any access to formal instruction. I'm self motivated, but I have attention issues, so if the reading is too slow, I'll get bored, and if it's too fast, I'll probably just lose focus.

I really wish the CS department at my school wasn't so impacted. I'd just take the intro to Python class if I could, but that's not really an option. :(

I don't think that there's a goon-consensus on any of these, pretty much anything in the OP is going to be okay. The MIT class looks good

keanu
Jul 27, 2013

Philip Rivers posted:

So is there a goon-consensus preferred introduction to Python, or are any of them in the OP good? I'm in a situation where I need to learn the basics of Python as quick as I can, but I don't have any access to formal instruction. I'm self motivated, but I have attention issues, so if the reading is too slow, I'll get bored, and if it's too fast, I'll probably just lose focus.

I really wish the CS department at my school wasn't so impacted. I'd just take the intro to Python class if I could, but that's not really an option. :(

If you know how to program already then Dive Into Python might work for you.

digitalcamo
Jul 11, 2013
Had a question about functions. Codecademy has me do program challenges and sometimes it will say make an argument called something_one that takes something as an argument. Other times it will ask to take something as input. I keep getting confused about the two. Could someone explain the difference between arguments and input. I thought arguments and input were what you put in (), like in something_one(input/argument).

keanu
Jul 27, 2013

digitalcamo posted:

Had a question about functions. Codecademy has me do program challenges and sometimes it will say make an argument called something_one that takes something as an argument. Other times it will ask to take something as input. I keep getting confused about the two. Could someone explain the difference between arguments and input. I thought arguments and input were what you put in (), like in something_one(input/argument).

In this particular context I think the terms are interchangeable. You can say that a function "has inputs x and y and output z" or "takes arguments x and y and returns z" and it means the same thing.

FoiledAgain
May 6, 2007

digitalcamo posted:

Had a question about functions. Codecademy has me do program challenges and sometimes it will say make an argument called something_one that takes something as an argument. Other times it will ask to take something as input. I keep getting confused about the two. Could someone explain the difference between arguments and input. I thought arguments and input were what you put in (), like in something_one(input/argument).

I think it's just perspective. Arguments are "part of" the function inputs come from outside the function. A function that takes a number as an argument could have an input of 3 or 7 or 12009 or even a string like 'hello' (in which case it should probably raise an error).

Ireland Sucks
May 16, 2004

Is it possible to read from a file on Windows while another process is writing to it? I'm trying to process logfile updates as fast as they are written but whenever I read some data the writer process complains about the file being used by another process and dies.

e:did it by checking for a lock and then reading. race conditions ahoy!

Ireland Sucks fucked around with this message at 09:14 on Aug 30, 2013

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Ireland Sucks posted:

Is it possible to read from a file on Windows while another process is writing to it? I'm trying to process logfile updates as fast as they are written but whenever I read some data the writer process complains about the file being used by another process and dies.

e:did it by checking for a lock and then reading. race conditions ahoy!

In Python it's generally recommended to deal with things like this using exceptions. So instead of

code:
if file exists and is not being used:
   do things with file
else:
   complain to user or whatever
you do

code:
try:
   do things with file
except IOError:
   complain to user or whatever
This is often summarised by the saying "it is easier to ask forgiveness than permission" (EAFP).

Alligator
Jun 10, 2009

LOCK AND LOAF

keanu posted:

If you know how to program already then Dive Into Python might work for you.
there might not be a goon consensus on what intro material to use to use but I'm pretty sure there's a consensus that no-one should use dive into python. the 2.x version is getting on a decade old now and hasn't been updated in about as long.

the MIT intro stuff (especially the video lectures) are great, and the python standard documentation is generally (not always) very readable once you have the basics down.

Dominoes
Sep 20, 2007

Scipy question for anyone familiar with Find_peaks_CWT or Scipy signal analysis.

I'm writing an instrument tuner. I'm trying a technique that uses a hybrid of FFT and calculating peak widths.

1: FFT precision seems to be equal to sample rate/size. For example, (44100hz rate / 1024 sample size) = precision of 43hz. With this in mind, a large sample size is required for acceptable precision, especially at low frequencies due to the logarithmic scaling of pitch. Ie to get 1hz precision, you need 1 second of data, meaning slow, choppy updates.

2: I tried a basic windowing method involving a long sample that adds new chunks to the end and slices chunk off at the beginning. This doesn't solve the problem: It just replaces slow analysis/update rates with lagged response. I don't think a smarter windowing method would solve this.

3: Calculating the width between peaks, then diving sample rate by it gives a precise result. (Ie: 44100hz / 100 sample peak distance) = 441hz signal. However, it requires a smooth signal without harmonics, is quirky, and I don't fully understand how the Scipy peak finding algorithm works. The docs are barebones.

4: I'm trying a hybrid approach. Use a low-sample size FFT (fast, imprecise), take the maximum FFT response, and feed that into a peak-finding algorithm. Set a low-pass filter on the original (amplitudes) signal based on the FFT-calculated freq to make the signal smooth and easy for the peak finder to understand. Set expected peak width based on the fft-calculated freq. Average however many peak widths are in the sample, and throw out outliers.

Example raw and low-passed signal of a A-440hz piano sample. You could calculate a frequency by taking average width between peaks of the low-passed signal, or by taking the total distance between start and end peaks, and dividing it by the number of peaks.



Main question: How should I set the peaks width? The input requires an array. I'd expect it to take min and max value, but instead it seems to want many values; examples online show using a range. I can figure out a range of peak values that may work: Ie: the width could be 1/4 of the fft-calculated freq. How should I set up the array? I can narrow down the expected peak width pretty closely given I have a nearby frequency, and and no harmonics. Should the width array just be one value? Can anyone explain how to use the other, optional parameters?

Dominoes fucked around with this message at 16:10 on Aug 30, 2013

onionradish
Jul 6, 2006

That's spicy.
A helpful resource that should probably be added to the OP is the pyvideo.org website, which archives presentations given at various Python conferences.

Some speakers and presentations are better than others, but there are gems of useful, practical information on unit tests and web frameworks (Flask to Django), standard and other modules (Requests, Pygame, etc.), details on core capabilities like iterators/generators, astronomy and other specialized topics, and so on, usually with links to sample code.

accipter
Sep 12, 2003

Dominoes posted:

Scipy question for anyone familiar with Find_peaks_CWT or Scipy signal analysis.

I'm writing an instrument tuner. I'm trying a technique that uses a hybrid of FFT and calculating peak widths.

1: FFT precision seems to be equal to sample rate/size. For example, (44100hz rate / 1024 sample size) = precision of 43hz. With this in mind, a large sample size is required for acceptable precision, especially at low frequencies to to the logarithmic scaling of pitch. Ie to get 1hz precision, you need 1 second of data, meaning slow, choppy updates.

2: I tried a basic windowing method involving a long sample that adds new chunks to the end and slices chunk off at the beginning. This doesn't solve the problem: It just replaces slow analysis/update rates with lagged response. I don't think a smarter windowing method would solve this.

3: Calculating the width between peaks, then diving sample rate by it gives a precise result. (Ie: 44100hz / 100 sample peak distance) = 441hz signal. However, it requires a smooth signal without harmonics, is quirky, and I don't fully understand how the Scipy peak finding algorithm works. The docs are barebones.

4: I'm trying a hybrid approach. Use a low-sample size FFT (fast, imprecise), take the maximum FFT response, and feed that into a peak-finding algorithm. Set a low-pass filter based on the FFT-calculated freq to make the signal smooth and easy for the peak finder to understand. Set expected peak width based on the fft-calculated freq. Average however many peak widths are in the sample, and throw out outliers.

Example raw and low-passed signal of a A-440hz piano sample. You could calculate a frequency by taking average width between peaks of the low-passed signal, or by taking the total distance between start and end peaks, and dividing it by the number of peaks.



Main question: How should I set the peaks width? The input requires an array. I'd expect it to take min and max value, but instead it seems to want many values; examples online show using a range. I can figure out a range of peak values that may work: Ie: the width could be 1/4 of the fft-calculated freq. How should I set up the array? I can narrow down the expected peak width pretty closely given I have a nearby frequency, and and no harmonics. Should the width array just be one value? Can anyone explain how to use the other, optional parameters?

Regarding the Fourier transform. The time step (dt) of the signal defines the maximum frequency (1/(2 dt)) -- which is called the Nyquist frequency. The duration of signal defines the lowest frequency.

Since you mentioned that you are doing an instrument tuner, it sounds like you want to find the dominant frequency of the signal. If you plot the Fourier Amplitude spectrum (FAS), the peak in the signal corresponds to the peak in the FAS. So rather than trying to find peaks in the time series. Take the FFT, smooth the FAS, and find the peak.

Dominoes
Sep 20, 2007

accipter posted:

Regarding the Fourier transform. The time step (dt) of the signal defines the maximum frequency (1/(2 dt)) -- which is called the Nyquist frequency. The duration of signal defines the lowest frequency.

Since you mentioned that you are doing an instrument tuner, it sounds like you want to find the dominant frequency of the signal. If you plot the Fourier Amplitude spectrum (FAS), the peak in the signal corresponds to the peak in the FAS. So rather than trying to find peaks in the time series. Take the FFT, smooth the FAS, and find the peak.
Here's a similar example with FFT of a 440hz piano sound plotted:


If I understand correctly, what you're describing is the method I use to get the reference FFT frequency I use to set the peak finder's parameters. The problem I'm encountering with FFT is that the peak in the subplot3 is 430.7hz instead of 440hz, due to its 44100hz/1024=43hz precision. The only output it can give is in increments of 43hz, with this sample rate and size. The low pass filter that generates the subplot2 signal is based on this 430hz reference freq.

The peak finder gives 100.22 sample average peak spacing on the subplot2 signal in the chart. Dividing 44100hz (sample rate) by this gives 440.02hz. Unfortunately, the peak finder is wonky and doesn't always act as expected. I think I can fix this by changing its parameters, which I don't know how to do.

Dominoes fucked around with this message at 17:21 on Aug 30, 2013

fritz
Jul 26, 2003

Dominoes posted:

Scipy question for anyone familiar with Find_peaks_CWT or Scipy signal analysis.


My first thought is I would not look for peaks in the Fourier spectrum directly. Check a book on statistical signal processing, there are other ways of estimating sinusoids in noise that should be useful here (I realize there are harmonics, but maybe your pre-filtering and bandpass stages will be enough?)

Modern Pragmatist
Aug 20, 2008

Dominoes posted:

Here's a similar example with FFT of a 440hz piano sound plotted:


If I understand correctly, what you're describing is the method I use to get the reference FFT frequency I use to set the peak finder's parameters. The problem I'm encountering with FFT is that the peak in the subplot3 is 430.7hz instead of 440hz, due to its 44100hz/1024=43hz precision. The only output it can give is in increments of 43hz, with this sample rate and size. The low pass filter that generates the subplot2 signal is based on this 430hz reference freq.

The peak finder gives 100.22 sample average peak spacing on the subplot2 signal in the chart. Dividing 44100hz (sample rate) by this gives 440.02hz. Unfortunately, the peak finder is wonky and doesn't always act as expected. I think I can fix this by changing its parameters, which I don't know how to do.

Why not upsample the original signal so that you have more than 1024 samples in your window and therefore better frequency resolution?

Also be careful where your 0 point is in your power spectrum. Zero should be the DC point and therefore be the largest peak in the spectrum. The first point in your spectrum is not the DC point but the one just outside of it.

Modern Pragmatist fucked around with this message at 17:49 on Aug 30, 2013

Dominoes
Sep 20, 2007

Modern Pragmatist posted:

Why not upsample the original signal so that you have more than 1024 samples in your window and therefore better frequency resolution?
That sounds like point 2 in my original post: It causes the response to be delayed when the frequency changes. 1 cent precision is ideal. cents = 1200 * math.log2(measured_freq/note_freq) Feeding 441 and 440 in as the measured and note freqs, 1cent near 440hz = 4hz. Given the precision formula I observed, You'd need a 4-second sample to get 1 cent precision at A440 using FFTs alone. Longer at lower freqs, shorter at higher.

quote:

Also be careful where your 0 point is in your power spectrum. Zero should be the DC point and therefore be the largest peak in the spectrum. The first point in your spectrum is not the DC point but the one just outside of it.
Thanks - I'll read about and fix that. It's currently the scipy default, with the negative side removed.

Dominoes fucked around with this message at 18:02 on Aug 30, 2013

QuarkJets
Sep 8, 2008

Dominoes posted:

That sounds like point 2 in my original post: It causes the response to be delayed when the frequency changes. 1 cent precision is ideal. cents = 1200 * math.log2(measured_freq/note_freq) Feeding 441 and 440 in as the measured and note freqs, 1cent near 440hz = 4hz. Given the precision formula I observed, You'd need a 4-second sample to get 1 cent precision at A440 using FFTs alone. Longer at lower freqs, shorter at higher.

Thanks - I'll read about and fix that. It's currently the scipy default, with the negative side removed.

In point 2 of your original post you said that you used a basic windowing method. Upsampling is not the same thing as windowing

Try using scipy.signal.resample

evensevenone
May 12, 2001
Glass is a solid.
Up sampling wont work. The data isn't there. A 440hz sine sampled at 44khz for 1000 samples won't actually look any different than a 430hz signal, at least not in a way that can lead you to discriminate the two.

Casyl
Feb 19, 2012

Dominoes posted:

Main question: How should I set the peaks width? The input requires an array. I'd expect it to take min and max value, but instead it seems to want many values; examples online show using a range. I can figure out a range of peak values that may work: Ie: the width could be 1/4 of the fft-calculated freq. How should I set up the array? I can narrow down the expected peak width pretty closely given I have a nearby frequency, and and no harmonics. Should the width array just be one value? Can anyone explain how to use the other, optional parameters?

I've messed around with scipy's implementation of cwt a bit, but it's a tough subject to get around so I didn't make all that much progress, but I'll share what I've figured out. scipy.signal.cwt takes your input vector and does the wavelet transform using each width, returning an array where each element is the smoothed vector at one width. So if your widths were [1, 2, 3, 4, 5] then your returned cwt array would have 5 entries, the first being the wavelet transform of the input vector using width = 1, the second being the cwt with width = 2, and so on. From what I can gather, when you use signal.find_peaks_cwt, the algorithm somehow looks at all of these wavelet transforms to determine if a peak is there or not (through "ridges" and whatever else it talks about), and returns an array of peak values. The find_peaks_cwt page references this paper that goes into more depth about how it picks the peaks, but it's really dense. If you can get through it it may help you get a sense of what the other parameters do.

tzec
Oct 6, 2003

HAMMERTIME!

evensevenone posted:

Up sampling wont work. The data isn't there. A 440hz sine sampled at 44khz for 1000 samples won't actually look any different than a 430hz signal, at least not in a way that can lead you to discriminate the two.

That isn't true; there's plenty to discriminate them (as a quick test, you can hear the difference with your ears even on a 1000 sample burst). Upsampling is a very inefficient way to do improve things, though.

Dominoes, if your signals are going to be clean like your examples, you might try simply bandpassing the signal and then counting the zero crossings (no need to use a peak finder). You can do interpolation to get sub-sample accuracy if you need to. If you want, you can try estimating the bandpass center frequency from the DFT peak.

Or you could use the autocorrelation (with a sliding window), which is a standard way of estimating the fundamental (e.g. see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.1676&rep=rep1&type=pdf for a mathematical overview).

Alternatively, this paper http://courses.physics.illinois.edu/phys406/NSF_REU_Reports/2005_reu/Real-Time_Time-Domain_Pitch_Tracking_Using_Wavelets.pdf shows a simple to implement and efficient wavelet pitch tracking algorithm (there's a C implementation at http://www.schmittmachine.com/dywapitchtrack.html )

fritz
Jul 26, 2003

Go read something like this: http://www.wpi.edu/Pubs/ETD/Available/etd-042511-144644/unrestricted/YLiao.pdf starting at about page 13. Or this: http://www.mathworks.com/help/signal/ref/rootmusic.html. Or do an autocorrelation like tzec suggested. Are you insisting on the fourier spectrum for non-technical reasons?

Dominoes
Sep 20, 2007

Casyl posted:

I've messed around with scipy's implementation of cwt a bit, but it's a tough subject to get around so I didn't make all that much progress, but I'll share what I've figured out. scipy.signal.cwt takes your input vector and does the wavelet transform using each width, returning an array where each element is the smoothed vector at one width. So if your widths were [1, 2, 3, 4, 5] then your returned cwt array would have 5 entries, the first being the wavelet transform of the input vector using width = 1, the second being the cwt with width = 2, and so on. From what I can gather, when you use signal.find_peaks_cwt, the algorithm somehow looks at all of these wavelet transforms to determine if a peak is there or not (through "ridges" and whatever else it talks about), and returns an array of peak values. The find_peaks_cwt page references this paper that goes into more depth about how it picks the peaks, but it's really dense. If you can get through it it may help you get a sense of what the other parameters do.

tzec posted:

That isn't true; there's plenty to discriminate them (as a quick test, you can hear the difference with your ears even on a 1000 sample burst). Upsampling is a very inefficient way to do improve things, though.

Dominoes, if your signals are going to be clean like your examples, you might try simply bandpassing the signal and then counting the zero crossings (no need to use a peak finder). You can do interpolation to get sub-sample accuracy if you need to. If you want, you can try estimating the bandpass center frequency from the DFT peak.

Or you could use the autocorrelation (with a sliding window), which is a standard way of estimating the fundamental (e.g. see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.1676&rep=rep1&type=pdf for a mathematical overview).

Alternatively, this paper http://courses.physics.illinois.edu/phys406/NSF_REU_Reports/2005_reu/Real-Time_Time-Domain_Pitch_Tracking_Using_Wavelets.pdf shows a simple to implement and efficient wavelet pitch tracking algorithm (there's a C implementation at http://www.schmittmachine.com/dywapitchtrack.html )

fritz posted:

Go read something like this: http://www.wpi.edu/Pubs/ETD/Available/etd-042511-144644/unrestricted/YLiao.pdf starting at about page 13. Or this: http://www.mathworks.com/help/signal/ref/rootmusic.html. Or do an autocorrelation like tzec suggested. Are you insisting on the fourier spectrum for non-technical reasons?
Thanks a lot - I'll try these. Fritz - I'm not attached to fourier spectrum's, it was just the first thing I found that seemed reasonable.

edit: Upsampling/FFT (and multiplying the fft freq accordingly) produced the same precision problems as the original signal's FFT, when I used scipy's upsample.

Dominoes fucked around with this message at 01:00 on Aug 31, 2013

BigRedDot
Mar 6, 2008

Hammerite posted:

In Python it's generally recommended to deal with things like this using exceptions. So instead of

code:
if file exists and is not being used:
   do things with file
else:
   complain to user or whatever
you do

code:
try:
   do things with file
except IOError:
   complain to user or whatever
This is often summarised by the saying "it is easier to ask forgiveness than permission" (EAFP).

Whether you should "look before you leap" or "ask for forgiveness" depends on how often you expect the "fail" case to occur. For instance if you put a try-except in a loop that will fail regularly even 10% of the time you can incur a performance penalty upwards of 30%.

Adbot
ADBOT LOVES YOU

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

BigRedDot posted:

Whether you should "look before you leap" or "ask for forgiveness" depends on how often you expect the "fail" case to occur. For instance if you put a try-except in a loop that will fail regularly even 10% of the time you can incur a performance penalty upwards of 30%.
I think it's important to note that even if you "look before you leap" you still have to handle the possible exception. There's a race condition, where something can happen in between "looking" and "leaping", which if not handled will lead to extremely subtle and hard-to-reproduce bugs.

So the two correct versions are:
code:
try:
  do things
except IOError:
  handle error
and
code:
if can safely use file:
  try:
    do things
  except IOError:
    handle error
else:
  handle error
The second version is still useful sometimes, as BigRedDot points out, as a performance consideration or to choose a most-likely-to-succeed plan. But correct code, in general, must always handle errors using exceptions, because it is impossible to detect error conditions before actually attempting whatever action.

ShoulderDaemon fucked around with this message at 04:32 on Aug 31, 2013

  • Locked thread