|
Lexical Unit posted:
Mmm, more like: Python code:
Or maybe there's an easier way to do this, does anyone know Beautiful Soup
|
# ? Aug 29, 2013 00:21 |
|
|
# ? May 9, 2024 15:12 |
|
accipter posted:Instead of telling the GUI to update, emit a signal from the model/data class. On the GUI side of things, connect to the signal of the model/data class and update the GUI as needed. edit: Got it working. It appears that the signal needs to be created in a QObject class in the child module to avoid unbounded signal issues. It needs to be referenced from an instance of this class. Dominoes fucked around with this message at 03:09 on Aug 29, 2013 |
# ? Aug 29, 2013 00:37 |
|
Pollyanna posted:Say I have a list like [a, b, c, 0, d, e, f, 0, g, h, i]. How do I extract all elements between instances of 0? Python code:
Ok, not a one-liner...though it could be. I converted the 0's to strings because I see in your example with real data they are strings. Python code:
Thermopyle fucked around with this message at 01:06 on Aug 29, 2013 |
# ? Aug 29, 2013 00:59 |
|
Pollyanna posted:Say I have a list like [a, b, c, 0, d, e, f, 0, g, h, i]. How do I extract all elements between instances of 0? Kinda ugly but: code:
|
# ? Aug 29, 2013 01:12 |
|
I think BS works by taking the contents of a set of tags, and I can't make it take two <hr/> tags as input...I think. Maybe I'm approaching this wrong...there has to be an easier way to do this, I just need to find out how Maybe if I can get it to output everything in <body> tags that is not between <a> or <h1> or anything...
|
# ? Aug 29, 2013 01:14 |
|
Thermopyle posted:
code:
code:
Hammerite fucked around with this message at 01:50 on Aug 29, 2013 |
# ? Aug 29, 2013 01:47 |
|
Pollyanna posted:and what I need is the string between both <hr/> tags. In your example, I'm looking for [d, e, f]. Start at the first <hr/> tag and keep going until you hit the other? e.g. Python code:
|
# ? Aug 29, 2013 02:22 |
|
FoiledAgain posted:Supposing the first element fails the test, the program still has to access that list 5001 time before realizing that, right? It appends 5000 elements to the list first, passes that list to all(), checks 1 element and returns False. Worst case, I have to append 5000 times and then check 5000 times. Yes, you have got the idea. If you use a list comprehension it must determine the value of every element in the list before it does anything with it. A generator determines its values as it goes, so there is no need to figure it all out in advances, which is advantageous if figuring it out takes a while. To put it another way this: [test(x) for x in whatever] Is effectively (and perhaps literally, I forget) the same thing as this: list(test(x) for x in whatever) You can also see there that you do not need extra parens to create a generator expression in cases where they are immediately inside other parens, you can just do all(test(x) for x in X).
|
# ? Aug 29, 2013 03:22 |
|
Pollyanna posted:Say I have a list like [a, b, c, 0, d, e, f, 0, g, h, i]. How do I extract all elements between instances of 0? Python code:
keanu fucked around with this message at 03:34 on Aug 29, 2013 |
# ? Aug 29, 2013 03:32 |
|
I'll start trying to get the text directly from the elements for now, before I try and tackle a string.suffix posted:Start at the first <hr/> tag and keep going until you hit the other? Thing is that I don't think BS takes "document.hr" as valid, at least not when I've tried to do so. I know that contents outputs a list, so I tried using this on that list. Unfortunately, it gives me this: Python code:
edit: I think takewhile does something different than what I need, it looks like the way it's written right now it takes everything that is NOT <hr/>, which is not quite what I need. I need it to not take anything until it FINDS an <hr/>, and then stop once it sees an <hr/> again... lemme mull over this some more. edit2: i swear i'm missing something really obvious here Pollyanna fucked around with this message at 05:39 on Aug 29, 2013 |
# ? Aug 29, 2013 05:35 |
|
Pollyanna posted:edit: I think takewhile does something different than what I need, it looks like the way it's written right now it takes everything that is NOT <hr/>, which is not quite what I need. I need it to not take anything until it FINDS an <hr/>, and then stop once it sees an <hr/> again... lemme mull over this some more. Out of curiosity, why didn't my solution earlier work? If you have a single symbol that's "bookending" your desired elements within the list, it should do. edit: oh I see below now it's actually a way more complicated issue, not just finding a sequence of things in a list. breaks posted:Yes, you have got the idea. If you use a list comprehension it must determine the value of every element in the list before it does anything with it. A generator determines its values as it goes, so there is no need to figure it all out in advances, which is advantageous if figuring it out takes a while. Oh good. Thank you! FoiledAgain fucked around with this message at 08:02 on Aug 29, 2013 |
# ? Aug 29, 2013 06:34 |
|
Pollyanna posted:I'll start trying to get the text directly from the elements for now, before I try and tackle a string. I know the background of what you're doing here since I read that other thread. That site you're trying to parse is kinda shittily written in that it doesn't do a good job of using the HTML DOM -- it has more than one h1 elements (allowed but not recommended, the different types of headers are supposed to be sublevels starting with an h1 at the top), doesn't use anything like class to describe what each tag is supposed to represent, and separates content only by <hr/> tags instead of nested tags. Beautiful Soup lets you move sideways on tags by using the .next_sibling and .previous_sibling functions on a tag, there's also the .next_siblings and .previous_siblings functions that are iterators over the range of elements it can go next/previous to (an iterator is just an object that's purpose is to move through a collection/object, like an iterator for a list would typically just iterate through each element one by one, that code you posted in angular brackets is just python's print formatting of an object that doesn't have a toString method so it prints a way of identifying what the object is instead of what the object represents itself as). If the site was laid out better you could just do something like: code:
code:
edit: also itertools.takewhile is the better choice of thing to use than whatever I just came up with. edit: itertools.takewhile (and everything else in itertools I'm guessing) returns an iterator object instead of a collection of the things it iterates on, to get the next element in an iterator you use "iterator.next()", typically there would also be a method for an iterator called something like hasNext() so you know when the iterator can't iterate no more but python doesn't seem to have that? And instead just raises a StopIteration exception you have to catch? So it's something like this I guess: code:
code:
piratepilates fucked around with this message at 07:40 on Aug 29, 2013 |
# ? Aug 29, 2013 07:06 |
|
So I'm working on extracting data (1600) from a tarball with 86,000 files and clocking in at 2.4GB. I've got a list of the station I want to extract but the process is s l o w. Here is my code, am I missing a quicker way of doing this? I think the slow part is the line with extractfile.code:
|
# ? Aug 29, 2013 18:15 |
|
SirPablo posted:am I missing a quicker way of doing this? Yes. Tar files don't provide random access to their contents; to get at a certain file you have to read the tar file up to its position. This can be optimized by seeking from one tar entry to the next, but you're still taking O(n2) time to do something that can be done in O(n). You should iterate over the TarInfo with TarFile.next() and extract each entry if it matches your criteria.
|
# ? Aug 29, 2013 18:28 |
|
I can't tell what proportion of the lines have TMAX in them, but if you're ignoring most of the them it will probably be a lot faster to select those lines outside of python with grep or similar and create files that have only the lines you want for python. The speed difference between grepping 2.4GB of text and looping line by line over 2.4GB of text in python is quite large. This might help even if you are using most of the lines, since it would avoid doing two passes through each file in python.
|
# ? Aug 29, 2013 18:45 |
|
Lysidas posted:Yes. Tar files don't provide random access to their contents; to get at a certain file you have to read the tar file up to its position. This can be optimized by seeking from one tar entry to the next, but you're still taking O(n2) time to do something that can be done in O(n). You should iterate over the TarInfo with TarFile.next() and extract each entry if it matches your criteria. So sounds like right now I'm iterating over the whole drat thing each time which isn't necessary. Just go through it once and check each entry in the tar against my list of files I want - find a match and suck it out. Thanks! Nippashish posted:I can't tell what proportion of the lines have TMAX in them, but if you're ignoring most of the them it will probably be a lot faster to select those lines outside of python with grep or similar and create files that have only the lines you want for python. The speed difference between grepping 2.4GB of text and looping line by line over 2.4GB of text in python is quite large. This might help even if you are using most of the lines, since it would avoid doing two passes through each file in python. Probably not the stumbling block since that is all in the numpy arrays and they are quite speedy. Issue is probably what was identified by Lysidas, but I'll find out once I make some changes. Thanks for the suggestion.
|
# ? Aug 29, 2013 19:36 |
|
So is there a goon-consensus preferred introduction to Python, or are any of them in the OP good? I'm in a situation where I need to learn the basics of Python as quick as I can, but I don't have any access to formal instruction. I'm self motivated, but I have attention issues, so if the reading is too slow, I'll get bored, and if it's too fast, I'll probably just lose focus. I really wish the CS department at my school wasn't so impacted. I'd just take the intro to Python class if I could, but that's not really an option.
|
# ? Aug 29, 2013 21:30 |
|
Philip Rivers posted:So is there a goon-consensus preferred introduction to Python, or are any of them in the OP good? I'm in a situation where I need to learn the basics of Python as quick as I can, but I don't have any access to formal instruction. I'm self motivated, but I have attention issues, so if the reading is too slow, I'll get bored, and if it's too fast, I'll probably just lose focus. I don't think that there's a goon-consensus on any of these, pretty much anything in the OP is going to be okay. The MIT class looks good
|
# ? Aug 29, 2013 21:45 |
|
Philip Rivers posted:So is there a goon-consensus preferred introduction to Python, or are any of them in the OP good? I'm in a situation where I need to learn the basics of Python as quick as I can, but I don't have any access to formal instruction. I'm self motivated, but I have attention issues, so if the reading is too slow, I'll get bored, and if it's too fast, I'll probably just lose focus. If you know how to program already then Dive Into Python might work for you.
|
# ? Aug 30, 2013 03:35 |
|
Had a question about functions. Codecademy has me do program challenges and sometimes it will say make an argument called something_one that takes something as an argument. Other times it will ask to take something as input. I keep getting confused about the two. Could someone explain the difference between arguments and input. I thought arguments and input were what you put in (), like in something_one(input/argument).
|
# ? Aug 30, 2013 03:45 |
|
digitalcamo posted:Had a question about functions. Codecademy has me do program challenges and sometimes it will say make an argument called something_one that takes something as an argument. Other times it will ask to take something as input. I keep getting confused about the two. Could someone explain the difference between arguments and input. I thought arguments and input were what you put in (), like in something_one(input/argument). In this particular context I think the terms are interchangeable. You can say that a function "has inputs x and y and output z" or "takes arguments x and y and returns z" and it means the same thing.
|
# ? Aug 30, 2013 04:00 |
|
digitalcamo posted:Had a question about functions. Codecademy has me do program challenges and sometimes it will say make an argument called something_one that takes something as an argument. Other times it will ask to take something as input. I keep getting confused about the two. Could someone explain the difference between arguments and input. I thought arguments and input were what you put in (), like in something_one(input/argument). I think it's just perspective. Arguments are "part of" the function inputs come from outside the function. A function that takes a number as an argument could have an input of 3 or 7 or 12009 or even a string like 'hello' (in which case it should probably raise an error).
|
# ? Aug 30, 2013 05:51 |
|
Is it possible to read from a file on Windows while another process is writing to it? I'm trying to process logfile updates as fast as they are written but whenever I read some data the writer process complains about the file being used by another process and dies. e:did it by checking for a lock and then reading. race conditions ahoy! Ireland Sucks fucked around with this message at 09:14 on Aug 30, 2013 |
# ? Aug 30, 2013 07:30 |
|
Ireland Sucks posted:Is it possible to read from a file on Windows while another process is writing to it? I'm trying to process logfile updates as fast as they are written but whenever I read some data the writer process complains about the file being used by another process and dies. In Python it's generally recommended to deal with things like this using exceptions. So instead of code:
code:
|
# ? Aug 30, 2013 10:59 |
|
keanu posted:If you know how to program already then Dive Into Python might work for you. the MIT intro stuff (especially the video lectures) are great, and the python standard documentation is generally (not always) very readable once you have the basics down.
|
# ? Aug 30, 2013 14:55 |
|
Scipy question for anyone familiar with Find_peaks_CWT or Scipy signal analysis. I'm writing an instrument tuner. I'm trying a technique that uses a hybrid of FFT and calculating peak widths. 1: FFT precision seems to be equal to sample rate/size. For example, (44100hz rate / 1024 sample size) = precision of 43hz. With this in mind, a large sample size is required for acceptable precision, especially at low frequencies due to the logarithmic scaling of pitch. Ie to get 1hz precision, you need 1 second of data, meaning slow, choppy updates. 2: I tried a basic windowing method involving a long sample that adds new chunks to the end and slices chunk off at the beginning. This doesn't solve the problem: It just replaces slow analysis/update rates with lagged response. I don't think a smarter windowing method would solve this. 3: Calculating the width between peaks, then diving sample rate by it gives a precise result. (Ie: 44100hz / 100 sample peak distance) = 441hz signal. However, it requires a smooth signal without harmonics, is quirky, and I don't fully understand how the Scipy peak finding algorithm works. The docs are barebones. 4: I'm trying a hybrid approach. Use a low-sample size FFT (fast, imprecise), take the maximum FFT response, and feed that into a peak-finding algorithm. Set a low-pass filter on the original (amplitudes) signal based on the FFT-calculated freq to make the signal smooth and easy for the peak finder to understand. Set expected peak width based on the fft-calculated freq. Average however many peak widths are in the sample, and throw out outliers. Example raw and low-passed signal of a A-440hz piano sample. You could calculate a frequency by taking average width between peaks of the low-passed signal, or by taking the total distance between start and end peaks, and dividing it by the number of peaks. Main question: How should I set the peaks width? The input requires an array. I'd expect it to take min and max value, but instead it seems to want many values; examples online show using a range. I can figure out a range of peak values that may work: Ie: the width could be 1/4 of the fft-calculated freq. How should I set up the array? I can narrow down the expected peak width pretty closely given I have a nearby frequency, and and no harmonics. Should the width array just be one value? Can anyone explain how to use the other, optional parameters? Dominoes fucked around with this message at 16:10 on Aug 30, 2013 |
# ? Aug 30, 2013 15:33 |
|
A helpful resource that should probably be added to the OP is the pyvideo.org website, which archives presentations given at various Python conferences. Some speakers and presentations are better than others, but there are gems of useful, practical information on unit tests and web frameworks (Flask to Django), standard and other modules (Requests, Pygame, etc.), details on core capabilities like iterators/generators, astronomy and other specialized topics, and so on, usually with links to sample code.
|
# ? Aug 30, 2013 15:39 |
|
Dominoes posted:Scipy question for anyone familiar with Find_peaks_CWT or Scipy signal analysis. Regarding the Fourier transform. The time step (dt) of the signal defines the maximum frequency (1/(2 dt)) -- which is called the Nyquist frequency. The duration of signal defines the lowest frequency. Since you mentioned that you are doing an instrument tuner, it sounds like you want to find the dominant frequency of the signal. If you plot the Fourier Amplitude spectrum (FAS), the peak in the signal corresponds to the peak in the FAS. So rather than trying to find peaks in the time series. Take the FFT, smooth the FAS, and find the peak.
|
# ? Aug 30, 2013 15:48 |
|
accipter posted:Regarding the Fourier transform. The time step (dt) of the signal defines the maximum frequency (1/(2 dt)) -- which is called the Nyquist frequency. The duration of signal defines the lowest frequency. If I understand correctly, what you're describing is the method I use to get the reference FFT frequency I use to set the peak finder's parameters. The problem I'm encountering with FFT is that the peak in the subplot3 is 430.7hz instead of 440hz, due to its 44100hz/1024=43hz precision. The only output it can give is in increments of 43hz, with this sample rate and size. The low pass filter that generates the subplot2 signal is based on this 430hz reference freq. The peak finder gives 100.22 sample average peak spacing on the subplot2 signal in the chart. Dividing 44100hz (sample rate) by this gives 440.02hz. Unfortunately, the peak finder is wonky and doesn't always act as expected. I think I can fix this by changing its parameters, which I don't know how to do. Dominoes fucked around with this message at 17:21 on Aug 30, 2013 |
# ? Aug 30, 2013 16:27 |
|
Dominoes posted:Scipy question for anyone familiar with Find_peaks_CWT or Scipy signal analysis. My first thought is I would not look for peaks in the Fourier spectrum directly. Check a book on statistical signal processing, there are other ways of estimating sinusoids in noise that should be useful here (I realize there are harmonics, but maybe your pre-filtering and bandpass stages will be enough?)
|
# ? Aug 30, 2013 16:34 |
|
Dominoes posted:Here's a similar example with FFT of a 440hz piano sound plotted: Why not upsample the original signal so that you have more than 1024 samples in your window and therefore better frequency resolution? Also be careful where your 0 point is in your power spectrum. Zero should be the DC point and therefore be the largest peak in the spectrum. The first point in your spectrum is not the DC point but the one just outside of it. Modern Pragmatist fucked around with this message at 17:49 on Aug 30, 2013 |
# ? Aug 30, 2013 17:47 |
|
Modern Pragmatist posted:Why not upsample the original signal so that you have more than 1024 samples in your window and therefore better frequency resolution? quote:Also be careful where your 0 point is in your power spectrum. Zero should be the DC point and therefore be the largest peak in the spectrum. The first point in your spectrum is not the DC point but the one just outside of it. Dominoes fucked around with this message at 18:02 on Aug 30, 2013 |
# ? Aug 30, 2013 17:51 |
|
Dominoes posted:That sounds like point 2 in my original post: It causes the response to be delayed when the frequency changes. 1 cent precision is ideal. cents = 1200 * math.log2(measured_freq/note_freq) Feeding 441 and 440 in as the measured and note freqs, 1cent near 440hz = 4hz. Given the precision formula I observed, You'd need a 4-second sample to get 1 cent precision at A440 using FFTs alone. Longer at lower freqs, shorter at higher. In point 2 of your original post you said that you used a basic windowing method. Upsampling is not the same thing as windowing Try using scipy.signal.resample
|
# ? Aug 30, 2013 18:19 |
|
Up sampling wont work. The data isn't there. A 440hz sine sampled at 44khz for 1000 samples won't actually look any different than a 430hz signal, at least not in a way that can lead you to discriminate the two.
|
# ? Aug 30, 2013 18:55 |
|
Dominoes posted:Main question: How should I set the peaks width? The input requires an array. I'd expect it to take min and max value, but instead it seems to want many values; examples online show using a range. I can figure out a range of peak values that may work: Ie: the width could be 1/4 of the fft-calculated freq. How should I set up the array? I can narrow down the expected peak width pretty closely given I have a nearby frequency, and and no harmonics. Should the width array just be one value? Can anyone explain how to use the other, optional parameters? I've messed around with scipy's implementation of cwt a bit, but it's a tough subject to get around so I didn't make all that much progress, but I'll share what I've figured out. scipy.signal.cwt takes your input vector and does the wavelet transform using each width, returning an array where each element is the smoothed vector at one width. So if your widths were [1, 2, 3, 4, 5] then your returned cwt array would have 5 entries, the first being the wavelet transform of the input vector using width = 1, the second being the cwt with width = 2, and so on. From what I can gather, when you use signal.find_peaks_cwt, the algorithm somehow looks at all of these wavelet transforms to determine if a peak is there or not (through "ridges" and whatever else it talks about), and returns an array of peak values. The find_peaks_cwt page references this paper that goes into more depth about how it picks the peaks, but it's really dense. If you can get through it it may help you get a sense of what the other parameters do.
|
# ? Aug 30, 2013 18:58 |
|
evensevenone posted:Up sampling wont work. The data isn't there. A 440hz sine sampled at 44khz for 1000 samples won't actually look any different than a 430hz signal, at least not in a way that can lead you to discriminate the two. That isn't true; there's plenty to discriminate them (as a quick test, you can hear the difference with your ears even on a 1000 sample burst). Upsampling is a very inefficient way to do improve things, though. Dominoes, if your signals are going to be clean like your examples, you might try simply bandpassing the signal and then counting the zero crossings (no need to use a peak finder). You can do interpolation to get sub-sample accuracy if you need to. If you want, you can try estimating the bandpass center frequency from the DFT peak. Or you could use the autocorrelation (with a sliding window), which is a standard way of estimating the fundamental (e.g. see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.1676&rep=rep1&type=pdf for a mathematical overview). Alternatively, this paper http://courses.physics.illinois.edu/phys406/NSF_REU_Reports/2005_reu/Real-Time_Time-Domain_Pitch_Tracking_Using_Wavelets.pdf shows a simple to implement and efficient wavelet pitch tracking algorithm (there's a C implementation at http://www.schmittmachine.com/dywapitchtrack.html )
|
# ? Aug 30, 2013 19:30 |
|
Go read something like this: http://www.wpi.edu/Pubs/ETD/Available/etd-042511-144644/unrestricted/YLiao.pdf starting at about page 13. Or this: http://www.mathworks.com/help/signal/ref/rootmusic.html. Or do an autocorrelation like tzec suggested. Are you insisting on the fourier spectrum for non-technical reasons?
|
# ? Aug 30, 2013 19:36 |
|
Casyl posted:I've messed around with scipy's implementation of cwt a bit, but it's a tough subject to get around so I didn't make all that much progress, but I'll share what I've figured out. scipy.signal.cwt takes your input vector and does the wavelet transform using each width, returning an array where each element is the smoothed vector at one width. So if your widths were [1, 2, 3, 4, 5] then your returned cwt array would have 5 entries, the first being the wavelet transform of the input vector using width = 1, the second being the cwt with width = 2, and so on. From what I can gather, when you use signal.find_peaks_cwt, the algorithm somehow looks at all of these wavelet transforms to determine if a peak is there or not (through "ridges" and whatever else it talks about), and returns an array of peak values. The find_peaks_cwt page references this paper that goes into more depth about how it picks the peaks, but it's really dense. If you can get through it it may help you get a sense of what the other parameters do. tzec posted:That isn't true; there's plenty to discriminate them (as a quick test, you can hear the difference with your ears even on a 1000 sample burst). Upsampling is a very inefficient way to do improve things, though. fritz posted:Go read something like this: http://www.wpi.edu/Pubs/ETD/Available/etd-042511-144644/unrestricted/YLiao.pdf starting at about page 13. Or this: http://www.mathworks.com/help/signal/ref/rootmusic.html. Or do an autocorrelation like tzec suggested. Are you insisting on the fourier spectrum for non-technical reasons? edit: Upsampling/FFT (and multiplying the fft freq accordingly) produced the same precision problems as the original signal's FFT, when I used scipy's upsample. Dominoes fucked around with this message at 01:00 on Aug 31, 2013 |
# ? Aug 31, 2013 00:27 |
|
Hammerite posted:In Python it's generally recommended to deal with things like this using exceptions. So instead of Whether you should "look before you leap" or "ask for forgiveness" depends on how often you expect the "fail" case to occur. For instance if you put a try-except in a loop that will fail regularly even 10% of the time you can incur a performance penalty upwards of 30%.
|
# ? Aug 31, 2013 04:06 |
|
|
# ? May 9, 2024 15:12 |
|
BigRedDot posted:Whether you should "look before you leap" or "ask for forgiveness" depends on how often you expect the "fail" case to occur. For instance if you put a try-except in a loop that will fail regularly even 10% of the time you can incur a performance penalty upwards of 30%. So the two correct versions are: code:
code:
ShoulderDaemon fucked around with this message at 04:32 on Aug 31, 2013 |
# ? Aug 31, 2013 04:29 |