|
BigRedDot posted:A few weeks ago my company put on PyData NYC, a conference dedicated to data analytics with python. Authors and contributors of numpy, scipy, pandas, pytables, ipython, and other projects all gave great talks to several hundred attendees. Today all the talks were made available on Vimeo, for anyone interested!
|
# ? Nov 11, 2012 21:27 |
|
|
# ? May 9, 2024 11:01 |
|
Suspicious Dish posted:You can use Gio to do the copy directly, which will use the credentials from the keyring. Perfect. Thanks! edit: not quite perfect. Your method signature is missing the second argument for a progress_callback. According to this documentation you can pass None for that argument. But when I pass None I get: code:
Thermopyle fucked around with this message at 21:57 on Nov 11, 2012 |
# ? Nov 11, 2012 21:45 |
|
There are seven arguments in the C version of the function. One is the GError out pointer, which is transformed into a Python exception. The other is the progress_callback_data, which is used by most bindings internally to store a copy to the current scope. But what I forgot is that PyGObject exposes that argument, and stores the scope and the argument passed in. It exposes that argument for compatibility reasons, I think. You should always pass None for it.
|
# ? Nov 11, 2012 22:13 |
|
Suspicious Dish posted:There are seven arguments in the C version of the function. One is the GError out pointer, which is transformed into a Python exception. The other is the progress_callback_data, which is used by most bindings internally to store a copy to the current scope. But what I forgot is that PyGObject exposes that argument, and stores the scope and the argument passed in. It exposes that argument for compatibility reasons, I think. You should always pass None for it. Ok, gotcha. What's the deal with this documentation, then? http://www.pygtk.org/docs/pygobject/class-giofile.html#method-giofile--copy Is it just plain wrong, or is there more current documentation that I can't find?
|
# ? Nov 11, 2012 22:18 |
|
That's for PyGTK, which is dead. It's certainly been confusing a lot of people, so I'll see if I can have someone who maintains the PyGTK website help people find more accurate documentation. The thing is that we don't have generated documentation for Python yet, but we're working on it, I swear. I guess the best resource right now is the gir file in /usr/share/gir-1.0/Gio.gir. PyGObject reads from this to give you the bindings at runtime.
|
# ? Nov 11, 2012 22:22 |
|
Suspicious Dish posted:That's for PyGTK, which is dead. It's certainly been confusing a lot of people, so I'll see if I can have someone who maintains the PyGTK website help people find more accurate documentation. The thing is that we don't have generated documentation for Python yet, but we're working on it, I swear. Ok, understood. Thanks for all the info, I got it working just like I needed.
|
# ? Nov 11, 2012 23:26 |
|
BigRedDot posted:A few weeks ago my company put on PyData NYC, a conference dedicated to data analytics with python. Authors and contributors of numpy, scipy, pandas, pytables, ipython, and other projects all gave great talks to several hundred attendees. Today all the talks were made available on Vimeo, for anyone interested! I've got to echo the thanks for this. I've been reading a book about interfacing Python with R as well as messing with time series data in MongoDB, so this has been a great series to watch.
|
# ? Nov 12, 2012 10:55 |
|
Stupid question that I'm too dumb to figure out on my own. I'm trying to pick up the basics of page scraping with Python, and Google has to screw everything up by having their stock price ref id be different on every stock. If I'm looking for this string: <td class=price><span id=ref_12441984_l> why doesn't this work? re.search(b'class=price><span id=(.*?)>', content)
|
# ? Nov 13, 2012 23:11 |
|
Pudgygiant posted:Stupid question that I'm too dumb to figure out on my own. I'm trying to pick up the basics of page scraping with Python, and Google has to screw everything up by having their stock price ref id be different on every stock. If I'm looking for this string: This isn't exactly answering your question, but parsing html using regexes is just a really bad idea. Use something like scrapy, or a parsing library like pyquery or lxml.html. (see here for why regexes are bad in this case)
|
# ? Nov 13, 2012 23:19 |
|
Sailor_Spoon posted:(see here for why regexes are bad in this case) I've read that so many times that I knew in my bones what you were linking to without even looking at the url, and I still clicked it and read it again because it still makes me laugh.
|
# ? Nov 13, 2012 23:45 |
|
I can't even put into words how dumb that page makes me feel. Probably because I'm too dumb to know the words.
|
# ? Nov 13, 2012 23:51 |
|
Pudgygiant posted:I can't even put into words how dumb that page makes me feel. Probably because I'm too dumb to know the words. To address your actual issue, because as bad as regexes are for html, we've all done it, I get the following: Python code:
|
# ? Nov 14, 2012 00:09 |
|
That's exactly what I want, and drat if it doesn't work fine doing it that way. Now I have to figure out why it doesn't when it's crawling the whole page Thanks for looking at it dude.
|
# ? Nov 14, 2012 00:13 |
|
Pudgygiant posted:Now I have to figure out why it doesn't when it's crawling the whole page
|
# ? Nov 14, 2012 08:34 |
|
I'm glad you liked the videos of the talks. We are doing another PyData on the west coast in March during the PyCon Sprints, I'll post those when they are available, too!
|
# ? Nov 14, 2012 10:28 |
|
Pudgygiant posted:That's exactly what I want, and drat if it doesn't work fine doing it that way. Now I have to figure out why it doesn't when it's crawling the whole page Just a guess, but is it possible that the since the regex is using ".*", it's being too greedy? How about this: Python code:
|
# ? Nov 14, 2012 23:53 |
|
Wait guys, hold on, give me a few minutes to get my square peg.
|
# ? Nov 14, 2012 23:58 |
|
If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes.
|
# ? Nov 15, 2012 00:00 |
|
It has been stated before, but by god don't parse HTML with regex, ever. Use lxml and be done with it. Your regex is going to break at some point and there is simply no escaping that fact. In the simplest cases of the two earlier examples, what happens if the id - just to prove a point - is "test<script>alert('lol');</script>"? The regexes would've broken. So. Don't use regex. Use lxml.
|
# ? Nov 15, 2012 00:12 |
|
Sailor_Spoon posted:This isn't exactly answering your question, but parsing html using regexes is just a really bad idea. Use something like scrapy, or a parsing library like pyquery or lxml.html. I laugh so loving hard every single time I see that page, even though I know it's coming.
|
# ? Nov 15, 2012 02:18 |
|
Being able to use Python like that in Pig is tastic
|
# ? Nov 15, 2012 04:53 |
|
So I'm trying to get better at unit testing, and one thing that gives me a hard time, is dealing with external dependencies. For instance I want to test a piece of code that relies on using Popen, but I want to mock up Popen itself so that I'm only testing the function. From what I can tell, patch is supposed to let me do that. I've been trying to get it to work, but it still seems to call the original function. Class that I want to test: Python code:
Python code:
code:
|
# ? Nov 16, 2012 03:33 |
|
Thern posted:Testing woes Looking at it, my guess is you should change @patch("subprocess.Popen") to @patch("popen"). I'm not familiar with this testing framework though so that could be entirely off base.
|
# ? Nov 16, 2012 08:12 |
|
Either change MyClass to use subprocess.Popen, or change your @patch decorator to patch myclass.Popen (where 'myclass' is whatever namespace MyClass is in). Otherwise MyClass will still reference the 'real' Popen, and not the patched version. See http://www.voidspace.org.uk/python/mock/patch.html#where-to-patch
|
# ? Nov 16, 2012 14:59 |
|
Changing MyClass to use subprocess.Popen instead of Popen did the trick for me. I always feel bad when I get stuck on namespace issues. Thanks a lot for your help guys.
|
# ? Nov 16, 2012 15:44 |
|
I find myself doing this kind of thing a lot:code:
|
# ? Nov 17, 2012 02:52 |
|
FoiledAgain posted:Basically, I have a list of objects passed to me (I don't always know anything about them or how many are in the list, but I do know that they all have a .name attribute) and I need to turn it into a dictionary where the keys are the name attribute and the values are the objects themselves. Is there some dictionary method I could use to make this a single line? (It's not that I need a one-liner, I'm just a low-intermediate programmer wondering about new ways to do things.) dict() will accept a sequence of (key, value) tuples so you can do something like d = dict((i.name, i) for i in getInventory()).
|
# ? Nov 17, 2012 03:01 |
|
Nippashish posted:dict() will accept a sequence of (key, value) tuples so you can do something like d = dict((i.name, i) for i in getInventory()). This would work as well: d = {i.name: i for in getInventory()}
|
# ? Nov 17, 2012 03:05 |
|
edit: Never mind, I basically duplicated Nippashish's answer.
Analytic Engine fucked around with this message at 03:19 on Nov 17, 2012 |
# ? Nov 17, 2012 03:16 |
|
Nippashish posted:dict() will accept a sequence of (key, value) tuples so you can do something like d = dict((i.name, i) for i in getInventory()). Oh that's easy. Thanks! Thanks to accipter too. I tried dict.fromkeys([x.name for x in inventory], inventory) but that gave me the unexpected result that the entire inventory was the value for each .name key.
|
# ? Nov 17, 2012 03:22 |
|
accipter posted:This would work as well: It's also faster (probably trivially)
|
# ? Nov 17, 2012 07:38 |
|
Bunny Cuddlin posted:It's also faster (probably trivially) I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about.
|
# ? Nov 17, 2012 08:40 |
|
FoiledAgain posted:I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about. If I read correctly, the literal syntax is 1.6 to 6 times faster, depending on how many entries are in the dict you're constructing. But the speedup is just in the dict construction. Unless you're creating a bunch of them in an inner loop it almost assuredly won't be a noticeable slowdown, and even then it might not be. In this case there's no real reason not to use the literal syntax (that I'm aware of), but as with all optimizations like this, if you're worrying about it before you profile you're wasting your time.
|
# ? Nov 17, 2012 09:01 |
|
FoiledAgain posted:I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about. I don't think many people use the paramaterless dict() list() int() str() constructors, rather than just using the empty type versions of them.
|
# ? Nov 17, 2012 09:17 |
|
FoiledAgain posted:I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about. If you ever get to the point where you're swapping {}s with dict()s to squeeze the last 1% out of your program, chances are you should be rewriting some parts in a C module or something.
|
# ? Nov 17, 2012 11:02 |
|
Ok, this is the ugly rear end amateur hour script I have after switching to Yahoo finance. So much easier to pull down a csv than it is to scrape.code:
|
# ? Nov 17, 2012 18:49 |
|
Pudgygiant posted:Ok, this is the ugly rear end amateur hour script I have after switching to Yahoo finance. So much easier to pull down a csv than it is to scrape. I don't really know what this is doing, you appear to be loading a CSV as a normal text file and parsing it manually and then using csv to write out a CSV instead of using it for input as well. You are also doing a whole lot of string replacements of whatever file you are pulling from Yahoo Finance, are those CSV files also? You are doing a whole lot of things manually that Python should be able to do for you.
|
# ? Nov 17, 2012 19:13 |
|
Yeah, it pulls down a couple different CSVs from Yahoo, cleans them up, and consolidates the information for every stock on NASDAQ. This is just determining market cap, because the option for market cap gives it in K, M, or B so it was easier to do the math. I probably reinvented the wheel on some of it but doing a bunch of string replaces seemed easier than learning a new API when I'm on the baby's first project phase of learning Python, and I'm leasing an 8 core Xeon server with 16gb of RAM for other unrelated things so efficiency isn't exactly key.
|
# ? Nov 17, 2012 20:31 |
|
It's not just efficiency, it's correctness and readability. The csv module is dead simple for input as well as output.
|
# ? Nov 17, 2012 20:56 |
|
|
# ? May 9, 2024 11:01 |
|
Pudgygiant posted:On a scale of 1-10, how terrible is it? Keeping in mind that I'm less than functionally retarded when it comes to programming. Here's some stuff that I noticed while trying to figure your script out.
This is actually not too terrible, in my opinion, and it would not take much to make it reasonably "natural" Python - the problems are mostly to do with not taking advantage of some stuff that's already in the language and library. Splitting the code into several functions would also help a lot.
|
# ? Nov 18, 2012 00:45 |