Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

BigRedDot posted:

A few weeks ago my company put on PyData NYC, a conference dedicated to data analytics with python. Authors and contributors of numpy, scipy, pandas, pytables, ipython, and other projects all gave great talks to several hundred attendees. Today all the talks were made available on Vimeo, for anyone interested!

http://vimeo.com/channels/pydata/videos/sort:preset/format:detail

I know I thanked you already for posting this, but I just got around to watching the AppNexus talk and stumbled upon so many things immediately relevant to my current time-series work that I needed to thank you again for posting this (and your company for hosting the conference). So thanks again!

# ? Nov 11, 2012 21:27

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 11:01

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Suspicious Dish posted:

You can use Gio to do the copy directly, which will use the credentials from the keyring.

Python code:

from gi.repository import Gio

source = Gio.File.new_for_path("/home/thermopyle/garbage/virus.dll")
target = Gio.File.new_for_uri("smb:///ebay/C$/Windows/system32/kernel32.dll")

source.copy(target, Gio.FileCopyFlags.NONE, None, None)

Perfect. Thanks!

edit: not quite perfect. Your method signature is missing the second argument for a progress_callback. According to this documentation you can pass None for that argument. But when I pass None I get:

code:

TypeError: Argument 2 does not allow None as a value

edit2: Ok, I was wrong, but when I use your method signature it tells me it takes 6 arguments...

Thermopyle fucked around with this message at 21:57 on Nov 11, 2012

# ? Nov 11, 2012 21:45

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

There are seven arguments in the C version of the function. One is the GError out pointer, which is transformed into a Python exception. The other is the progress_callback_data, which is used by most bindings internally to store a copy to the current scope. But what I forgot is that PyGObject exposes that argument, and stores the scope and the argument passed in. It exposes that argument for compatibility reasons, I think. You should always pass None for it.

# ? Nov 11, 2012 22:13

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Suspicious Dish posted:

There are seven arguments in the C version of the function. One is the GError out pointer, which is transformed into a Python exception. The other is the progress_callback_data, which is used by most bindings internally to store a copy to the current scope. But what I forgot is that PyGObject exposes that argument, and stores the scope and the argument passed in. It exposes that argument for compatibility reasons, I think. You should always pass None for it.

Ok, gotcha.

What's the deal with this documentation, then? http://www.pygtk.org/docs/pygobject/class-giofile.html#method-giofile--copy

Is it just plain wrong, or is there more current documentation that I can't find?

# ? Nov 11, 2012 22:18

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

That's for PyGTK, which is dead. It's certainly been confusing a lot of people, so I'll see if I can have someone who maintains the PyGTK website help people find more accurate documentation. The thing is that we don't have generated documentation for Python yet, but we're working on it, I swear.

I guess the best resource right now is the gir file in /usr/share/gir-1.0/Gio.gir. PyGObject reads from this to give you the bindings at runtime.

# ? Nov 11, 2012 22:22

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Suspicious Dish posted:

That's for PyGTK, which is dead. It's certainly been confusing a lot of people, so I'll see if I can have someone who maintains the PyGTK website help people find more accurate documentation. The thing is that we don't have generated documentation for Python yet, but we're working on it, I swear.

I guess the best resource right now is the gir file in /usr/share/gir-1.0/Gio.gir. PyGObject reads from this to give you the bindings at runtime.

Ok, understood. Thanks for all the info, I got it working just like I needed.

# ? Nov 11, 2012 23:26

Captain Capacitor: Jan 21, 2008; The code you say?

BigRedDot posted:

A few weeks ago my company put on PyData NYC, a conference dedicated to data analytics with python. Authors and contributors of numpy, scipy, pandas, pytables, ipython, and other projects all gave great talks to several hundred attendees. Today all the talks were made available on Vimeo, for anyone interested!

http://vimeo.com/channels/pydata/videos/sort:preset/format:detail

I've got to echo the thanks for this. I've been reading a book about interfacing Python with R as well as messing with time series data in MongoDB, so this has been a great series to watch.

# ? Nov 12, 2012 10:55

Pudgygiant: Apr 8, 2004; Garnet and black? More like gold and blue or whatever the fuck colors these are

Stupid question that I'm too dumb to figure out on my own. I'm trying to pick up the basics of page scraping with Python, and Google has to screw everything up by having their stock price ref id be different on every stock. If I'm looking for this string:
<td class=price><span id=ref_12441984_l>
why doesn't this work?
re.search(b'class=price><span id=(.*?)>', content)

# ? Nov 13, 2012 23:11

good jovi: Dec 11, 2000; 'm pro-dickgirl, and I VOTE!

Pudgygiant posted:

Stupid question that I'm too dumb to figure out on my own. I'm trying to pick up the basics of page scraping with Python, and Google has to screw everything up by having their stock price ref id be different on every stock. If I'm looking for this string:
<td class=price><span id=ref_12441984_l>
why doesn't this work?
re.search(b'class=price><span id=(.*?)>', content)

This isn't exactly answering your question, but parsing html using regexes is just a really bad idea. Use something like scrapy, or a parsing library like pyquery or lxml.html.

(see here for why regexes are bad in this case)

# ? Nov 13, 2012 23:19

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Sailor_Spoon posted:

(see here for why regexes are bad in this case)

I've read that so many times that I knew in my bones what you were linking to without even looking at the url, and I still clicked it and read it again because it still makes me laugh.

# ? Nov 13, 2012 23:45

Pudgygiant: Apr 8, 2004; Garnet and black? More like gold and blue or whatever the fuck colors these are

I can't even put into words how dumb that page makes me feel. Probably because I'm too dumb to know the words.

# ? Nov 13, 2012 23:51

good jovi: Dec 11, 2000; 'm pro-dickgirl, and I VOTE!

Pudgygiant posted:

I can't even put into words how dumb that page makes me feel. Probably because I'm too dumb to know the words.

To address your actual issue, because as bad as regexes are for html, we've all done it, I get the following:

Python code:

In [3]: content='<td class=price><span id=ref_12441984_l>'

In [4]: m = re.search('class=price><span id=(.*?)>', content)

In [5]: m.groups()
Out[5]: ('ref_12441984_l',)

Is that not what you wanted?

# ? Nov 14, 2012 00:09

Pudgygiant: Apr 8, 2004; Garnet and black? More like gold and blue or whatever the fuck colors these are

That's exactly what I want, and drat if it doesn't work fine doing it that way. Now I have to figure out why it doesn't when it's crawling the whole page :bang:

Thanks for looking at it dude.

# ? Nov 14, 2012 00:13

Yay: Aug 4, 2007

Pudgygiant posted:

Now I have to figure out why it doesn't when it's crawling the whole page

double check your regular expression is searching in multi-line mode, rather than single line. Or make sure your input string is on one line, I guess.

# ? Nov 14, 2012 08:34

BigRedDot: Mar 6, 2008

I'm glad you liked the videos of the talks. We are doing another PyData on the west coast in March during the PyCon Sprints, I'll post those when they are available, too!

# ? Nov 14, 2012 10:28

NadaTooma: Aug 24, 2004; The good thing is that everyone around you has more critical failures in combat, the bad thing is - so do you!

Pudgygiant posted:

That's exactly what I want, and drat if it doesn't work fine doing it that way. Now I have to figure out why it doesn't when it's crawling the whole page

Thanks for looking at it dude.

Just a guess, but is it possible that the since the regex is using ".*", it's being too greedy? How about this:

Python code:

>>> import re
>>> content='<td class=price><span id=ref_12441984_l>'
>>> m = re.search('class=price><span id=([^>]*?)>', content)
>>> m.groups()
('ref_12441984_l',)

It's a minor tweak, using "[^>]" to match non-closing-bracket characters, rather than all possible characters.

# ? Nov 14, 2012 23:53

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Wait guys, hold on, give me a few minutes to get my square peg.

# ? Nov 14, 2012 23:58

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes.

# ? Nov 15, 2012 00:00

geonetix: Mar 6, 2011

It has been stated before, but by god don't parse HTML with regex, ever. Use lxml and be done with it. Your regex is going to break at some point and there is simply no escaping that fact. In the simplest cases of the two earlier examples, what happens if the id - just to prove a point - is "test<script>alert('lol');</script>"? The regexes would've broken.

So. Don't use regex. Use lxml.

# ? Nov 15, 2012 00:12

Murodese: Mar 6, 2007; Think you've got what it takes?
We're looking for fine Men & Women to help Protect the Australian Way of Life.

Become part of the Legend. Defence Jobs.

Sailor_Spoon posted:

This isn't exactly answering your question, but parsing html using regexes is just a really bad idea. Use something like scrapy, or a parsing library like pyquery or lxml.html.

(see here for why regexes are bad in this case)

I laugh so loving hard every single time I see that page, even though I know it's coming.

# ? Nov 15, 2012 02:18

Hed: Mar 31, 2004; Fun Shoe

Being able to use Python like that in Pig is :swoon:

tastic

# ? Nov 15, 2012 04:53

Thern: Aug 12, 2006; Say Hello To My Little Friend

So I'm trying to get better at unit testing, and one thing that gives me a hard time, is dealing with external dependencies. For instance I want to test a piece of code that relies on using Popen, but I want to mock up Popen itself so that I'm only testing the function.

From what I can tell, patch is supposed to let me do that. I've been trying to get it to work, but it still seems to call the original function.

Class that I want to test:

Python code:

class MyClass():
    def MyMethod(self):
        something = "MyStuff"
        return Popen(something)

And here is my unit test:

Python code:

class TestMyClass(TestCase):
    @patch("subprocess.Popen")
    def test_MyMethod(self,popen):
        popen.return_value = 1
        test = MyClass()
        test.MyMethod()
        popen.assert_called_with_once()

This gives me the following error:

code:

WindowsError: [Error 2] The system cannot find the file specified

Which means that it is trying to use the original Popen, and not the mock Popen that I want it to. Any ideas of what I'm doing wrong?

# ? Nov 16, 2012 03:33

Titan Coeus: Jul 30, 2007; check out my horn

Thern posted:

Testing woes

Looking at it, my guess is you should change @patch("subprocess.Popen") to @patch("popen"). I'm not familiar with this testing framework though so that could be entirely off base.

# ? Nov 16, 2012 08:12

Civil Twilight: Apr 2, 2011

Either change MyClass to use subprocess.Popen, or change your @patch decorator to patch myclass.Popen (where 'myclass' is whatever namespace MyClass is in). Otherwise MyClass will still reference the 'real' Popen, and not the patched version. See http://www.voidspace.org.uk/python/mock/patch.html#where-to-patch

# ? Nov 16, 2012 14:59

Thern: Aug 12, 2006; Say Hello To My Little Friend

Changing MyClass to use subprocess.Popen instead of Popen did the trick for me. I always feel bad when I get stuck on namespace issues.

Thanks a lot for your help guys.

# ? Nov 16, 2012 15:44

FoiledAgain: May 6, 2007

I find myself doing this kind of thing a lot:

code:


inventory = getInventory()#returns something like [Object1, Object2, Object3]
d = dict()
for item in inventory:
    d[item.name] = item

Basically, I have a list of objects passed to me (I don't always know anything about them or how many are in the list, but I do know that they all have a .name attribute) and I need to turn it into a dictionary where the keys are the name attribute and the values are the objects themselves. Is there some dictionary method I could use to make this a single line? (It's not that I need a one-liner, I'm just a low-intermediate programmer wondering about new ways to do things.)

# ? Nov 17, 2012 02:52

Nippashish: Nov 2, 2005; Let me see you dance!

FoiledAgain posted:

Basically, I have a list of objects passed to me (I don't always know anything about them or how many are in the list, but I do know that they all have a .name attribute) and I need to turn it into a dictionary where the keys are the name attribute and the values are the objects themselves. Is there some dictionary method I could use to make this a single line? (It's not that I need a one-liner, I'm just a low-intermediate programmer wondering about new ways to do things.)

dict() will accept a sequence of (key, value) tuples so you can do something like d = dict((i.name, i) for i in getInventory()).

# ? Nov 17, 2012 03:01

accipter: Sep 12, 2003

Nippashish posted:

dict() will accept a sequence of (key, value) tuples so you can do something like d = dict((i.name, i) for i in getInventory()).

This would work as well:
d = {i.name: i for in getInventory()}

# ? Nov 17, 2012 03:05

Analytic Engine: May 18, 2009; not the analytical engine

edit: Never mind, I basically duplicated Nippashish's answer.

Analytic Engine fucked around with this message at 03:19 on Nov 17, 2012

# ? Nov 17, 2012 03:16

FoiledAgain: May 6, 2007

Nippashish posted:

dict() will accept a sequence of (key, value) tuples so you can do something like d = dict((i.name, i) for i in getInventory()).

Oh that's easy. Thanks! Thanks to accipter too. I tried dict.fromkeys([x.name for x in inventory], inventory) but that gave me the unexpected result that the entire inventory was the value for each .name key.

# ? Nov 17, 2012 03:22

Bunny Cuddlin: Dec 12, 2004

accipter posted:

This would work as well:
d = {i.name: i for in getInventory()}

It's also faster (probably trivially)

# ? Nov 17, 2012 07:38

FoiledAgain: May 6, 2007

Bunny Cuddlin posted:

It's also faster (probably trivially)

I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about.

# ? Nov 17, 2012 08:40

raminasi: Jan 25, 2005; a last drink with no ice

FoiledAgain posted:

I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about.

If I read correctly, the literal syntax is 1.6 to 6 times faster, depending on how many entries are in the dict you're constructing. But the speedup is just in the dict construction. Unless you're creating a bunch of them in an inner loop it almost assuredly won't be a noticeable slowdown, and even then it might not be. In this case there's no real reason not to use the literal syntax (that I'm aware of), but as with all optimizations like this, if you're worrying about it before you profile you're wasting your time.

# ? Nov 17, 2012 09:01

The Gripper: Sep 14, 2004; i am winner

FoiledAgain posted:

I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about.

It really doesn't make a huge amount of difference. {} is literally an empty dict, [] an empty list, "" an empty string and 0 technically an empty int.

I don't think many people use the paramaterless dict() list() int() str() constructors, rather than just using the empty type versions of them.

# ? Nov 17, 2012 09:17

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

FoiledAgain posted:

I only understood about 1/8 of that, but the take-away message seemed to be that you should use d={} instead of d=dict(). I'm slightly disappointed by this, because it's my habit to use dict() for new dictionaries. But that doesn't look like the best practice. Am I similarly wasting time using list(), int(), str()? That said, I'm happy to learn about dictionary comprehensions, which I somehow didn't know about.

If you ever get to the point where you're swapping {}s with dict()s to squeeze the last 1% out of your program, chances are you should be rewriting some parts in a C module or something.

# ? Nov 17, 2012 11:02

Pudgygiant: Apr 8, 2004; Garnet and black? More like gold and blue or whatever the fuck colors these are

Ok, this is the ugly rear end amateur hour script I have after switching to Yahoo finance. So much easier to pull down a csv than it is to scrape.

code:

import csv
from urllib.request import urlopen

def list_import(filename):
    uservar = filename
    filename = open(filename + ".csv").read()
    filename = filename.replace('"', '')
    filename = filename.replace(', ', '')
    filename = filename.split('\n')
    listlen = len(filename) -1
    del filename[listlen]
    urlbase = 'http://finance.yahoo.com/d/quotes.csv?s='
    listlen = len(filename)
    listnum = 0
    filenamea = []
    while (listnum < listlen):  #getting quotes
        filenamea.append([filename[listnum]])
        filenamea[listnum].append(urlopen(urlbase + filename[listnum] + '&f=a2').read())
        filenamea[listnum][1] = filenamea[listnum][1].decode('UTF-8')
        filenamea[listnum][1] = str(filenamea[listnum][1])
        filenamea[listnum][1] = filenamea[listnum][1].replace('\\r\\n', '')
        filenamea[listnum][1] = filenamea[listnum][1].replace('\r\\n', '')
        filenamea[listnum][1] = filenamea[listnum][1].replace('N/A', '0')
        filenamea[listnum][1] = filenamea[listnum][1].replace(' ', '')
        filenamea[listnum][1] = filenamea[listnum][1].replace(',', '')
        filenamea[listnum][1] = filenamea[listnum][1].replace('.', '')
        filenamea[listnum][1] = int(filenamea[listnum][1])
        currprice = urlopen(urlbase + filename[listnum] + '&f=a').read()
        currprice = currprice.decode('UTF-8')
        currprice = currprice.replace('\r\n', '')
        currprice = currprice.replace(' ', '')
        currprice = currprice.replace(',', '')
        currprice = currprice.replace('.', '')
        currprice = currprice.replace('N/A', '0')
        currprice = float(currprice)
        filenamea[listnum].append(currprice)
        filenamea[listnum].append(filenamea[listnum][1]*filenamea[listnum][2])
        filenamea[listnum][3] = round(filenamea[listnum][3])
        listnum = listnum + 1
    filenameb = []
    listnum = 0
    while (listnum < listlen):
	if filenamea[listnum][1] > 1:
		filenameb.append([filenamea[listnum][0]])
		filenameb[listnum].append(filenamea[listnum][1])
       	listnum = listnum + 1
    myfile = open(uservar + 'vol.csv', "w")
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(filenameb)
    return filenameb

On a scale of 1-10, how terrible is it? Keeping in mind that I'm less than functionally retarded when it comes to programming.

# ? Nov 17, 2012 18:49

OnceIWasAnOstrich: Jul 22, 2006

Pudgygiant posted:

Ok, this is the ugly rear end amateur hour script I have after switching to Yahoo finance. So much easier to pull down a csv than it is to scrape.

On a scale of 1-10, how terrible is it? Keeping in mind that I'm less than functionally retarded when it comes to programming.

I don't really know what this is doing, you appear to be loading a CSV as a normal text file and parsing it manually and then using csv to write out a CSV instead of using it for input as well. You are also doing a whole lot of string replacements of whatever file you are pulling from Yahoo Finance, are those CSV files also? You are doing a whole lot of things manually that Python should be able to do for you.

# ? Nov 17, 2012 19:13

Pudgygiant: Apr 8, 2004; Garnet and black? More like gold and blue or whatever the fuck colors these are

Yeah, it pulls down a couple different CSVs from Yahoo, cleans them up, and consolidates the information for every stock on NASDAQ. This is just determining market cap, because the option for market cap gives it in K, M, or B so it was easier to do the math. I probably reinvented the wheel on some of it but doing a bunch of string replaces seemed easier than learning a new API when I'm on the baby's first project phase of learning Python, and I'm leasing an 8 core Xeon server with 16gb of RAM for other unrelated things so efficiency isn't exactly key.

# ? Nov 17, 2012 20:31

Opinion Haver: Apr 9, 2007

It's not just efficiency, it's correctness and readability. The csv module is dead simple for input as well as output.

# ? Nov 17, 2012 20:56

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 11:01

AlexG: Jul 15, 2004; If you can't solve a problem with gaffer tape, it's probably insoluble anyway.

Pudgygiant posted:

On a scale of 1-10, how terrible is it? Keeping in mind that I'm less than functionally retarded when it comes to programming.

Here's some stuff that I noticed while trying to figure your script out.

It is one giant function that is doing several different jobs. It is often more convenient to separate out the various tasks, which makes them easier to understand, test, and reuse. For example, you start off by getting a list of stock symbols from a file. That could be its own function. Then, it's easier to run your query against a stock list from a different source, or to use the same list to do something different. Equally, you are doing almost the same things to sanitize currprice and filenamea[listnum][1] - assuming these are meant to be the same steps in both cases, they could be extracted out into a single function which you can call as required. It is a bit unclean to have a function which returns a value (filenameb) but also writes that to a file as a side effect. And so on.
The way you reuse the 'filename' variable is confusing. It starts off as a filename, then contains the contents of the file, and finally is a list of strings. Also, 'filenamea' has nothing to do with filenames as far as I can see, except that its entries are indexed by the elements in filename (which is no longer a filename at this point).
In filenamea and filenameb you are basically reinventing a Python dictionary - you've got some keys (the stock symbols) and associated values. Python can take care of this sort of thing for you - there are a lot of language/library features to simplify working with data of this form.
In this version of the code as presented, you don't seem to be using the 2 and 3 indices into the filenamea sublists - things get put there, but never taken out.
You usually do not need to iterate over lists using a counter. Using indices like this can often lead you a bit astray in the way you think about the code. If you start off by writing "for stock in stocks:" then you will be naturally led in a better direction and the code will probably end up cleaner. Then, using comprehensions to build lists or dictionaries will come quite naturally.

This is actually not too terrible, in my opinion, and it would not take much to make it reasonably "natural" Python - the problems are mostly to do with not taking advantage of some stuff that's already in the language and library. Splitting the code into several functions would also help a lot.

# ? Nov 18, 2012 00:45

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »