Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›13 »

tef: May 30, 2004; -> some l-system crap ->

Bonus posted:

So almost a double improvement by using generators.

Now with less lambda:

code:

for i in xrange(1,10000): 
     if i == sum(j for j in xrange(1,i) if i%j == 0):
             print i

Edit: We could always try running it in iron python.

# ¿ Nov 8, 2008 18:13

Adbot: ADBOT LOVES YOU

# ¿ May 12, 2024 12:33

tef: May 30, 2004; -> some l-system crap ->

If you're using beautiful soup still, try using lxml instead. You can use it to parse html and xml quickly, and run xpath over it.

# ¿ Nov 15, 2008 20:40

tef: May 30, 2004; -> some l-system crap ->

king_kilr posted:

I'm not talking about the BeautifulSoup stuff.

It was merely an aside

- I found lxml nicer to use.

Habnabit posted:

Sure, except lxml can segfault with exceptionally malformed HTML.

Can you elaborate on this - we've migrated a scraper library from beautiful soup to lxml and it has been pretty solid so far, but it would not be good to try to run the transition in reverse

# ¿ Nov 16, 2008 04:44

tef: May 30, 2004; -> some l-system crap ->

Habnabit posted:

If subpackage/foo.py imports subpackage/bar.py as 'import bar' then it won't work. __future__ imports also only affect the module that they're imported in. Try adding 'from __future__ import absolute_import' to foo.py and doing it.

Didn't he say from subpackage import bar in subpackage/foo.py

Also you should be able to do from . import bar I think?

# ¿ Nov 20, 2008 15:04

tef: May 30, 2004; -> some l-system crap ->

Kire posted:

Typing out the examples and making windows, buttons, status bars, is a fun diversion from trying to figure out classes and modules.

The turtle module is fun too :3:

# ¿ Nov 20, 2008 15:35

tef: May 30, 2004; -> some l-system crap ->

Kire posted:

I haven't gotten to Turtle mode yet, what's that? Part of Tkinter?

Here: http://www.python.org/doc/2.5.2/lib/module-turtle.html

The print change was documented in a pep:

http://www.python.org/dev/peps/pep-3105/

And so is the new with .. as statement:

http://www.python.org/dev/peps/pep-0343/

tef fucked around with this message at 17:27 on Nov 20, 2008

# ¿ Nov 20, 2008 17:18

tef: May 30, 2004; -> some l-system crap ->

Mashi posted:

code:

from __future__ import print_function

Who needs imports when you can do __import__("sys").stdout.write("foo") :c00l:

# ¿ Nov 22, 2008 14:19

tef: May 30, 2004; -> some l-system crap ->

lxml also works pretty well, but I did have to wrap it to make it more pleasant to deal with (namespaces)2.

it supports all of xpath, and you can even get xpath with regexes working.

(elementtree api is not my friend)

Infact, here is an lxml wrapper i've been using*** - it also has a beautifulsoup compatibility method too:

http://secretvolcanobase.org/~tef/lxml/

*** A very cut down version - I actually use a slightly different api, and a lot more html processing shortcuts (like extracting form values)

tef fucked around with this message at 12:10 on Nov 27, 2008

# ¿ Nov 27, 2008 10:10

tef: May 30, 2004; -> some l-system crap ->

outlier posted:

Which parts don't you like? I ask because I've *coff* wrapped elementtree myself to compensate for some deficiencies.

Hard to inherit from or extend without composition, and having to handle the namespaces by hand.

For example: At work, we get it to throw exceptions when an xpath doesn't match - dieing early has helped us find a number of issues in the screen scrapers.

# ¿ Nov 27, 2008 18:21

tef: May 30, 2004; -> some l-system crap ->

chemosh6969 posted:

I'm interested it learning how to use python to search for and download movie info from places like imdb.com and allmovie.com.

Where's the best place to go to start learning?

python Screenscraper reporting in :toot:

Downloading stuff from internet:

urllib2 is pretty good, but pycurl is a little bit more featured (but more awkward to use). I would use the former first unless you *really* need transparent gzip decompression or https proxy support.

Parsing html:

BeautifulSoup does the job, but it is slow and it doesn't like XML all that well either. ElementTree can do XML parsing, but I personally find the api clunky but usable. Don't use regexes. lxml uses the element tree api, supports xpath over html and xml. it's quite a nice package with a cruddy api.

if you know xpath already use lxml, it's good enough - otherwise use beautiful soup.

# ¿ Nov 29, 2008 12:23

tef: May 30, 2004; -> some l-system crap ->

Generate SVG if you want vector output, but you may find that the python imagine library does enough for your needs.

# ¿ Dec 12, 2008 04:51

tef: May 30, 2004; -> some l-system crap ->

nbv4 posted:

a "can't compare date and datetime objects" on the last line. Whats going on here?

date and datetime are seperate objects, date doesn't include a time, and datetime does.

helpful, eh?

from the python docs

quote:

In other words, date1 < date2 if and only if date1.toordinal() < date2.toordinal(). In order to stop comparison from falling back to the default scheme of comparing object addresses, date comparison normally raises TypeError if the other comparand isn’t also a date object. However, NotImplemented is returned instead if the other comparand has a timetuple() attribute. This hook gives other kinds of date objects a chance at implementing mixed-type comparison. If not, when a date object is compared to an object of a different type, TypeError is raised unless the comparison is == or !=. The latter cases return False or True, respectively.

I would assume that the reason you can't compare date and datetimes, is that the start time of a date can vary from culture to culture.

# ¿ Dec 25, 2008 22:10

tef: May 30, 2004; -> some l-system crap ->

The python tutorial is pretty good, and the euler problems are also pretty good to make sure you have the fundamentals sorted.

A classic exercise is to re-write unix utilities like grep, wc, cat, tail, rev and sort in python.

# ¿ Jan 5, 2009 20:22

tef: May 30, 2004; -> some l-system crap ->

I would highly recommend using lxml, I have found it a joy to use compared to elementtree

http://codespeak.net/lxml/parsing.html

quote:

>>> tree = etree.parse("doc/test.xml")

# ¿ Jan 6, 2009 17:37

tef: May 30, 2004; -> some l-system crap ->

You can use a tuple as a hash key.

# ¿ Jan 7, 2009 00:27

tef: May 30, 2004; -> some l-system crap ->

IntoTheNihil posted:

Is there anything wrong with learning Python in 3.0 as opposed to 2.6?

Yes, none of the libraries or third party things will support it yet. The leap to 3.0 from 2.6 isn't really that much - it's only to allow backwards incomatible changes.

(Personally, at work, we're still using 2.5 at the moment).

3.0 isn't going to happen overnight, and I imagine most people won't switch until 3.1 at least - there are still changes that need to be refined, and the perfomance of 3.0 is slighly less than 2.6 at the moment.

Learn the version of python you're most likely to encounter, but you can do future imports to bring most of the parts of 3.0 you want in.

# ¿ Jan 7, 2009 00:47

tef: May 30, 2004; -> some l-system crap ->

lxml does validation

# ¿ Jan 27, 2009 17:47

tef: May 30, 2004; -> some l-system crap ->

supster posted:

I'm sorry for asking probably previously-answered questions that no one likes to reanswer again... but:

I think we need to put this in the first post.

quote:

(1) If I'm starting with Python, should I jump into 3.0 or 2.6? How common is it to not be able to find a 3.0 library for a common task that exists for 2.6?

Python 2.6 and Python 3.0 are not really that different, and learning one is not significantly harder than learning both. You can install both of them, too.

Still, it would be best to learn from 2.6 - there will be more documentation and libraries availiable.

Even so it would be best to do with an aim towards python 3 - avoiding deprecated features (use the -3 warning flag)

Python 3 is mainly a release to break backwards compatibility, many changes have made it into 2.6 also.

Python 3.1 will be out soon with the io library re-written in c

# ¿ Jan 29, 2009 10:12

tef: May 30, 2004; -> some l-system crap ->

code:

out = {}
for word in wlist:
    first_letter = word[0]
    if first_letter in out:
        out[first_letter].append(word)
    else
        out[first_letter] = [word]

# ¿ Feb 10, 2009 06:25

tef: May 30, 2004; -> some l-system crap ->

Has anyone used cinpy: http://www.cs.tut.fi/~ask/cinpy/

# ¿ Feb 12, 2009 04:49

tef: May 30, 2004; -> some l-system crap ->

politicorific posted:

code:

def check():
    a = "reader"
    if a[-1:]=="r":
        a=a[0:-1]
        a=a+"R"

code:

>>> transform = { 'a' : 'B' }
>>> def check(foo):
...     if foo[-1] in transform:
...             foo = foo[0:-1]+transform[foo[-1]]
...     return foo
... 
>>> print check("toota")
tootB
>>>

# ¿ Feb 12, 2009 08:14

tef: May 30, 2004; -> some l-system crap ->

There are two types of functions at work here: plain functions and generators.

A generator is a function you can call repeatedly to get new results:

code:

>>> def expr(foo):
...     for bar in foo:
...             yield bar
... 
>>> for thing in expr("hello"):
...     print thing
... 
h
e
l
l
o

Here expr is a generator - it uses the yield keyword. yield means 'return, but come back here' in python.

code:

>>> expr
<function expr at 0xb7d6c41c>
>>> expr("toot")
<generator object at 0xb7d77d6c>

Notice how if we call expr normally, we get a generator. this is why we use the for statement above, to go through all the values.

We can even call the generator manually

code:

>>> g = expr("toot")
>>> g.next()
't'
>>> g.next()
'o'

But normally we use for foo in generator() to iterate over the results.
There are a bunch of other cool things about generators

# ¿ Feb 12, 2009 20:54

tef: May 30, 2004; -> some l-system crap ->

ATLbeer posted:

I had a similar thought a while ago (and I wasn't the first one, there are modules out there that people have written already for it) to pickle python functions and send them over the wire with parameters to "workers" to have a distributed work farm in processing large amounts of information.

http://discoproject.org/ ?

quote:

Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.

The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data.

# ¿ Feb 24, 2009 02:24

tef: May 30, 2004; -> some l-system crap ->

Python variables are references, not values. You want to copy the variable before adding it to the list in this instance. Can you post some code?

Also: http://docs.python.org/library/copy.html

Here is an example of references - both b and a refer to the *same* list. = does not copy the value.

code:

>>> a= []
>>> a.append(1)
>>> a
[1]
>>> b= a
>>> b
[1]
>>> b.append(2)
>>> a
[1, 2]
>>>

Example 2:

code:

>>> a= []; c=[]
>>> c.append(a)
>>> c.append(a)
>>> a.append(1)
>>> c
[[1], [1]]


>>> import copy
>>> a= []; c=[]
>>> c.append(copy.copy(a))
>>> c.append(copy.copy(a))
>>> a.append(1)
>>> c
[[], []]

tef fucked around with this message at 06:10 on Feb 27, 2009

# ¿ Feb 27, 2009 06:07

tef: May 30, 2004; -> some l-system crap ->

I assume listEntry is some sort of object?

You are adding the same object over and over to the list in ranges. Something like this is what you want to do :

code:

for line in file:
     listEntry = NewSomeKindOfObject()
     listEntry.name = line[ searchPos_a + 9 : searchPos_b ]
     ranges.append(listEntry)

Here is another example for you:

Here we are chaning the contents of the list in each loop, and adding that list to
the output:

code:

>>> a = [0]; o = []
>>> for x in range(1,10):
...     a[0] = x
...     o.append(a)
... 
>>> o
[[9], [9], [9], [9], [9], [9], [9], [9], [9]]

Here we are creating a *new* list and adding that to the output:

code:

>>> for x in range(1,10):
...     a = [x] 
...     o.append(a)
... 
>>> o
[[1], [2], [3], [4], [5], [6], [7], [8], [9]]

>>>

# ¿ Feb 27, 2009 06:18

tef: May 30, 2004; -> some l-system crap ->

Zombywuf posted:

Can someone please explain to me WTF is going on here:

I thought I was the only one there who understood python unicode :v:

# ¿ Mar 6, 2009 21:23

tef: May 30, 2004; -> some l-system crap ->

Zombywuf posted:

IT'S TWO THOUSAND AND loving NINE PEOPLE! UNICODE IS A SOLVED PROBLEM.

Just like Soap!

# ¿ Mar 6, 2009 21:53

tef: May 30, 2004; -> some l-system crap ->

Yikes.

Most platforms have an existing package system (for example dpkg, rpm and the tools built on top of them), and it's a pity to try and implement a system on top
of it.

# ¿ Mar 7, 2009 03:33

tef: May 30, 2004; -> some l-system crap ->

I used pyodbc last time instead of pymssql, with a little more success.

http://code.google.com/p/pyodbc/

I dunno maybe there's still some working examples from the last time.

# ¿ Mar 13, 2009 12:24

tef: May 30, 2004; -> some l-system crap ->

Does it have to be in python? There are good messaging systems that python can use.

For example: http://www.rabbitmq.com/

# ¿ Mar 14, 2009 00:19

tef: May 30, 2004; -> some l-system crap ->

Can anyone explain this?

code:

>>> def foo(x):
...     return not(x)+1
... 
>>> foo(1)
False
>>> 
>>> def foo(x):
...     return (not(x))+1
... 
>>> foo(1)
1
>>>

A little look at the bytecode is a little more revealing:

code:

>>> dis.dis(foo)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 BINARY_ADD          
              7 UNARY_NOT           
              8 RETURN_VALUE    

>>> dis.dis(foo)
  2           0 LOAD_FAST                0 (x)
              3 UNARY_NOT           
              4 LOAD_CONST               1 (1)
              7 BINARY_ADD          
              8 RETURN_VALUE

It seems this is translating not(x)+1 to not(x+1)

# ¿ Mar 21, 2009 01:17

tef: May 30, 2004; -> some l-system crap ->

Ooooor I should be using not x and not not(x)

tef fucked around with this message at 01:27 on Mar 21, 2009

# ¿ Mar 21, 2009 01:18

tef: May 30, 2004; -> some l-system crap ->

not(x) is parsed as not (x)

so 1.and(2) is 1.0 and (2)

# ¿ Mar 21, 2009 15:14

tef: May 30, 2004; -> some l-system crap ->

The pycon videos are up:

http://pycon.blip.tv/file/1947354/

# ¿ Apr 3, 2009 12:51

tef: May 30, 2004; -> some l-system crap ->

I know this will sound facile, but have you looked at the standard documentation for python?

http://docs.python.org/index.html

It has the complete language reference:

http://docs.python.org/reference/index.html

And the complete library reference:

http://docs.python.org/library/index.html

# ¿ Apr 17, 2009 19:38

tef: May 30, 2004; -> some l-system crap ->

Use the urlfetch api for google app engine:

http://code.google.com/appengine/docs/python/urlfetch/overview.html

urllib2 is awful.

# ¿ May 2, 2009 04:32

tef: May 30, 2004; -> some l-system crap ->

Well if it is disabled in the api, I imagine what you're trying to do won't work on google app engine at all.

The local api will just be wrappers around the standard library, but the implementations at google have been modified for the sandbox.

Edit: What you are doing is trying to circumvent the security policy by using an unsupported api. Which is why it is not working, and will never work.

# ¿ May 2, 2009 06:29

tef: May 30, 2004; -> some l-system crap ->

The Evan posted:

But it kept giving me errors about concatenating strings and ints and such.

code:

>>> i = 3
>>> print "i is %d" % i
i is 3
>>> i = 3.0
>>> print "i is %f" % i
i is 3.000000
>>> i = "three"
>>> print "i is %s" % i
i is three
>>> a,b = 1,2
>>> print "a is %d, b is %d" % (a,b)
a is 1, b is 2
>>>

# ¿ May 5, 2009 12:39

tef: May 30, 2004; -> some l-system crap ->

Sylink posted:

Can someone explain parsing with elementtree to me, it makes no sense and the effbot stuff seems to all be about making xml, I just want to get information.

Like if I have an element <title>, how do I get the actual content between the tags?

gently caress element tree, learn xpath - it's a standard simple query language for xml documents that is supported on a number of platforms, and it makes xml significantly less painful.

http://codespeak.net/lxml/xpathxslt.html

http://en.wikipedia.org/wiki/XPath_1.0
if you have

code:

<products>
  <product>
    <title lang="en">foo</title>
    <price>0.99</price>
  </product>
</products>

You can access all of the titles using /products/product/title, and attributes with /products/product/title/@lang, as well as indexing the entries: /products/product[1]/title.

It's fairly terse, readable and expressive - and a lot simpler to use than things like dom or sax or object models.

# ¿ May 9, 2009 19:59

Adbot: ADBOT LOVES YOU

# ¿ May 12, 2024 12:33

tef: May 30, 2004; -> some l-system crap ->

Icesler posted:

I have never even glanced at a programming language before. So I have decided that I am going to learn python. So far I have just done some really simple basic stuff like the 'Hello World!' and made a number guessing game.

Is it safe to say that it ramps up in difficulty rather quickly?

If you take on a hard project, yes.

Pick easy projects that are fairly simple to complete.

A standard and simple task is to re-implement classic unix utilities like sort or grep.

Personally, I would go and look at the turtle module and draw fractals.

# ¿ May 20, 2009 18:39

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›13 »