|
Bonus posted:So almost a double improvement by using generators. Now with less lambda: code:
|
# ¿ Nov 8, 2008 18:13 |
|
|
# ¿ May 12, 2024 12:33 |
|
If you're using beautiful soup still, try using lxml instead. You can use it to parse html and xml quickly, and run xpath over it.
|
# ¿ Nov 15, 2008 20:40 |
|
king_kilr posted:I'm not talking about the BeautifulSoup stuff. It was merely an aside - I found lxml nicer to use. Habnabit posted:Sure, except lxml can segfault with exceptionally malformed HTML. Can you elaborate on this - we've migrated a scraper library from beautiful soup to lxml and it has been pretty solid so far, but it would not be good to try to run the transition in reverse
|
# ¿ Nov 16, 2008 04:44 |
|
Habnabit posted:If subpackage/foo.py imports subpackage/bar.py as 'import bar' then it won't work. __future__ imports also only affect the module that they're imported in. Try adding 'from __future__ import absolute_import' to foo.py and doing it. Didn't he say from subpackage import bar in subpackage/foo.py Also you should be able to do from . import bar I think?
|
# ¿ Nov 20, 2008 15:04 |
|
Kire posted:Typing out the examples and making windows, buttons, status bars, is a fun diversion from trying to figure out classes and modules. The turtle module is fun too
|
# ¿ Nov 20, 2008 15:35 |
|
Kire posted:I haven't gotten to Turtle mode yet, what's that? Part of Tkinter? Here: http://www.python.org/doc/2.5.2/lib/module-turtle.html The print change was documented in a pep: http://www.python.org/dev/peps/pep-3105/ And so is the new with .. as statement: http://www.python.org/dev/peps/pep-0343/ tef fucked around with this message at 17:27 on Nov 20, 2008 |
# ¿ Nov 20, 2008 17:18 |
|
Mashi posted:
Who needs imports when you can do __import__("sys").stdout.write("foo")
|
# ¿ Nov 22, 2008 14:19 |
|
lxml also works pretty well, but I did have to wrap it to make it more pleasant to deal with (namespaces)2. it supports all of xpath, and you can even get xpath with regexes working. (elementtree api is not my friend) Infact, here is an lxml wrapper i've been using*** - it also has a beautifulsoup compatibility method too: http://secretvolcanobase.org/~tef/lxml/ *** A very cut down version - I actually use a slightly different api, and a lot more html processing shortcuts (like extracting form values) tef fucked around with this message at 12:10 on Nov 27, 2008 |
# ¿ Nov 27, 2008 10:10 |
|
outlier posted:Which parts don't you like? I ask because I've *coff* wrapped elementtree myself to compensate for some deficiencies. Hard to inherit from or extend without composition, and having to handle the namespaces by hand. For example: At work, we get it to throw exceptions when an xpath doesn't match - dieing early has helped us find a number of issues in the screen scrapers.
|
# ¿ Nov 27, 2008 18:21 |
|
chemosh6969 posted:I'm interested it learning how to use python to search for and download movie info from places like imdb.com and allmovie.com. python Screenscraper reporting in Downloading stuff from internet: urllib2 is pretty good, but pycurl is a little bit more featured (but more awkward to use). I would use the former first unless you *really* need transparent gzip decompression or https proxy support. Parsing html: BeautifulSoup does the job, but it is slow and it doesn't like XML all that well either. ElementTree can do XML parsing, but I personally find the api clunky but usable. Don't use regexes. lxml uses the element tree api, supports xpath over html and xml. it's quite a nice package with a cruddy api. if you know xpath already use lxml, it's good enough - otherwise use beautiful soup.
|
# ¿ Nov 29, 2008 12:23 |
|
Generate SVG if you want vector output, but you may find that the python imagine library does enough for your needs.
|
# ¿ Dec 12, 2008 04:51 |
|
nbv4 posted:a "can't compare date and datetime objects" on the last line. Whats going on here? date and datetime are seperate objects, date doesn't include a time, and datetime does. helpful, eh? from the python docs quote:In other words, date1 < date2 if and only if date1.toordinal() < date2.toordinal(). In order to stop comparison from falling back to the default scheme of comparing object addresses, date comparison normally raises TypeError if the other comparand isn’t also a date object. However, NotImplemented is returned instead if the other comparand has a timetuple() attribute. This hook gives other kinds of date objects a chance at implementing mixed-type comparison. If not, when a date object is compared to an object of a different type, TypeError is raised unless the comparison is == or !=. The latter cases return False or True, respectively. I would assume that the reason you can't compare date and datetimes, is that the start time of a date can vary from culture to culture.
|
# ¿ Dec 25, 2008 22:10 |
|
The python tutorial is pretty good, and the euler problems are also pretty good to make sure you have the fundamentals sorted. A classic exercise is to re-write unix utilities like grep, wc, cat, tail, rev and sort in python.
|
# ¿ Jan 5, 2009 20:22 |
|
I would highly recommend using lxml, I have found it a joy to use compared to elementtree http://codespeak.net/lxml/parsing.html quote:>>> tree = etree.parse("doc/test.xml")
|
# ¿ Jan 6, 2009 17:37 |
|
You can use a tuple as a hash key.
|
# ¿ Jan 7, 2009 00:27 |
|
IntoTheNihil posted:Is there anything wrong with learning Python in 3.0 as opposed to 2.6? Yes, none of the libraries or third party things will support it yet. The leap to 3.0 from 2.6 isn't really that much - it's only to allow backwards incomatible changes. (Personally, at work, we're still using 2.5 at the moment). 3.0 isn't going to happen overnight, and I imagine most people won't switch until 3.1 at least - there are still changes that need to be refined, and the perfomance of 3.0 is slighly less than 2.6 at the moment. Learn the version of python you're most likely to encounter, but you can do future imports to bring most of the parts of 3.0 you want in.
|
# ¿ Jan 7, 2009 00:47 |
|
lxml does validation
|
# ¿ Jan 27, 2009 17:47 |
|
supster posted:I'm sorry for asking probably previously-answered questions that no one likes to reanswer again... but: I think we need to put this in the first post. quote:(1) If I'm starting with Python, should I jump into 3.0 or 2.6? How common is it to not be able to find a 3.0 library for a common task that exists for 2.6? Python 2.6 and Python 3.0 are not really that different, and learning one is not significantly harder than learning both. You can install both of them, too. Still, it would be best to learn from 2.6 - there will be more documentation and libraries availiable. Even so it would be best to do with an aim towards python 3 - avoiding deprecated features (use the -3 warning flag) Python 3 is mainly a release to break backwards compatibility, many changes have made it into 2.6 also. Python 3.1 will be out soon with the io library re-written in c
|
# ¿ Jan 29, 2009 10:12 |
|
code:
|
# ¿ Feb 10, 2009 06:25 |
|
Has anyone used cinpy: http://www.cs.tut.fi/~ask/cinpy/
|
# ¿ Feb 12, 2009 04:49 |
|
politicorific posted:
code:
|
# ¿ Feb 12, 2009 08:14 |
|
There are two types of functions at work here: plain functions and generators. A generator is a function you can call repeatedly to get new results: code:
code:
We can even call the generator manually code:
There are a bunch of other cool things about generators
|
# ¿ Feb 12, 2009 20:54 |
|
ATLbeer posted:I had a similar thought a while ago (and I wasn't the first one, there are modules out there that people have written already for it) to pickle python functions and send them over the wire with parameters to "workers" to have a distributed work farm in processing large amounts of information. http://discoproject.org/ ? quote:Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
|
# ¿ Feb 24, 2009 02:24 |
|
Python variables are references, not values. You want to copy the variable before adding it to the list in this instance. Can you post some code? Also: http://docs.python.org/library/copy.html Here is an example of references - both b and a refer to the *same* list. = does not copy the value. code:
code:
tef fucked around with this message at 06:10 on Feb 27, 2009 |
# ¿ Feb 27, 2009 06:07 |
|
I assume listEntry is some sort of object? You are adding the same object over and over to the list in ranges. Something like this is what you want to do : code:
Here we are chaning the contents of the list in each loop, and adding that list to the output: code:
code:
|
# ¿ Feb 27, 2009 06:18 |
|
Zombywuf posted:Can someone please explain to me WTF is going on here: I thought I was the only one there who understood python unicode
|
# ¿ Mar 6, 2009 21:23 |
|
Zombywuf posted:IT'S TWO THOUSAND AND loving NINE PEOPLE! UNICODE IS A SOLVED PROBLEM. Just like Soap!
|
# ¿ Mar 6, 2009 21:53 |
|
Yikes. Most platforms have an existing package system (for example dpkg, rpm and the tools built on top of them), and it's a pity to try and implement a system on top of it.
|
# ¿ Mar 7, 2009 03:33 |
|
I used pyodbc last time instead of pymssql, with a little more success. http://code.google.com/p/pyodbc/ I dunno maybe there's still some working examples from the last time.
|
# ¿ Mar 13, 2009 12:24 |
|
Does it have to be in python? There are good messaging systems that python can use. For example: http://www.rabbitmq.com/
|
# ¿ Mar 14, 2009 00:19 |
|
Can anyone explain this?code:
code:
|
# ¿ Mar 21, 2009 01:17 |
|
Ooooor I should be using not x and not not(x)
tef fucked around with this message at 01:27 on Mar 21, 2009 |
# ¿ Mar 21, 2009 01:18 |
|
not(x) is parsed as not (x) so 1.and(2) is 1.0 and (2)
|
# ¿ Mar 21, 2009 15:14 |
|
The pycon videos are up: http://pycon.blip.tv/file/1947354/
|
# ¿ Apr 3, 2009 12:51 |
|
I know this will sound facile, but have you looked at the standard documentation for python? http://docs.python.org/index.html It has the complete language reference: http://docs.python.org/reference/index.html And the complete library reference: http://docs.python.org/library/index.html
|
# ¿ Apr 17, 2009 19:38 |
|
Use the urlfetch api for google app engine: http://code.google.com/appengine/docs/python/urlfetch/overview.html urllib2 is awful.
|
# ¿ May 2, 2009 04:32 |
|
Well if it is disabled in the api, I imagine what you're trying to do won't work on google app engine at all. The local api will just be wrappers around the standard library, but the implementations at google have been modified for the sandbox. Edit: What you are doing is trying to circumvent the security policy by using an unsupported api. Which is why it is not working, and will never work.
|
# ¿ May 2, 2009 06:29 |
|
The Evan posted:But it kept giving me errors about concatenating strings and ints and such. code:
|
# ¿ May 5, 2009 12:39 |
|
Sylink posted:Can someone explain parsing with elementtree to me, it makes no sense and the effbot stuff seems to all be about making xml, I just want to get information. gently caress element tree, learn xpath - it's a standard simple query language for xml documents that is supported on a number of platforms, and it makes xml significantly less painful. http://codespeak.net/lxml/xpathxslt.html http://en.wikipedia.org/wiki/XPath_1.0 if you have code:
It's fairly terse, readable and expressive - and a lot simpler to use than things like dom or sax or object models.
|
# ¿ May 9, 2009 19:59 |
|
|
# ¿ May 12, 2024 12:33 |
|
Icesler posted:I have never even glanced at a programming language before. So I have decided that I am going to learn python. So far I have just done some really simple basic stuff like the 'Hello World!' and made a number guessing game. If you take on a hard project, yes. Pick easy projects that are fairly simple to complete. A standard and simple task is to re-implement classic unix utilities like sort or grep. Personally, I would go and look at the turtle module and draw fractals.
|
# ¿ May 20, 2009 18:39 |