|
BeefofAges posted:Yeah, that's what I said to do, but the guy who asked me was like "I'm doing this in a lot of places, so it would be nice to make it compact". Heh. So use shorter variable names. Or use a semicolon to put it on one physical line.
|
# ? Mar 2, 2010 21:53 |
|
|
# ? May 28, 2024 02:58 |
|
Python tries to avoid this kind of stuff because of the whole readability thing and that there should be only one way to write something. I guess you could hack something together by accessing the global dictionary, but blehh.
|
# ? Mar 2, 2010 21:54 |
|
tef posted:
Scaevolus posted:It works fine for me. I'll try it again in the next day or so. I was using lxml.html and I was getting a SerialisationError when I tried to do a toString() on anything. If I tried to mess with specifying a unicode encoding, it gave me a Unicode error on some random character saying the ordinal was out of range(128). This was using 2.2.6 lxml and 2.6.2 active python on windows XP.
|
# ? Mar 2, 2010 22:05 |
|
BeefofAges posted:Yeah, that's what I said to do, but the guy who asked me was like "I'm doing this in a lot of places, so it would be nice to make it compact". Heh. Doing something in a lot of places eh? Repeating yourself? If only there was a function? make_string_and_append("make", "append") Haha... but, no clarity is always important over cleverness. Just set the string and append.
|
# ? Mar 2, 2010 23:37 |
|
Lurchington posted:I'll try it again in the next day or so. I was using lxml.html and I was getting a SerialisationError when I tried to do a toString() on anything. If I tried to mess with specifying a unicode encoding, it gave me a Unicode error on some random character saying the ordinal was out of range(128). try etree.tounicode(foo) ? can you pastebin a small example showing the error?
|
# ? Mar 3, 2010 01:44 |
|
Dijkstracula posted:I tried Beautiful Soup this afternoon but it broke -- that is, threw some exception deep in its bowls -- on a reasonably trivial page. You could try html5lib, which uses more-or-less the same handling for lovely pages that modern browsers do. Your test page is currently timing out for me, but html5lib works with like a million possible parser implementations and there's no way your page is so hosed up that it'll fail to parse.
|
# ? Mar 3, 2010 04:30 |
|
tef posted:try etree.tounicode(foo) ? can you pastebin a small example showing the error? I could be misunderstanding, but I'm not using (I assume you mean the etree that comes bundled with python) etree, this is with the lxml install from http://codespeak.net/lxml/installation.html#ms-windows If etree does html I'll use it for sure. edit: something must be hosed in my install or I'm unfortunately glossing over important details. I've only used xml parts of etree before and I had thought I was relatively comfortable with the api. I tried to make it as barebones as possible: http://www.pastebin.org/100314 code:
code:
Lurchington fucked around with this message at 05:31 on Mar 3, 2010 |
# ? Mar 3, 2010 05:17 |
|
Lurchington posted:I could be misunderstanding, but I'm not using (I assume you mean the etree that comes bundled with python) etree, this is with the lxml install from http://codespeak.net/lxml/installation.html#ms-windows Just, you know, sticking my neck out here but I'm guessing it can't load the page for whatever reason. Also "Pyhton Interpreter"?
|
# ? Mar 3, 2010 10:25 |
|
Lurchington posted:I could be misunderstanding, but I'm not using (I assume you mean the etree that comes bundled with python) etree, this is with the lxml install from http://codespeak.net/lxml/installation.html#ms-windows as in lxml.etree.tounicode(...) quote:edit: something must be hosed in my install or I'm unfortunately glossing over important details. I've only used xml parts of etree before and I had thought I was relatively comfortable with the api. try page = lxml.etree.parse("http://cocks/", lxml.etree.HTMLParser()) instead. your example doesn't work on my installation (no html module in lxml...). (also, lxml is good for parsing xml, but you might want to try something like curl or urllib2 for fetching it.)
|
# ? Mar 3, 2010 13:35 |
|
Jonnty posted:Also "Pyhton Interpreter"? PyScripter is definitely my favorite windows IDE, but there's one of two idiosyncracies there. That's what's printed with an external run specified (alt+F9) And the link certainly could be that it isn't loading (in my original implementation, I did use urllib2 to get the url, which was successfully opened), but I was just trying to repeat Scaevolus's succesfull attempt with the same syntax. I'll try the different parse option and the tounicode parts this afternoon, thanks for the suggestions.
|
# ? Mar 3, 2010 13:46 |
|
e: nvm, this is serious overkill
the wizards beard fucked around with this message at 17:25 on Mar 3, 2010 |
# ? Mar 3, 2010 16:46 |
|
Lurchington posted:PyScripter is definitely my favorite windows IDE, but there's one of two idiosyncracies there. That's what's printed with an external run specified (alt+F9) For future reference, though, do make an attempt to understand error messages - don't just go 'welp, program guts are all over my screen, time to panic'. This: code:
|
# ? Mar 3, 2010 17:50 |
|
The point I was making, was that I used the same code as previous poster, but mine errored. While lxml clearly couldn't open the link, and had a relatively clear error message, I don't think it's a stretch to consider that something else was the actual root cause. If you simply wanted an opportunity to say "don't freak out and actually read error messages," fine, I'm right there with you. Lurchington fucked around with this message at 18:12 on Mar 3, 2010 |
# ? Mar 3, 2010 18:10 |
|
Alright, I'm fine to not talk about this anymore since I gave up on the original project and I don't want to derail the thread too bad, but here's where I'm at:Scaevolus posted:It works fine for me. ok, I'm at a separate computer that has lxml 2.2.2 installed and got: code:
(original form used by Scaev) code:
code:
and to answer tef: using code:
code:
Lurchington fucked around with this message at 18:54 on Mar 3, 2010 |
# ? Mar 3, 2010 18:42 |
|
Okay, here's something dead simple I can't find an elegant way to do. I want to set a slice of a list to a single constant. My naive guess: list[0:15] = 100 No dice. Is there a way to do this without a loop or constructing a list of the same constant repeated N times for the sole purpose of pairing to the list slice? EDIT: In case it's not clear, I'm thinking of something analagous to memset() in C.
|
# ? Mar 3, 2010 19:30 |
|
Stabby McDamage posted:Okay, here's something dead simple I can't find an elegant way to do. code:
code:
MaberMK fucked around with this message at 19:43 on Mar 3, 2010 |
# ? Mar 3, 2010 19:39 |
|
MaberMK posted:
Or just list[x:y] = [value] * (y-x)
|
# ? Mar 3, 2010 20:58 |
|
No Safe Word posted:Or just list[x:y] = [value] * (y-x) hurrr, color me retarded. Stabby, do it this way.
|
# ? Mar 3, 2010 22:22 |
|
No Safe Word posted:Or just list[x:y] = [value] * (y-x) Isn't that going to create a scratch list that's y-x entries long? Or is there some magic there I don't see? It doesn't matter for what I was doing, but it seems really inefficient for large values of y-x. I ended up doing: for i in range(x,y): a[i] = value Anyway, here's my next question. Multiple constructors: how do I make them elegant? Right now I have my __init__ be the root case that the user will never directly call. Then I have various @classmethod-decorated functions that call the constructor, then diddle with the new object before finally returning it. Is this the right way to do it?
|
# ? Mar 4, 2010 00:01 |
|
Stabby McDamage posted:Isn't that going to create a scratch list that's y-x entries long? Or is there some magic there I don't see? It doesn't matter for what I was doing, but it seems really inefficient for large values of y-x. If you're worried about efficiency, why are you using a list?
|
# ? Mar 4, 2010 00:10 |
|
I just tested it, and it does make a y-x list and then throw it away. What's worse is that my for loop uses even more memory and runs even slower! I think the range() in my for loop is literally making a list in memory -- I thought it was supposed to be a generator? To reproduce: code:
|
# ? Mar 4, 2010 00:12 |
|
Avenging Dentist posted:If you're worried about efficiency, why are you using a list? What else other than a list would I use for array-like functionality in Python? Or are you asking "if you care about large arrays, why are you writing Python?"
|
# ? Mar 4, 2010 00:13 |
|
Stabby McDamage posted:What else other than a list would I use for array-like functionality in Python? NumPy. Which incidentally does what you tried to do in the beginning. Also in Python 2.x, range returns a list. You want xrange.
|
# ? Mar 4, 2010 00:24 |
|
Won't replacing elements in a loop cause python to traverse the list y-x times, pretty much negating any benefit of not creating a scratch list?
|
# ? Mar 4, 2010 00:33 |
|
taqueso posted:Won't replacing elements in a loop cause python to traverse the list y-x times, pretty much negating any benefit of not creating a scratch list? What?
|
# ? Mar 4, 2010 00:44 |
|
Avenging Dentist posted:What? Maybe cpython optimizes this, but I would think code:
1. traverse the list to get to element a[x] 2. replace a[x] with value 3. increment x and go to 1 But code:
I am not a python expert by any stretch of the imagination and I would love to find out that python is smart enough to remember the previous element in the for loop.
|
# ? Mar 4, 2010 00:49 |
|
What are you talking about? Why would Python need to traverse a list to get to an integer offset in an array?
|
# ? Mar 4, 2010 00:50 |
|
Avenging Dentist posted:What are you talking about? Why would Python need to traverse a list to get to an integer offset in an array? You are right. For some reason I thought that a linked list is used to represent the list datatype, but as you say it is an array and an element can be found in constant time.
|
# ? Mar 4, 2010 00:54 |
|
Why the hell would anyone implement the [] operator if the underlying implementation was a linked list!
|
# ? Mar 4, 2010 01:09 |
|
Lurchington posted:Alright, I'm fine to not talk about this anymore since I gave up on the original project and I don't want to derail the thread too bad, but here's where I'm at: code:
|
# ? Mar 4, 2010 01:41 |
|
Lurchington, I think whatever activepython or pyshell fancyness you are using is making this harder for you by obfuscating the error messages a little. I assume what should be UnicodeDecodeError exceptions are instead for you getting translated to those serialization errors by whatever pyshell or activepython stuff you are using. It's very important to keep in mind that a windows console will ALWAYS give you unicode encoding errors if you try to print out unicode characters on it, because the native character encoding of the windows command interpreter is ASCII! Thats why you are getting that error doing what Tef suggested.
|
# ? Mar 4, 2010 02:35 |
|
tripwire posted:Lurchington, I think whatever activepython or pyshell fancyness you are using is making this harder for you by obfuscating the error messages a little. I will attest to this. Unicode always makes everything more complicated, but it's all soluble if you remember this when dealing with Unicode. I'm on a project now that requires parsing a bunch of Unicode XML, and I've had more trouble with my debug reporting than the real problem solving.
|
# ? Mar 4, 2010 02:50 |
|
ErIog posted:I will attest to this. Unicode always makes everything more complicated, but it's all soluble if you remember this when dealing with Unicode. I'm on a project now that requires parsing a bunch of Unicode XML, and I've had more trouble with my debug reporting than the real problem solving. Agreed. I'm writing some stuff now that takes a bunch of unicode from web services and debugging on windows is bullshit because of the ASCII console.
|
# ? Mar 4, 2010 03:14 |
|
tripwire posted:Lurchington, I think whatever activepython or pyshell fancyness you are using is making this harder for you by obfuscating the error messages a little. That's likely, I have an Mac and Linux test box around here that I can use, but all my windows machines are using ActivePython. Scaevolus posted:
just using the url "failed to load external entity" but using urllib2.urlopen on the url with the explicit encoding does seem to provide good results. Upon inspection, there's like 80 of these lines: code:
|
# ? Mar 4, 2010 03:50 |
|
Avenging Dentist posted:NumPy. Which incidentally does what you tried to do in the beginning. I'll keep that in mind for the future. I assume Numpy is written in C on the backend? Regarding xrange(): why does range() exist given the existence of xrange()? Are there situations where the generator nature of it would be a problem? Also, there's something else I'm curious about. Why did my for loop take so much more memory than the list itself? It seems like range(0,90e6) should be a bit smaller than [1] * 100e6, but instead my for loop ate all the RAM available (~1.5GB) and started thrashing. Weird.
|
# ? Mar 4, 2010 05:40 |
|
Stabby McDamage posted:Also, there's something else I'm curious about. Why did my for loop take so much more memory than the list itself? It seems like range(0,90e6) should be a bit smaller than [1] * 100e6, but instead my for loop ate all the RAM available (~1.5GB) and started thrashing. Weird. Because [x]*N creates a list with N references to x (i.e. x[i] is x[j] == True for all i,j in [0,N)), whereas range(N) creates a list with N distinct integers, all* of which are heap-allocated as separate objects. It should be obvious why the former takes less space than the latter. Basically a Python list is an array of pointers to PyObjects, so for an N-element list on a 32-bit system, there are 4*N bytes of data taken up by that (plus 12 bytes for the refcount, pointer to type object, and length). Each integer object in Python takes 12 bytes of data (refcount, pointer to type object, and value), so a list of N distinct Python ints costs 16*N+12 bytes of data, excluding malloc overhead. A list of N identical Python ints costs 4*N+24 bytes. * However, integers in the range [-5, 256] are cached by Python and don't involve an allocation. Avenging Dentist fucked around with this message at 06:44 on Mar 4, 2010 |
# ? Mar 4, 2010 06:38 |
|
Stabby McDamage posted:Right now I have my __init__ be the root case that the user will never directly call. Then I have various @classmethod-decorated functions that call the constructor, then diddle with the new object before finally returning it. Is this the right way to do it? I'm not a python expert, but thats exactly how I do it.
|
# ? Mar 4, 2010 07:07 |
|
Avenging Dentist posted:Because [x]*N creates a list with N references to x (i.e. x[i] is x[j] == True for all i,j in [0,N)), whereas range(N) creates a list with N distinct integers, all* of which are heap-allocated as separate objects. It should be obvious why the former takes less space than the latter. Individual integers are heap-allocated full python objects, with a refcount and everything? Gross. I mean, I see why you'd want to implement it that way, but drat.
|
# ? Mar 4, 2010 15:32 |
|
Stabby McDamage posted:I assume Numpy is written in C on the backend?
|
# ? Mar 4, 2010 17:17 |
|
|
# ? May 28, 2024 02:58 |
|
I ran into this bug yesterday.code:
Scaevolus fucked around with this message at 00:54 on Mar 6, 2010 |
# ? Mar 6, 2010 00:34 |