Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

tripwire: Nov 19, 2004; _{ghost flow}

LuckySevens posted:

I'm using urllib2 to download a rar file, but its coming up as corrupt despite it being the correct size. What's something special I need to do?

Are you writing it to a file in binary mode or ascii?

# ? Jan 3, 2010 11:56

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 13:58

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

tripwire posted:

Are you writing it to a file in binary mode or ascii?

I struggled for like an hour with this exact problem yesterday before I remembered to use "wb" instead of "w".

The sad thing is, it seems like I forget to do it every time I write something that downloads a binary file.

# ? Jan 3, 2010 18:10

LuckySevens: Feb 16, 2004; fear not failure, fear only the limitations of our dreams

tripwire posted:

Are you writing it to a file in binary mode or ascii?

Binary, using wb mode.

# ? Jan 3, 2010 23:27

tripwire: Nov 19, 2004; _{ghost flow}

LuckySevens posted:

Binary, using wb mode.

Huh. Well that was the first thing I would have checked.

Maybe try making a really really tiny rar file of a few lines of text, and then check it bit for bit with what ends up getting saved and see where its differing.

# ? Jan 4, 2010 00:17

LuckySevens: Feb 16, 2004; fear not failure, fear only the limitations of our dreams

yeah good idea, i'll give that a whirl

# ? Jan 4, 2010 00:45

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Anyone have any idea what this string is?

code:

b~\x89\xc3T\xeb\x9aW\x1c\xb1H\x84\xa0M\xc3\x94\xfd\xe6\x8bH

I think it's supposed to represent this:

code:

627E89C354EB9A571CB14884A04DC394FDE68B48

If it does, how do I convert from the first to second?

# ? Jan 6, 2010 05:47

tripwire: Nov 19, 2004; _{ghost flow}

Thermopyle posted:

Anyone have any idea what this string is?
code:
b~\x89\xc3T\xeb\x9aW\x1c\xb1H\x84\xa0M\xc3\x94\xfd\xe6\x8bH
I think it's supposed to represent this:
code:
627E89C354EB9A571CB14884A04DC394FDE68B48
If it does, how do I convert from the first to second?

Looks like ~�├Tδ�W∟▒H��M├��H to me.

If you are on linux or osx try starting up an interactive session and fool around with encoding it to or from some kind of unicode (utf-8 for example). both unicode and standard string data types in python have an 'encode' method for encoding to different character encodings. This may be more difficult on windows as the command prompt doesn't work properly with unicode.

tripwire fucked around with this message at 05:56 on Jan 6, 2010

# ? Jan 6, 2010 05:53

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Figured it out.

The first was an sha1 digest, the second was an sha1 hexdigest of the same data.

# ? Jan 6, 2010 06:21

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

An incredibly open-ended question: I'm looking at porting some old software of mine (specialised scientific data analysis) to Python. I'd like to take advantage of multi-core machines, and the code is readily de-composable into independent, parallel parts. So I've been doing my reading about the various options, and gotten to the point of confusion from too much - and sometimes contradictory - information.

Multiprocessing, pyprocessing, pprocess, Axon, eventlets, Twisted, Stackless, "there's no point because of the GIL", "the GIL is not a problem", "the GIL is not a problem if you do it right", greenlets, Diesel, Fibra ...

So, is there a good high level overview of the choices for concurrency in Python and/or maybe a strong recommendation?

To give you an idea of my priorities and the limitations I have to work in, let's assume that:

* Faster execution is the point
* I'd rather use an extension to standard Python than a specialised distribution (e.g. Stackless)
* I want to get my work done, rather than spend all my hours in and become an expert on a monster framework (e.g. Twisted)
* I'm starting with multicore machines, with maybe an eye towards more powerful hardware later
* The Actor model is kind of cool and a humane way to handle concurrency
* Something that plays well with normal Python libraries is essential
* The problems that I'm working on are better suited to high-level coarse concurrency (i.e. at "the top" of the program) rather than lots of small threads "at the bottom".

# ? Jan 6, 2010 22:44

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

outlier posted:

* Faster execution is the point

Then why use Python? Most scientific applications that use Python use it to script already-fast (and sometimes already-parallel) tools/libraries. Speed is not Python's priority. Much like you wouldn't write a AAA game entirely in Python, you wouldn't write computationally-intensive scientific code entirely in it either.

Bust out a C compiler and a decent MPI library. And if you have a use case for Python scripting, make a Python extension module for it.

# ? Jan 6, 2010 22:49

BigRedDot: Mar 6, 2008

You should really take a look at the newest ipython:

http://ipython.scipy.org/doc/stable/html/overview.html#interactive-parallel-computing

Edit: which also integrates with mpi:

http://ipython.scipy.org/doc/stable/html/parallel/parallel_mpi.html

# ? Jan 6, 2010 22:49

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

BigRedDot posted:

Edit: which also integrates with mpi:

http://ipython.scipy.org/doc/stable/html/parallel/parallel_mpi.html

Merely having access to MPI doesn't give you a lot if the underlying code being run on each node isn't very fast. Granted, with NumPy, you can probably approach sensible speeds provided you're only doing common linear algebraic operations. But when speed is essential, the best way to make Python run faster is to do the minimum amount of work in Python itself.

Also, I just noticed that you're working on multicore machines (currently) instead of parallel machines, so for that you'd really want to use something like OpenMP instead. Obviously, this extends to MPI as well if each node is a multicore machine* (use MPI for inter-node parallelism and OpenMP for intra-node parallelism).

* Technically, a shared-memory machine

Avenging Dentist fucked around with this message at 23:00 on Jan 6, 2010

# ? Jan 6, 2010 22:57

BigRedDot: Mar 6, 2008

I'm not really disagreeing, if numpy is not sufficient for the actual numerics then you'll need to wrap your own C or Fortran routines. I only point out ipython because it allows the scatter/gather, etc operations to be handled in python in the driver script.

# ? Jan 6, 2010 23:05

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

outlier posted:

Multiprocessing, pyprocessing, pprocess, Axon, eventlets, Twisted, Stackless, "there's no point because of the GIL", "the GIL is not a problem", "the GIL is not a problem if you do it right", greenlets, Diesel, Fibra ...

Use multiprocessing. It's in the standard library, and works well. Managers within it allow you to share objects between individual machines/on the network.

The GIL is a hindrance for compute-bound operations not doing I/O or contained in a C extension which releases the GIL. If you are I/O heavy, you will see speed up with threads; although contention over the GIL prevents it from being as fast as it could be.

pyprocessing: multiprocessing is pyprocessing, but with a pile of bug fixes. Use multiprocessing.
pprocess: Works well - pprocess and multiprocessing are a lot alike, different APIs though.
Axon: Kamaelia - good abstraction(s), it's event loop is sort of slow, and it's GPL, which means I wouldn't go near it.
eventlets: Mainly geared at async network work
stackless: Cool, wouldn't touch it - it's essentially a fork of python.
greenlet: Spin off of stackless, it's OK. Don't know how fast it goes. 0.2 - and I don't know if it's actually maintained.
Twisted: Again, my feeling is that this is largely a networking stack.
Diesel: Too new, don't know much about it/speed. Async.
Fibra: I don't know if Simon is still working on this.

m0nk3yz fucked around with this message at 15:13 on Jan 7, 2010

# ? Jan 7, 2010 02:24

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

BigRedDot posted:

[ipython stuff]

Good stuff - I noticed ages ago that ipython was taking on MPI etc., but hadn't realised that they'd actually done it. That'll be awesomely useful.

m0nk3yz posted:

.. words ...

Double awesome - exactly what I was looking for. There seems to be a lot of dubious or confused information out there, but this has made things a lot clearer.

nonathlon fucked around with this message at 21:19 on Jan 7, 2010

# ? Jan 7, 2010 20:56

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

outlier posted:

Good stuff - I noticed ages ago that ipython was taking on MPI etc., but hadn't realised that they'd actually done it. That'll be awesomely useful.

Double awesome - exactly what I was looking for. There seems to be a lot of dubious or confused information out there, but this has made things a lot clearer.

Note that I'm opinionated; I've spent a lot of time looking at this stuff. I'm also biased in that I'm the maintainer of multiprocessing.

# ? Jan 7, 2010 22:19

nbv4: Aug 21, 2002; by Duchess Gummybuns

m0nk3yz posted:

and it's GPL, which means I wouldn't go near it.

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

# ? Jan 7, 2010 23:22

MaberMK: Feb 1, 2008; BFFs

nbv4 posted:

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

For me, because it's championed by a certified nutjob who eats his own toenail clippings. That and it's not actually a free software license. It's a free as long as you share it too license. While I subscribe to the share and share-alike attitude (and I think everyone else should too), forcing developers to release their derivative works doesn't exactly scream freedom.

# ? Jan 7, 2010 23:26

king_kilr: May 25, 2007

nbv4 posted:

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

I don't like it because it's behavior is entirely undefined in a dynamic language like Python. That's why I don't use GPL code, I release BSD code because I believe the most important freedom is the freedom to do something I might not like.

king_kilr fucked around with this message at 05:12 on Jan 8, 2010

# ? Jan 8, 2010 02:04

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

nbv4 posted:

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

Because, usage of a library which is GPL (e.g. import) "more than likely" invokes the "Your app is must now GPL" clause. And:

quote:

While I subscribe to the share and share-alike attitude (and I think everyone else should too), forcing developers to release their derivative works doesn't exactly scream freedom.

If I import a GPL library, mine is now GPL, which means a consumer must be GPL, and so on. I find the GPL fundamentally more restrictive than X11/Apache/BSD/etc, so I choose to use those. I also write software, much of which is proprietary, for my job(s). Touching GPL code is philosophically, and business-wise, undesirable.

king_kilr posted:

I don't like it because it's behavior is entirely undefined in a dynamic language like Python. That's why I don't use GPL code, I don't release BSD code because I believe the most important freedom is the freedom to do something I might not like.

I don't think it's really undefined - imports trigger it's sharing clauses. Saying it really doesn't, and arguing language semantics smells like violates the spirit of the GPL. It hasn't been tested in court though. And I think you meant you release BSD software, right?

# ? Jan 8, 2010 02:08

king_kilr: May 25, 2007

m0nk3yz posted:

I don't think it's really undefined - imports trigger it's sharing clauses. Saying it really doesn't, and arguing language semantics smells like violates the spirit of the GPL. It hasn't been tested in court though. And I think you meant you release BSD software, right?

Yep, all my code is BSD. And yeah, your choices for the semantics are "undefined, but violating the spirit" and "my poo poo is now GPL too", awesome.

# ? Jan 8, 2010 02:36

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

king_kilr posted:

Yep, all my code is BSD. And yeah, your choices for the semantics are "undefined, but violating the spirit" and "my poo poo is now GPL too", awesome.

Then fix the "I don't release BSD code" in your last post

# ? Jan 8, 2010 02:42

nbv4: Aug 21, 2002; by Duchess Gummybuns

m0nk3yz posted:

Because, usage of a library which is GPL (e.g. import) "more than likely" invokes the "Your app is must now GPL" clause. And:

If I import a GPL library, mine is now GPL, which means a consumer must be GPL, and so on. I find the GPL fundamentally more restrictive than X11/Apache/BSD/etc, so I choose to use those. I also write software, much of which is proprietary, for my job(s). Touching GPL code is philosophically, and business-wise, undesirable.

I don't think it's really undefined - imports trigger it's sharing clauses. Saying it really doesn't, and arguing language semantics smells like violates the spirit of the GPL. It hasn't been tested in court though. And I think you meant you release BSD software, right?

But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

# ? Jan 8, 2010 02:43

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

nbv4 posted:

But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

Not exactly. If I write library X, and in order for it to offer it's full functionality, it depends, and imports Y - X now must be GPL'ed, forcing me to either drop functionality on Y, or to change my license. See this discussion about readline: http://clisp.cvs.sourceforge.net/*checkout*/clisp/clisp/doc/Why-CLISP-is-under-GPL

For a great read on this, see: http://jacobian.org/writing/gpl-questions/

# ? Jan 8, 2010 03:55

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

nbv4 posted:

But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

What you have described is the LGPL and is the reason most libraries use it.

# ? Jan 8, 2010 09:02

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

nbv4 posted:

But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

No. The whole point of the GPL is to force any derivative works to be and remain free software. If you don't want those semantics for the code you write, don't use the GPL. The LGPL, on the other hand, has the semantics you want (the code itself must remain free software, preventing proprietary forks, but things that link it don't have to be). If you don't care about proprietary forks, a more permissive license (BSD, MIT, X11, Apache etc) is for you.

# ? Jan 8, 2010 09:12

Zombywuf: Mar 29, 2008

MaberMK posted:

forcing developers to release their derivative works doesn't exactly scream freedom.

Good job the GPL doesn't do that then.

# ? Jan 8, 2010 17:12

MaberMK: Feb 1, 2008; BFFs

Zombywuf posted:

Good job the GPL doesn't do that then.

Sorry, I got in a hurry and misspoke.

Forcing developers to release the source code to any derivative work they distribute doesn't exactly scream freedom.

# ? Jan 8, 2010 17:28

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

I've got an object that caches writes to a structured data file. My object has a force_update method that does the expensive writing to the file (it's a file used by a third-party application that I have to terminate before writing to and then restart when I'm done writing).

Out of curiosity...is there a way to make sure the force_update method is called before the script terminates? I can of course make sure that I have obj.force_update() at the end of my script, but I'm just thinking about convenience and catching early terminations.

# ? Jan 9, 2010 20:41

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

Thermopyle posted:

I've got an object that caches writes to a structured data file. My object has a force_update method that does the expensive writing to the file (it's a file used by a third-party application that I have to terminate before writing to and then restart when I'm done writing).

Out of curiosity...is there a way to make sure the force_update method is called before the script terminates? I can of course make sure that I have obj.force_update() at the end of my script, but I'm just thinking about convenience and catching early terminations.

Sounds a bit tricky. You could put a call to force_update in the destructor of your object, so it's triggered when the object is cleaned up / garbage collected at script termination. But Python terminates sessions in - what seems to me - an awfully messy way. You can't rely on anything still existing, I guess because the order of cleanup is unknown. Possibly someone can suggest a neater way that involves catching the termination signal.

EDIT: vvv You learn something every day.

nonathlon fucked around with this message at 00:22 on Jan 10, 2010

# ? Jan 9, 2010 22:03

Lonely Wolf: Jan 20, 2003; Will hawk false idols for heaps and heaps of dough.

import atexit

# ? Jan 9, 2010 23:36

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

You could also just use RAII via the with statement, which has the added benefit of doing what you'd expect even if your object doesn't last until termination of the script. (Of course, the fact that Python requires an extra keyword to safely implement RAII is a bit embarrassing, but oh well.)

# ? Jan 10, 2010 00:32

king_kilr: May 25, 2007

Avenging Dentist posted:

You could also just use RAII via the with statement, which has the added benefit of doing what you'd expect even if your object doesn't last until termination of the script. (Of course, the fact that Python requires an extra keyword to safely implement RAII is a bit embarrassing, but oh well.)

Huh? C++ style RAII, just using desctuctors is perfectly possible, the issue is in systems with sensible garbage collection (read: generational GC, like PyPy or Jython, and NOT CPython) the actual destruction time is undefined.

# ? Jan 10, 2010 01:02

tripwire: Nov 19, 2004; _{ghost flow}

Out of curiosity, does Jython suffer from the GIL contention performance problems when using multithreaded code and a multicore computer?
If you've never seen it visualized, check this out: http://www.dabeaz.com/blog/2010/01/python-gil-visualized.html
Its quite dramatic how much performance is sacrificed on the altar of the GIL.

# ? Jan 10, 2010 01:06

king_kilr: May 25, 2007

tripwire posted:

Out of curiosity, does Jython suffer from the GIL contention performance problems when using multithreaded code and a multicore computer?
If you've never seen it visualized, check this out: http://www.dabeaz.com/blog/2010/01/python-gil-visualized.html
Its quite dramatic how much performance is sacrificed on the altar of the GIL.

Jython doesn't have a GIL. And it's not performance that is sacrificed by the GIL (much less an altar), at least not any more than the 50% hit in performance all threads take in any of the safe-threading implementations.

# ? Jan 10, 2010 01:57

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

king_kilr posted:

Huh? C++ style RAII, just using desctuctors is perfectly possible, the issue is in systems with sensible garbage collection (read: generational GC, like PyPy or Jython, and NOT CPython) the actual destruction time is undefined.

That's a pretty huge issue!

# ? Jan 10, 2010 03:07

king_kilr: May 25, 2007

Avenging Dentist posted:

That's a pretty huge issue!

Yeah, and that's how you get a fast garbage collector, hence the introduction of the with statement. There's no way to have deterministic destruction with one of these systems. Of course you could always have ref counting like CPython does, but that doesn't particularly scale due to the need for locking on every refcount update (this doesn't consider in-between sysetems like IBM's recycler).

# ? Jan 10, 2010 03:54

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

king_kilr posted:

Yeah, and that's how you get a fast garbage collector, hence the introduction of the with statement. There's no way to have deterministic destruction with one of these systems.

RAII and non-deterministic garbage collection are not mutually exclusive, though they often get conflated because language designers decide that the way to avoid forcing the programmer to manage memory is to make everything garbage-collected. All this is the usual generic-programmer* talking point raging about value types vs. reference types, though.

* "Generic" in the Stepanov sense

# ? Jan 10, 2010 04:16

Sock on a Fish: Jul 17, 2004; What if that thing I said?

I'm using binary numbers to help me build out all the possible permutations of an arbitrary set of boolean variables. Is there a way to to manipulate binary numbers in Python without using string operations? I'd like to be able to ask for the value of a digit at some position in the number.

It seems both unnatural and unnecessary to treat binary numbers like strings.

Here's what I'm doing right now for an input s.

code:

	p = []
	s_len = len(s)
	poss_matrix = []
	for x in range(2**s_len):
		poss_list = [None for y in range(s_len)]
		bits = bin(x)
		bits = bits[2:]
		while len(bits) < s_len:
			bits = '0%s' % bits
		for r in range(len(poss_list)):
			if int(bits[r]):
				poss_list[r] = True
			else:
				poss_list[r] = False
		poss_matrix.append(poss_list)

# ? Jan 11, 2010 23:15

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 13:58

tripwire: Nov 19, 2004; _{ghost flow}

Sock on a Fish posted:

I'm using binary numbers to help me build out all the possible permutations of an arbitrary set of boolean variables. Is there a way to to manipulate binary numbers in Python without using string operations? I'd like to be able to ask for the value of a digit at some position in the number.

It seems both unnatural and unnecessary to treat binary numbers like strings.

Here's what I'm doing right now for an input s.
code:
	p = []
	s_len = len(s)
	poss_matrix = []
	for x in range(2**s_len):
		poss_list = [None for y in range(s_len)]
		bits = bin(x)
		bits = bits[2:]
		while len(bits) < s_len:
			bits = '0%s' % bits
		for r in range(len(poss_list)):
			if int(bits[r]):
				poss_list[r] = True
			else:
				poss_list[r] = False
		poss_matrix.append(poss_list)

What kind of permutations are you trying to build up? Itertools might be more suitable here.
Depending on exactly what you want, you could use combinations with the product function, or permutations which is a little more specialized.

# ? Jan 11, 2010 23:22

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »