Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
tripwire
Nov 19, 2004

        ghost flow

LuckySevens posted:

I'm using urllib2 to download a rar file, but its coming up as corrupt despite it being the correct size. What's something special I need to do?

Are you writing it to a file in binary mode or ascii?

Adbot
ADBOT LOVES YOU

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

tripwire posted:

Are you writing it to a file in binary mode or ascii?

I struggled for like an hour with this exact problem yesterday before I remembered to use "wb" instead of "w".

The sad thing is, it seems like I forget to do it every time I write something that downloads a binary file.

LuckySevens
Feb 16, 2004

fear not failure, fear only the limitations of our dreams

tripwire posted:

Are you writing it to a file in binary mode or ascii?

Binary, using wb mode.

tripwire
Nov 19, 2004

        ghost flow

LuckySevens posted:

Binary, using wb mode.

Huh. Well that was the first thing I would have checked.

Maybe try making a really really tiny rar file of a few lines of text, and then check it bit for bit with what ends up getting saved and see where its differing.

LuckySevens
Feb 16, 2004

fear not failure, fear only the limitations of our dreams

yeah good idea, i'll give that a whirl

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Anyone have any idea what this string is?

code:
b~\x89\xc3T\xeb\x9aW\x1c\xb1H\x84\xa0M\xc3\x94\xfd\xe6\x8bH
I think it's supposed to represent this:

code:
627E89C354EB9A571CB14884A04DC394FDE68B48
If it does, how do I convert from the first to second?

tripwire
Nov 19, 2004

        ghost flow

Thermopyle posted:

Anyone have any idea what this string is?

code:
b~\x89\xc3T\xeb\x9aW\x1c\xb1H\x84\xa0M\xc3\x94\xfd\xe6\x8bH
I think it's supposed to represent this:

code:
627E89C354EB9A571CB14884A04DC394FDE68B48
If it does, how do I convert from the first to second?

Looks like ~ë├TδÜW∟▒HäáM├ö˛µďH to me.

If you are on linux or osx try starting up an interactive session and fool around with encoding it to or from some kind of unicode (utf-8 for example). both unicode and standard string data types in python have an 'encode' method for encoding to different character encodings. This may be more difficult on windows as the command prompt doesn't work properly with unicode.

tripwire fucked around with this message at 05:56 on Jan 6, 2010

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Figured it out.

The first was an sha1 digest, the second was an sha1 hexdigest of the same data.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
An incredibly open-ended question: I'm looking at porting some old software of mine (specialised scientific data analysis) to Python. I'd like to take advantage of multi-core machines, and the code is readily de-composable into independent, parallel parts. So I've been doing my reading about the various options, and gotten to the point of confusion from too much - and sometimes contradictory - information.

Multiprocessing, pyprocessing, pprocess, Axon, eventlets, Twisted, Stackless, "there's no point because of the GIL", "the GIL is not a problem", "the GIL is not a problem if you do it right", greenlets, Diesel, Fibra ...

So, is there a good high level overview of the choices for concurrency in Python and/or maybe a strong recommendation?

To give you an idea of my priorities and the limitations I have to work in, let's assume that:

* Faster execution is the point
* I'd rather use an extension to standard Python than a specialised distribution (e.g. Stackless)
* I want to get my work done, rather than spend all my hours in and become an expert on a monster framework (e.g. Twisted)
* I'm starting with multicore machines, with maybe an eye towards more powerful hardware later
* The Actor model is kind of cool and a humane way to handle concurrency
* Something that plays well with normal Python libraries is essential
* The problems that I'm working on are better suited to high-level coarse concurrency (i.e. at "the top" of the program) rather than lots of small threads "at the bottom".

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

outlier posted:

* Faster execution is the point

Then why use Python? Most scientific applications that use Python use it to script already-fast (and sometimes already-parallel) tools/libraries. Speed is not Python's priority. Much like you wouldn't write a AAA game entirely in Python, you wouldn't write computationally-intensive scientific code entirely in it either.

Bust out a C compiler and a decent MPI library. And if you have a use case for Python scripting, make a Python extension module for it.

BigRedDot
Mar 6, 2008

You should really take a look at the newest ipython:

http://ipython.scipy.org/doc/stable/html/overview.html#interactive-parallel-computing

Edit: which also integrates with mpi:

http://ipython.scipy.org/doc/stable/html/parallel/parallel_mpi.html

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

BigRedDot posted:

Edit: which also integrates with mpi:

http://ipython.scipy.org/doc/stable/html/parallel/parallel_mpi.html

Merely having access to MPI doesn't give you a lot if the underlying code being run on each node isn't very fast. Granted, with NumPy, you can probably approach sensible speeds provided you're only doing common linear algebraic operations. But when speed is essential, the best way to make Python run faster is to do the minimum amount of work in Python itself.

Also, I just noticed that you're working on multicore machines (currently) instead of parallel machines, so for that you'd really want to use something like OpenMP instead. Obviously, this extends to MPI as well if each node is a multicore machine* (use MPI for inter-node parallelism and OpenMP for intra-node parallelism).

* Technically, a shared-memory machine

Avenging Dentist fucked around with this message at 23:00 on Jan 6, 2010

BigRedDot
Mar 6, 2008

I'm not really disagreeing, if numpy is not sufficient for the actual numerics then you'll need to wrap your own C or Fortran routines. I only point out ipython because it allows the scatter/gather, etc operations to be handled in python in the driver script.

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

outlier posted:

Multiprocessing, pyprocessing, pprocess, Axon, eventlets, Twisted, Stackless, "there's no point because of the GIL", "the GIL is not a problem", "the GIL is not a problem if you do it right", greenlets, Diesel, Fibra ...

Use multiprocessing. It's in the standard library, and works well. Managers within it allow you to share objects between individual machines/on the network.

The GIL is a hindrance for compute-bound operations not doing I/O or contained in a C extension which releases the GIL. If you are I/O heavy, you will see speed up with threads; although contention over the GIL prevents it from being as fast as it could be.

  • pyprocessing: multiprocessing is pyprocessing, but with a pile of bug fixes. Use multiprocessing.
  • pprocess: Works well - pprocess and multiprocessing are a lot alike, different APIs though.
  • Axon: Kamaelia - good abstraction(s), it's event loop is sort of slow, and it's GPL, which means I wouldn't go near it.
  • eventlets: Mainly geared at async network work
  • stackless: Cool, wouldn't touch it - it's essentially a fork of python.
  • greenlet: Spin off of stackless, it's OK. Don't know how fast it goes. 0.2 - and I don't know if it's actually maintained.
  • Twisted: Again, my feeling is that this is largely a networking stack.
  • Diesel: Too new, don't know much about it/speed. Async.
  • Fibra: I don't know if Simon is still working on this.

m0nk3yz fucked around with this message at 15:13 on Jan 7, 2010

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

BigRedDot posted:

[ipython stuff]

Good stuff - I noticed ages ago that ipython was taking on MPI etc., but hadn't realised that they'd actually done it. That'll be awesomely useful.


m0nk3yz posted:

.. words ...

Double awesome - exactly what I was looking for. There seems to be a lot of dubious or confused information out there, but this has made things a lot clearer.

nonathlon fucked around with this message at 21:19 on Jan 7, 2010

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

outlier posted:

Good stuff - I noticed ages ago that ipython was taking on MPI etc., but hadn't realised that they'd actually done it. That'll be awesomely useful.


Double awesome - exactly what I was looking for. There seems to be a lot of dubious or confused information out there, but this has made things a lot clearer.

Note that I'm opinionated; I've spent a lot of time looking at this stuff. I'm also biased in that I'm the maintainer of multiprocessing.

nbv4
Aug 21, 2002

by Duchess Gummybuns

m0nk3yz posted:

and it's GPL, which means I wouldn't go near it.

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

MaberMK
Feb 1, 2008

BFFs

nbv4 posted:

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

For me, because it's championed by a certified nutjob who eats his own toenail clippings. That and it's not actually a free software license. It's a free as long as you share it too license. While I subscribe to the share and share-alike attitude (and I think everyone else should too), forcing developers to release their derivative works doesn't exactly scream freedom.

king_kilr
May 25, 2007

nbv4 posted:

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

I don't like it because it's behavior is entirely undefined in a dynamic language like Python. That's why I don't use GPL code, I release BSD code because I believe the most important freedom is the freedom to do something I might not like.

king_kilr fucked around with this message at 05:12 on Jan 8, 2010

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

nbv4 posted:

I see this sentiment a lot in the python community. What do so many people dislike the GPL?

Because, usage of a library which is GPL (e.g. import) "more than likely" invokes the "Your app is must now GPL" clause. And:

quote:

While I subscribe to the share and share-alike attitude (and I think everyone else should too), forcing developers to release their derivative works doesn't exactly scream freedom.

If I import a GPL library, mine is now GPL, which means a consumer must be GPL, and so on. I find the GPL fundamentally more restrictive than X11/Apache/BSD/etc, so I choose to use those. I also write software, much of which is proprietary, for my job(s). Touching GPL code is philosophically, and business-wise, undesirable.

king_kilr posted:

I don't like it because it's behavior is entirely undefined in a dynamic language like Python. That's why I don't use GPL code, I don't release BSD code because I believe the most important freedom is the freedom to do something I might not like.

I don't think it's really undefined - imports trigger it's sharing clauses. Saying it really doesn't, and arguing language semantics smells like violates the spirit of the GPL. It hasn't been tested in court though. And I think you meant you release BSD software, right?

king_kilr
May 25, 2007

m0nk3yz posted:

I don't think it's really undefined - imports trigger it's sharing clauses. Saying it really doesn't, and arguing language semantics smells like violates the spirit of the GPL. It hasn't been tested in court though. And I think you meant you release BSD software, right?

Yep, all my code is BSD. And yeah, your choices for the semantics are "undefined, but violating the spirit" and "my poo poo is now GPL too", awesome.

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

king_kilr posted:

Yep, all my code is BSD. And yeah, your choices for the semantics are "undefined, but violating the spirit" and "my poo poo is now GPL too", awesome.

Then fix the "I don't release BSD code" in your last post :)

nbv4
Aug 21, 2002

by Duchess Gummybuns

m0nk3yz posted:

Because, usage of a library which is GPL (e.g. import) "more than likely" invokes the "Your app is must now GPL" clause. And:


If I import a GPL library, mine is now GPL, which means a consumer must be GPL, and so on. I find the GPL fundamentally more restrictive than X11/Apache/BSD/etc, so I choose to use those. I also write software, much of which is proprietary, for my job(s). Touching GPL code is philosophically, and business-wise, undesirable.


I don't think it's really undefined - imports trigger it's sharing clauses. Saying it really doesn't, and arguing language semantics smells like violates the spirit of the GPL. It hasn't been tested in court though. And I think you meant you release BSD software, right?
But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

nbv4 posted:

But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

Not exactly. If I write library X, and in order for it to offer it's full functionality, it depends, and imports Y - X now must be GPL'ed, forcing me to either drop functionality on Y, or to change my license. See this discussion about readline: http://clisp.cvs.sourceforge.net/*checkout*/clisp/clisp/doc/Why-CLISP-is-under-GPL

For a great read on this, see: http://jacobian.org/writing/gpl-questions/

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

nbv4 posted:

But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

What you have described is the LGPL and is the reason most libraries use it.

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

nbv4 posted:

But doesn't the GPL mainly prevent people forking the software and making that fork proprietary? I don't really see why importing a library falls into that category. I can understand if you modified the library, and then imported it, but otherwise I don't see it as a "spiritual violation". Then again I'm not really a software licensing expert sooooo...

No. The whole point of the GPL is to force any derivative works to be and remain free software. If you don't want those semantics for the code you write, don't use the GPL. The LGPL, on the other hand, has the semantics you want (the code itself must remain free software, preventing proprietary forks, but things that link it don't have to be). If you don't care about proprietary forks, a more permissive license (BSD, MIT, X11, Apache etc) is for you.

Zombywuf
Mar 29, 2008

MaberMK posted:

forcing developers to release their derivative works doesn't exactly scream freedom.

Good job the GPL doesn't do that then.

MaberMK
Feb 1, 2008

BFFs

Zombywuf posted:

Good job the GPL doesn't do that then.

Sorry, I got in a hurry and misspoke.

Forcing developers to release the source code to any derivative work they distribute doesn't exactly scream freedom.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I've got an object that caches writes to a structured data file. My object has a force_update method that does the expensive writing to the file (it's a file used by a third-party application that I have to terminate before writing to and then restart when I'm done writing).

Out of curiosity...is there a way to make sure the force_update method is called before the script terminates? I can of course make sure that I have obj.force_update() at the end of my script, but I'm just thinking about convenience and catching early terminations.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Thermopyle posted:

I've got an object that caches writes to a structured data file. My object has a force_update method that does the expensive writing to the file (it's a file used by a third-party application that I have to terminate before writing to and then restart when I'm done writing).

Out of curiosity...is there a way to make sure the force_update method is called before the script terminates? I can of course make sure that I have obj.force_update() at the end of my script, but I'm just thinking about convenience and catching early terminations.

Sounds a bit tricky. You could put a call to force_update in the destructor of your object, so it's triggered when the object is cleaned up / garbage collected at script termination. But Python terminates sessions in - what seems to me - an awfully messy way. You can't rely on anything still existing, I guess because the order of cleanup is unknown. Possibly someone can suggest a neater way that involves catching the termination signal.

EDIT: vvv You learn something every day.

nonathlon fucked around with this message at 00:22 on Jan 10, 2010

Lonely Wolf
Jan 20, 2003

Will hawk false idols for heaps and heaps of dough.
import atexit

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh
You could also just use RAII via the with statement, which has the added benefit of doing what you'd expect even if your object doesn't last until termination of the script. (Of course, the fact that Python requires an extra keyword to safely implement RAII is a bit embarrassing, but oh well.)

king_kilr
May 25, 2007

Avenging Dentist posted:

You could also just use RAII via the with statement, which has the added benefit of doing what you'd expect even if your object doesn't last until termination of the script. (Of course, the fact that Python requires an extra keyword to safely implement RAII is a bit embarrassing, but oh well.)

Huh? C++ style RAII, just using desctuctors is perfectly possible, the issue is in systems with sensible garbage collection (read: generational GC, like PyPy or Jython, and NOT CPython) the actual destruction time is undefined.

tripwire
Nov 19, 2004

        ghost flow
Out of curiosity, does Jython suffer from the GIL contention performance problems when using multithreaded code and a multicore computer?
If you've never seen it visualized, check this out: http://www.dabeaz.com/blog/2010/01/python-gil-visualized.html
Its quite dramatic how much performance is sacrificed on the altar of the GIL.

king_kilr
May 25, 2007

tripwire posted:

Out of curiosity, does Jython suffer from the GIL contention performance problems when using multithreaded code and a multicore computer?
If you've never seen it visualized, check this out: http://www.dabeaz.com/blog/2010/01/python-gil-visualized.html
Its quite dramatic how much performance is sacrificed on the altar of the GIL.

Jython doesn't have a GIL. And it's not performance that is sacrificed by the GIL (much less an altar), at least not any more than the 50% hit in performance all threads take in any of the safe-threading implementations.

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

king_kilr posted:

Huh? C++ style RAII, just using desctuctors is perfectly possible, the issue is in systems with sensible garbage collection (read: generational GC, like PyPy or Jython, and NOT CPython) the actual destruction time is undefined.

That's a pretty huge issue!

king_kilr
May 25, 2007

Avenging Dentist posted:

That's a pretty huge issue!

Yeah, and that's how you get a fast garbage collector, hence the introduction of the with statement. There's no way to have deterministic destruction with one of these systems. Of course you could always have ref counting like CPython does, but that doesn't particularly scale due to the need for locking on every refcount update (this doesn't consider in-between sysetems like IBM's recycler).

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

king_kilr posted:

Yeah, and that's how you get a fast garbage collector, hence the introduction of the with statement. There's no way to have deterministic destruction with one of these systems.

RAII and non-deterministic garbage collection are not mutually exclusive, though they often get conflated because language designers decide that the way to avoid forcing the programmer to manage memory is to make everything garbage-collected. All this is the usual generic-programmer* talking point raging about value types vs. reference types, though.

* "Generic" in the Stepanov sense

Sock on a Fish
Jul 17, 2004

What if that thing I said?
I'm using binary numbers to help me build out all the possible permutations of an arbitrary set of boolean variables. Is there a way to to manipulate binary numbers in Python without using string operations? I'd like to be able to ask for the value of a digit at some position in the number.

It seems both unnatural and unnecessary to treat binary numbers like strings.

Here's what I'm doing right now for an input s.

code:
	p = []
	s_len = len(s)
	poss_matrix = []
	for x in range(2**s_len):
		poss_list = [None for y in range(s_len)]
		bits = bin(x)
		bits = bits[2:]
		while len(bits) < s_len:
			bits = '0%s' % bits
		for r in range(len(poss_list)):
			if int(bits[r]):
				poss_list[r] = True
			else:
				poss_list[r] = False
		poss_matrix.append(poss_list)

Adbot
ADBOT LOVES YOU

tripwire
Nov 19, 2004

        ghost flow

Sock on a Fish posted:

I'm using binary numbers to help me build out all the possible permutations of an arbitrary set of boolean variables. Is there a way to to manipulate binary numbers in Python without using string operations? I'd like to be able to ask for the value of a digit at some position in the number.

It seems both unnatural and unnecessary to treat binary numbers like strings.

Here's what I'm doing right now for an input s.

code:
	p = []
	s_len = len(s)
	poss_matrix = []
	for x in range(2**s_len):
		poss_list = [None for y in range(s_len)]
		bits = bin(x)
		bits = bits[2:]
		while len(bits) < s_len:
			bits = '0%s' % bits
		for r in range(len(poss_list)):
			if int(bits[r]):
				poss_list[r] = True
			else:
				poss_list[r] = False
		poss_matrix.append(poss_list)

What kind of permutations are you trying to build up? Itertools might be more suitable here.
Depending on exactly what you want, you could use combinations with the product function, or permutations which is a little more specialized.

  • Locked thread