Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Lurchington: Jan 2, 2003; Forums Dragoon

MelonWheels posted:

I'm new to to programming, so this an general advice question. My first language a few months ago was python, and I quite like it. However, in some instances it seems to run slow, so I'm learning C now. What I wonder is: is it worth it to mix the two languages with the Python/C API (or Cython) in order to get more speed and mix languages, or will I just create some putrid pile of crap? Do people write C extensions just for quicker code execution, or am I missing the point?

I think it'd be a fair statement that if you're new to programming, it's unlikely you're going to be tackling the type of problems that you general look to C extensions to solve. C extensions are written for speed, or to get around the GIL, or in some cases because the developer is more comfortable in it.

And I'd be curious to hear what kind of situations you're finding python is too slow for. Often the speed increases come with familiarity and expertise in approaching the problem, versus just going "welp, better write it in a fast language now!"

Dijkstracula posted:

So...as a relatively hardcore Perl dork, I have to say that I'm rather enjoying Python these days.

That said:

The first page of the megathread mentions some alternative APIs that offer threads that span more than one CPU. Are they still the choices of today or has something else/better come along? I am sadly hitting 101% CPU utilization with my current implementation

If you're talking about m0nk3yz post about the processing module, then that's still the solution to using more than one CPU (although not in the way you're describing threads spanning more than one CPU). However, it's in the standard library now as multiprocessing.

here's m0nk3yz's talk from Pycon 09 that I used to get familiar with multiprocessing: http://us.pycon.org/2009/conference/schedule/event/31/

# ? Sep 9, 2010 20:20

Adbot: ADBOT LOVES YOU

# ? May 17, 2024 21:29

tef: May 30, 2004; -> some l-system crap ->

Dijkstracula posted:

So...as a relatively hardcore Perl dork, I have to say that I'm rather enjoying Python these days.

That said:

The first page of the megathread mentions some alternative APIs that offer threads that span more than one CPU. Are they still the choices of today or has something else/better come along? I am sadly hitting 101% CPU utilization with my current implementation

eventlets?

# ? Sep 9, 2010 20:25

Dijkstracula: Mar 18, 2003; You can't spell 'vector field' without me, Professor!

Sorry, I misspoke - what I meant to say was to have multiple threads span multiple CPUs (which I guess boils down to more than one interpreter process?)

Thanks, I'll look both at multiprocessing and eventlets.

# ? Sep 9, 2010 21:25

MelonWheels: May 24, 2004; The ending of Max Payne 2 made me cry.

Lurchington posted:

I think it'd be a fair statement that if you're new to programming, it's unlikely you're going to be tackling the type of problems that you general look to C extensions to solve. C extensions are written for speed, or to get around the GIL, or in some cases because the developer is more comfortable in it.

And I'd be curious to hear what kind of situations you're finding python is too slow for. Often the speed increases come with familiarity and expertise in approaching the problem, versus just going "welp, better write it in a fast language now!"

I'm playing around with a rogue-like game. I noticed that Nethack has a neat GUI versions so I want to see if I can change the interface into that. I'm still learning wxPython, but using GIFs with Tkinter makes it incredibly slow. Since wxWidgets is in C, I'm wondering if I can skip the Python with an extension to make things faster. I got the idea here: http://www.linuxjournal.com/article/3776 (end of third paragraph after the bullet list)

# ? Sep 9, 2010 21:55

Threep: Apr 1, 2006; It's kind of a long story.

MelonWheels posted:

I'm playing around with a rogue-like game. I noticed that Nethack has a neat GUI versions so I want to see if I can change the interface into that. I'm still learning wxPython, but using GIFs with Tkinter makes it incredibly slow. Since wxWidgets is in C, I'm wondering if I can skip the Python with an extension to make things faster. I got the idea here: http://www.linuxjournal.com/article/3776 (end of third paragraph after the bullet list)

Have a look at PyGame instead. Making games in a GUI toolkit is a bad idea because they're rarely built for graphics performance, even if it's just a simple tile engine.

# ? Sep 9, 2010 22:05

MelonWheels: May 24, 2004; The ending of Max Payne 2 made me cry.

Yeah, I knew that cramming GIFs into Tkinter was a bad idea, but it was too appealing not to try. I prefer complicating things, which must be pretty selfish. I think PyGame looks too easy.

I was wondering why, if C is faster than Python, people don't always run their modules through Cython and use those. I don't have code that I need to speed up besides for the GUI.

edit: Or I guess I should say that I wondered why good Python programmers don't always write C extensions. From Luchington's answer it seems that it's rarely really needed, which makes sense. Either way I'm still going to learn C. It's a pretty language, and that Nethack source looks fun.

MelonWheels fucked around with this message at 22:51 on Sep 9, 2010

# ? Sep 9, 2010 22:40

BigRedDot: Mar 6, 2008

MelonWheels posted:

C. It's a pretty language

Come again? I'm used to it after almost 20 years, but I'd never call C's "declaration follows use" syntax experiment anything other than beastly.

# ? Sep 9, 2010 23:18

Threep: Apr 1, 2006; It's kind of a long story.

MelonWheels posted:

Yeah, I knew that cramming GIFs into Tkinter was a bad idea, but it was too appealing not to try. I prefer complicating things, which must be pretty selfish. I think PyGame looks too easy.

In that case, make your tile engine by feeding the gifs into PIL, then you just have one big image for Tkinter to render.

# ? Sep 9, 2010 23:23

MelonWheels: May 24, 2004; The ending of Max Payne 2 made me cry.

BigRedDot posted:

Come again? I'm used to it after almost 20 years, but I'd never call C's "declaration follows use" syntax experiment anything other than beastly.

I meant it in a strictly visual way, of course. I just started reading Kernighan and Ritchie two days ago, and I don't know what you're referring too. I googled "declaration follows use" and I still don't get it. Oh boy.

Threep posted:

In that case, make your tile engine by feeding the gifs into PIL, then you just have one big image for Tkinter to render.

Hey I hadn't thought of that at all. I'm going to try it. Thanks.

# ? Sep 9, 2010 23:46

Lurchington: Jan 2, 2003; Forums Dragoon

MelonWheels posted:

edit: Or I guess I should say that I wondered why good Python programmers don't always write C extensions.

I dunno, I'll take a stab.

Seems to violate some of the core principles of python: "Explicit is better than implicit" and "readability counts."

Real talk, no one is going to be writing python to get the best theoretical performance for a given problem. People write in python because it often allows you to do more in less time, with more maintainable code, for a given problem.

If everyone is going to be putting everything in C extensions, that's going to force the next person coming in (probably to clean up your mess) to have to dig a level deeper, into some serious boilerplate and obfuscation, for the most minor of performance gains.

You seem excited about programming, and that's admirable, but take to heart a completely true axiom: that code has to be maintained far longer than the time it takes to do initial developments. Even if it's just your personal project, you're going to run into those moments where you have to really re-interpret what the hell you were thinking if you're not careful about writing your code to be easily understood.

# ? Sep 10, 2010 04:39

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

This year's Google AI Contest just started, and they support Python (as well as other languages) - http://ai-contest.com/index.php

This year's game isn't just a purely computational game, so I think C++ doesn't have the huge advantage it would normally have.

# ? Sep 10, 2010 08:27

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

I have a number of methods that will be called with option-dicts. In order to verify that the methods receive the correct parameters, I've written a decorator, but it seems very convoluted. Is there a prettier way to write this?

code:

def required_opts(required_opts):
	def inner_func(func):
		def wrapper(options):
			if set(options.keys()) != required_opts:
				print 'missing options: %s' % (required_opts - 	set(options.keys()))
			else:
				func(options)
		return wrapper
	return inner_func

@required_opts({'test'})
def doTest(options):
	print 'options were OK: %s' % options

# .. more methods with different required_opts

Output:

code:

>>> doTest({'test': 'works'})
options were OK: {'test': 'works'}

>>> doTest({'abc': 'def'})
missing options: set(['test'])

Edit: The reason I'm not putting the options directly into each of the method declarations is that which one is called is determined at runtime, and the options are parsed from a pickled dict.

Carthag Tuek fucked around with this message at 10:39 on Sep 10, 2010

# ? Sep 10, 2010 10:36

Lurchington: Jan 2, 2003; Forums Dragoon

since you're just using keys, I'm not sure you need to pass in a dictionary versus a list of strings. Also, casting to set seems like it'd work, but here's a different take:

pre:

def required_opts(*required_opts):
	def inner_func(func):
		def wrapper(options):
                        missing_options = [k for k in options.keys()
                                              if not k in required_opts]
			if missing_options:
				print 'missing options: %s' % missing_options)
			else:
				func(options)
		return wrapper
	return inner_func

@required_opts('test')
def doTest(options):
	print 'options were OK: %s' % options

if you didn't need quite the same print-out, assert all([k for k in options.keys() if not k in required_opts]) isn't a bad option

Lurchington fucked around with this message at 14:01 on Sep 10, 2010

# ? Sep 10, 2010 13:53

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

Thanks for taking a look. You mean [k for k in required_opts if not k in options.keys()] though, right?

I was mostly wondering if there was a better way to do a decorator like this than the nested mess I have.

# ? Sep 10, 2010 14:45

Lurchington: Jan 2, 2003; Forums Dragoon

what you said :o

And yeah, it's ugly, but passing arguments to a decorator means you're going to be nesting more than you'd probably like to.

# ? Sep 10, 2010 14:55

Jonnty: Aug 2, 2007; The enemy has become a flaming star!

Are you 100% sure you need to be doing args checking like that in the first place? There's a reason python doesn't include anything like that normally.

# ? Sep 10, 2010 23:25

Ferg: May 6, 2007; Lipstick Apathy

Jonnty posted:

Are you 100% sure you need to be doing args checking like that in the first place? There's a reason python doesn't include anything like that normally.

I've written decorators like that before to validate RESTful API methods. Can be a helpful shortcut.

# ? Sep 10, 2010 23:56

Jonnty: Aug 2, 2007; The enemy has become a flaming star!

Ferg posted:

I've written decorators like that before to validate RESTful API methods. Can be a helpful shortcut.

Oh, decorators are absolutely the thing to use - the issue is just whether you actually need to use anything in the first place.

# ? Sep 11, 2010 00:08

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

Jonnty posted:

Oh, decorators are absolutely the thing to use - the issue is just whether you actually need to use anything in the first place.

Well I guess I could in theory just not do the pre-validation and use something like try/except KeyError instead. It just seems cleaner to not even go into the method if the args are missing and avoid any potential cleanup if it fails.

# ? Sep 11, 2010 06:34

defmacro: Sep 27, 2005; cacio e ping pong

I'm trying to optimize some python code and I'm looking for suggestions. I read the PythonPerformanceTips page and profiled my code with cProfile, yielding the following output:

>>> stats.strip_dirs().sort_stats('time').print_stats(20) Sun Sep 12 01:22:46 2010 cprofile.cprof 10667781418 function calls (10260364358 primitive calls) in 15649.788 CPU seconds Ordered by: internal time List reduced from 638 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 435802858 1898.209 0.000 2736.195 0.000 dpkt.py:124(unpack) 729076090 1531.070 0.000 1531.070 0.000 __init__.py:1230(getEffectiveLevel) 202923848 1246.041 0.000 1246.041 0.000 {method 'read' of 'file' objects} 425020751 1086.238 0.000 2767.425 0.000 pcap.py:82(add) 27147179 855.587 0.000 982.274 0.000 http.py:143(unpack) 2842864932 714.493 0.000 714.493 0.000 {setattr} 1 696.640 696.640 15648.728 15648.728 pcap.py:411(parse) 102588057/101419862 643.599 0.000 2778.338 0.000 ip.py:52(unpack) 447198263/206803310 599.504 0.000 5478.157 0.000 dpkt.py:59(__init__) 729076090 591.775 0.000 2122.845 0.000 __init__.py:1244(isEnabledFor) 101419851 494.426 0.000 3202.260 0.000 pcap.py:243(ipparse) 101461932 462.968 0.000 1519.800 0.000 pcap.py:132(__iter__) 649254330 453.720 0.000 2346.365 0.000 __init__.py:1034(debug) 159266909 402.603 0.000 974.618 0.000 pcap.py:91(incby) 101419862 334.745 0.000 3289.783 0.000 ethernet.py:42(_unpack_data) 70711316 311.558 0.000 3882.430 0.000 pcap.py:374(tcpparse) 202665495 278.194 0.000 923.807 0.000 pcap.py:229(dnstoip) 101419879 277.263 0.000 4065.621 0.000 ethernet.py:60(unpack) 205545887 258.319 0.000 258.319 0.000 {_socket.inet_ntoa} 463500302 240.309 0.000 240.309 0.000 {_struct.unpack}

I seem to be incurring a LOT of overhead in logging (the bold entries) and would like advice on the best approach to minimize this. I've seen suggestions to try psyco (I'm on x64 unfortunately) would PyPy offer similar potential speedups? This post mentions python's logging has pretty nasty overhead and I should just use syslog instead; anyone else have similar findings?

# ? Sep 12, 2010 18:39

tripwire: Nov 19, 2004; _{ghost flow}

If waiting on IO is killing your CPU bound application, couldn't you throw all of the IO related things to a seperate thread so the logging is done asynchronously? I know that the GIL just destroys your multithreaded performance under many circumstances but I think this may be one case where it wouldn't hurt much at all.

I'd be very surprised if Psyco would help because it just does just in time specialization; that's not likely to help you if your app isn't mostly cpu-bound.

# ? Sep 12, 2010 23:02

Stabby McDamage: Dec 11, 2005; Doctor Rope

tripwire posted:

If waiting on IO is killing your CPU bound application, couldn't you throw all of the IO related things to a seperate thread so the logging is done asynchronously? I know that the GIL just destroys your multithreaded performance under many circumstances but I think this may be one case where it wouldn't hurt much at all.

I'd be very surprised if Psyco would help because it just does just in time specialization; that's not likely to help you if your app isn't mostly cpu-bound.

I'd confirm that it's actually IO bound before doing that -- run it with "time" or just watch "top". If it's IO-bound, then "time" will show a large system time, as in:

IO bound:

code:

$ time dd if=/dev/zero of=crap bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 4.11754 s, 127 MB/s

real    0m4.182s
user    0m0.010s
sys     0m4.090s  << This number is almost as high as the real (wall) time

CPU bound:

code:

$ time perl -e 'for(1..10000000){}'

real    0m1.621s
user    0m1.610s  << This number is the bulk of the real (wall) time
sys     0m0.010s

Similarly, "top" will show an IO-bound task as using mostly "IO Wait" or "System" time instead of "user", e.g.:

IO bound:

code:

Cpu1  :  0.0%us, 47.1%sy,  0.0%ni, 49.0%id,  0.0%wa,  2.0%hi,  2.0%si,  0.0%st

CPU bound:

code:

Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

# ? Sep 12, 2010 23:19

tef: May 30, 2004; -> some l-system crap ->

defmacro posted:

I'm trying to optimize some python code and I'm looking for suggestions. I read the PythonPerformanceTips page and profiled my code with cProfile, yielding the following output:
This post mentions python's logging has pretty nasty overhead and I should just use syslog instead; anyone else have similar findings?

We did profiling at work, and found that 'logging' was adding a noticeable startup time. We replaced it with a call to stderr.write and never looked back. We didn't need anything more than that in logging.

# ? Sep 12, 2010 23:37

tripwire: Nov 19, 2004; _{ghost flow}

If the logger is in fact eating up lots of CPU cycles, the syslog package might reduce that, but it pins you to unix so thats not a great option for everyone.

It seems that the logger module is going to be doing a bunch of overhead everytime it reaches a log message regardless of if it even has to print it. As far as I can see, theres no way builtin module for high performance logging in python, but you can toggle it off when doing a performance build in a gross hackish way like this guy: http://dound.com/2010/02/python-logging-performance/

# ? Sep 12, 2010 23:54

Jo: Jan 24, 2005; Soiled Meat

Why, when loading images with PIL or Scipy.misc.imread, do I get arrays of less than width*height? I suspect it has to do with the image compression, yes? How do I uncompress the image into an array of the right length?

EDIT: D'oh! img.resize -> img = img.resize

Jo fucked around with this message at 07:25 on Sep 13, 2010

# ? Sep 13, 2010 03:44

Aredna: Mar 17, 2007; Nap Ghost

I haven't used Python since I took a class when 2.1 was current hotness, but I'm looking to pick it up again and use for a toy project and figure I'll use 3.1 since that's where things are headed.

It's going to have a basic GUI with just a few images and buttons (open, save, custom functions).

What GUI framework/toolkit should I look at using? I see a lot of tutorials in the OP, but nothing specifically referencing GUIs. Any recommendations for a quick start guide to get me going?

# ? Sep 14, 2010 04:55

xPanda: Feb 6, 2003; Was that me or the door?

Aredna posted:

I haven't used Python since I took a class when 2.1 was current hotness, but I'm looking to pick it up again and use for a toy project and figure I'll use 3.1 since that's where things are headed.

It's going to have a basic GUI with just a few images and buttons (open, save, custom functions).

What GUI framework/toolkit should I look at using? I see a lot of tutorials in the OP, but nothing specifically referencing GUIs. Any recommendations for a quick start guide to get me going?

Though I've never done any GUI programming in Python, my understanding is that Tkinter is part of the standard distribution and so is always available, but that PyQt is the bees knees. So Tkinter if things are going to be simple, and PyQt if you are making something nice.

# ? Sep 14, 2010 08:27

defmacro: Sep 27, 2005; cacio e ping pong

Stabby McDamage posted:

CPU vs. IO bound

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

# ? Sep 14, 2010 14:37

MaberMK: Feb 1, 2008; BFFs

defmacro posted:

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

Check out Stackless.

# ? Sep 14, 2010 15:09

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

defmacro posted:

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

Check out multiprocessing. It's built in, might work for you.

# ? Sep 14, 2010 19:20

MaberMK: Feb 1, 2008; BFFs

m0nk3yz posted:

Check out multiprocessing. It's built in, might work for you.

This is a better suggestion than mine, try this first.

# ? Sep 14, 2010 19:42

Stabby McDamage: Dec 11, 2005; Doctor Rope

defmacro posted:

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

That's interesting...where is the other 22% spent? System, Wait, or Idle? Because I would expect a CPU bound process to peg the CPU at 100% very easily.

Out of curiosity, is there any interdependence between the data, or can separate processes do it? If the latter, you could probably make this a multi-node computation if you need to. I wonder...is there a Python package to support that? Something to do roll-your-own lightweight Folding@Home style parallelism without all the complexity?

# ? Sep 14, 2010 21:12

Benji the Blade: Jun 22, 2004; Plate of shrimp.

MaberMK posted:

Check out Stackless.

How does Stackless help with a (mostly) CPU-bound problem when Stackless has the same GIL that cpython has?

A non-blocking IO library or multithreading might be able to help utilize the CPU better, but Stackless isn't even really required for that.

# ? Sep 14, 2010 21:15

MaberMK: Feb 1, 2008; BFFs

Benji the Blade posted:

How does Stackless help with a (mostly) CPU-bound problem when Stackless has the same GIL that cpython has?

It doesn't, I had just blindly assumed that tasklets would be good for parallelizing the problem when in fact they'd all be bound up in a single CPU thread anyway. So never mind.

# ? Sep 14, 2010 21:56

defmacro: Sep 27, 2005; cacio e ping pong

tef posted:

We did profiling at work, and found that 'logging' was adding a noticeable startup time. We replaced it with a call to stderr.write and never looked back. We didn't need anything more than that in logging.

I miss the multiple levels, but the speed makes up for it.

m0nk3yz posted:

Check out multiprocessing. It's built in, might work for you.

Exactly what I was looking for. Thanks!

defmacro fucked around with this message at 06:34 on Sep 15, 2010

# ? Sep 15, 2010 06:09

Sylink: Apr 17, 2004

PyGame vs Pyglet, is there a preference? I'd like to try my hand at a simple 2D game. All the openGL in pyglet seems faster but more complicated.

# ? Sep 15, 2010 21:13

Lurchington: Jan 2, 2003; Forums Dragoon

Alright, my office is moving towards Multi-Mechanize for functional/load-testing, away from Grinder. And I was wondering if anyone had done something similar, if so, what were your thoughts or overall impressions of both/either.

I'm more familiar with Grinder so far, it's Jython-based but kind of has its own API and way of doing things separate from Jython-proper (Python 2.5.2 equivalent) or Java. As a result it's been slightly annoying to get up to speed on it, especially since I don't have a lot of Java experience.

Multi-Mechanize certainly does more for my python sensibilities, but I did have some trouble getting matplotlib installed on my Mac. Pypi points to a much older package, and it's difficult to compile from source since the Mac version of freetype2 isn't right. I ended up installing it via macports (py26-matplotlib) and it worked fine with the built-in test project.

# ? Sep 16, 2010 01:50

BannedNewbie: Apr 22, 2003; HOW ARE YOU? -> YOSHI?
FINE, THANK YOU. -> YOSHI.

I have a script that interfaces with a dll using ctypes, and one of the functions I need to call from the dll takes in a FILE* as one of the arguments. How in the world can I pass a FILE* into the dll?

I've been struggling with this for a couple days and have been trying various things, most recently PyFile_AsFile, all with no luck.

# ? Sep 16, 2010 03:06

king_kilr: May 25, 2007

Something like what's done here: http://svn.python.org/projects/ctypes/trunk/ctypeslib/ctypeslib/contrib/pythonhdr.py perhaps? the C standard says to treat FILE* as opaque, so you can do the same I guess.

# ? Sep 16, 2010 04:04

Adbot: ADBOT LOVES YOU

# ? May 17, 2024 21:29

Lurchington: Jan 2, 2003; Forums Dragoon

Alright, I'd like to take a minute to say screw urllib2 in its stupid face for overwriting my accurately filled in header, and inserted whatever the gently caress you want.

I had something I was trying to upload with the data argument of urlopen and my content-length kept coming across as "-1"

However, HTTPLib with it's connection/request model, allowed me to POST over what I wanted and with an accurate header.

Perhaps this is due to Pylons, but still, I needed somewhere to bitch. Thanks.

# ? Sep 17, 2010 01:25

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »