Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Lurchington
Jan 2, 2003

Forums Dragoon

MelonWheels posted:

I'm new to to programming, so this an general advice question. My first language a few months ago was python, and I quite like it. However, in some instances it seems to run slow, so I'm learning C now. What I wonder is: is it worth it to mix the two languages with the Python/C API (or Cython) in order to get more speed and mix languages, or will I just create some putrid pile of crap? Do people write C extensions just for quicker code execution, or am I missing the point?

I think it'd be a fair statement that if you're new to programming, it's unlikely you're going to be tackling the type of problems that you general look to C extensions to solve. C extensions are written for speed, or to get around the GIL, or in some cases because the developer is more comfortable in it.

And I'd be curious to hear what kind of situations you're finding python is too slow for. Often the speed increases come with familiarity and expertise in approaching the problem, versus just going "welp, better write it in a fast language now!"

Dijkstracula posted:

So...as a relatively hardcore Perl dork, I have to say that I'm rather enjoying Python these days.

That said:

The first page of the megathread mentions some alternative APIs that offer threads that span more than one CPU. Are they still the choices of today or has something else/better come along? I am sadly hitting 101% CPU utilization with my current implementation

If you're talking about m0nk3yz post about the processing module, then that's still the solution to using more than one CPU (although not in the way you're describing threads spanning more than one CPU). However, it's in the standard library now as multiprocessing.

here's m0nk3yz's talk from Pycon 09 that I used to get familiar with multiprocessing: http://us.pycon.org/2009/conference/schedule/event/31/

Adbot
ADBOT LOVES YOU

tef
May 30, 2004

-> some l-system crap ->

Dijkstracula posted:

So...as a relatively hardcore Perl dork, I have to say that I'm rather enjoying Python these days. :)

That said:

The first page of the megathread mentions some alternative APIs that offer threads that span more than one CPU. Are they still the choices of today or has something else/better come along? I am sadly hitting 101% CPU utilization with my current implementation :(
eventlets?

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

Sorry, I misspoke - what I meant to say was to have multiple threads span multiple CPUs (which I guess boils down to more than one interpreter process?)

Thanks, I'll look both at multiprocessing and eventlets.

MelonWheels
May 24, 2004
The ending of Max Payne 2 made me cry.

Lurchington posted:

I think it'd be a fair statement that if you're new to programming, it's unlikely you're going to be tackling the type of problems that you general look to C extensions to solve. C extensions are written for speed, or to get around the GIL, or in some cases because the developer is more comfortable in it.

And I'd be curious to hear what kind of situations you're finding python is too slow for. Often the speed increases come with familiarity and expertise in approaching the problem, versus just going "welp, better write it in a fast language now!"

I'm playing around with a rogue-like game. I noticed that Nethack has a neat GUI versions so I want to see if I can change the interface into that. I'm still learning wxPython, but using GIFs with Tkinter makes it incredibly slow. Since wxWidgets is in C, I'm wondering if I can skip the Python with an extension to make things faster. I got the idea here: http://www.linuxjournal.com/article/3776 (end of third paragraph after the bullet list)

Threep
Apr 1, 2006

It's kind of a long story.

MelonWheels posted:

I'm playing around with a rogue-like game. I noticed that Nethack has a neat GUI versions so I want to see if I can change the interface into that. I'm still learning wxPython, but using GIFs with Tkinter makes it incredibly slow. Since wxWidgets is in C, I'm wondering if I can skip the Python with an extension to make things faster. I got the idea here: http://www.linuxjournal.com/article/3776 (end of third paragraph after the bullet list)
Have a look at PyGame instead. Making games in a GUI toolkit is a bad idea because they're rarely built for graphics performance, even if it's just a simple tile engine.

MelonWheels
May 24, 2004
The ending of Max Payne 2 made me cry.
Yeah, I knew that cramming GIFs into Tkinter was a bad idea, but it was too appealing not to try. I prefer complicating things, which must be pretty selfish. I think PyGame looks too easy.

I was wondering why, if C is faster than Python, people don't always run their modules through Cython and use those. I don't have code that I need to speed up besides for the GUI.

edit: Or I guess I should say that I wondered why good Python programmers don't always write C extensions. From Luchington's answer it seems that it's rarely really needed, which makes sense. Either way I'm still going to learn C. It's a pretty language, and that Nethack source looks fun.

MelonWheels fucked around with this message at 22:51 on Sep 9, 2010

BigRedDot
Mar 6, 2008

MelonWheels posted:

C. It's a pretty language
Come again? I'm used to it after almost 20 years, but I'd never call C's "declaration follows use" syntax experiment anything other than beastly.

Threep
Apr 1, 2006

It's kind of a long story.

MelonWheels posted:

Yeah, I knew that cramming GIFs into Tkinter was a bad idea, but it was too appealing not to try. I prefer complicating things, which must be pretty selfish. I think PyGame looks too easy.
In that case, make your tile engine by feeding the gifs into PIL, then you just have one big image for Tkinter to render.

MelonWheels
May 24, 2004
The ending of Max Payne 2 made me cry.

BigRedDot posted:

Come again? I'm used to it after almost 20 years, but I'd never call C's "declaration follows use" syntax experiment anything other than beastly.

I meant it in a strictly visual way, of course. I just started reading Kernighan and Ritchie two days ago, and I don't know what you're referring too. I googled "declaration follows use" and I still don't get it. Oh boy.

Threep posted:

In that case, make your tile engine by feeding the gifs into PIL, then you just have one big image for Tkinter to render.

Hey I hadn't thought of that at all. I'm going to try it. Thanks. :)

Lurchington
Jan 2, 2003

Forums Dragoon

MelonWheels posted:

edit: Or I guess I should say that I wondered why good Python programmers don't always write C extensions.

I dunno, I'll take a stab.

Seems to violate some of the core principles of python: "Explicit is better than implicit" and "readability counts."

Real talk, no one is going to be writing python to get the best theoretical performance for a given problem. People write in python because it often allows you to do more in less time, with more maintainable code, for a given problem.

If everyone is going to be putting everything in C extensions, that's going to force the next person coming in (probably to clean up your mess) to have to dig a level deeper, into some serious boilerplate and obfuscation, for the most minor of performance gains.

You seem excited about programming, and that's admirable, but take to heart a completely true axiom: that code has to be maintained far longer than the time it takes to do initial developments. Even if it's just your personal project, you're going to run into those moments where you have to really re-interpret what the hell you were thinking if you're not careful about writing your code to be easily understood.

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

This year's Google AI Contest just started, and they support Python (as well as other languages) - http://ai-contest.com/index.php

This year's game isn't just a purely computational game, so I think C++ doesn't have the huge advantage it would normally have.

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



I have a number of methods that will be called with option-dicts. In order to verify that the methods receive the correct parameters, I've written a decorator, but it seems very convoluted. Is there a prettier way to write this?

code:
def required_opts(required_opts):
	def inner_func(func):
		def wrapper(options):
			if set(options.keys()) != required_opts:
				print 'missing options: %s' % (required_opts - 	set(options.keys()))
			else:
				func(options)
		return wrapper
	return inner_func

@required_opts({'test'})
def doTest(options):
	print 'options were OK: %s' % options

# .. more methods with different required_opts
Output:
code:
>>> doTest({'test': 'works'})
options were OK: {'test': 'works'}

>>> doTest({'abc': 'def'})
missing options: set(['test'])
Edit: The reason I'm not putting the options directly into each of the method declarations is that which one is called is determined at runtime, and the options are parsed from a pickled dict.

Carthag Tuek fucked around with this message at 10:39 on Sep 10, 2010

Lurchington
Jan 2, 2003

Forums Dragoon
since you're just using keys, I'm not sure you need to pass in a dictionary versus a list of strings. Also, casting to set seems like it'd work, but here's a different take:

pre:
def required_opts(*required_opts):
	def inner_func(func):
		def wrapper(options):
                        missing_options = [k for k in options.keys()
                                              if not k in required_opts]
			if missing_options:
				print 'missing options: %s' % missing_options)
			else:
				func(options)
		return wrapper
	return inner_func

@required_opts('test')
def doTest(options):
	print 'options were OK: %s' % options
if you didn't need quite the same print-out, assert all([k for k in options.keys() if not k in required_opts]) isn't a bad option

Lurchington fucked around with this message at 14:01 on Sep 10, 2010

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



Thanks for taking a look. You mean [k for k in required_opts if not k in options.keys()] though, right? :)

I was mostly wondering if there was a better way to do a decorator like this than the nested mess I have.

Lurchington
Jan 2, 2003

Forums Dragoon
what you said :o

And yeah, it's ugly, but passing arguments to a decorator means you're going to be nesting more than you'd probably like to.

Jonnty
Aug 2, 2007

The enemy has become a flaming star!

Are you 100% sure you need to be doing args checking like that in the first place? There's a reason python doesn't include anything like that normally.

Ferg
May 6, 2007

Lipstick Apathy

Jonnty posted:

Are you 100% sure you need to be doing args checking like that in the first place? There's a reason python doesn't include anything like that normally.

I've written decorators like that before to validate RESTful API methods. Can be a helpful shortcut.

Jonnty
Aug 2, 2007

The enemy has become a flaming star!

Ferg posted:

I've written decorators like that before to validate RESTful API methods. Can be a helpful shortcut.

Oh, decorators are absolutely the thing to use - the issue is just whether you actually need to use anything in the first place.

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



Jonnty posted:

Oh, decorators are absolutely the thing to use - the issue is just whether you actually need to use anything in the first place.

Well I guess I could in theory just not do the pre-validation and use something like try/except KeyError instead. It just seems cleaner to not even go into the method if the args are missing and avoid any potential cleanup if it fails.

defmacro
Sep 27, 2005
cacio e ping pong
I'm trying to optimize some python code and I'm looking for suggestions. I read the PythonPerformanceTips page and profiled my code with cProfile, yielding the following output:

>>> stats.strip_dirs().sort_stats('time').print_stats(20)
Sun Sep 12 01:22:46 2010 cprofile.cprof

10667781418 function calls (10260364358 primitive calls) in 15649.788 CPU seconds

Ordered by: internal time
List reduced from 638 to 20 due to restriction <20>

ncalls tottime percall cumtime percall filename:lineno(function)
435802858 1898.209 0.000 2736.195 0.000 dpkt.py:124(unpack)
729076090 1531.070 0.000 1531.070 0.000 __init__.py:1230(getEffectiveLevel)
202923848 1246.041 0.000 1246.041 0.000 {method 'read' of 'file' objects}
425020751 1086.238 0.000 2767.425 0.000 pcap.py:82(add)
27147179 855.587 0.000 982.274 0.000 http.py:143(unpack)
2842864932 714.493 0.000 714.493 0.000 {setattr}
1 696.640 696.640 15648.728 15648.728 pcap.py:411(parse)
102588057/101419862 643.599 0.000 2778.338 0.000 ip.py:52(unpack)
447198263/206803310 599.504 0.000 5478.157 0.000 dpkt.py:59(__init__)
729076090 591.775 0.000 2122.845 0.000 __init__.py:1244(isEnabledFor)
101419851 494.426 0.000 3202.260 0.000 pcap.py:243(ipparse)
101461932 462.968 0.000 1519.800 0.000 pcap.py:132(__iter__)
649254330 453.720 0.000 2346.365 0.000 __init__.py:1034(debug)
159266909 402.603 0.000 974.618 0.000 pcap.py:91(incby)
101419862 334.745 0.000 3289.783 0.000 ethernet.py:42(_unpack_data)
70711316 311.558 0.000 3882.430 0.000 pcap.py:374(tcpparse)
202665495 278.194 0.000 923.807 0.000 pcap.py:229(dnstoip)
101419879 277.263 0.000 4065.621 0.000 ethernet.py:60(unpack)
205545887 258.319 0.000 258.319 0.000 {_socket.inet_ntoa}
463500302 240.309 0.000 240.309 0.000 {_struct.unpack}


I seem to be incurring a LOT of overhead in logging (the bold entries) and would like advice on the best approach to minimize this. I've seen suggestions to try psyco (I'm on x64 unfortunately) would PyPy offer similar potential speedups? This post mentions python's logging has pretty nasty overhead and I should just use syslog instead; anyone else have similar findings?

tripwire
Nov 19, 2004

        ghost flow
If waiting on IO is killing your CPU bound application, couldn't you throw all of the IO related things to a seperate thread so the logging is done asynchronously? I know that the GIL just destroys your multithreaded performance under many circumstances but I think this may be one case where it wouldn't hurt much at all.

I'd be very surprised if Psyco would help because it just does just in time specialization; that's not likely to help you if your app isn't mostly cpu-bound.

Stabby McDamage
Dec 11, 2005

Doctor Rope

tripwire posted:

If waiting on IO is killing your CPU bound application, couldn't you throw all of the IO related things to a seperate thread so the logging is done asynchronously? I know that the GIL just destroys your multithreaded performance under many circumstances but I think this may be one case where it wouldn't hurt much at all.

I'd be very surprised if Psyco would help because it just does just in time specialization; that's not likely to help you if your app isn't mostly cpu-bound.

I'd confirm that it's actually IO bound before doing that -- run it with "time" or just watch "top". If it's IO-bound, then "time" will show a large system time, as in:

IO bound:
code:
$ time dd if=/dev/zero of=crap bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 4.11754 s, 127 MB/s

real    0m4.182s
user    0m0.010s
sys     0m4.090s  << This number is almost as high as the real (wall) time
CPU bound:
code:
$ time perl -e 'for(1..10000000){}'

real    0m1.621s
user    0m1.610s  << This number is the bulk of the real (wall) time
sys     0m0.010s
Similarly, "top" will show an IO-bound task as using mostly "IO Wait" or "System" time instead of "user", e.g.:

IO bound:
code:
Cpu1  :  0.0%us, 47.1%sy,  0.0%ni, 49.0%id,  0.0%wa,  2.0%hi,  2.0%si,  0.0%st
CPU bound:
code:
Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

tef
May 30, 2004

-> some l-system crap ->

defmacro posted:

I'm trying to optimize some python code and I'm looking for suggestions. I read the PythonPerformanceTips page and profiled my code with cProfile, yielding the following output:
This post mentions python's logging has pretty nasty overhead and I should just use syslog instead; anyone else have similar findings?

We did profiling at work, and found that 'logging' was adding a noticeable startup time. We replaced it with a call to stderr.write and never looked back. We didn't need anything more than that in logging.

tripwire
Nov 19, 2004

        ghost flow
If the logger is in fact eating up lots of CPU cycles, the syslog package might reduce that, but it pins you to unix so thats not a great option for everyone.

It seems that the logger module is going to be doing a bunch of overhead everytime it reaches a log message regardless of if it even has to print it. As far as I can see, theres no way builtin module for high performance logging in python, but you can toggle it off when doing a performance build in a gross hackish way like this guy: http://dound.com/2010/02/python-logging-performance/

Jo
Jan 24, 2005

:allears:
Soiled Meat
Why, when loading images with PIL or Scipy.misc.imread, do I get arrays of less than width*height? I suspect it has to do with the image compression, yes? How do I uncompress the image into an array of the right length?

EDIT: D'oh! img.resize -> img = img.resize

Jo fucked around with this message at 07:25 on Sep 13, 2010

Aredna
Mar 17, 2007
Nap Ghost
I haven't used Python since I took a class when 2.1 was current hotness, but I'm looking to pick it up again and use for a toy project and figure I'll use 3.1 since that's where things are headed.

It's going to have a basic GUI with just a few images and buttons (open, save, custom functions).

What GUI framework/toolkit should I look at using? I see a lot of tutorials in the OP, but nothing specifically referencing GUIs. Any recommendations for a quick start guide to get me going?

xPanda
Feb 6, 2003

Was that me or the door?

Aredna posted:

I haven't used Python since I took a class when 2.1 was current hotness, but I'm looking to pick it up again and use for a toy project and figure I'll use 3.1 since that's where things are headed.

It's going to have a basic GUI with just a few images and buttons (open, save, custom functions).

What GUI framework/toolkit should I look at using? I see a lot of tutorials in the OP, but nothing specifically referencing GUIs. Any recommendations for a quick start guide to get me going?

Though I've never done any GUI programming in Python, my understanding is that Tkinter is part of the standard distribution and so is always available, but that PyQt is the bees knees. So Tkinter if things are going to be simple, and PyQt if you are making something nice.

defmacro
Sep 27, 2005
cacio e ping pong

Stabby McDamage posted:

CPU vs. IO bound

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

MaberMK
Feb 1, 2008

BFFs

defmacro posted:

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

Check out Stackless.

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

defmacro posted:

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

Check out multiprocessing. It's built in, might work for you.

MaberMK
Feb 1, 2008

BFFs

m0nk3yz posted:

Check out multiprocessing. It's built in, might work for you.

This is a better suggestion than mine, try this first.

Stabby McDamage
Dec 11, 2005

Doctor Rope

defmacro posted:

It's CPU bound, ~78% of the time is spent in user. I'm extracting features from (lots of) pcap files to perform clustering on after. This seems to be a pretty reasonable candidate for threadpool-like worker concurrency but I keep hearing poo poo about the GIL in python. How should I got about doing this? Would I want to fork off worker threads (so they have a distinct GIL)?

That's interesting...where is the other 22% spent? System, Wait, or Idle? Because I would expect a CPU bound process to peg the CPU at 100% very easily.

Out of curiosity, is there any interdependence between the data, or can separate processes do it? If the latter, you could probably make this a multi-node computation if you need to. I wonder...is there a Python package to support that? Something to do roll-your-own lightweight Folding@Home style parallelism without all the complexity?

Benji the Blade
Jun 22, 2004
Plate of shrimp.

MaberMK posted:

Check out Stackless.

How does Stackless help with a (mostly) CPU-bound problem when Stackless has the same GIL that cpython has?

A non-blocking IO library or multithreading might be able to help utilize the CPU better, but Stackless isn't even really required for that.

MaberMK
Feb 1, 2008

BFFs

Benji the Blade posted:

How does Stackless help with a (mostly) CPU-bound problem when Stackless has the same GIL that cpython has?

It doesn't, I had just blindly assumed that tasklets would be good for parallelizing the problem when in fact they'd all be bound up in a single CPU thread anyway. So never mind.

defmacro
Sep 27, 2005
cacio e ping pong

tef posted:

We did profiling at work, and found that 'logging' was adding a noticeable startup time. We replaced it with a call to stderr.write and never looked back. We didn't need anything more than that in logging.

I miss the multiple levels, but the speed makes up for it.

m0nk3yz posted:

Check out multiprocessing. It's built in, might work for you.

Exactly what I was looking for. Thanks!

defmacro fucked around with this message at 06:34 on Sep 15, 2010

Sylink
Apr 17, 2004

PyGame vs Pyglet, is there a preference? I'd like to try my hand at a simple 2D game. All the openGL in pyglet seems faster but more complicated.

Lurchington
Jan 2, 2003

Forums Dragoon
Alright, my office is moving towards Multi-Mechanize for functional/load-testing, away from Grinder. And I was wondering if anyone had done something similar, if so, what were your thoughts or overall impressions of both/either.

I'm more familiar with Grinder so far, it's Jython-based but kind of has its own API and way of doing things separate from Jython-proper (Python 2.5.2 equivalent) or Java. As a result it's been slightly annoying to get up to speed on it, especially since I don't have a lot of Java experience.

Multi-Mechanize certainly does more for my python sensibilities, but I did have some trouble getting matplotlib installed on my Mac. Pypi points to a much older package, and it's difficult to compile from source since the Mac version of freetype2 isn't right. I ended up installing it via macports (py26-matplotlib) and it worked fine with the built-in test project.

BannedNewbie
Apr 22, 2003

HOW ARE YOU? -> YOSHI?
FINE, THANK YOU. -> YOSHI.
I have a script that interfaces with a dll using ctypes, and one of the functions I need to call from the dll takes in a FILE* as one of the arguments. How in the world can I pass a FILE* into the dll?

I've been struggling with this for a couple days and have been trying various things, most recently PyFile_AsFile, all with no luck.

king_kilr
May 25, 2007
Something like what's done here: http://svn.python.org/projects/ctypes/trunk/ctypeslib/ctypeslib/contrib/pythonhdr.py perhaps? the C standard says to treat FILE* as opaque, so you can do the same I guess.

Adbot
ADBOT LOVES YOU

Lurchington
Jan 2, 2003

Forums Dragoon
Alright, I'd like to take a minute to say screw urllib2 in its stupid face for overwriting my accurately filled in header, and inserted whatever the gently caress you want.

I had something I was trying to upload with the data argument of urlopen and my content-length kept coming across as "-1"

However, HTTPLib with it's connection/request model, allowed me to POST over what I wanted and with an accurate header.

Perhaps this is due to Pylons, but still, I needed somewhere to bitch. Thanks.

  • Locked thread