Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

tef posted:

I can't think of anything in python that would cause thrashing

To be fair, other scripting languages have pretty lovely threads too fork while fork , yeahhhhhhhhhhhhhhhhhhhh

# ? May 11, 2010 21:14

Adbot: ADBOT LOVES YOU

# ? May 24, 2024 14:59

UberJumper: May 20, 2007; woop

quote:

<lots of responses about python ide>

Well i decided to just try all the other ones i could find :eng101:

, that met my requirements. Now i don't really care for VI/Emacs bindings and such. I am trying to basically find an idea that does debugging and auto completion, and something that i can actually afford.

[WingIDE Professional (Trial)]
Pros:
- Worked pretty much flawlessly, everything felt extremely professional and well done
Cons:
- Slow/Choppy for no apparent reason
- Debugger dies on me(well it comes back if i wait long enough ~3 minutes or so), the program pauses, then whole debug panel basically becomes non responsive. I am guessing its trying to inspect all the elements in the current frame, and is choking on the amount of objects and such.
- certain modules in my site-package it just refuses to generate intellisense.

[Netbeans]
Pros:
- Its nice that it basically follows PEP's style guide for working with intellisense.
- Looks pretty and is pretty clean interface...? (i'm digging)
Cons:
- Slow, crashes, does random things, refuses to hit certain breakpoints for no loving reason.
- gently caress it.

[PyCharm]
Pros:
- Free right now (until beta is over)
- Looks good
- Intellisense worked perfectly
- pretty light memory footprint
- Lots of little things that are great ideas
Cons:
- It generated 5 gigs in cache files, :suicide:

- Some of their random context sensitive intellisense is nags the living gently caress of me, even after turning it off. It still nags me.
- Other little gripes.
- Debugger hangs randomly.

But right now basically i think i am going to stick with PyCharm, i've had the best experience so far with it.

# ? May 11, 2010 21:18

hink: Feb 19, 2006

nvm

hink fucked around with this message at 01:27 on May 12, 2010

# ? May 12, 2010 01:18

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

hink posted:

The 2tuples way seems to be about 10000% slower.

That would make sense for a data set 100x as large. http://codepad.org/MHuJZsSL

# ? May 12, 2010 01:24

tef: May 30, 2004; -> some l-system crap ->

I don't think the code does what you think it does.

code:

list2 = [x for x in xrange(5)]
list3 = [y for y in xrange(5)]

output = map(lambda x,y:x*y, list2, list3)

print output

list1 = [(x,y) for x in xrange(5) for y in xrange(5)]

print map(lambda (x,y):x*y, list1)

# ? May 12, 2010 01:26

hink: Feb 19, 2006

hmm I thought it was a zip, nevermind! I had two people look at that and neither of them picked up on that.

hink fucked around with this message at 01:30 on May 12, 2010

# ? May 12, 2010 01:28

Stabby McDamage: Dec 11, 2005; Doctor Rope

tef posted:

I can't think of anything in python that would cause thrashing

Wow, I just lost a huge amount of respect for Python. Nothing you can do will ever allow parallelism with threads...that's nuts. Worse, all that work in that slide deck is just focused on making the overhead of threads closer to single-core performance! Nothing about actually taking advantage of multicore to achieve any kind of speedup!

# ? May 12, 2010 01:29

tef: May 30, 2004; -> some l-system crap ->

Stabby McDamage posted:

Wow, I just lost a huge amount of respect for Python. Nothing you can do will ever allow parallelism with threads...that's nuts. Worse, all that work in that slide deck is just focused on making the overhead of threads closer to single-core performance! Nothing about actually taking advantage of multicore to achieve any kind of speedup!

long live actors.

(and multiprocessing)

# ? May 12, 2010 01:34

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

Stabby McDamage posted:

Wow, I just lost a huge amount of respect for Python. Nothing you can do will ever allow parallelism with threads...that's nuts. Worse, all that work in that slide deck is just focused on making the overhead of threads closer to single-core performance! Nothing about actually taking advantage of multicore to achieve any kind of speedup!

Bullshit. Threads in python still work fine, despite David's tests for most (not all) I/O bound workloads. I use the crap out of them for parallelism all the time. Yes, they're "fundamentally" broken due to GIL contention, but most I/O bound apps using threads will see performance increases. I might be the maintainer for multiprocessing, but I still use threads in most of my code.

If you notice a problem in your heavily threaded app, go async/coroutine (eventlet, gevent, etc) or use multiprocessing (go me, "whoo"), or parallel python, or a bare fork() which is what multiprocessing uses.

# ? May 12, 2010 02:34

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

m0nk3yz posted:

Bullshit. Threads in python still work fine, despite David's tests for most (not all) I/O bound workloads. I use the crap out of them for parallelism all the time. Yes, they're "fundamentally" broken due to GIL contention, but most I/O bound apps using threads will see performance increases.

It's worth noting that this isn't "real" parallelism, though. Because there's a global restriction on at-most-one-python-thread-executing-at-a-time, the only speedups you can possibly gain are the result of effectively changing to nonblocking I/O. For people who genuinely aren't I/O bound, there is literally no mechanism by which python's threads could give a speedup. This has nothing to do with overhead in taking the GIL, it's just a simple consequence of the fact that there is a lock at all; you could reduce the GIL overhead to zero, and there would still be zero speedup from threads on CPU-bound computation.

In light of this, it's really silly that python "threads" use kernel threads at all; most other languages that use this sort of pseudo-parallelism do so by simply doing nonblocking I/O behind the scenes, and implementing the "threads" in a single kernel thread, avoiding any locking overhead altogether. The GIL is one of those things that really just shouldn't exist if you're purporting to give users access to kernel threads.

# ? May 12, 2010 03:17

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Not that it's ever been a problem for me, but out of curiosity...are there plans to address this "issue" in future versions of Python?

edit: I wanted to add that I have no problem with I/O-constrained threading. Eventlet works great!

Thermopyle fucked around with this message at 03:57 on May 12, 2010

# ? May 12, 2010 03:27

Scaevolus: Apr 16, 2007

Thermopyle posted:

Not that it's ever been a problem for me, but out of curiosity...are there plans to address this "issue" in future versions of Python?

Unladen Swallow had plans to remove the GIL, but they're not so optimistic anymore.

http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Global_Interpreter_Lock

Python is a pretty bad choice for performance, so parallelizing CPU-bound tasks in it seems silly to me.

# ? May 12, 2010 03:53

Stabby McDamage: Dec 11, 2005; Doctor Rope

m0nk3yz posted:

Bullshit. Threads in python still work fine, despite David's tests for most (not all) I/O bound workloads. I use the crap out of them for parallelism all the time. Yes, they're "fundamentally" broken due to GIL contention, but most I/O bound apps using threads will see performance increases. I might be the maintainer for multiprocessing, but I still use threads in most of my code.

If you notice a problem in your heavily threaded app, go async/coroutine (eventlet, gevent, etc) or use multiprocessing (go me, "whoo"), or parallel python, or a bare fork() which is what multiprocessing uses.

I like that multiprocessing basically tries to hide the fact that it's not threads, but I have to wonder at the overhead of using heavyweight processes in this way. Without having proper CPython threads to compare against, we'll never know.

The weird thing is that Jython and IronPython don't have this limitation.

# ? May 12, 2010 04:03

king_kilr: May 25, 2007

Scaevolus posted:

Unladen Swallow had plans to remove the GIL, but they're not so optimistic anymore.

http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Global_Interpreter_Lock

Python is a pretty bad choice for performance, so parallelizing CPU-bound tasks in it seems silly to me.

No they don't. It's been decided that it's out of scope for the work they're doing (and the current priority is merger into py3k anyways).

The GIL is a hard problem, really hard. It's pretty much categorically impossible to get a) true shared memory concurrency, b) acceptable single threaded performance, c) not break every single C extension.

Seriously, it's hard.

# ? May 12, 2010 04:31

unixbeard: Dec 29, 2004

Scaevolus posted:

Python is a pretty bad choice for performance, so parallelizing CPU-bound tasks in it seems silly to me.

I don't like this argument. Yes you probably won't reach for python when speed is your biggest concern, but that doesn't mean it's ok to leave parts of the implementation fundamentally broken. Although it's not a high performance language, you still want to be able to make it go as fast as possible.

# ? May 12, 2010 07:50

king_kilr: May 25, 2007

unixbeard posted:

I don't like this argument. Yes you probably won't reach for python when speed is your biggest concern, but that doesn't mean it's ok to leave parts of the implementation fundamentally broken. Although it's not a high performance language, you still want to be able to make it go as fast as possible.

It's a completely inane argument for several reasons:

a) The fact that the Python VM is slow isn't proof of anything, the datastructures are extremely well optimized, and using native code libraries for various tasks can result in excellent performance.

b) When we have 2 core machines parallelizing for speed makes no sense, but as we move to 8 core, 16 core, and beyond it will make more sense. C may be 10x faster than Python, but it doesn't do much better for anything besides the most algorithmic code.

c) Building using multiprocessing is nice in that you're already assuming shared-nothing, if you need to distribute across multiple machines it's not a lot of extra code, as compared to if you're doing shared memory concurrency.

# ? May 12, 2010 07:56

Scaevolus: Apr 16, 2007

king_kilr posted:

No they don't. It's been decided that it's out of scope for the work they're doing (and the current priority is merger into py3k anyways).

The GIL is a hard problem, really hard. It's pretty much categorically impossible to get a) true shared memory concurrency, b) acceptable single threaded performance, c) not break every single C extension.

Seriously, it's hard.

Unladen Swallow posted:

Accordingly, we are no longer as optimistic about our chances of removing the GIL completely. We now favor a more incremental approach improving the shortcomings

I appreciate that it's a difficult problem. If it weren't, all the clever people working on it would have solved it by now.

unixbeard posted:

I don't like this argument. Yes you probably won't reach for python when speed is your biggest concern, but that doesn't mean it's ok to leave parts of the implementation fundamentally broken. Although it's not a high performance language, you still want to be able to make it go as fast as possible.

I agree; the GIL is Python's biggest mistake. The reason it's acceptable is because performance has never been a major goal of the language's development.

The main benefit I see in removing the GIL is the ability to run multiple threads of pure C code simultaneously.

king_kilr posted:

a) The fact that the Python VM is slow isn't proof of anything, the datastructures are extremely well optimized, and using native code libraries for various tasks can result in excellent performance.

"Python is fast, if you don't actually use it for anything that needs to be fast."

I agree. Python is good for writing glue code or frontends. Doing actual performance-intensive work in the language itself is a bad idea.

quote:

b) When we have 2 core machines parallelizing for speed makes no sense, but as we move to 8 core, 16 core, and beyond it will make more sense. C may be 10x faster than Python, but it doesn't do much better for anything besides the most algorithmic code.

C is consistently around 50x faster than pure Python for most any CPU-bound task.

You may dislike them, but the computer language shootout's benchmarks display the performance quite well. Python is closest in the benchmarks where (1) it uses the same bignum library as C to perform the calculations (2) it uses a function written in C (3) it uses a regex library written in C.

# ? May 12, 2010 09:14

tef: May 30, 2004; -> some l-system crap ->

m0nk3yz posted:

Bullshit. Threads in python still work fine

Just as long as you don't use signals, like hitting ^C.

# ? May 12, 2010 11:13

Stabby McDamage: Dec 11, 2005; Doctor Rope

king_kilr posted:

The GIL is a hard problem, really hard. It's pretty much categorically impossible to get a) true shared memory concurrency, b) acceptable single threaded performance, c) not break every single C extension.

Seriously, it's hard.

It may be technically hard to go from where CPython is now to a GIL-free model, but it's not theoretically hard. Most higher-level languages (Java, Perl, Ruby) aren't burdened with anything like a GIL. Further, Jython and IronPython show that there's nothing intrinsic to the Python language that precludes threaded parallelism.

The problem here is purely a technical one: removing the GIL without breaking everything else. If you sat down to write CPython today, you wouldn't introduce a GIL to begin with, as it's obvious that multicore is the way forward. That wasn't clear a decade ago, so the current design made sense.

# ? May 12, 2010 15:26

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

On that point why didn't they break it with Python 3? I doubt it would have hurt its adoption any more than it's currently suffering.

# ? May 12, 2010 15:41

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

I was under the impression that 3.1 or 3.2 fixed 90% of the GIL issues. Not that I care because why would you use Python for something that needs threads anyway?

# ? May 12, 2010 15:48

tef: May 30, 2004; -> some l-system crap ->

MEAT TREAT posted:

On that point why didn't they break it with Python 3? I doubt it would have hurt its adoption any more than it's currently suffering.

The GIL is a feature of cpython, not python. The threading model is a feature of python.

To remove the gil is essentially ripping the skeleton out of the interpreter and to replace it wholesale. Also refcounting needs to be taken out and shot. No-one really wants to re-write all of the python c extensions either, or force new extensions to use fine grain locking (joy).

If you were sitting down today, it might be more likely to use an actor model (share nothing concurrency) over threads, but I think that's the fashion today :3:

# ? May 12, 2010 16:32

tef: May 30, 2004; -> some l-system crap ->

Avenging Dentist posted:

I was under the impression that 3.1 or 3.2 fixed 90% of the GIL issues. Not that I care because why would you use Python for something that needs threads anyway?

Now code that uses threads is unlikely to take that much longer than single threaded code :v:

# ? May 12, 2010 16:33

Stabby McDamage: Dec 11, 2005; Doctor Rope

A while ago, someone asked for possible improvements to the standard library. I think I have one.

code:

# order preserving removal of duplicates, with a possible function to determine equivalence classes
def unique(seq, idfun=None): 
    if idfun is None: idfun = lambda x: x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

I was surprised there wasn't a built-in list function for this, so I looked around and found this. The simplest solution, of course, is to use Set. However, that has two problems: (1) it doesn't preserve ordering, and (2) it doesn't allow a user-specified function to determine equivalence. The above function solves these issues.

Is this general and useful enough to make it into the standard library?

Stabby McDamage fucked around with this message at 17:43 on May 12, 2010

# ? May 12, 2010 17:40

king_kilr: May 25, 2007

Stabby McDamage posted:

It may be technically hard to go from where CPython is now to a GIL-free model, but it's not theoretically hard. Most higher-level languages (Java, Perl, Ruby) aren't burdened with anything like a GIL. Further, Jython and IronPython show that there's nothing intrinsic to the Python language that precludes threaded parallelism.

The problem here is purely a technical one: removing the GIL without breaking everything else. If you sat down to write CPython today, you wouldn't introduce a GIL to begin with, as it's obvious that multicore is the way forward. That wasn't clear a decade ago, so the current design made sense.

a) Ruby (MRI) has a GIL, as does Perl AFAIK

b) Jython and IronPython both are slower than CPython for single threaded code.

c) Jython doesn't support C extensions, IronPython sort of does

# ? May 12, 2010 18:16

king_kilr: May 25, 2007

Stabby McDamage posted:

A while ago, someone asked for possible improvements to the standard library. I think I have one.
code:
# order preserving removal of duplicates, with a possible function to determine equivalence classes
def unique(seq, idfun=None): 
    if idfun is None: idfun = lambda x: x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result
I was surprised there wasn't a built-in list function for this, so I looked around and found this. The simplest solution, of course, is to use Set. However, that has two problems: (1) it doesn't preserve ordering, and (2) it doesn't allow a user-specified function to determine equivalence. The above function solves these issues.

Is this general and useful enough to make it into the standard library?

Probably not. Also there's still no reason to use a dict in your version, just use a set.

# ? May 12, 2010 18:17

Jonnty: Aug 2, 2007; The enemy has become a flaming star!

Stabby McDamage posted:

A while ago, someone asked for possible improvements to the standard library. I think I have one.
code:
# order preserving removal of duplicates, with a possible function to determine equivalence classes
def unique(seq, idfun=None): 
    if idfun is None: idfun = lambda x: x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result
I was surprised there wasn't a built-in list function for this, so I looked around and found this. The simplest solution, of course, is to use Set. However, that has two problems: (1) it doesn't preserve ordering, and (2) it doesn't allow a user-specified function to determine equivalence. The above function solves these issues.

Is this general and useful enough to make it into the standard library?

That's a bit redundant, isn't it?

code:

# order preserving removal of duplicates, with a possible function to determine equivalence classes
def unique(seq, idfun=None): 
    if idfun is None: idfun = lambda x: x
    for item in seq:
        marker = idfun(item)
        if marker in seen: continue
        seen.append(marker)
    return seen

e: also what situation do you envisage where something like sorted(set(seq)) won't do the trick?

Jonnty fucked around with this message at 20:42 on May 12, 2010

# ? May 12, 2010 20:38

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

Jonnty posted:

e: also what situation do you envisage where something like sorted(set(seq)) won't do the trick?

Probably the one where you care about performance?

# ? May 12, 2010 20:43

Jonnty: Aug 2, 2007; The enemy has become a flaming star!

Avenging Dentist posted:

Probably the one where you care about performance?

python isn't about performance, or so you always say

# ? May 12, 2010 20:44

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

Jonnty posted:

python isn't about performance, or so you always say

Yeah, worrying about asymptotic complexity of algorithms is exactly the same thing as caring about one tiny subset of parallelism.

# ? May 12, 2010 20:57

Stabby McDamage: Dec 11, 2005; Doctor Rope

king_kilr posted:

a) Ruby (MRI) has a GIL, as does Perl AFAIK

Nope, at least on Perl.
$ perl perl-thread-test.pl Did 10000000 work. Single thread did 10000000 work in 2.1 seconds (4775506 work/s) Did 5000000 work. Did 5000000 work. Two threads did 10000000 work in 1.2 seconds (8682403 work/s)

king_kilr posted:

b) Jython and IronPython both are slower than CPython for single threaded code.

This isn't an argument that a GIL-free python must be slower, merely that a Python built on something other than machine code will be slower.

king_kilr posted:

c) Jython doesn't support C extensions, IronPython sort of does

That is true, and I can see the GIL issue running headlong into C extension support. Still, you can imagine a solution that wraps C unapproved extensions in a lock while allowing the bulk of the interpreter to run with full parallelism. My point is that there's no intrinsic reason for CPython to be completely serial.

# ? May 12, 2010 21:00

ChiralCondensate: Nov 13, 2007; what is that man doing to his colour palette?; Grimey Drawer

Jonnty posted:

e: also what situation do you envisage where something like sorted(set(seq)) won't do the trick?

Sorting is not "preserving order".

# ? May 12, 2010 21:00

Jonnty: Aug 2, 2007; The enemy has become a flaming star!

ChiralCondensate posted:

Sorting is not "preserving order".

But I'm trying to envisage a situation where you have duplicates in a list where the order would be important and not natural. The point is, if there's barely any use for it, it doesn't belong in the standard library.

# ? May 12, 2010 21:02

Stabby McDamage: Dec 11, 2005; Doctor Rope

Jonnty posted:

That's a bit redundant, isn't it?
code:
# order preserving removal of duplicates, with a possible function to determine equivalence classes
def unique(seq, idfun=None): 
    if idfun is None: idfun = lambda x: x
    for item in seq:
        marker = idfun(item)
        if marker in seen: continue
        seen.append(marker)
    return seen
e: also what situation do you envisage where something like sorted(set(seq)) won't do the trick?

That doesn't do what the function I gave does. First, it uses "marker in seen" N times. I assume 'seen' is a list, so this is a linear search of size N. So it's an N^2 algorithm for duplicate elimination...that's not good.

Second, it doesn't do the same thing. idfun() is a transformation that gives the equivalence class. The default is an identity, but if the user provides one, as I had to do, you lose the data itself!

For example, I had a large number of objects that I wanted to unify based on their signature() method, so I did:

my_unique_objects = unique(my_objects, lambda x: x.signature())

With your method, I'd just have a list of signatures instead of a list of objects.

# ? May 12, 2010 21:05

Stabby McDamage: Dec 11, 2005; Doctor Rope

Jonnty posted:

But I'm trying to envisage a situation where you have duplicates in a list where the order would be important and not natural. The point is, if there's barely any use for it, it doesn't belong in the standard library.

You've spent roughly 30 seconds considering the possibilities, and I don't think you fully understand the proposal.

The natural order thing isn't the main goal -- it's just a useful side-effect.

The main goal is the ability to specify the equivalence function, just like you can specify the key function in sort operations.

EDIT:

king_kilr posted:

Probably not. Also there's still no reason to use a dict in your version, just use a set.

I'm looking at the set docs, and I don't see how you implement the given algorithm while keeping the idfun() functionality.

Stabby McDamage fucked around with this message at 21:14 on May 12, 2010

# ? May 12, 2010 21:07

Jonnty: Aug 2, 2007; The enemy has become a flaming star!

Psssh, fair enough.

# ? May 12, 2010 21:09

king_kilr: May 25, 2007

Stabby McDamage posted:

I'm looking at the set docs, and I don't see how you implement the given algorithm while keeping the idfun() functionality.

Seriously?

code:

def uniquify(seq, func=lambda x: x):
    seen = set()
    res = []
    for obj in seq:
        key = func(obj)
        if key in seen:
            continue
        res.append(obj)
        seen.add(key)
    return res

# ? May 12, 2010 21:20

Stabby McDamage: Dec 11, 2005; Doctor Rope

king_kilr posted:

Seriously?

code:

def uniquify(seq, func=lambda x: x):
    seen = set()
    res = []
    for obj in seq:
        key = func(obj)
        if key in seen:
            continue
        res.append(obj)
        seen.add(key)
    return res

Oh, I misunderstood. I thought you meant to replace both seen and res with just one set. Yeah, that's much better.

# ? May 13, 2010 00:49

huge sesh: Jun 9, 2008

Is there any way to get a sort of virtual file descriptor that doesn't actually correspond to a file? I want to capture stdout and stderr of a subprocess.check_call(), but without actually writing to a file and then reading it back.

# ? May 14, 2010 04:44

Adbot: ADBOT LOVES YOU

# ? May 24, 2024 14:59

b0lt: Apr 29, 2005

huge sesh posted:

Is there any way to get a sort of virtual file descriptor that doesn't actually correspond to a file? I want to capture stdout and stderr of a subprocess.check_call(), but without actually writing to a file and then reading it back.

StringIO

# ? May 14, 2010 04:46

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »