Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Stoatbringer: Sep 15, 2004; naw, you love it you little ho-bot

I've been using Python for about six months now, and I really like it.

Apart from the indentation, which I consider to be dangerous nonsense which will only end in tears. I don't mind using it, but the old-school part of my brain is always screaming "One slip of the auto-formatter, or accidentally deleting a tab will break everything and nobody will ever know why! Oh woe, woe unto the poor sod who has to maintain this code in five years time!" :bahgawd:

# ? Jun 18, 2011 00:46

Adbot: ADBOT LOVES YOU

# ? Jun 11, 2024 01:43

brosmike: Jun 26, 2009

Stoatbringer posted:

Apart from the indentation, which I consider to be dangerous nonsense which will only end in tears. I don't mind using it, but the old-school part of my brain is always screaming "One slip of the auto-formatter, or accidentally deleting a tab will break everything and nobody will ever know why! Oh woe, woe unto the poor sod who has to maintain this code in five years time!"

How exactly is that any different from the risk that you (or your auto-formatter) could accidentally delete a }?

# ? Jun 18, 2011 01:14

dis astranagant: Dec 14, 2006

brosmike posted:

How exactly is that any different from the risk that you (or your auto-formatter) could accidentally delete a }?

If nothing else, most any text editor or ide worth a drat can tell you how your parens/braces/brackets match up. And many languages that use them will throw an error if you have a stray one.

# ? Jun 18, 2011 01:42

brosmike: Jun 26, 2009

dis astranagant posted:

If nothing else, most any text editor or ide worth a drat can tell you how your parens/braces/brackets match up.

I don't see how this is better than using indentation, whereupon any human eye worth a drat can tell how your code blocks match up.

dis astranagant posted:

And many languages that use them will throw an error if you have a stray one.

I think this is pretty much a neutral trade-off; it's true that a deleted tab in a python script is more likely to result in an error not caught til runtime than a deleted brace in a C program, but giving whitespace semantic meaning also allows you to eliminate errors from things like missing semicolons that mark statement endings. (Those would often be caught as syntax errors, but then, so would most instances of deleting a tab in a python script)

# ? Jun 18, 2011 01:54

Detetsu: Jan 14, 2006; Your loyal assistant Dr. Meowgon is all over this one.

Is there an easy answer for getting SSL running on a Win 7 python installation?

# ? Jun 18, 2011 02:49

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Stoatbringer posted:

I've been using Python for about six months now, and I really like it.

Apart from the indentation, which I consider to be dangerous nonsense which will only end in tears. I don't mind using it, but the old-school part of my brain is always screaming "One slip of the auto-formatter, or accidentally deleting a tab will break everything and nobody will ever know why! Oh woe, woe unto the poor sod who has to maintain this code in five years time!"

The biggest thing that should make you feel better about this is that it's not a problem.

Billions of lines of code in real applications are a testament to that.

# ? Jun 18, 2011 04:29

Stabby McDamage: Dec 11, 2005; Doctor Rope

brosmike posted:

I don't see how this is better than using indentation, whereupon any human eye worth a drat can tell how your code blocks match up.

The human eye can't spot ALL indentation problems, e.g.

Yeah, it's a dumb one -- spaces and tabs at the same time, and you shouldn't do that, blah blah blah, but it still is an invisible mistake.

That said, it's one that almost never occurs in practice. I actually had to play with the example a bit just to make it be an error.

# ? Jun 18, 2011 04:47

spankweasel: Jan 4, 2006

#!/usr/bin/python -tt solves the tabs/spaces problem

# ? Jun 18, 2011 05:09

TOO SCSI FOR MY CAT: Oct 12, 2008; this is what happens when you take UI design away from engineers and give it to a bunch of hipster art student "designers"

Well, yeah, it's hard to spot mistakes if you're using a broken editor. If your editor didn't render { or }, programming in C'd be awfully hard too.

Every decent editor can render tabs. Go find the preference, turn it on, and be happy.

# ? Jun 18, 2011 07:58

Stabby McDamage: Dec 11, 2005; Doctor Rope

Janin posted:

Well, yeah, it's hard to spot mistakes if you're using a broken editor. If your editor didn't render { or }, programming in C'd be awfully hard too.

Every decent editor can render tabs. Go find the preference, turn it on, and be happy.

Wow, that was super smug, even for a post about text editors. I was just trying to show that spacing errors can exist, but they're largely pathological.

What do you mean "render tabs"? You mean one that displays some kind of glyph for them? I've never needed a feature like that, because again, the error I showed never actually comes up in practice.

# ? Jun 18, 2011 12:55

German Joey: Dec 18, 2004

Stabby McDamage posted:

Wow, that was super smug, even for a post about text editors. I was just trying to show that spacing errors can exist, but they're largely pathological.

What do you mean "render tabs"? You mean one that displays some kind of glyph for them? I've never needed a feature like that, because again, the error I showed never actually comes up in practice.

Oh, well, good thing you decided to make a big deal out of something that doesn't exist then!

# ? Jun 18, 2011 15:42

chemosh6969

Jul 3, 2004

code:
cat /dev/null > /etc/professionalism

_{_{I am in fact a massive asswagon.

Do not let me touch computer.}}

Janin posted:

Well, yeah, it's hard to spot mistakes if you're using a broken editor. If your editor didn't render { or }, programming in C'd be awfully hard too.

Every decent editor can render tabs. Go find the preference, turn it on, and be happy.

The only time I have dumb poo poo like that happen is when I load a file that was done in one editor, that uses spaces, and then load it in one that by default uses tabs.

Then any decent IDE has a switch that fixes the poo poo.

# ? Jun 18, 2011 17:31

MaberMK: Feb 1, 2008; BFFs

The solution to this problem is to use soft tabs and code in vim

# ? Jun 18, 2011 17:50

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

This is the dumbest argument.

# ? Jun 18, 2011 17:57

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

BeefofAges posted:

This is the dumbest argument.

No it's not.

This is the dumbest argument!

# ? Jun 18, 2011 22:19

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

Thermopyle posted:

No it's not.

This is the dumbest argument!

Nuh uh!

# ? Jun 19, 2011 10:13

Opinion Haver: Apr 9, 2007

I'm using xml.etree to do some XML parsing, and I came across some rather curious behavior:

code:

for node in xml:
    if node.find("invalid-data"): print "hi"

produces no output, but

code:

for node in xml:
    if node.find("invalid-data") != None: print "hi"

does. What's going on here?

# ? Jun 20, 2011 20:18

Jonnty: Aug 2, 2007; The enemy has become a flaming star!

yaoi prophet posted:

I'm using xml.etree to do some XML parsing, and I came across some rather curious behavior:
code:
for node in xml:
    if node.find("invalid-data"): print "hi"
produces no output, but
code:
for node in xml:
    if node.find("invalid-data") != None: print "hi"
does. What's going on here?

Print the output of find() directly. It might be illegally returning False or something.

# ? Jun 20, 2011 20:36

Opinion Haver: Apr 9, 2007

Oh, apparently elements with no subelements test as false, and since the invalid-data elements have no subelements, I have to use '!= None' or 'is not None'. That's... really kind of a stupid design decision.

# ? Jun 20, 2011 22:11

No Safe Word: Feb 26, 2005

also it's more "pythonic" to say is not None than to do !=. None is a singleton so there's only ever one None.

# ? Jun 21, 2011 03:45

freezepops: Aug 21, 2007; witty title not included; Fun Shoe

I have some code that uses the distance formula, and currently it takes ~20seconds to finish a job. This code:

code:

    def distance3D(self,x,y,z,x1,y1,z1):
        dist = math.sqrt((x-x1)*(x-x1) + (y-y1)*(y-y1) + (z-z1)*(z-z1))
        return dist

Is faster than this code, which takes ~30seconds:

code:

    def distance3D(self,x,y,z,x1,y1,z1):
        dist = (x-x1)*(x-x1) + (y-y1)*(y-y1) + (z-z1)*(z-z1)
        return dist

Why would adding square root make my code faster?

# ? Jun 22, 2011 03:36

dis astranagant: Dec 14, 2006

Are you using that distance as a loop counter or something? sqrt(big pile of numbers) is a very different thing from (big pile of numbers)

# ? Jun 22, 2011 03:48

freezepops: Aug 21, 2007; witty title not included; Fun Shoe

It's looping through an image, so its run ~3x on each pixel in the image, but the values are discarded after comparison, and the max value would be 255^2*3.

# ? Jun 22, 2011 03:54

tripwire: Nov 19, 2004; _{ghost flow}

freezepops posted:

It's looping through an image, so its run ~3x on each pixel in the image, but the values are discarded after comparison, and the max value would be 255^2*3.

Well you're wrong about it being faster when you add an extra function call to math.sqrt at least.

code:

setup_trailer = '''
import random, math
width,height = 512,512
values = range(256)
pixels = [ tuple( random.choice(values) for _ in xrange(3)) 
    for _ in xrange(width*height) ]
'''


distance_1_setup = '''
def distance(x,y,z,x1,y1,z1):
    dist = math.sqrt((x-x1)*(x-x1) + (y-y1)*(y-y1) + (z-z1)*(z-z1))
    return dist

''' + setup_trailer

distance_2_setup = '''
def distance(x,y,z,x1,y1,z1):
    dist = (x-x1)*(x-x1) + (y-y1)*(y-y1) + (z-z1)*(z-z1)
    return dist

''' + setup_trailer

distance_3_setup = '''
def distance(x,y,z,x1,y1,z1):
    return (
        (x1-x)**2 +
        (y1-y)**2 +
        (z1-z)**2 )

''' + setup_trailer



statement = '''
index = 0
for x in xrange(width):
    for y in xrange(height):
        pixel = pixels[index]
        distance(pixel[0],pixel[1],pixel[2],127,127,127)
        index += 1
'''

import timeit

print timeit.timeit(statement,distance_1_setup,number=20)
print timeit.timeit(statement,distance_2_setup,number=20)
print timeit.timeit(statement,distance_3_setup,number=20)

Output:
9.2920000553131104
5.871999979019165
5.0909998416900635

tripwire fucked around with this message at 04:32 on Jun 22, 2011

# ? Jun 22, 2011 04:28

German Joey: Dec 18, 2004

maybe it would be faster to cache to function call?

# ? Jun 22, 2011 05:03

Computer viking: May 30, 2011; Now with less breakage.

German Joey posted:

maybe it would be faster to cache to function call?

And on an even more brute-force methodological level, I'm sure a dash of Cython would speed that up a lot.

More to the point, if it's genuinely faster when wrapped in a sqrt, the most obvious guess is that the compiler produces better code in the latter case, either because you trigger some heuristic, or because it's able to infer more useful info (e.g. about types)? Just out of curiosity, does it change anything time-wise if you replace the sqrt() with e.g. float()?

Computer viking fucked around with this message at 19:04 on Jun 22, 2011

# ? Jun 22, 2011 18:07

FoiledAgain: May 6, 2007

I'm not familiar with the use of triple quotes in tripwire's post. How does that work? Is this because timeit needs strings? (I've never used timeit so I have no idea.) Or is this some other convention? I'm getting an IndentationError when I copy the code, so I apparently don't understand this.

# ? Jun 22, 2011 19:15

Computer viking: May 30, 2011; Now with less breakage.

FoiledAgain posted:

I'm not familiar with the use of triple quotes in tripwire's post. How does that work? Is this because timeit needs strings? (I've never used timeit so I have no idea.) Or is this some other convention? I'm getting an IndentationError when I copy the code, so I apparently don't understand this.

Looks like timeit wants code strings, yes. Triple quotes can contain newlines and single quotes, so they're useful for things like that.
As for indent errors, it worked for me, though I copied each block on its own. I also got the same results, so it seems to have been a fluke...

edit:
Just for the record, Cython is indeed a good bit faster. Using the definitions above, I get this:
Distance 1: 4.63275718689
Distance 2: 3.84116697311
Distance 3: 3.87355780602

pre:

>>> import pyximport; pyximport.install()
>>> distance_4_setup="from distance_test import distance;" + setup_trailer
>>> print timeit.timeit(statement,distance_4_setup,number=20)
1.72920703888

And "distance_test.pyx" is this:

pre:

def distance(int x, int y, int z, int x1, int y1, int z1):
    dist = (x-x1)*(x-x1) + (y-y1)*(y-y1) + (z-z1)*(z-z1)
    return dist

Computer viking fucked around with this message at 19:34 on Jun 22, 2011

# ? Jun 22, 2011 19:24

Unknownmass: Nov 3, 2007

I am new to python and getting back into programming after a few years. My question is I have a tab-delimited text file with indices and then grouping of data. What would be the best way to import these, and rank them? Also if possible even import them as separate groups. I have been trying to use numpy but have not had to much luck so far. Thanks

# ? Jun 24, 2011 00:21

Computer viking: May 30, 2011; Now with less breakage.

Unknownmass posted:

I am new to python and getting back into programming after a few years. My question is I have a tab-delimited text file with indices and then grouping of data. What would be the best way to import these, and rank them? Also if possible even import them as separate groups. I have been trying to use numpy but have not had to much luck so far. Thanks

I don't quite get the structure here, could you elaborate?

# ? Jun 24, 2011 01:50

brosmike: Jun 26, 2009

Unknownmass posted:

I am new to python and getting back into programming after a few years. My question is I have a tab-delimited text file with indices and then grouping of data. What would be the best way to import these, and rank them? Also if possible even import them as separate groups. I have been trying to use numpy but have not had to much luck so far. Thanks

What you describe is a bit vague, but probably pretty easy to do (you probably don't need to bother with numpy for the importing). Can you give us an actual example of the format you're trying to read? Telling us what you mean by "separate groups" of data, as well as how you want to rank the data, would help us help you.

# ? Jun 24, 2011 02:59

Unknownmass: Nov 3, 2007

Sorry for being vague. The file is an excel matrix that has been exported as a tab-delimited text file, of 30 points split into 3 groups (points 1-10 group 1, 11-20 group 2 and 21-30 group 3). Each of the 30 points has a distance to each other ie:

A B C D
A 0 2 5 10
B 2 0 4 8
C 5 4 0 3
D 10 8 3 0

That is the general structure but with 30 points and the first column and row are the index points (in the example A,B,C,D). What I am trying to do is import all the points and then rank them in some manner like largest to smallest. Hope this helps. I will post what I have so far soon, but its likely to be ugly as I'm just returning to coding. Thanks for the help.

# ? Jun 24, 2011 04:17

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

There are modules that let you read Excel files directly through Python, you know.

# ? Jun 24, 2011 05:22

FoiledAgain: May 6, 2007

Unknownmass posted:

That is the general structure but with 30 points and the first column and row are the index points (in the example A,B,C,D). What I am trying to do is import all the points and then rank them in some manner like largest to smallest. Hope this helps. I will post what I have so far soon, but its likely to be ugly as I'm just returning to coding. Thanks for the help.

Is this the kind of thing you want to do?

code:

lines =  [line for line in open(your_file_name_here)]
for line in lines:
    line = line.split('\t')
    line = line[1:]
    line.sort()

# ? Jun 24, 2011 06:26

Lurchington: Jan 2, 2003; Forums Dragoon

BeefofAges posted:

There are modules that let you read Excel files directly through Python, you know.

specifically, http://pypi.python.org/pypi/xlrd

it's pretty nice

# ? Jun 24, 2011 15:49

Computer viking: May 30, 2011; Now with less breakage.

Unknownmass posted:

Sorry for being vague. The file is an excel matrix that has been exported as a tab-delimited text file, of 30 points split into 3 groups (points 1-10 group 1, 11-20 group 2 and 21-30 group 3). Each of the 30 points has a distance to each other ie:
pre:
     A  B  C  D
   A 0  2  5  10
   B 2  0  4  8
   C 5  4  0  3
   D 10 8  3  0
That is the general structure but with 30 points and the first column and row are the index points (in the example A,B,C,D). What I am trying to do is import all the points and then rank them in some manner like largest to smallest. Hope this helps. I will post what I have so far soon, but its likely to be ugly as I'm just returning to coding. Thanks for the help.

If I get it right, that's a distance matrix for all the points. What do you count as the value of a single point? Something derived from the distances, or do you have a separate table for that? Also, are the points grouped just by the external knowledge that the first ten are group 1 and so on, or is this encoded somehow?

BTW, the [ pre] tag is useful for fixed-width text.

# ? Jun 24, 2011 16:47

Unknownmass: Nov 3, 2007

Computer viking posted:

If I get it right, that's a distance matrix for all the points. What do you count as the value of a single point? Something derived from the distances, or do you have a separate table for that? Also, are the points grouped just by the external knowledge that the first ten are group 1 and so on, or is this encoded somehow?

BTW, the [ pre] tag is useful for fixed-width text.

Yes it is a distance matrix. The files I'm currently working with are just the distance values, and not the points. The points are just grouped by the external knowledge and have to be separated out. Thanks for everyone's help.

# ? Jun 24, 2011 18:30

Computer viking: May 30, 2011; Now with less breakage.

Unknownmass posted:

Yes it is a distance matrix. The files I'm currently working with are just the distance values, and not the points. The points are just grouped by the external knowledge and have to be separated out. Thanks for everyone's help.

Right, which still doesn't answer what you're sorting the points by. :)

Anyway. To read a distance matrix like that into a numpy array, you can do something like this:

pre:

import numpy as np

infile = open("distance.txt", "r")
header = infile.readline().strip().split("\t")
header_len = len(header)
data = []
rownames = []
for line in infile:
	parts = line.strip().split("\t")
	if len(parts) < header_len:
		break
	rownames.append(parts[0])
	values = [int(p) for p in parts[1:] ]
	data.append(values)

data_array = np.array(data)

That leaves you with the header and the row names (should be identical), a list of lines ("data"), and a numpy array of the same numbers ("data_array"). It's possible to compact this down to just a few lines by nesting two list comprehensions, but ... let's not go there.

I'm still not sure what you're sorting by, so I'll use the sum of distances as a placeholder. To do this, you basically want to sort a list of key,value - pairs on the value - a neat way is to use operators.itemgetter to create a "get the second element"-function, and give that to "sorted". (Remember that we count from 0.)

pre:

import operator
sum_of_distances = map(sum,data_array)
name_with_distance = zip(header,sum_of_distances)
nwd_sorted = sorted(name_with_distance, key=operator.itemgetter(1))

(Of course, you could just swap the order of the arguments in the zip function, to put the value first ... but that wouldn't let me talk about itemgetter.)

Oh, and output:

pre:

>>> header
['a', 'b', 'c', 'd', 'e']
>>> data_array
array([[0, 1, 1, 5, 4],
       [1, 0, 2, 5, 4],
       [1, 2, 0, 3, 3],
       [5, 5, 3, 0, 5],
       [4, 4, 3, 5, 0]])
>>> name_with_distance
[('a', 11), ('b', 12), ('c', 9), ('d', 18), ('e', 16)]
>>> nwd_sorted
[('c', 9), ('a', 11), ('b', 12), ('e', 16), ('d', 18)]

As for the groups, uhm. You can get group N by grabbing header[N*10:(N+1)*10] and data_array[N*10:(N+1)*10, N*10:(N+1)*10], then work with those (slices are from-and-including:to-but-not-including).

Computer viking fucked around with this message at 20:17 on Jun 24, 2011

# ? Jun 24, 2011 19:53

Clandestine!: Jul 17, 2010

Here with more stupid questions. I finished a text handling program, which was painfully easy to do (it counted the words and sentences in a file, nothing too crazy). I, however, have been stumped for the past hour on another text handling program which should've been even easier: one using a dictionary to count all of the words in a text file and display the output in alphabetical order.

code:

word_counts = {}
word_items = {}

myfile = open('gettysburg.txt', 'r')

for words in myfile:
    word_items[words] = word_counts.items()
    word_items.sort()
    
print word_items

(I'm using a text file of the Gettysburg address, in case anyone cares :v:

) I have a feeling that this is FAR too simple and I am doing things badly; as well, I'm getting an attribute error that says that the dictionary object has no sort function. For reference: I'm using the online guide "How to Think Like a Computer Scientist" now; it's been pretty good to me so far, but it's providing no hints right now.

# ? Jun 26, 2011 16:53

Adbot: ADBOT LOVES YOU

# ? Jun 11, 2024 01:43

tripwire: Nov 19, 2004; _{ghost flow}

First: Dictionaries are not for counting. There is a defaultDict class however which you can use for counting (and if you are using 2.7 or higher, theres even a specialized Tally class).

What do you think this line does?

code:

word_items[words] = word_counts.items()

You never update the contents of the dictionary "word_counts", but you query it a whole lot. Also, calling sort() on a dictionary is a nonsequitor: dictionaries are only mappings between keys and values; there is no concept of ordering in a dictionary. If you want though, you can ask the dictionary for its keys, and throw those into the "sorted" function if you want the keys in sorted order.

I don't know what your textfile looks like, but I just googled for a text file of the address and got this:
http://morphadorner.northwestern.edu/morphadorner/techtalk/sentenceandtokenoffsets/gettysburg.txt

I think this is what you are trying to do:

code:

from collections import defaultdict
word_tally = defaultdict(int)
#a default dict takes a factory function as an argument. whenever you lookup a key
#which isn't in the dictionary, it uses that factory function to make a value for
#the key. In this case, int() is used to return the integer value 0.

with open('gettysburg.txt', 'r') as myfile:
    for line in myfile:
        for word in line.strip().split(' '):
            word_tally[word] += 1

for word in sorted( word_tally.keys() ):
    print word, word_tally[word]

Do you understand what this code is doing?

tripwire fucked around with this message at 17:48 on Jun 26, 2011

# ? Jun 26, 2011 17:02

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »