Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

gmq posted:

Python code:
movie['genres'] = [str(s) for s in mq.genres]
did the trick. I think json.dumps didn't like the unescaped single quotes inside each item in the list.

Thanks though!

What? No. It didn't like the custom class that it didn't know how to serialize. The quote you saw was just the repr.

# ? May 25, 2012 17:44

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 03:58

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

This may (is) a silly question, but I decided to start playing around with dictionaries today after reading y'all talking about them. So I did a small example:

code:

In [217]: classroom={"Name":[], "Grade":[]}

In [218]: classroom["Name"].append("Alex")

In [219]: classroom["Grade"].append(86.5)

In [220]: classroom["Name"].append("Anne")

In [221]: classroom["Grade"].append(90.0)

In [222]: classroom
Out[222]: {'Grade': [86.5, 90.0], 'Name': ['Alex', 'Anne']}

In [223]: classroom["Grade"][ classroom["Name"].index("Anne") ]
Out[223]: 90.0

My question is in regards in Line 223. Is there an easier way to pull Anne's grade out in this example? Alternatively, I'd also kind of prefer the dictionary to look something more like:

{{"Name":"Alex", "Grade"=86.5},
{"Name":"Anne", "Grade"=90.0}}

Or something similar that I could eventually query just a name and get the grade (and eventually all pertinent info). I could just run a line similar to 223 for all the other entries, but I have to assume there is an easier way to do this? Thanks, and sorry if this is just the dumbest question. :/

# ? May 25, 2012 19:55

Lysidas: Jul 26, 2002; John Diefenbaker is a madman who thinks he's John Diefenbaker.; Pillbug

JetsGuy posted:

There are a lot of ways to approach this, but I'd probably do something like the following:

code:

In [1]: class Student:
   ...:     def __init__(self, name, grade):
   ...:         self.name = name
   ...:         self.grade = grade
   ...:     def __repr__(self):
   ...:         return '{}: {}'.format(self.name, self.grade)
   ...:     

In [2]: classroom = {}

In [3]: classroom['Alex'] = Student('Alex', 86.5)

In [4]: classroom['Anne'] = Student('Anne', 90.0)

In [5]: classroom
Out[5]: {'Alex': Alex: 86.5, 'Anne': Anne: 90.0}

In [6]: classroom['Anne'].grade
Out[6]: 90.0

(maybe stealth) EDIT: Note that this is not what the __repr__ method is supposed to do ideally; I overrode it just to make pprinting classroom more informative.

Lysidas fucked around with this message at 20:13 on May 25, 2012

# ? May 25, 2012 20:10

vikingstrike: Sep 23, 2007; whats happening, captain

^^Kind of beaten. His way works too.

I think it would be a little easier to do something like:

Python code:

>>> students = {}
>>> students["Alex"] = {'test1':85}
>>> students["Anne"] = {'test1':90}
>>> students
{'Anne': {'test1': 90}, 'Alex': {'test1': 85}}
>>> students["Alex"]["test1"]
85
>>> students["Anne"]["test1"]
90

# ? May 25, 2012 20:12

Lysidas: Jul 26, 2002; John Diefenbaker is a madman who thinks he's John Diefenbaker.; Pillbug

I was going by JetsGuy's "and eventually all pertinent info" comment -- you can make the Student object as complicated as you want:

code:

In [1]: class Student:
   ...:     def __init__(self, name):
   ...:         self.name = name
   ...:         self.grades = []
   ...:     def avg(self):
   ...:         return sum(self.grades) / len(self.grades)
   ...:     def is_passing(self):
   ...:         return self.avg() > 60
   ...:     def __repr__(self):
   ...:         return '{}: avg {}, {}'.format(self.name, self.avg(),
   ...:             'passing' if self.is_passing() else 'failing')
   ...:     

In [2]: classroom = {}

In [3]: classroom['Alex'] = Student('Alex')

In [4]: classroom['Alex'].grades.append(86.5)

In [5]: classroom['Alex'].grades.append(49.0)

In [6]: classroom['Alex'].grades.append(34.0)

In [7]: classroom['Anne'] = Student('Anne')

In [8]: classroom['Anne'].grades.append(90.0)

In [9]: classroom['Anne'].grades.append(100.0)

In [10]: classroom['Anne'].grades.append(94.5)

In [11]: classroom['Alex'].grades
Out[11]: [86.5, 49.0, 34.0]

In [12]: classroom['Anne'].is_passing()
Out[12]: True

In [13]: classroom
Out[13]:
{'Alex': Alex: avg 56.5, failing,
 'Anne': Anne: avg 94.83333333333333, passing}

# ? May 25, 2012 20:34

Emacs Headroom: Aug 2, 2003

JetsGuy posted:

This may (is) a silly question, but I decided to start playing around with dictionaries today after reading y'all talking about them. So I did a small example:

I would organize things a bit differently. For a set of things with members, I tend to use a list of dictionaries (unless they need associated methods, in which case I use classes).

It's easy to use list comprehensions on lists of dictionaries to get SQL-like behavior:

code:

In [1]: students = [{'name': 'Alex', 'Grade': 86.5}, {'name': 'Anne', 'Grade': 90.0}]

In [2]: print [s['Grade'] for s in students if s['name'] == 'Anne']
[90.0]

edit: it's also dead-simple to convert this to json or sqlite or whatever

edit2: if 99% of the time you're looking up info for a student by name, then you should do what the others said and make a big dictionary of students where the key is their name. If you need to look them up by other entries often though (like by their grades or their classroom etc.) then a list of dicts is better I think

Emacs Headroom fucked around with this message at 20:39 on May 25, 2012

# ? May 25, 2012 20:35

Reformed Pissboy: Nov 6, 2003

Well I got hella beaten on code examples, but the big takeaway should be that you need to consider how you want your data to be stored (your "desired behavior" example was so close!). You need some kind of container to hold an individual student's information (after all, it's the student that has a Name or a Grade, not a classroom), THEN you can add them to a larger classroom container (so you can say "get the Grades from everybody in the classroom" etc.).

edit: heck, more code never hurt anyone, this was the example I was going to use (also shows how you can pretty easily make a dictionary almost any way you want)

Python code:

# Using dictionaries only
>>> student_list = [ {"Name":"Alex", "Grade":86.5}, 
{"Name":"Anne", "Grade":90.0}, 
{"Name":"some idiot", "Grade":0} ]
>>> classroom = dict()
>>> for student in student_list:
...     classroom[ student["Name"] ] = student
...
>>> classroom
{'Anne': {'Grade': 90.0, 'Name': 'Anne'}, 
'Alex': {'Grade': 86.5, 'Name': 'Alex'}, 
'some idiot': {'Grade': 0, 'Name': 'some idiot'}}
>>> classroom["Alex"]
{'Grade': 86.5, 'Name': 'Alex'}
>>> classroom["Alex"]["Grade"]
86.5
>>> for student_name in classroom:
...     student = classroom[student_name]
...     print student_name, "has a", student["Grade"]
...
Anne has a 90.0
Alex has a 86.5
some idiot has a 0

# Using classes
>>> class Student:
...     def __init__(self, name, grade=0):
...             self.name = name
...             self.grade = grade
...     def __str__(self):
...             return "%s, grade: %s" % (self.name, self.grade)
...     def __repr__(self):
...             return "<Student %s>" % (self.name,)
...
>>> Alex = Student("Alex", grade=86.5)
>>> Anne = Student("Anne", 90.0)
>>> Dunce = Student("some idiot")
>>> student_list = [Alex, Anne, Dunce]
>>> another_class = dict()
>>> for student in student_list:
...     another_class[student.name] = student
...
>>> print another_class
{'Anne': <Student Anne>, 'Alex': <Student Alex>, 'some idiot': <Student some idiot>}
>>> another_class["Anne"]
<Student Anne>
>>> another_class["Anne"].grade
90.0
>>> for student_name in another_class:
...     student_data = another_class[student_name]
...     print student_data
...
Anne, grade: 90.0
Alex, grade: 86.5
some idiot, grade: 0

Reformed Pissboy fucked around with this message at 20:54 on May 25, 2012

# ? May 25, 2012 20:42

Lysidas: Jul 26, 2002; John Diefenbaker is a madman who thinks he's John Diefenbaker.; Pillbug

Ridgely_Fan posted:

edit2: if 99% of the time you're looking up info for a student by name, then you should do what the others said and make a big dictionary of students where the key is their name. If you need to look them up by other entries often though (like by their grades or their classroom etc.) then a list of dicts is better I think

This is turning into a serious bikeshedding discussion, but even if you're going to look up by other keys reasonably often, you lose almost nothing by storing a dict of dicts instead of a list of dicts. You get constant-time lookup with what is hopefully your most commonly-used key, and you can still do a lazy linear search over all values by replacing students in your list comprehension with classroom.values().

# ? May 25, 2012 21:09

Emacs Headroom: Aug 2, 2003

To add to the bikeshedding, I wonder which uses more memory in practice:

1) Making a dict of dicts
2) Making a list of dicts, and a separate dict mapping unique keys (like names) to indices in the list

I believe the performance should be identical between them for all cases (as long as you're always appending to the list of dicts rather than inserting in the middle)

# ? May 25, 2012 21:27

IAmKale: Jun 7, 2007; やらないか; Fun Shoe

Is pickling commonly used in Python? I've only heard about it once in the few tutorials I've been through so I have no idea how often it's used to store serialized information.

# ? May 25, 2012 21:36

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Karthe posted:

Is pickling commonly used in Python? I've only heard about it once in the few tutorials I've been through so I have no idea how often it's used to store serialized information.

Common? Yes, somewhat. Useful and worthwhile? No. It has several large pitfalls, so if you can easily serialize data to a more agnostic format (like JSON), do.

# ? May 25, 2012 21:39

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Ridgely_Fan posted:

To add to the bikeshedding, I wonder which uses more memory in practice:

1) Making a dict of dicts
2) Making a list of dicts, and a separate dict mapping unique keys (like names) to indices in the list

I believe the performance should be identical between them for all cases (as long as you're always appending to the list of dicts rather than inserting in the middle)

How could the second possibly use less memory?

# ? May 25, 2012 23:52

Maluco Marinero: Jan 18, 2001; Damn that's a
fine elephant.

Suspicious Dish posted:

Common? Yes, somewhat. Useful and worthwhile? No. It has several large pitfalls, so if you can easily serialize data to a more agnostic format (like JSON), do.

Yeah, I just sort of avoided investigating pickling entirely for some custom fields in Django myself. Just the sound of it feels unstable when in most instances you could get the job done with a list or dictionary and JSON encoding. Plain text formats feel so much more stable, especially cause you can read what the hell is going on if there's problems.

# ? May 26, 2012 00:20

lunar detritus: May 6, 2009

While talking about lists and dictionaries.

I have dictionaries inside dictionaries inside a list. It looks somewhat like this but with a lot more keys:

code:

[{
  "3658": {
    "tmdbid": 20181, 
    "imdbid": "tt0970462", 
  }
},
{
  "2768": {
    "tmdbid": 1845, 
    "imdbid": "tt0841044",
  }
},
{
  "3537": {
    "tmdbid": "Unavailable"
  }
}]

Is there a way to sort this list using the value inside the "tmdbid" keys? There's some duplicate data with the same "tmdbid" that should be different movies and sorting the list would make it infinitely easier to check which movies need to be corrected.

# ? May 26, 2012 02:41

Emacs Headroom: Aug 2, 2003

Plorkyeran posted:

How could the second possibly use less memory?

Python dicts are hash tables instead of rb trees I believe, and hash tables have to pre-allocate quite a bit of space to reduce collision probability.

I would be surprised if dicts didn't use more memory than equivalent arrays for this reason (but it would be a small percentage of total memory as the space would only need to be allocated for references). Though having the second dict for looking up indices would completely invalidate the advantage, so nevermind.

# ? May 26, 2012 03:48

Modern Pragmatist: Aug 20, 2008

gmq posted:

While talking about lists and dictionaries.

I have dictionaries inside dictionaries inside a list. It looks somewhat like this but with a lot more keys:
code:
[{
  "3658": {
    "tmdbid": 20181, 
    "imdbid": "tt0970462", 
  }
},
{
  "2768": {
    "tmdbid": 1845, 
    "imdbid": "tt0841044",
  }
},
{
  "3537": {
    "tmdbid": "Unavailable"
  }
}]
Is there a way to sort this list using the value inside the "tmdbid" keys? There's some duplicate data with the same "tmdbid" that should be different movies and sorting the list would make it infinitely easier to check which movies need to be corrected.

This should work.

Python code:

def sorter(item):
    return item.values()[0]['tmdbid']

# Where A is your nested dataset
S = sorted(A,key=sorter)

# ? May 26, 2012 03:50

Emacs Headroom: Aug 2, 2003

gmq posted:

Is there a way to sort this list using the value inside the "tmdbid" keys? There's some duplicate data with the same "tmdbid" that should be different movies and sorting the list would make it infinitely easier to check which movies need to be corrected.

If 'a' is your array:

Python code:

sorted(a, key=lambda x: x.values()[0]['tmdbid'])

# ? May 26, 2012 03:53

lunar detritus: May 6, 2009

Ridgely_Fan posted:

If 'a' is your array:
Python code:
sorted(a, key=lambda x: x.values()[0]['tmdbid'])

It works, it works. :woop:

Thanks!
Now to read about lambda.

# ? May 26, 2012 04:15

IAmKale: Jun 7, 2007; やらないか; Fun Shoe

I'm trying to use TKinter to design a GUI but I'm having trouble figuring out how to center multiple lines of text on a button. Some guides say to use justify=CENTER when defining the button, but apparently it's no good in the most recent version of it. Any ideas?

# ? May 29, 2012 23:54

FoiledAgain: May 6, 2007

I have a text file with about 10,000 words in it, many of which are duplicates (I have no idea how many). I want a list of the unique words in there. I can think of two ways to do this, but I'm curious which one is faster/more efficient. (Or is there a better different way?) Intuitively the second one seems faster, but I don't know how Python makes a set out of a list.

code:

#OPTION 1
for item in corpus:
    if item not in unique_list:
        unique_list.append(item)

#OPTION 2
for item in corpus:
    unique_list.append(item)

unique_list = list(set(unique_list))

# ? May 31, 2012 05:41

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Uh, why not just

code:

unique_items = set(corpus)

The set constructor can take any iterable (if you can iterate over it with a for loop, it's an iterable), so there's no need to construct a temporary list.

I'm also not sure why you want a list back.

# ? May 31, 2012 05:47

FoiledAgain: May 6, 2007

Suspicious Dish posted:

Uh, why not just
code:
unique_items = set(corpus)
The set constructor can take any iterable (if you can iterate over it with a for loop, it's an iterable), so there's no need to construct a temporary list.

I'm also not sure why you want a list back.

Because I simplified the problem in the interests of a short post. Although you're right, I don't strictly need a set. What I'm actually doing is:

code:


for line in corpus.readlines():
    words = line.split(sep):
        for word in words:
            sylls = syllabify(word)
            for syl in sylls:
                if syl not in unique_sylls:
                    unique_sylls.append(syl)

I'm still simplifying, but I can't directly do what you're suggesting, because I have to do a bunch of other stuff first. When constructing that unique_sylls list at the end, is it better to do what I wrote here, to just cram every syllable I find into a list, then turn it into a set?

# ? May 31, 2012 07:36

peepsalot: Apr 24, 2007; ��PEEP THIS...
��BITCH!

You don't seem to grasp that the whole purpose of a set is to store a unique collection of elements. Repeatedly using "in" on a list is incredibly less efficient than just using a set for what it was made for.

At least do something like this:

code:

unique_sylls = set()
for line in corpus.readlines():
    words = line.split(sep):
        for word in words:
            sylls = syllabify(word)
            for syl in sylls:
	        unique_sylls.add(syl)

# ? May 31, 2012 07:48

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

So, quick nits: don't use readlines, iterating over a file directly will do what you want. I'd probably write it like this:

Python code:

def get_syllables(corpus):
    """
    Get all the unique syllables in corpus.
    """
    syllables = set()
    for line in corpus:
        for word in line.split():
            syllables.update(syllablize(word))
    return syllables

# ? May 31, 2012 07:49

FoiledAgain: May 6, 2007

peepsalot posted:

You don't seem to grasp that the whole purpose of a set is to store a unique collection of elements. Repeatedly using "in" on a list is incredibly less efficient than just using a set for what it was made for.

I know what a set is for, I was just being an idiot about how to use them in Python. Your suggestion is great. Thanks!

# ? May 31, 2012 08:19

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

In brief: what templating solution should I use?

Long version: I've got a project that generates a scaffold of folders and files (XML & Python code), basically to be filled in and used as a plugin for another program. Having written the code to generate this stuff, I realize that I should have used a templating solution. So, which one? Somewhere back in the thread, someone dissed Cheetah, which from my limited experience seemed to be fine and easy to get the hang of. (But then, I struggled with TAL and Zope Page templates for years.) Recommendations or criteria I should consider?

# ? May 31, 2012 16:05

No Safe Word: Feb 26, 2005

outlier posted:

In brief: what templating solution should I use?

Long version: I've got a project that generates a scaffold of folders and files (XML & Python code), basically to be filled in and used as a plugin for another program. Having written the code to generate this stuff, I realize that I should have used a templating solution. So, which one? Somewhere back in the thread, someone dissed Cheetah, which from my limited experience seemed to be fine and easy to get the hang of. (But then, I struggled with TAL and Zope Page templates for years.) Recommendations or criteria I should consider?

Other notable options which I have no opinion on are:

Genshi
Jinja2
Mako

and Django has a templating engine as well though I'm not sure how easy it would be to use without the rest of Django (which is a pretty full featured web framework)

# ? May 31, 2012 16:11

TOO SCSI FOR MY CAT: Oct 12, 2008; this is what happens when you take UI design away from engineers and give it to a bunch of hipster art student "designers"

No Safe Word posted:

Other notable options which I have no opinion on are:

Genshi
Jinja2
Mako

and Django has a templating engine as well though I'm not sure how easy it would be to use without the rest of Django (which is a pretty full featured web framework)

Jinja is essentially a standalone version of Django's template system.

Genshi is far and away the best at generating XML, since it represents the template as a sequence of SAX events, but its plain-text templates are merely OK.

Mako is sort of a modernized PSP, which takes the PHP <% embedded code %> model and applies it to Python.

Cheetah is designed to be "fast", but in practice this isn't a useful goal because template rendering is almost never a bottleneck, and if it is then no template engine written in Python will be suitable. When I used Cheetah a few years ago, it was rather awkward to "do things right".

---

If you need a lot of fancy plain-text templates, and only limited XML templating, use Jinja. If you need basic plain-text but want to have fancy XML (xpath, xinclude, matching), use Genshi. If you need fancy text *and* XML, then use Jinja for text and Genshi for XML.

# ? May 31, 2012 16:30

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Suspicious Dish posted:

So, quick nits: don't use readlines, iterating over a file directly will do what you want. I'd probably write it like this:
Python code:
def get_syllables(corpus):
    """
    Get all the unique syllables in corpus.
    """
    syllables = set()
    for line in corpus:
        for word in line.split():
            syllables.update(syllablize(word))
    return syllables

I have no doubt yours is better, but when I was very new to Python (and programming) every time I wanted to get uniques from a list with duplicates I'd do something like:

Python code:

list_of_uniques = dict.fromkeys(words).keys()

I was impressed with my self-perceived cleverness when I thought of that. So much so, I just kind of do that without thinking whenever I need uniques nowadays.

# ? May 31, 2012 16:46

good jovi: Dec 11, 2000; 'm pro-dickgirl, and I VOTE!

Thermopyle posted:

Python code:

list_of_uniques = dict.fromkeys(words).keys()

When I first started out and wanted to check for a substring, I would do this:

Python code:

if haystack.find(needle) != -1:
    do_something()

I bet there are a few of these still lurking around certain codebases today.

# ? May 31, 2012 16:56

Captain Capacitor: Jan 21, 2008; The code you say?

outlier posted:

In brief: what templating solution should I use?

Long version: I've got a project that generates a scaffold of folders and files (XML & Python code), basically to be filled in and used as a plugin for another program. Having written the code to generate this stuff, I realize that I should have used a templating solution. So, which one? Somewhere back in the thread, someone dissed Cheetah, which from my limited experience seemed to be fine and easy to get the hang of. (But then, I struggled with TAL and Zope Page templates for years.) Recommendations or criteria I should consider?

On top of the other suggestions in the thread, Paste scripts and templates are really popular for more filesystem-oriented templating.

# ? May 31, 2012 16:57

deimos: Nov 30, 2006; Forget it man this bat is whack, it's got poobrain!

Suspicious Dish posted:

So, quick nits: don't use readlines, iterating over a file directly will do what you want. I'd probably write it like this:
Python code:
def get_syllables(corpus):
    """
    Get all the unique syllables in corpus.
    """
    syllables = set()
    for line in corpus:
        for word in line.split():
            syllables.update(syllablize(word))
    return syllables

A small optimization, early, I know, but good to keep in mind for when you want to start scaling this poo poo to millions of lines:

Python code:

def get_syllables(corpus):
    """
    Get all the unique syllables in corpus.
    """
    syllables = set()
    syllable_update = syllables.update
    for line in corpus:
        for word in line.split():
            syllables_update(syllablize(word))
    return syllables

Also, for some workloads using a dict's keys is sometimes faster for uniqueness than turning things into a set, so this could be faster (while using more memory):

Python code:

def get_syllables(corpus):
    """
    Get all the unique syllables in corpus.
    """
    syllables = {}
    syllables_update = syllables.update
    for line in corpus:
        for word in line.split():
            syllables_update(dict.fromkeys(syllablize(word)))
    return syllables.keys()

e: Goddamnit you clever bastard:

Thermopyle posted:

I have no doubt yours is better, but when I was very new to Python (and programming) every time I wanted to get uniques from a list with duplicates I'd do something like:
Python code:
list_of_uniques = dict.fromkeys(words).keys()
I was impressed with my self-perceived cleverness when I thought of that. So much so, I just kind of do that without thinking whenever I need uniques nowadays.

deimos fucked around with this message at 17:04 on May 31, 2012

# ? May 31, 2012 17:00

Emacs Headroom: Aug 2, 2003

deimos posted:

Also, for some workloads using a dict's keys is sometimes faster for uniqueness than turning things into a set, so this could be faster (while using more memory):

Why is this? Aren't they both based on hash tables?

# ? May 31, 2012 17:07

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

Have any of you had to write your own variant on readline? I'm trying to use Stackless Python, and I want something that can read stdin, or later some other stream, without blocking on it. Personally I want to be able to make a green thread version of the Python REPL but it's really the input reading that is the killer. I need to be able to poll input and yield if there isn't any extra characters.

I started to work on it and got the basics so I know it is tenable, but there are ton of corner cases. Consider backspace and delete, as well as the up arrow or down arrow. Also tab completion. With the cmd module it's easy enough to make something react to these keys but I have to be sure to code to react to them.

So far I have only found system-specific ways to poll the keyboard and they make me sad. Does anybody have anything here?

# ? May 31, 2012 17:30

deimos: Nov 30, 2006; Forget it man this bat is whack, it's got poobrain!

Ridgely_Fan posted:

Why is this? Aren't they both based on hash tables?

I haven't tested this thoroughly (definitely not recently, and I ended up going with a set at the time anyways because it seemed clearer to me), but if I remember correctly when I looked at the C code I think the only big difference I noticed was in how they dealt with resizing.

One of the people that have actually touched the core code might be able to give more details (or call bullshit on me since I only did a naive test for the performance of this, so I may be wrong).

# ? May 31, 2012 17:40

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

deimos posted:

A small optimization, early, I know, but good to keep in mind for when you want to start scaling this poo poo to millions of lines:
Python code:
def get_syllables(corpus):
    """
    Get all the unique syllables in corpus.
    """
    syllables = set()
    syllable_update = syllables.update
    for line in corpus:
        for word in line.split():
            syllables_update(syllablize(word))
    return syllables

What is wrong with you? If you need something to go fast, use Cython or PyPy. Don't do something that will turn a LOAD_ATTR (set is a builtin class, so it has __slots__, which means that it's just doing a map lookup on an interned string to a slot location) into a LOAD_FAST.

deimos posted:

Also, for some workloads using a dict's keys is sometimes faster for uniqueness than turning things into a set

Nope. A set is pretty much a dictionary without an associated value.

# ? May 31, 2012 20:00

FoiledAgain: May 6, 2007

I had a pretty stupid question, so I'm glad it's generated some actually interesting discussion.

# ? May 31, 2012 20:41

deimos: Nov 30, 2006; Forget it man this bat is whack, it's got poobrain!

Suspicious Dish posted:

What is wrong with you? If you need something to go fast, use Cython or PyPy. Don't do something that will turn a LOAD_ATTR (set is a builtin class, so it has __slots__, which means that it's just doing a map lookup on an interned string to a slot location) into a LOAD_FAST.

Turning something that does a LOAD_FAST and a LOAD_ATTR into something that does just a LOAD_FAST.

e: PyPy optimizes the LOAD_ATTR to a LOAD_METHOD which is much faster (doing just the single LOAD_FAST is still very slightly faster, but definitely goes into the "not worth it" realm).

deimos fucked around with this message at 22:23 on May 31, 2012

# ? May 31, 2012 21:58

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Yeah, don't hand-optimize bytecode like that. If you want speed, use PyPy/Cython instead. Also, profile before doing that blindly, thinking it's going to make a difference.

# ? May 31, 2012 22:40

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 03:58

Kungfoomasta: Apr 26, 2007; No user servicable parts inside.

I'm very new to Python. I'm writing a web application that will allow users to upload a file, and then allow that file to be downloaded one single time before being deleted from the server. I've got everything working except for the file deletion portion. So far, the only way I know how to present the file to the user for download is to generate a link that they can click. What I want is some way to indicate to the application that the file has finished being downloaded, since I don't want to delete the file while it's still being downloaded. I've looked at a bunch of examples of progress bars and things using urllib and urllib2, but they all appear to be intended for use in command line applications for downloading files. I'm actually trying to serve up the files with my application. So my question is - is there a way to indicate when a file download has completed?

Edit: ok scratch all that - I'm able to trigger the download now through the script, which will let me control when the delete occurs. What's happening now though is that my downloads are being mangled. If I create and upload a PNG, for example, the file is the correct size on the server, and if I open the file in Gimp on the server it displays, but when I download the file through my script, I'm unable to open the file and view the image. This is what I'm using for the download:

code:

def download_file():
	getfile = open(src,'rb')
	buffer = getfile.read()
	
	print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n" %    (os.path.split(src)[-1], len(buffer))
	print buffer

download_file()

where src is the file including path. I thought that including the "rb" in open(src,'rb') meant to read it in as binary, but maybe I'm doing it wrong - does anyone know why this doesn't work?

Kungfoomasta fucked around with this message at 17:03 on Jun 1, 2012

# ? Jun 1, 2012 15:45

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »