Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
king salmon
Oct 30, 2011

by Cowcaster
Been doing some code golf type stuff and I've found myself using this pattern a lot:

Python code:

def iterate(f, x, condition):
	while condition(x):
		yield x
		x = f(x)

Is there anything in itertools that can duplicate this (in one line)?

king salmon fucked around with this message at 05:15 on Jul 25, 2014

Adbot
ADBOT LOVES YOU

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER
Python code:
>>> not all("men")
False
Thanks Python!

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER

ENSENDA CURES MAIL posted:

Been doing some code golf type stuff and I've found myself using this pattern a lot:

Python code:

def iterate(f, x, condition):
	while condition(x):
		yield x
		x = f(x)

Is there anything in itertools that can duplicate this (in one line)?

Python code:
#!/usr/bin/env python3
import itertools
def iterate(f, x, condition):
    while condition(x):
        yield x
        x = f(x)

gen1 = iterate(lambda x: x**2, 2, lambda x: x <= 1000)
print(list(gen1)) # [2, 4, 16, 256]


gen2 = itertools.takewhile(lambda x: x <= 1000, itertools.accumulate(itertools.cycle((2,)), func=lambda x, y: x**2))
print(list(gen2)) # [2, 4, 16, 256]

Drunk Badger
Aug 27, 2012

Trained Drinking Badger
A Faithful Companion

Grimey Drawer
Are there any python IDEs that would allow me to debug step through a thread? I'm trying to do this with visual studio and the python plugin, but I can move to another IDE if it will work.

Right now I'm just stepping into the thread start, and then it just goes off on its own without stopping at breakpoints.

accipter
Sep 12, 2003

Drunk Badger posted:

Are there any python IDEs that would allow me to debug step through a thread? I'm trying to do this with visual studio and the python plugin, but I can move to another IDE if it will work.

Right now I'm just stepping into the thread start, and then it just goes off on its own without stopping at breakpoints.

I use the community version of PyCharm, which apparently supports debugging multiple threads. However, I have no experience doing so.

Drunk Badger
Aug 27, 2012

Trained Drinking Badger
A Faithful Companion

Grimey Drawer

accipter posted:

I use the community version of PyCharm, which apparently supports debugging multiple threads. However, I have no experience doing so.

I'll give it a try and report back if I can get it to work - thanks!

the
Jul 18, 2004

by Cowcaster
As I've posted before, I'm using Beatbox to make queries to a Salesforce database. These are being returned as dict files. I've been normally grabbing the information from them and storing them to a list, but the query I'm doing now has some complicated results that I think should be saved to a dictionary. The results are basically looking like this:

code:
Breed        Adoption Info
             Name         Adopted
Collie       Sparky       True

Maltese      Rex          False
             Jenny        False
             Spike        True

Boxer

Bulldog      Wiggles      False
So a dict entry for this would look like:

code:
{'Breed': 'Collie', 'Adoption Info': [{'Name': 'Sparky', 'Adopted': False}]}
My questions are:

1. Is the best way to store all the information from this query to just append it to a blank dictionary that I create? I need to do stuff with it later (eliminate certain entries meeting a criteria).

2. How the hell do I just append dictionary entries to a blank dict? I tried dict.append(row) for row in query, but that's not working.

SurgicalOntologist
Jun 17, 2004

It sounds like you want a list of dictionaries. Although, this would be another case where using pandas would really help you. It has methods to create a DataFrame directly from database queries also.

Actually, you might want a dictionary with the breed as the key and the adoption info as the value. So it would look like
Python code:
{'Collie': [{'Name': 'Sparky', 'Adopted': False}]}
Furthermore, it might be easier to make adoption info a tuple (or to be real fancy, look up namedtuples).
Python code:
{'Collie': [('Sparky', False)]}
If you want to perfect the standard library approach instead of going with pandas, you might also consider using a defaultdict, which lets you avoid having to initialize the list for each breed.
Python code:
from collections import defaultdict, namedtuple

AdoptionInfo = namedtuple('AdoptionInfo', ('name', 'adopted'))

breeds = defaultdict(list)

for breed, adoption_info in query_result:
    breeds[breed].append(AdoptionInfo(adoption_info['Name'], adoption_info['Adopted']))

SurgicalOntologist fucked around with this message at 19:43 on Jul 25, 2014

vikingstrike
Sep 23, 2007

whats happening, captain
This is exactly the stuff that pandas and DataFrames are great for. You don't need to worry about dictionaries or whatever. If, for example, you need all of the "Collies", then you just use logical indexes and grab the information you need.

the
Jul 18, 2004

by Cowcaster

SurgicalOntologist posted:

It sounds like you want a list of dictionaries. Although, this would be another case where using pandas would really help you. It has methods to create a DataFrame directly from database queries also.

The module grabs queries 500 entries at a time. There are 8862. How do I store/append them to the dataframe during this process?

And, if you can answer this, you will make my life:

Is there a way, using Pandas, that I could return (in this example) every Breed that has exactly one Adoption Info attached to it (eliminating Maltese and Boxer from that list)?

SurgicalOntologist
Jun 17, 2004

the posted:

The module grabs queries 500 entries at a time. There are 8862. How do I store/append them to the dataframe during this process?

http://pandas.pydata.org/pandas-docs/stable/merging.html

the posted:

Is there a way, using Pandas, that I could return (in this example) every Breed that has exactly one Adoption Info attached to it (eliminating Maltese and Boxer from that list)?

Get the data in pandas and play with it. Maybe you can figure this out yourself? You've already posted an example of using groupby and count in pandas. That's step 1. That leaves two steps. 2: Find the breeds with count equal to one (hint: == 1). 3: Selectively index in the DataFrame (hint: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing).

vikingstrike
Sep 23, 2007

whats happening, captain

the posted:

The module grabs queries 500 entries at a time. There are 8862. How do I store/append them to the dataframe during this process?

And, if you can answer this, you will make my life:

Is there a way, using Pandas, that I could return (in this example) every Breed that has exactly one Adoption Info attached to it (eliminating Maltese and Boxer from that list)?

Do you even look at the docs? Here's how you can concatenate DataFrames together: http://pandas.pydata.org/pandas-docs/stable/merging.html

For your second issue, you want to think about combining what you asked previously about getting group sizes and then using logical indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html

the
Jul 18, 2004

by Cowcaster

SurgicalOntologist posted:

Get the data in pandas and play with it. Maybe you can figure this out yourself? You've already posted an example of using groupby and count in pandas. That's step 1. That leaves two steps. 2: Find the breeds with count equal to one (hint: == 1). 3: Selectively index in the DataFrame (hint: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing).

Thanks for this. I have the information in a Dataframe now, and I'm going to look at that. I was basically just looking for a "Yes, that can be done with Pandas," because my initial "hard way" strategy would have been doing a bunch of complicated nested loops to check each entry for duplicates and eliminate or keep it if it met the criteria I was looking for.

SurgicalOntologist
Jun 17, 2004

the posted:

Thanks for this. I have the information in a Dataframe now, and I'm going to look at that. I was basically just looking for a "Yes, that can be done with Pandas," because my initial "hard way" strategy would have been doing a bunch of complicated nested loops to check each entry for duplicates and eliminate or keep it if it met the criteria I was looking for.

For the record though, this would also be very easy using the standard library approach:
Python code:
singular_breeds = {breed: dogs for breed, dogs in breeds.items() if len(dogs) == 1}
Also if you have a question that starts "is it possible to..." you should just assume the answer is yes and move on to the associated "how" question.

SurgicalOntologist
Jun 17, 2004

I thought I'd share: I'm in the process of breaking up a huge project I'm using to run motor-control experiments (interactive 2D graphics with interactions via motion-capture equipment). I'm taking each piece, adding the documentation and tests I should have written from the beginning, and releasing them separately. I did the first one yesterday:
https://github.com/hsharrison/pyglet2d

Whenever games come up in this thread, pygame is always mentioned. IMO, pyglet is a much nicer alternative. It uses callbacks instead of an event loop, which is not only convenient but also more flexible. It also doesn't have a website that looks like it's from 1995.

The downside for us, and others I've heard, is that it doesn't have 2D primitives like rectangles. It does provide an interface for OpenGL calls, though. So, I've written a small module that acts as an interface between pyglet and a geometry library (Polygon3). So not only can you draw shapes, but you can also do complicated unions and intersects and whatnot to make complex shapes.

The next package will be for streaming position data from motion-capture equipment (a Python interface to the C library VRPN). The final piece will be some projective geometry calibration procedures that will turn any screen into a touchscreen, provided you have a 3D position feed for the user's hand.

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER
What's the most Pythonic way to write this sort of thing?
Python code:
gen = some_global_generator()

def foo():
    try:
        x = next(gen)
        return do_something(x)
    except StopIteration:
        return do_something_else()

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord
I'd do something like this I guess:

Python code:
def foo():
    for x in gen:
        yield do_something(x)
    yield do_something_else()

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER

Symbolic Butt posted:

I'd do something like this I guess:

Python code:
def foo():
    for x in gen:
        yield do_something(x)
    yield do_something_else()
Thank you, that's indeed equivalent and very pythonic, though change those yields to returns to match the semantics of what I had.

QuarkJets
Sep 8, 2008

ShadowHawk posted:

Thank you, that's indeed equivalent and very pythonic, though change those yields to returns to match the semantics of what I had.

It kind of looks like you want a generator, though, so using yield would make sense

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord
:doh:

Yeah, I was thinking it'd look good as a generator but then changed my mind halfway and forgot to use returns instead.

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER

QuarkJets posted:

It kind of looks like you want a generator, though, so using yield would make sense
Code is here if you wanna take a look: https://github.com/YokoZar/wine-model

In this case multiple functions are referencing the same generator in an unpredictable order, and it's important that there be only one such state.


I initially wrote it like 6 years ago and have been cleaning it up for the past few days. It's kind of amazing how much simpler my code can read now. I'm really loving Python (and Python3 in particular).

tef
May 30, 2004

-> some l-system crap ->

ShadowHawk posted:

What's the most Pythonic way to write this sort of thing?
Python code:
gen = some_global_generator()

def foo():
    try:
        x = next(gen)
        return do_something(x)
    except StopIteration:
        return do_something_else()

You could also do something like try: x = next(..): except: ... else: do_something(...). Just in case do_something's __call__ method throws a spurious exception.

(Unrelated rant about python's exception shadowing behaviour)

Symbolic Butt posted:

Python code:
def foo():
    for x in gen:
        yield do_something(x)
    yield do_something_else()
(Yeah, I was thinking it'd look good as a generator but then changed my mind halfway and forgot to use returns instead.)


On the whole I'd do a double take at a for with a return statement. I'd assume something was up. It's certainly the least clunky way to work on the first item of a generator.

The first may seem like more lines but it's plenty obvious what it does. I often see "What's the most pythonic way to do X?" with some simple, working code attached. The responses vary between missed idioms, and code golf. The most pythonic way to write code is writing simple obvious code without sneaky effects or tricks.

Alternative methods can include things like foo = itertools.islice(gen, 1) to get a one element generator. Your code will probably end up looking clunky, but you can something like

code:
x = tuple(itertools.slice(gen, 1))

if x:
    x = x[0]
    return do_something(x)
else:
    return do_something_else()

tef
May 30, 2004

-> some l-system crap ->

ShadowHawk posted:

Code is here if you wanna take a look: https://github.com/YokoZar/wine-model

In this case multiple functions are referencing the same generator in an unpredictable order, and it's important that there be only one such state.

Your question struck me as a bit of an X-Y problem, i.e "I'm using generators awkwardly, how do I mitigate it?". Generators are about streams of values, if you're only wanting the first item, something's a bit up.

quote:

I initially wrote it like 6 years ago and have been cleaning it up for the past few days. It's kind of amazing how much simpler my code can read now. I'm really loving Python (and Python3 in particular).

Re: "multiple functions are referencing the same generator in an unpredictable order". Sharing things is painful! What your code seems to have is a series of shared data structures which contain the state of your simulation, a bunch of generators over this state, and then a bunch of accessors to return these generators one by one. The problem I can imagine is that if you switch generator mid way through, bad things can happen.

There are lots of different ways to model it, but a hacky but useful way to model it as a Project object. When you construct it, you tell it how many users, bugs, features you want, and how to link them. This will contain the globals, and let you simulate multiple projects in your code.

Then this project object can be told to work on a given bug for a while, possibly updating the object in place. Along with methods which return the bugs by different sort orders. You then have "strategies" which basically get passed a Project, look at the bugs and tell it to work on one.

code:
p = Project(users=..., features=...)
while not p.complete():
       some_strategy(p)
       print p.report()

def some_strategy(p):
       bug = random.choice(p.all_bugs())
       p. work_on(bug)
This is a very imperative way to model it, but since you're doing numerical simulation, eh, who cares.

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

tef posted:

The first may seem like more lines but it's plenty obvious what it does. I often see "What's the most pythonic way to do X?" with some simple, working code attached. The responses vary between missed idioms, and code golf. The most pythonic way to write code is writing simple obvious code without sneaky effects or tricks.

At least I didn't claim it was "pythonic" or anything. :v:

My thought process was that calling next is usually awkward compared to just looping over the iterator but you're right... I don't think I ever used a return in a for loop this way before, it just doesn't come up naturally. Mixing yields and returns is a hell of a thing.

tef
May 30, 2004

-> some l-system crap ->
(I'd probably do a for-return but wrap it up in a function called first() which returned an item or None, or leave a little comment to say "this is intentional")

SurgicalOntologist
Jun 17, 2004

What's the proper way to package a Python library that requires a C library/application, that doesn't have a friendly installer? Let me know if this would be better in the Linux thread.

My process is:
1. Run sed a bunch of times on like 4 different Makefiles to make it work right.
2. Run make on these 4 different Makefiles in the proper order.
3. Move the whole thing to /opt.
4. Make a symbolic link to /usr/local/bin.
5. Run make install on the last Makefile, as far as I can tell this manually copies some files to /path/to/pythonlib/lib-dynload.

Basically I'd like to package this upstream package in a friendlier way, whether that's as a deb or on PyPi or whatever. I'm not sure how to set this up. I'm thinking steps 1-4 as a .deb. Then the PyPi package would just be a thin wrapper, that does what? Checks in /opt for the right directory and runs make install? Does this make sense or is there a better way?

It's not for widespread redistribution, but the users are not very Linux-savvy and I need it to be easy and not error-prone or I'll be playing tech support for a decade.

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER

SurgicalOntologist posted:

What's the proper way to package a Python library that requires a C library/application, that doesn't have a friendly installer? Let me know if this would be better in the Linux thread.

My process is:
1. Run sed a bunch of times on like 4 different Makefiles to make it work right.
2. Run make on these 4 different Makefiles in the proper order.
3. Move the whole thing to /opt.
4. Make a symbolic link to /usr/local/bin.
5. Run make install on the last Makefile, as far as I can tell this manually copies some files to /path/to/pythonlib/lib-dynload.

Basically I'd like to package this upstream package in a friendlier way, whether that's as a deb or on PyPi or whatever. I'm not sure how to set this up. I'm thinking steps 1-4 as a .deb. Then the PyPi package would just be a thin wrapper, that does what? Checks in /opt for the right directory and runs make install? Does this make sense or is there a better way?

It's not for widespread redistribution, but the users are not very Linux-savvy and I need it to be easy and not error-prone or I'll be playing tech support for a decade.
Are you familiar with the basics of debian packaging?

If you are the modern way to make .debs out of python libraries is Pybuild for debhelper.

Is it the python or the C library that doesn't have the proper installer? Which one are you grabbing from elsewhere?

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER
Thank you guys.

Symbolic Butt posted:

My thought process was that calling next is usually awkward compared to just looping over the iterator but you're right... I don't think I ever used a return in a for loop this way before, it just doesn't come up naturally. Mixing yields and returns is a hell of a thing.

tef posted:

Your question struck me as a bit of an X-Y problem, i.e "I'm using generators awkwardly, how do I mitigate it?". Generators are about streams of values, if you're only wanting the first item, something's a bit up.
Generators are the only way I know of to get a "function" that can preserve its local variables. The generators in this case do a significant amount of setup computation, which gets preserved. I suppose the alternative would be to do that setup computation as a separate function and then pass the result in every time as a parameter (or as another global variable), but that seems a bit wonky too. Maybe not so much when it becomes a class.

quote:

Re: "multiple functions are referencing the same generator in an unpredictable order". Sharing things is painful! What your code seems to have is a series of shared data structures which contain the state of your simulation, a bunch of generators over this state, and then a bunch of accessors to return these generators one by one. The problem I can imagine is that if you switch generator mid way through, bad things can happen.
Your description is completely right. In this case the generators need to be careful to recheck their state each time they're called. It's probably a bit unusual for a generator to yield two different things given the state of the program changing between next() calls. That is a bit awkward, but on the other hand it saves having to have a completely separate data structure do the computation and pass it into whatever replaces the "next" function.

quote:

There are lots of different ways to model it, but a hacky but useful way to model it as a Project object. When you construct it, you tell it how many users, bugs, features you want, and how to link them. This will contain the globals, and let you simulate multiple projects in your code.

Then this project object can be told to work on a given bug for a while, possibly updating the object in place. Along with methods which return the bugs by different sort orders. You then have "strategies" which basically get passed a Project, look at the bugs and tell it to work on one.

code:
p = Project(users=..., features=...)
while not p.complete():
       some_strategy(p)
       print p.report()

def some_strategy(p):
       bug = random.choice(p.all_bugs())
       p. work_on(bug)
This is a very imperative way to model it, but since you're doing numerical simulation, eh, who cares.
This is a good idea as it would easily allow, eg, two projects with identical setup to be compared side-by-side as we test competing strategies. I don't think that's incompatible with the current generator design though, they would just move to be within the scope of the project object.

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

ShadowHawk posted:

Generators are the only way I know of to get a "function" that can preserve its local variables. The generators in this case do a significant amount of setup computation, which gets preserved. I suppose the alternative would be to do that setup computation as a separate function and then pass the result in every time as a parameter (or as another global variable), but that seems a bit wonky too. Maybe not so much when it becomes a class.

I would definitely structure this as a class whose instances are callable; "combining data and behavior" is one of the fundamental principles of object-oriented programming.

Very basic example:
Python code:
In [1]: class Test:
   ...:     def __init__(self):
   ...:         self.count = 0
   ...:     def __call__(self, i):
   ...:         self.count += 1
   ...:         return i + self.count
   ...:     

In [2]: t = Test()

In [3]: t(4)
Out[3]: 5

In [4]: t(3)
Out[4]: 5

In [5]: t(2)
Out[5]: 5

In [6]: t(10)
Out[6]: 14

SurgicalOntologist
Jun 17, 2004

ShadowHawk posted:

Are you familiar with the basics of debian packaging?

If you are the modern way to make .debs out of python libraries is Pybuild for debhelper.

Is it the python or the C library that doesn't have the proper installer? Which one are you grabbing from elsewhere?

I am not familar with debian packaging but I just started reading up on it, for this project.

There are really three pieces. The C library (/application) comes with a low-level Python interface, and I've written a high-level Python interface that relies on the low-level interface.

It's the C library and it's low-level Python interface that doesn't have a proper installer and instead have you edit 4 Makefiles (and I'm also wgetting a zip from upstream). The first 3 Makefiles and the symbolic link into /usr/local/bin are for the C library, the fourth (the one that requires a make install) is for the included Python interface.

Afterward I can install my high-level interface the usual way since it's pure Python. Does it make sense to pakage the upstream stuff as a .deb and my own package just the usual Python way?

One potential issue with using a .deb is the python parts will need to be installed into separate virtualenvs on the same machine. So maybe the C library as a .deb, and have it install a deploy command that simply runs make install? (I've edited the Makefile to have it install the low-level Python interface to the current virtualenv)

SurgicalOntologist fucked around with this message at 20:37 on Jul 28, 2014

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER

SurgicalOntologist posted:

I am not familar with debian packaging but I just started reading up on it, for this project.

There are really three pieces. The C library (/application) comes with a low-level Python interface, and I've written a high-level Python interface that relies on the low-level interface.

It's the C library and it's low-level Python interface that doesn't have a proper installer and instead have you edit 4 Makefiles (and I'm also wgetting a zip from upstream). The first 3 Makefiles and the symbolic link into /usr/local/bin are for the C library, the fourth (the one that requires a make install) is for the included Python interface.

Afterward I can install my high-level interface the usual way since it's pure Python. Does it make sense to pakage the upstream stuff as a .deb and my own package just the usual Python way?

One potential issue with using a .deb is the python parts will need to be installed into separate virtualenvs on the same machine. So maybe the C library as a .deb, and have it install a deploy command that simply runs make install? (I've edited the Makefile to have it install the low-level Python interface to the current virtualenv)
If you're worried about support issues I'd definitely go with at least a .deb cause that way it can be removed (or upgraded!) cleanly.

Do these python virtualenvs sometimes not want the low-level python library? If it was just set as a proper system library it would be available by default with no need for a deployment script run each time, right?

SurgicalOntologist
Jun 17, 2004

ShadowHawk posted:

If you're worried about support issues I'd definitely go with at least a .deb cause that way it can be removed (or upgraded!) cleanly.

Do these python virtualenvs sometimes not want the low-level python library? If it was just set as a proper system library it would be available by default with no need for a deployment script run each time, right?

No, unless the use-system-site-packages flag is used creating the virtualenv, which is not a good idea for us.

Anyways it's not desirable to have the same system package in every environment because if/when it gets upgraded we will inevitably start having the situation where different environments want a different version, which is the whole reason to use environments.

Which actually, my plan wouldn't support either. Each env would have a specific version of the make install part but a system-wide version of the C application part (in /opt and the symlink). Ugh. I think I'll package the entire upstream bundle as a conda package then, that should work.

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

No, unless the use-system-site-packages flag is used creating the virtualenv, which is not a good idea for us.

Anyways it's not desirable to have the same system package in every environment because if/when it gets upgraded we will inevitably start having the situation where different environments want a different version, which is the whole reason to use environments.

Which actually, my plan wouldn't support either. Each env would have a specific version of the make install part but a system-wide version of the C application part (in /opt and the symlink). Ugh. I think I'll package the entire upstream bundle as a conda package then, that should work.

I usually solve this by having an env folder in my project root that I install software into. I end up with a structure like my_project/env/lib and my_project/env/bin and whatever else the installers create. It's usually pretty easy to coerce software to install like this, although sometimes you need to fiddle around a bit if you need to install multiple things that depend on each other (e.g. telling the linker for project B to look in my_project/env/lib to find lib A). I also have a bash script in the project root that rebuilds everything in env from scratch, so if something gets messed up in my environment I can just nuke the whole thing and start over.

My setup sounds more or less like what you were planning on doing, except I don't ever put things into system folders.

SurgicalOntologist
Jun 17, 2004

Yeah, and that's pretty much what conda does too with its environments. So I just have to tweak the directories and it should be fine. I might try to package it on binstar or say gently caress it and just distribute a shell script that puts everything in the right place.

Literally Elvis
Oct 21, 2013

I was hoping I could get some feedback on this hyper-simple script: https://github.com/LiterallyElvis/Porktrack/blob/master/scripts/youtube_get.py

It was originally part of what I used to get data for a goofy website I made, but I decided it might be useful as an individual script for somebody else, so I segmented it enough that you could import it in your own project and use it there. I've scanned it using PEP8, added a header, and added docstrings to my function declarations because I've sort of seen that used elsewhere, and I thought it would be a good idea to add it to my stuff even though it may be much simpler.

Zarithas
Jun 18, 2008

Literally Elvis posted:

I was hoping I could get some feedback on this hyper-simple script: https://github.com/LiterallyElvis/Porktrack/blob/master/scripts/youtube_get.py

It was originally part of what I used to get data for a goofy website I made, but I decided it might be useful as an individual script for somebody else, so I segmented it enough that you could import it in your own project and use it there. I've scanned it using PEP8, added a header, and added docstrings to my function declarations because I've sort of seen that used elsewhere, and I thought it would be a good idea to add it to my stuff even though it may be much simpler.

A few things:

  • Prefer snake_case over camelCase
  • Be consistent with your use of single or double quotation marks. No one really cares which you use, but you should pick one and stick with it. The only exception is docstrings for functions; most agree that those look nicer when they're always double quotation marks.
  • Most of the time, developers and libraries only care about __version__, and rarely, __author__. You can omit the rest.
  • Prefer using at most 2 line break after different blocks of code. You use 3 after your first function. It's debatable whether 3 line breaks are okay to separate very distinct sections, like between imports and your first function. I do it sometimes but usually stick with 2.
  • Use the requests library, it will not only handle UTF-8 for you but is superior to the standard lib's HTTP modules in every way. You can change retrieveLink's (or preferably retrieve_link's) body to "return requests.get(url).text"
  • Don't feel bad about chaining function calls or method calls, as long as the line still remains 79 characters or less.
  • If you want to strip all whitespace characters (spaces, tabs, newlines, etc.) before and after a string, use " string ".strip() instead of .replace(). If you really want to remove newlines from anywhere inside a string, you have it right. Not sure which you intended.
  • Prefer string formatting ("%s" or "{}".format()) over concatenation. In your case since you're just concatenating some things at the end, it doesn't really matter either way.
  • Prefer str.index() over str.find(). str.index() raises an exception when it can't find something and str.find() just returns -1. You generally want your functions to throw exceptions when they fail.
  • Prefer regex over multiple string finds and indices. You can replace all your lines that build up video_id with a single re.search() call, for example. Regex is overkill in many cases but this is a good case to use it when compared to str.find()/str.index(). In a case like this it's best not to use substring finding + indexing at all.
  • When trying to parse HTML, you usually want to use a full HTML parser. The better alternative to substring finding or regexes in this case is to use an HTML parser. I recommend BeautifulSoup with lxml, or just plain lxml.

Other than that the code is good. Python is really big on style nitpicking and "The Right Way to Do It" though, so it's good to learn and practice some of these idioms. Also, all of the libraries I mention are third party ones. You can find them on Google/GitHub/PyPI (almost always all 3).

Here's an example version using regular expressions (aka regex, accessible in the re module) to find the video ID string and to strip away all the crap after it that you don't need:

http://bpaste.net/show/523966/

Zarithas fucked around with this message at 00:39 on Jul 30, 2014

EAT THE EGGS RICOLA
May 29, 2008

Zarithas posted:

[*] Prefer string formatting ("%s" or "{}".format()) over concatenation. In your case since you're just concatenating some things at the end, it doesn't really matter either way.
I've seen lots of more recent stuff use "{}".format() over "%s", is that just the area I live in/the github stuff I work with?

Zarithas
Jun 18, 2008

EAT THE EGGS RICOLA posted:

I've seen lots of more recent stuff use "{}".format() over "%s", is that just the area I live in/the github stuff I work with?
Python 3 plans to eventually deprecate "%s", and Guido recommends str.format() over the old syntax. It actually may be technically deprecated already, I can't remember. But it hasn't been removed from Python 3 yet.

Personally I still use the old syntax a lot, and .format only in cases where I can use it to write less code. For example, if I want to repeat a variable:

code:
animal = "cat"

print "My %s is a %s" % (animal, animal)

# Or

print "My {0} is a {0}.format(animal)
It also provides a nicer syntax when you want to use a dict containing your format parameters.

code:
d = {"color": "red", "age": 25}

print "My favorite color is {color} and I am {age} years old".format(**d)
There are a few other reasons to prefer it as well. You can find some examples here: http://stackoverflow.com/questions/5082452/python-string-formatting-vs-format

Generally speaking, if something in Python 3 has replaced or will eventually replace a construct in Python 2.x, then even 2.x devs will begin using it too in favor of the old ones. At least when it's possible for them to do so. The only exception to that would probably be print as a function instead of a statement.

Zarithas fucked around with this message at 00:40 on Jul 30, 2014

Literally Elvis
Oct 21, 2013

Zarithas posted:

A few things:
  • Prefer snake_case over camelCase
  • Be consistent with your use of single or double quotation marks. No one really cares which you use, but you should pick one and stick with it. The only exception is docstrings for functions; most agree that those look nicer when they're always double quotation marks.
  • Most of the time, developers and libraries only care about __version__, and rarely, __author__. You can omit the rest.
  • Don't feel bad about chaining function calls or method calls, as long as the line still remains 79 characters or less.
  • Prefer string formatting ("%s" or "{}".format()) over concatenation. In your case since you're just concatenating some things at the end, it doesn't really matter either way.
  • Prefer str.index() over str.find(). str.index() raises an exception when it can't find something and str.find() just returns -1. You generally want your functions to throw exceptions when they fail.

I'm cool with all these. Didn't notice the inconsistent quotations thing, so I've changed that and dumped the headers. I've also implemented the .format() thing. I'll look into the others later.

Zarithas posted:

  • Prefer using at most 2 line break after different blocks of code. You use 3 after your first function. It's debatable whether 3 line breaks are okay to separate very distinct sections, like between imports and your first function. I do it sometimes but usually stick with 2.
  • Use the requests library, it will not only handle UTF-8 for you but is superior to the standard lib's HTTP modules in every way. You can change retrieveLink's (or preferably retrieve_link's) body to "return requests.get(url).text"
  • If you want to strip all whitespace characters (spaces, tabs, newlines, etc.) before and after a string, use " string ".strip() instead of .replace(). If you really want to remove newlines from anywhere inside a string, you have it right. Not sure which you intended.
  • When trying to parse HTML, you usually want to use a full HTML parser. The better alternative to substring finding or regexes in this case is to use an HTML parser. I recommend BeautifulSoup with lxml, or just plain lxml.
  • I used two blank lines before functions because I thought that's what the PEP8 spec said I should do. I've seen that sort of style used elsewhere, and while I agree with you that two new lines might not be necessary, I thought it'd be better to be PEP8 compliant than stubborn. (looking up the blank line section lead me to realize/eliminate blank lines within my functions, though, so that's good)
  • Admittedly my use of urllib2 is vestigial from when I was first learning how to do things in this language. This whole project is sort of my first, and I've been more or less reiterating through it to fix things or make them more pythonic or whatever. I'll look into requests though, it definitely seems easier.
  • I feel like I'm using replace correctly here. I need to replace "&" with "and", and spaces with plus signs, and eliminate newlines (which you said replace is fine for)
  • I didn't think of using a full HTML parser because I only need a very small portion of the HTML. I do use the HTMLParser package in another portion of the file, so I'll look into those things you suggested.

Zarithas posted:

  • Prefer regex over multiple string finds and indices. You can replace all your lines that build up video_id with a single re.search() call, for example. Regex is overkill in many cases but this is a good case to use it when compared to str.find()/str.index(). In a case like this it's best not to use substring finding + indexing at all.
But regex is...

Zarithas posted:

Here's an example version using regular expressions (aka regex, accessible in the re module) to find the video ID string and to strip away all the crap after it that you don't need:

http://bpaste.net/show/523966/
okay, that's pretty cool.

Adbot
ADBOT LOVES YOU

FoiledAgain
May 6, 2007

Zarithas posted:

Generally speaking, if something in Python 3 has replaced or will eventually replace a construct in Python 2.x, then even 2.x devs will begin using it too in favor of the old ones. At least when it's possible for them to do so. The only exception to that would probably be print as a function instead of a statement.

From 2.6 onward, you can do from __future__ import print_function, which is suggested in the python documentation.

  • Locked thread