Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Nippashish posted:

Do you think having sum in the standard library was a mistake?

In some ways, yes. I'd expect sum to act like:

Python code:
def my_sum(iterable, start=0):
    result = start
    for i in iterable:
        result += i
    return result
But it doesn't. Again, this is tricky edge case behavior sort of stuff:

Python code:
list_of_ints = [[i] for i in xrange(5)]
empty = []
print sum(list_of_ints, empty), empty
# [0, 1, 2, 3, 4], []

print my_sum(list_of_ints, empty), empty
# [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!
I'm not buying this edge case argument. Literally every function ever written has edge cases.

BeefofAges posted:

Just write a library that spins up a hadoop cluster to do your prod operation via mapreduce, then make it part of the standard library.

I think this is the optimal solution.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Nippashish posted:

I'm not buying this edge case argument. Literally every function ever written has edge cases.

If you write a*b yourself, you can define what edge cases there are, as opposed to having to compromise on the same edge cases for everybody.

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

Suspicious Dish posted:

If you write a*b yourself, you can define what edge cases there are, as opposed to having to compromise on the same edge cases for everybody.

That's a silly argument. Going by that logic, we might as well not have a standard library at all. Everyone can just implement their own without having to compromise.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
prod and sum to me, are such simple things that there isn't a benefit in making them builtin. I'm in favor of builtins that are non-trivial to implement, and have clear meanings.

Nippashish
Nov 2, 2005

Let me see you dance!

Suspicious Dish posted:

prod and sum to me, are such simple things that there isn't a benefit in making them builtin. I'm in favor of builtins that are non-trivial to implement, and have clear meanings.

Ugh, you are the reason C++ added concurrency to the standard library but didn't add semaphores, and also the reason why boost.ublas doesn't have determinants or inverses. Python is supposed to be batteries included.

Dominoes
Sep 20, 2007

Hey dudes, how do you unserialize a timezone-aware datetime object? PYYAML will automatically save them correctly in the ISO format, but drops the timezone info when loading. (weird) Using str(my_datetime_object) makes a correct ISO string, but the datetime module has no clean way to convert it back to a datetime object. (strftime has no ISO-compatible timezone format)

dateutil.parser.parse does something weird that's still not right:

code:
In [113]: x
Out[113]: datetime.datetime(2014, 2, 15, 21, 58, 25, 866385, tzinfo=<DstTzInfo 'Europe/Athens' EET+2:00:00 STD>)

In [114]: str(x)
Out[114]: '2014-02-15 21:58:25.866385+02:00'

In [115]: dateutil.parser.parse(str(x))
Out[115]: datetime.datetime(2014, 2, 15, 21, 58, 25, 866385, tzinfo=tzoffset(None, 7200))

Crosscontaminant
Jan 18, 2007

Nippashish posted:

Python is supposed to be batteries included.
That's not necessarily a good thing. PyPI is mature, and in a lot of cases it makes for much better batteries than the standard library. "Batteries included" is hollow when you need to know which ones are good.

a dog from hell
Oct 18, 2009

by zen death robot
I'm trying to teach myself Python. Right now I'm doing the Pig Latin lesson in codecademy but I'm doing it in Python 3. Python is not printing a vowel when I input a vowel and I don't know why. The code is probably ugly but I'm just following along with the lesson *shrug*

Sorry for being a noob:

code:
print("Welcome to the English to Pig Latin translator")
original = input("Please enter an English word")
if len(original) > 0:
    print(original.lower())
    if first[0] == ("a", "e", "i", "o", "u"):
        print("Vowel.")
    else:
        print("Consonant.")
else:
    print("Not a valid word.")
if original.isalpha():
    print(original.lower())
else:
    print("Not a valid word.")

a dog from hell fucked around with this message at 18:51 on Feb 16, 2014

Crosscontaminant
Jan 18, 2007

Use code tags for code, they don't obliterate your indentation.

The error is probably here:
Python code:
if first[0] == ("a", "e", "i", "o", "u"):
You want the in operator.

e: why do quote tags require quotes around the optional attributes while code tags require no quotes?

Crosscontaminant fucked around with this message at 18:35 on Feb 16, 2014

Lyon
Apr 17, 2003

Splurgerwitzl posted:

I'm trying to teach myself Python. Right now I'm doing the Pig Latin lesson in codecademy but I'm doing it in Python 3. Python is not printing a vowel when I input a vowel and I don't know why. The code is probably ugly but I'm just following along with the lesson *shrug*

Sorry for being a noob:

code:
print("Welcome to the English to Pig Latin translator")
original = input("Please enter an English word")
if len(original) &gt; 0:
    print(original.lower())
    if first[0] == ("a", "e", "i", "o", "u"):
        print("Vowel.")
    else:
        print("Consonant.")
else:
    print("Not a valid word.")
if original.isalpha():
    print(original.lower())
else:
    print("Not a valid word.")


Also first isn't defined. It should be original[0] which will give you the first letter of the string the user types in. So variable name and the in operator should fix you up.

I imagine originally there was a variable named first that was set equal to original[0] and then you performed your if check on the variable first. Or maybe it was just a transcription error!

Lyon fucked around with this message at 19:39 on Feb 16, 2014

QuarkJets
Sep 8, 2008

Crosscontaminant posted:

That's not necessarily a good thing. PyPI is mature, and in a lot of cases it makes for much better batteries than the standard library. "Batteries included" is hollow when you need to know which ones are good.

No matter where you look, you're going to find some 3rd party functions that do certain things better than a language's standard library. That doesn't make the standard library superfluous or bad in some way, and you don't "need to know" about the better 3rd party functions. By your logic we may as well not have a standard library at all

I haven't seen any convincing arguments for the standard library not having a prod() capability like sum()

a dog from hell
Oct 18, 2009

by zen death robot

Lyon posted:

Also first isn't defined. It should be original[0] which will give you the first letter of the string the user types in. So variable name and the in operator should fix you up.

I imagine originally there was a variable named first that was set equal to original[0] and then you performed your if check on the variable first. Or maybe it was just a transcription error!
Yah that was my error before copying it into my post. I tried to change a couple things and forgot to revert it back.

Daynab
Aug 5, 2008

Lysidas posted:

The faulthandler module is designed for exactly this purpose -- you can enable it and give a file handle as the file argument, and it'll dump a traceback if the interpreter crashes.

I've found it very useful in debugging extension modules written in C, since the traceback actually includes function calls in native code and it's far more helpful than "Segmentation fault (core dumped)" as the only output of the program.

Sorry for the late reply. I tried this, but it seems like the only thing that gets outputted is something like:

Fatal Python error: Segmentation fault

Current thread 0x00001c24:

I have no idea what that even means or how I would figure out where it's from.
Also the pointer(?) changes every time it seems.

Daynab fucked around with this message at 00:30 on Feb 17, 2014

John DiFool
Aug 28, 2013

Is there a good Python library for quickly and easily creating an object that will be primarily stored in a database?

I guess what I'm looking for is an ORM, and I've found a list here: https://wiki.python.org/moin/HigherLevelDatabaseProgramming

any recommendations on which one I should try out?

Feel free to educate me further, my database knowledge is not great by any means, but I understand the basics at least.

Lurchington
Jan 2, 2003

Forums Dragoon
Based on that list, SQLAlchemy is the most commonly used and best documented.

In case you're about to start a new web server application, Django provides a very good ORM as well. If you are starting a webserver, Django using the built-in ORM is probably the best choice due to a combination of being relatively simple, and very well documented (SQLA is better than average, but definitely not the same tier).

My Rhythmic Crotch
Jan 13, 2011

Peewee is very quick to learn. Probably not as powerful as SQLAlchemy.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
e: nm, Elixir is crazy old and not maintained anymore!

Crosscontaminant
Jan 18, 2007

QuarkJets posted:

No matter where you look, you're going to find some 3rd party functions that do certain things better than a language's standard library. That doesn't make the standard library superfluous or bad in some way
I agree. What makes the standard library bad is the fact the standard library is (at least in parts) bad. Aside from warts like urllib, httplib and asyncore, there's more than a few good modules which have a separate existence on PyPI which could be trivially removed from the standard library entirely and replaced with a single documentation page saying "use this module, it's best-of-breed" and save everyone the try: import json; except ImportError: import simplejson as json dance. If you have to put argparse in your requirements.txt file anyway, what's the point of it being added to the standard library?

QuarkJets posted:

I haven't seen any convincing arguments for the standard library not having a prod() capability like sum()
For what it's worth I agree - either have both of them or have neither (since they're trivially implemented in terms of reduce, as are all and any).

SurgicalOntologist
Jun 17, 2004

Crosscontaminant posted:

you have to put argparse in your requirements.txt file anyway

Wait, what? Why?


In other news, I have some questions about testing. My library is coming along nicely, all that's left to do before I upload it to PyPi is testing and documentation. I've got most things tested, but there's two things I can't figure out:
1. The command-line-interface. Should I use subprocess to call my own __main__ script? It seems like there should be a better way.
2. The pickling/unpickling interface. The big issue here is that I use a terrible hack to pickle functions passed as attributes, which is to figure out where they live from sys.argv[0] and then upon unpickling re-import them from there. The problem is that when I'm testing, sys.argv[0] is py.test instead of the script where my functions are defined. The only solution I can think of is to write a separate test script and call it with subprocess so it's as if it was called from the command-line, but again, it feels like there should be a better way.

(The two are really tied together, since to test the CLI I'd want to pickle my data beforehand then unpickle it after to run all the asserts)

Crosscontaminant
Jan 18, 2007

SurgicalOntologist posted:

Wait, what? Why?
Python 2 backcompat.

SurgicalOntologist posted:

1. The command-line-interface. Should I use subprocess to call my own __main__ script? It seems like there should be a better way.
Your __main__.py should do nothing other than importing and calling your entry point, which you can then unit test.

Crosscontaminant fucked around with this message at 18:36 on Feb 17, 2014

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
2.7 has argparse and there's really no reason to support anything older than that in anything new.

SurgicalOntologist
Jun 17, 2004

Crosscontaminant posted:

Your __main__.py should do nothing other than importing and calling your entry point, which you can then unit test.

Hmm. Well my entry point is a function called main in __main__.py, which from there calls functions from other modules (all tested) depending on the command-line arguments. What I would want to test is the argument parsing So, I can import and call main no problem, but wouldn't I have to set sys.argv?

Ninja edit: Figured it out:

Python code:
if __name__ == '__main__':
    main(*sys.argv)
allows me to test main with different arguments.

Lurchington
Jan 2, 2003

Forums Dragoon

Plorkyeran posted:

2.7 has argparse and there's really no reason to support anything older than that in anything new.

Since 2.6 is on Travis it's pretty cheap to test against. I'd suggest just conditionally adding argparse in the setup.py file if the python version is 2.6 or earlier.

Edit: reading the last couple of posts, people aren't talking about this, but anyway, this post is a reminder that it's totally ok to have different requirements across different versions of python.

Lurchington fucked around with this message at 05:45 on Feb 18, 2014

Dominoes
Sep 20, 2007

Hey, so I have a webapp that does some things with the gmaps API. Ie you input some parameters, and it generates lines, polygons, text etc on the map. I made a library of equations dealing with lat/lons, making shapes etc in python. I want to make it so users can modify the shapes after the map generation. Is there a way to call back my python functions from the web page/javascript, or do I have to duplicate them in javascript?

SurgicalOntologist
Jun 17, 2004

This might be something only a core developer can answer, but maybe someone has some insight here.

In trying to figure out a better way to test my "pickling functions passed as arguments hack", I discovered that there's a way to find a module a function came from that's, well, less of a hack:

Python code:
def get_reference(func):
    return func.__name__, os.path.basename(inspect.getsourcefile(func))[:-3]
Curious, I decided to look at the actual source code in pickle:

Python code:
def whichmodule(func, funcname):
    """Figure out the module in which a function occurs.

    Search sys.modules for the module.
    Cache in classmap.
    Return a module name.
    If the function cannot be found, return "__main__".
    """
    # Python functions should always get an __module__ from their globals.
    mod = getattr(func, "__module__", None)
    if mod is not None:
        return mod
    if func in classmap:
        return classmap[func]

    for name, module in list(sys.modules.items()):
        if module is None:
            continue # skip dummy package entries
        if name != '__main__' and getattr(module, funcname, None) is func:
            break
    else:
        name = '__main__'
    classmap[func] = name
    return name
So what I'm wondering is... why default to __main__ since that will be impossible to unpickle from anywhere else? Why not use inspect.getsourcefile? As far as I can tell, the only use case that would break is if you were unpickling from the same file you were pickling in. But in that case, why pickle at all...

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
The answer is "pickle is bad don't use it"

SurgicalOntologist
Jun 17, 2004

Right, we've established that. Some of the reasons it's bad seem to stem from legitimate difficulties in implementing its features or ambiguity in determining the desired behavior. I'm wondering if this is one of those cases, or if there really is a simple solution here that hasn't been implemented yet.

Is your reply just a snarky way of saying it's the latter? Or is pickle is so bad that it's a waste of brain cycles to discuss it and I'm doing the world a disservice by even mentioning it?

SurgicalOntologist fucked around with this message at 07:55 on Feb 18, 2014

Houston Rockets
Apr 15, 2006

What are you serializing?

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

This could be a case of the XY Problem.

https://meta.stackoverflow.com/questions/66377/what-is-the-xy-problem

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

SurgicalOntologist posted:

Right, we've established that. Some of the reasons it's bad seem to stem from legitimate difficulties in implementing its features or ambiguity in determining the desired behavior. I'm wondering if this is one of those cases, or if there really is a simple solution here that hasn't been implemented yet.

Is your reply just a snarky way of saying it's the latter? Or is pickle is so bad that it's a waste of brain cycles to discuss it and I'm doing the world a disservice by even mentioning it?

The answer is that pickle has an overly broad, fairly wide goal, which is "save stuff". People want to save stuff, and they want to reload it later. Whoever wrote pickle decided that it should be easy and quick to use, and also try to save as much stuff as possible. They also decided that those goals trump the goal to be robust and stable, so it's not.

pickle doesn't have schemas, and has no tools to upgrade your old dumps when your code is changed.

So, whoever wrote pickle decided that falling back to __main__ was the best way to save stuff, I'd imagine so that the simple case of python foo.py works fine, and so does interactively using pickle at the console. Easy and quick to use, not robust.

Even if this was a poor decision and there was a better way to do it without sacrificing easy and quick saves, it's a stable API. We can't change it without breaking old code.

Meanwhile, the rest of the world have learned that unreliable data isn't really useful data at all, and we stopped using pickle, no matter how quick and easy it was.

The pizza place a few blocks down from me is extremely cheap and has super fast delivery, but the one time I tried to order a pizza, the cheese had some mold on it. It does not matter how cheap and fast the pizza is if what I'm getting out of it is moldy pizza.

Lurchington
Jan 2, 2003

Forums Dragoon

SurgicalOntologist posted:

In trying to figure out a better way to test my "pickling functions passed as arguments hack", I discovered that there's a way to find a module a function came from that's, well, less of a hack:

If you're looking for an alternative serialization format, I like YAML for things more complex than JSON (though haven't used it for what you're doing). It supports tagging sections with Python types, which is their way of serializing complicated stuff.

http://pyyaml.org/wiki/PyYAMLDocumentation#YAMLtagsandPythontypes
check the "Complex Python Tags" sections for the things you can tag

SurgicalOntologist
Jun 17, 2004


:doh: The answer to my question was "backwards compatibility reasons". Of course.

But okay, I'll play ball with you all and maybe pickle isn't the best solution for me. I have a container class that holds a recursive tree structure of instances of another class. Callbacks are attached to nodes, such that in a depth-first search you can have a callback run when a node is hit for the first time (i.e., on the way down) and a different callback run when it is exited (on the way up).

The basic operation is:
1. Create the tree, attach callbacks, and serialize it.
2. Later, load it, run part of the tree starting at a specified node, and save it again.
3. Repeat 2 a bunch of times.
4 Get the returns of all the callbacks out in a DataFrame for analysis.

In addition to facilitating the callbacks, the tree also facilitates a mapping that describes each Node. This is done with ChainMaps, such that each Node has its own key-value pairs but also the key-value pairs of all its parents. These key-value pairs are the keyword arguments to the callbacks, and the callbacks return dicts whose key-value pairs are stored in the ChainMap as well.

It's Python 3.3-heavy, in addition to ChainMaps there's a good deal of recursive yield froms.

The only thing pickle gets hung up on is the callbacks, and three-line __getstate__ and __setstate__ methods, based on the function I posted above, handle it fine.

That's a description of the implementation, for a description from the user's perspective, see the github repo.

So... is there a better way than pickle? Maybe, but it would probably require a lot more code. And no matter what I'd still have to save references to the callbacks which would probably require the exact same call to inspect.getsourcefile. I'd be happy to be wrong though, since this is being used "in production" imminently.

SurgicalOntologist fucked around with this message at 17:46 on Feb 18, 2014

Dren
Jan 5, 2001

Pillbug
The thing that is weird to me is that you serialize the functions. As you've found out, pickle doesn't do serializing functions especially well. Aside from pickle, I don't know of a serialization format that allows serialization of functions. That's not to say that some don't exist I just can't think of one for the languages I use.

I think the answer to your problem is to change the way users interface with your library so that they do not serialize functions. Once you move away from serializing functions you could continue to use pickle or you could switch to JSON or YAML or whatever floats your boat.

SurgicalOntologist
Jun 17, 2004

In think you already know this, but to be clear, I just serialize a reference to the functions, then import them back upon loading.

I've racked my brain about this for weeks now and I can't think of a way to avoid saving the function reference. The callbacks are really the whole point, you might say it's a library for repeatedly calling a function while systematically varying the inputs.

The only alternatives I can think of are to use hardcoded names, such as always looking for the same filename with tagged functions in it (like fabfile.py in fabric). But that would be pretty much the exact same loading process, except grabbing the function's module from a library constant or class attribute instead of from inspecting the function itself. Not to mention making things more error-prone for the user, forcing them to segregate projects across directories and such.

Is there something I'm not thinking of? The callbacks need to be user defined, and need to be run across multiple sessions of the Python interpreter. There's no way around that.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...
Register your functions, store them in a dictionary with the information you need to reconstruct the callbacks. When you serialize, output this dictionary, and store a reference into this dictionary in place of the call-back?

SurgicalOntologist
Jun 17, 2004

That's exactly what I'm doing. Or am I misunderstanding... all I'm actually serializing is the information I need to reconstruct the callbacks. But people seem to be suggesting that I should be able to avoid even having to do that.

(Rereading, it wasn't clear in the last post. When I said I just serialize a "reference" to the functions, I meant a string telling me where to find it)

SurgicalOntologist fucked around with this message at 01:35 on Feb 19, 2014

Dren
Jan 5, 2001

Pillbug
You could sort of hard code the callback. The user would then supply a file named a certain thing with a certain function in it. Then your code runs that function.

e.g. callback.py
Python code:
def present_stimulus(session_data, experiment_data, congruent=False, display_time=0.1, **context):
    # The interesting part goes here.
    # Let's imagine a stimulus is presented, and a response is collected.
    return {'reaction_time': rt, 'correct': response==answer}

def experiment_callback(session_data, experiment_data, **kwargs):
    present_stimulus(session_data, experiment_data, **kwargs)
(been a while since I wrote stuff with **kwargs in it so check that I did that right).

and your sample experiment code:
Python code:
from experimentator.api import within_subjects_experiment
from experimentator.order import Shuffle

if __name__ == '__main__':
    independent_variables = {'congruent': [False, True],
                             'display_time': [0.1, 0.55, 1]}
    distractor_experiment = within_subjects_experiment(independent_variables,
                                                       n_participants=20,
                                                       ordering=Shuffle(10),
                                                       experiment_file='distractor.dat')
    # don't set the callback any more.  inside within_subjects_experiment import callbacks.py and set it up yourself
    distractor_experiment.save()

Nippashish
Nov 2, 2005

Let me see you dance!
I usually solve this by making the callback a class with a __call__ method , i.e.
code:
class MyCallback(object):
    def __call__(self, whatever, bro):
        return bro
which pickle handles better than raw functions (i.e. at all). This is a bit more annoying for the user though.

Adbot
ADBOT LOVES YOU

andyf
May 18, 2008

happy car is happy

Hi all,

Having what seems like a unicode problem, but I so far haven't figured out how to fix it. Not sure if it's a limitation of the functions used or just a decode/encode issue.

So I have a small piece of code to go load a web page, pull the <title> line and return it. The existing way uses urllib / lxml(etree), and that's what's causing me problems. Some pages return a messy result. I've since tested a second way using requests that seems to work great, but it means junking the existing urllib setup and some stuff used to handle that. What I'd ideally like is to carry on with urllib if at all possible (I'm writing a plugin for a fork of Scaev's skybot and he's got a lot of wrapping / handling stuff for making it easy to use urllib).

I'm using a steam URL to test, as it has stuff like a TM and copyright symbol in it.

urllib method:

Python code:
inp = 'http://store.steampowered.com/app/264803/'

request_url = get_html(inp)
titleget = request_url.xpath('//title/text()')[0]
titleuni = unicode(titleget.strip())
print titleuni

: Assassin&#12287;s Creed® IV Black Flag&#12258; &#12287; Freedom Cry on Steam
the get_html function chains some stuff together which I don't fully understand due to newness to python. ( https://github.com/rmmh/skybot/blob/master/plugins/util/http.py#L33 )


requests method:
Python code:
r = requests.get(inp)
responsetext = r.text
tree = etree.HTML(responsetext)
urltitle = tree.xpath('//title/text()')[0]
print urltitle

: Assassin’s Creed® IV Black Flag™ – Freedom Cry on Steam
Both titleuni and urltitle seem to come out as type 'lxml.etree._ElementUnicodeResult' so I'm not sure whats going on. Is requests reading the headers of the page and correctly setting it to utf-8, whereas urllib isn't? Is there a way to force an encoding type somewhere in the method? I've tried a bit of .decode('utf8','ignore) / .encode('utf8','ignore') while trying to return the output but it doesn't seem to do anything, I guess I need that to happen somewhere during the original handling of the page but I've no idea how to do that.

edit - well poo poo, a buddy knocked up a quick page with some unicode in it and that returns fine for both methods. This looks like it might be specific to the trademark / registered trademark symbols.

andyf fucked around with this message at 15:17 on Feb 20, 2014

  • Locked thread