Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Cingulate
Oct 23, 2012

by Fluffdaddy
Yep. Thanks.

Adbot
ADBOT LOVES YOU

Emacs Headroom
Aug 2, 2003

1000101 posted:

I think I'm going to go ahead with what you're suggesting. To make sure I understand you right, dump the actual json somewhere in the module path (say in a templates directory); when I need it, read it in as a dict then just set the values I need like I was setting them in a dictionary. Then return the final JSON just like before.

Yeah that's what I would do. I think it'll be a little more readable and more maintainable.

xpander
Sep 2, 2004
I'm working on a small toy app, and while I've mostly used Django, it occurred to me that this might be a great time to check out Flask(mostly API-driven, not much in the way of user-backed database stuff). In reading through their documentation, specifically in regards to Python 3, they recommend sticking to 2.x for the time being, due to some packages still needing work for 3 compatibility(Flask and Werkzeug are mentioned). I doubt this will be a big deal for the project I have in mind, but I'm just curious if this is still a thing in 2015, or it's mostly solved and the docs just haven't been updated(they're stamped with 2013). Any experiences or ruminations are welcome!

Dominoes
Sep 20, 2007

xpander posted:

I'm working on a small toy app, and while I've mostly used Django, it occurred to me that this might be a great time to check out Flask(mostly API-driven, not much in the way of user-backed database stuff). In reading through their documentation, specifically in regards to Python 3, they recommend sticking to 2.x for the time being, due to some packages still needing work for 3 compatibility(Flask and Werkzeug are mentioned). I doubt this will be a big deal for the project I have in mind, but I'm just curious if this is still a thing in 2015, or it's mostly solved and the docs just haven't been updated(they're stamped with 2013). Any experiences or ruminations are welcome!
Flask is now fully functional with Python 3.

Ghost of Reagan Past
Oct 7, 2003

rock and roll fun
So presently an app I'm spit-polishing uses a big pile of functions to do some basic processing work; the main Flask app takes the website data, passes it through to an interface function written to unify all the other code, which then passes the data off to some functions for building the right data structures (there are three different data structures I use for different purposes) and then doing some processing, finally spitting out the result for Flask to display. I'm wondering if abstracting some of this into something more object-oriented would be fruitful, since the biggest issue is that the functions end up requiring between 4-7 arguments to do all the work and the code can be somewhat ugly when that's all floating around (the functions are as separate as they can be at this point).

There are three ways I think this could be done to clean things up: one, throw all the parameters into a class and let the functions pull what they need out. Two, throw all the data structures into a class and pass that around. Three, build an all-encompassing class, instantiate. I don't like the third option that much, since it seems mostly pointless (everything would need to be called the exact same way, except now as an object! WOW!). The second is better, but there are three data structures I use and a single class for them seems gratuitous and dangerous. Finally, the first: I like it, but since different functions need different arguments, the only thing it would do is move the location of the ugliness. Now, there's something to be said for cleaning up the ugliness a bit, but since I don't have any real idea whether object-oriented design would be remotely useful in this case at all, I'm curious if this sounds like a good idea, or if it's just overcomplicating things.

Basically, I'm asking a more general question about designing Python applications: should I be wary of throwing around objects when it's not clear to me (an amateur) that there's a very good reason (I'm not sure slightly deuglifying code is a very good reason, but I could be wrong!)? And what are some ways to tell that there's a good reason for throwing around objects (beyond the obvious cases like representing objects and their properties)?

Ghost of Reagan Past fucked around with this message at 02:17 on Aug 21, 2015

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Ghost of Reagan Past posted:

So presently an app I'm spit-polishing uses a big pile of functions to do some basic processing work; the main Flask app takes the website data, passes it through to an interface function written to unify all the other code, which then passes the data off to some functions for building the right data structures (there are three different data structures I use for different purposes) and then doing some processing, finally spitting out the result for Flask to display. I'm wondering if abstracting some of this into something more object-oriented would be fruitful, since the biggest issue is that the functions end up requiring between 4-7 arguments to do all the work and the code can be somewhat ugly when that's all floating around (the functions are as separate as they can be at this point).

There are three ways I think this could be done to clean things up: one, throw all the parameters into a class and let the functions pull what they need out. Two, throw all the data structures into a class and pass that around. Three, build an all-encompassing class, instantiate. I don't like the third option that much, since it seems mostly pointless (everything would need to be called the exact same way, except now as an object! WOW!). The second is better, but there are three data structures I use and a single class for them seems gratuitous and dangerous. Finally, the first: I like it, but since different functions need different arguments, the only thing it would do is move the location of the ugliness. Now, there's something to be said for cleaning up the ugliness a bit, but since I don't have any real idea whether object-oriented design would be remotely useful in this case at all, I'm curious if this sounds like a good idea, or if it's just overcomplicating things.

Basically, I'm asking a more general question about designing Python applications: should I be wary of throwing around objects when it's not clear to me (an amateur) that there's a very good reason (I'm not sure slightly deuglifying code is a very good reason, but I could be wrong!)? And what are some ways to tell that there's a good reason for throwing around objects (beyond the obvious cases like representing objects and their properties)?

If you're not sure how best to start arranging things into classes, one thing you could do is this - you said that a lot of these functions take several parameters. Look at the parameter lists of your various functions. Are there any "groups" of parameters that occur together a lot? If so, that might be a clue that those parameters form the data members of a class.

Ghost of Reagan Past
Oct 7, 2003

rock and roll fun

Hammerite posted:

If you're not sure how best to start arranging things into classes, one thing you could do is this - you said that a lot of these functions take several parameters. Look at the parameter lists of your various functions. Are there any "groups" of parameters that occur together a lot? If so, that might be a clue that those parameters form the data members of a class.
This is sort of what I ended up doing. I just init'ed a bunch of variables for the parameters i needed, then added two methods for adding and checking some variables (and in one case doing some necessary work to it), and just set the rest to the right values. It didn't really clean up the code much, since I still gotta, at some point, access the variables, but it kind of centralized some of the stuff, which is nicer.

salisbury shake
Dec 27, 2011

Cingulate posted:


I don't assume I could somehow have something like
code:
things, stuff = [expensive_computation(thing) for thing in things]
?

You can also delay your expensive computation to the last minute with a generator expression

Python code:
tuple_gen = (expensive_computation(thing) for thing in things)
And then do what Kickbama suggested

quote:

things, stuff = zip(*tuple_gen)

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I realize this is a vague question but as a beginner Python hobbyist, should I be trying to learn IPython? Most of what I've read about it has been positive but I've been doing most of my coding in the default IDLE and Vim. I'm curious if I should try to add IPython to my tool set.

accipter
Sep 12, 2003

Hughmoris posted:

I realize this is a vague question but as a beginner Python hobbyist, should I be trying to learn IPython? Most of what I've read about it has been positive but I've been doing most of my coding in the default IDLE and Vim. I'm curious if I should try to add IPython to my tool set.

I typically use two different setups when I am programming Python. The first is PyCharm -- this is really an amazing IDE and has plugin for vim movement. The other is Vim with an IPython console. I don't use all of the IPython features, but it is really handy explore the objects and access the docstrings (? and ??).

salisbury shake
Dec 27, 2011

Hughmoris posted:

I realize this is a vague question but as a beginner Python hobbyist, should I be trying to learn IPython? Most of what I've read about it has been positive but I've been doing most of my coding in the default IDLE and Vim. I'm curious if I should try to add IPython to my tool set.

I found the ipython console to be invaluable when learning or doing much of anything in Python.

Dominoes
Sep 20, 2007

The Qtconsole version of Ipython's especially useful.

Cingulate
Oct 23, 2012

by Fluffdaddy
... and if you're doing data science, you should proceed straight to the iPython Notebook.

Dominoes
Sep 20, 2007

I recommend the first few videos in this Youtube series for learning Jupyter (Ipython Notebook), along with their accompanying notebooks, in the video descriptions.

Badly Jester
Apr 9, 2010


Bitches!

Cingulate posted:

... and if you're doing data science, you should proceed straight to the iPython Notebook Jupyter.

Jupyter replaces iPython Notebook in the new version.

Cingulate
Oct 23, 2012

by Fluffdaddy
Given a pandas series of length 1, I can do this
code:
list(the_series)[0]
to get just the value. However, surely there is a more pythonic way?

For what it's worth, the series originated from a DataFrame from which I have selected one specific cell.

DarthRoblox
Nov 25, 2007
*rolls ankle* *gains 15lbs* *apologizes to TFLC* *rolls ankle*...
How are you getting a series back from a single dataframe cell? Selecting a cell by index or iloc will return the value of that cell - are you doing something else?

Cingulate
Oct 23, 2012

by Fluffdaddy

Viking_Helmet posted:

How are you getting a series back from a single dataframe cell? Selecting a cell by index or iloc will return the value of that cell - are you doing something else?
In this case, I've first extracted a single row of the data frame to do various things with it. Then I select one column of that single-row frame. Though I guess you're pointing out the solution - using .loc or .iloc in the first place.

This has come up a few times now though.

The actual code in this case has however completely changed, looks like this now and is still really bad and slow:

code:
keys = ("date", "type", "thing")

outs = []
for title in df.title.unique():
    d_ = df[df["title"] == title]
    if len(d_.type.unique()) > 1:
        for name in d_["name"]:
            d__ = d_[d_.name == name]
            d = d__.loc[:,keys]
            other_date = (d_[d_["type"] == "foo"]["date"].mean()
                            if "bar" in d__["type"] else 
                            d_[d_["type"] == "bar"]["date"].mean())
            d["date_diff"] = list(d__["date"] - other_date)[0]
            outs.append(d)
df2 = pd.concat(outs)

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

When you're using a library that provides objects whose data is provided by an online bank for personal checking/savings which of the following designs is most appealing to you?

1. Getters

The object has methods like get_foo() which do an HTTP request(s) each time you call them.

2. Lazy properties

The object has properties like obj.foo which do an HTTP request(s) the first time you access them and then always return that value.

3. Regular properties

Same as lazy properties except do the HTTP request(s) every time you access them.

4. Configurable caching, lazy properties

Per object configurable time period that causes lazy properties to do HTTP requests the first time you access them, and then return that value every time for up to X period of time and then do an HTTP request again.

5. Something else?

--------

The HTTP requests are pretty slow. This banks online services have always been slow as poo poo.

I can pretty easily provide them all, but I usually prefer opinionated libraries. Anyway, this isn't for anything super serious, I'm mostly just experimenting with API designs. However, assume this library is written to be used far and wide.

Stringent
Dec 22, 2004


image text goes here

Thermopyle posted:

When you're using a library that provides objects whose data is provided by an online bank for personal checking/savings which of the following designs is most appealing to you?

1. Getters

The object has methods like get_foo() which do an HTTP request(s) each time you call them.

2. Lazy properties

The object has properties like obj.foo which do an HTTP request(s) the first time you access them and then always return that value.

3. Regular properties

Same as lazy properties except do the HTTP request(s) every time you access them.

4. Configurable caching, lazy properties

Per object configurable time period that causes lazy properties to do HTTP requests the first time you access them, and then return that value every time for up to X period of time and then do an HTTP request again.

5. Something else?

--------

The HTTP requests are pretty slow. This banks online services have always been slow as poo poo.

I can pretty easily provide them all, but I usually prefer opinionated libraries. Anyway, this isn't for anything super serious, I'm mostly just experimenting with API designs. However, assume this library is written to be used far and wide.

Of the options you listed I think I'd like getters with configurable caching defaulting to none, but no suggestions of my own.

brosmike
Jun 26, 2009
When I write code against your library, I want it to be trivial for a maintainer to identify things like "is this operation safe for my UI thread", "when did this information come from", and "is it safe to assume that the value I got in the outer function scope is the same as the one two layers deeper". This makes me prefer that most of the data be in uninteresting (possibly immutable) data bags, and that the operations doing network activity that output the uninteresting models be distinct functions with searchable names. This also makes it less likely that I will have to do unnecessary wrapping to make my code against your model sanely testable.

Rime
Nov 2, 2011

by Games Forum
So this launches a window which displays the correct @ character, but the window then immediately freezes up and must be terminated. Any idea why? Windows 10, Python 2.7.

code:
def handle_keys():
  key = libtcod.console_check_for_keypress()
  if key.vk == libtcod.KEY_ENTER and key.lalt:
    libtcod.console_set_fullscreen(not libtcod.console_is_fullscreen)

  elif key.vk == libtcod.KEY_ESCAPE:
    return True #exit game

  global playerx, playery

  #movementkeys
  if libtcod.console_is_key_pressed(libtcod.KEY_UP):
    player -= 1

  elif libtcod.console_is_key_pressed(libtcod.KEY_DOWN):
    player += 1

  elif libtcod.console_is_key_pressed(libtcod.KEY_LEFT):
    player -= 1

  elif libtcod.console_is_key_pressed(libtcod.KEY_RIGHT):
    player += 1



while not libtcod.console_is_window_closed():
  libtcod.console_set_default_foreground(0, libtcod.white)
  libtcod.console_put_char(0,1,1,'@',libtcod.BKGND_NONE)
  libtcod.console_flush()

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Rime posted:

So this launches a window which displays the correct @ character, but the window then immediately freezes up and must be terminated. Any idea why? Windows 10, Python 2.7.

code:
#snip

You probably need to call libtcod.console_wait_for_keypress(True) at the end of the loop. Right now it looks like it's just drawing an @ over and over again as quickly as it can.

Look at https://kooneiform.wordpress.com/2009/03/29/241/ for an example

Rime
Nov 2, 2011

by Games Forum
Yup, that's the ticket. Dur dur dur. :v:

MrMoo
Sep 14, 2000

I wrote a webapp that dumps out a big piece of JSON for historical queries on market data, for example Facebook (NASDAQ:FB). The predominant client app is written in R living on OpenCPU.org and thus the data set is savagely optimized to parse as an R data frame. Today I looked at NumPy and Pandas in Python and whilst the latter was ok-ish the former I have no idea.

R code:
library(RJSONIO)
library(RCurl)

op <- options(digits.secs=3)
list <- fromJSON(getURL("http://miru.hk/tmp/FB.O.json"), nullValue = NA)
list$timeseries[[1]] <- strptime(list$timeseries[[1]], "%Y-%m-%dT%H:%M:%OSZ", tz="GMT")
df <- data.frame(list$timeseries)
names(df) <- list$fields
print(df)
Produces the following output (truncated):
code:
$ Rscript tas.R | head -n 10
Loading required package: methods
Loading required package: bitops
                  datetime   ASK ASKSIZE   BID BIDSIZE GV4_TEXT IRGCOND IRGXID
1  2015-06-15 20:00:00.031 80.71    3600 80.70    9700     <NA>    <NA>   <NA>
2  2015-06-15 20:00:00.265 80.73     200 80.71    8700     <NA>    <NA>   <NA>
3  2015-06-15 20:00:01.000 80.75     700 80.70     400     @ TW    <NA>    ADF
4  2015-06-15 20:00:03.000 80.75     700 80.70     400     @ TI     ODT    ADF
5  2015-06-15 20:00:04.000 80.75     700 80.70     400       T      132    ADF
6  2015-06-15 20:00:05.000 80.75     700 80.68     100       T      132    ADF
7  2015-06-15 20:00:28.000 80.84     400 80.67    1000     @ TI     ODT    ADF
8  2015-06-15 20:00:31.000 80.84     400 80.67    1000     @ TW    <NA>    ADF
9  2015-06-15 20:01:17.000 80.81     700 80.76     700       T      132    PSE
...
Python code:
#!/usr/bin/python
import urllib
import json
import pandas as pd

result = json.load(urllib.urlopen("http://miru.hk/tmp/FB.O.json"))
df = pd.DataFrame(result["timeseries"], index=result["fields"])
print df
Produces the following output (truncated):
code:
                                  0                         1   \
datetime    2015-06-15T20:00:00.031Z  2015-06-15T20:00:00.265Z   
ASK                            80.71                     80.73   
ASKSIZE                         3600                       200   
BID                             80.7                     80.71   
BIDSIZE                         9700                      8700   
GV4_TEXT                        None                      None
...

[20 rows x 80 columns]
>>> df.transpose()
                    datetime    ASK ASKSIZE    BID BIDSIZE GV4_TEXT IRGCOND  \
0   2015-06-15T20:00:00.031Z  80.71    3600   80.7    9700     None    None   
1   2015-06-15T20:00:00.265Z  80.73     200  80.71    8700     None    None   
2   2015-06-15T20:00:01.000Z  80.75     700   80.7     400     @ TW    None   
3   2015-06-15T20:00:03.000Z  80.75     700   80.7     400     @ TI     ODT
...
Python code:
#!/usr/bin/python
import urllib
import json
import numpy as np

result = json.load(urllib.urlopen("http://miru.hk/tmp/FB.O.json"))
print np.array(result["timeseries"][0], dtype='datetime64')
print np.fromiter(result["timeseries"][1], np.float)
Produces the following output (truncated):
code:
['2015-06-15T16:00:00.031-0400' '2015-06-15T16:00:00.265-0400'
 '2015-06-15T16:00:01.000-0400' '2015-06-15T16:00:03.000-0400'
 '2015-06-15T16:00:04.000-0400' '2015-06-15T16:00:05.000-0400'
...
[ 80.71  80.73  80.75  80.75  80.75  80.75  80.84  80.84  80.81  80.8   80.8
  80.76  80.73  80.76  80.76  80.76  80.76  80.75  80.75  80.75  80.75
  80.75  80.75  80.75  80.75  80.75  80.75  80.75  80.75  80.74  80.74
...
My questions are:

1) Why does the data frame in Pandas looks like it is transposed? I have to call df.transpose() to view it in the same form as R.
2) Is it possible and how to build a multi-format friendly table in NumPy without hard coding the columns?

MrMoo fucked around with this message at 23:47 on Aug 26, 2015

accipter
Sep 12, 2003

MrMoo posted:

My questions are:

1) Why does the data frame in Pandas looks like it is transposed? I have to call df.transpose() to view it in the same form as R.
2) Is it possible and how to build a multi-format friendly table in NumPy without hard coding the columns?

Look at the time series data. It is provided in columns, not rows. You can provide it to pandas in rows by the following:
Python code:
df = pd.DataFrame.from_items(zip(result['fields'], result['timeseries']))
If you want to use a NumPy table use a recarray, but I think pandas.DataFrames are easier to use.

Dominoes
Sep 20, 2007

-

Dominoes fucked around with this message at 00:53 on Aug 27, 2015

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Lots of people posted:

Learn IPython

Thanks for the advice/tips. I'll take a look at the videos posted.

MrMoo
Sep 14, 2000

accipter posted:

Look at the time series data. It is provided in columns, not rows. You can provide it to pandas in rows by the following:
Python code:
df = pd.DataFrame.from_items(zip(result['fields'], result['timeseries']))
If you want to use a NumPy table use a recarray, but I think pandas.DataFrames are easier to use.

Thanks, looks like I have the NumPy version with this:

Python code:
#!/usr/bin/python
import urllib
import json
import numpy as np

result = json.load(urllib.urlopen("http://miru.hk/tmp/FB.O.json"))
r = np.core.records.fromarrays(result['timeseries'], names = result['fields'])
On an interactive console,
Python code:
>>> r[1]
(u'2015-06-15T20:00:00.265Z', 80.730000000000004, 200, 80.709999999999994, 8700, 
  None, None, None, 0.0, u'@6 X', -1.0058, u'\xde', 72000265, 1757748, u'15 JUN 2015', 
  80.709999999999994, 1185258, u'NAS', 0.0, 80.495999999999995)
>>> r.ASK
array([ 80.71,  80.73,  80.75,  80.75,  80.75,  80.75,  80.84,  80.84,
        80.81,  80.8 ,  80.8 ,  80.76,  80.73,  80.76,  80.76,  80.76,
        80.76,  80.75,  80.75,  80.75,  80.75,  80.75,  80.75,  80.75,
        80.75,  80.75,  80.75,  80.75,  80.75,  80.74,  80.74,  80.73,
        80.73,  80.73,  80.73,  80.73,  80.74,  80.74,  80.74,  80.75,
        80.75,  80.75,  80.75,  80.75,  80.74,  80.74,  80.72,  80.72,
        80.72,  80.72,  80.7 ,  80.7 ,  80.7 ,  80.73,  80.73,  80.73,
        80.72,  80.72,  80.72,  80.72,  80.71,  80.7 ,  80.7 ,  80.7 ,
        80.7 ,  80.72,  80.72,  80.7 ,  80.72,  80.72,  80.71,  80.71,
        80.71,  80.72,  80.72,  80.72,  80.72,  80.72,  80.72,  80.72])
Looks like the timestamp needs something, but the docs imply rather dubious date and time support. The following attempts conversion but fails to pick up the "Z" in the timestamp for UTC and forces the local time zone.

Python code:
>>> np.asarray(r.datetime, dtype='datetime64')
array(['2015-06-15T16:00:00.031-0400', '2015-06-15T16:00:00.265-0400',
       '2015-06-15T16:00:01.000-0400', '2015-06-15T16:00:03.000-0400',
       '2015-06-15T16:00:04.000-0400', '2015-06-15T16:00:05.000-0400',
...
       '2015-06-15T19:44:51.000-0400', '2015-06-15T19:54:51.000-0400'], dtype='datetime64[ms]')

MrMoo fucked around with this message at 02:16 on Aug 27, 2015

Nippashish
Nov 2, 2005

Let me see you dance!
If you have tables with columns of heterogeneous type, or your columns have names, or you have missing values, or really if your data is anything except a big contiguous block of floating point numbers, then you should put it in a pandas DataFrame.

You can try to use numpy for things that are not big contiguous blocks of floating point numbers, and there is even kind of support for doing that (as you have found), but literally no one actually does that in real life.

MrMoo
Sep 14, 2000

Pandas has more success with ISO 8601, no doubt with '15 JUN 2015' I have to fix it server side.

Python code:
>>> df.datetime = df.datetime.astype("datetime64")
>>> df[:3]
                 datetime    ASK  ASKSIZE    BID  BIDSIZE GV4_TEXT IRGCOND 
0 2015-06-15 20:00:00.031  80.71     3600  80.70     9700     None    None   
1 2015-06-15 20:00:00.265  80.73      200  80.71     8700     None    None   
2 2015-06-15 20:00:01.000  80.75      700  80.70      400     @ TW    None

DarthRoblox
Nov 25, 2007
*rolls ankle* *gains 15lbs* *apologizes to TFLC* *rolls ankle*...

MrMoo posted:

Pandas has more success with ISO 8601, no doubt with '15 JUN 2015' I have to fix it server side.

Python code:
>>> df.datetime = df.datetime.astype("datetime64")
>>> df[:3]
                 datetime    ASK  ASKSIZE    BID  BIDSIZE GV4_TEXT IRGCOND 
0 2015-06-15 20:00:00.031  80.71     3600  80.70     9700     None    None   
1 2015-06-15 20:00:00.265  80.73      200  80.71     8700     None    None   
2 2015-06-15 20:00:01.000  80.75      700  80.70      400     @ TW    None

Use pandas.to_datetime(), it's good at parsing pretty much whatever I've tried to throw at it:

Python code:
>>> weird_date_formats = pd.DataFrame([['2015.1.1 10:00:00 PM'],
                                       ['JAN 1 2015 22:00:00'],
                                       ['1/1/2015 22:00:00Z'],
                                       ['1 January 2015 10:00:00 PM']], columns=[['dates']])
>>> pd.to_datetime(weird_date_formats.dates)

0   2015-01-01 22:00:00
1   2015-01-01 22:00:00
2   2015-01-01 22:00:00
3   2015-01-01 22:00:00
Name: dates, dtype: datetime64[ns]

Stringent
Dec 22, 2004


image text goes here
So I just ran across the genesis of this in 2.7, and I figured I'd post a something about it. If this is super well known by everyone just scroll on by, but I wasn't aware of it and it led to a careless omission in a coworker's code causing a nasty little bug.

Python code:
def buttes(butte, donge=None):
    print donge

buttes('butte', donge='donge')
# prints donge

buttes('butte', 'donge')
# prints donge

buttes('butte', 'fudge', donge='donge')
# TypeError: buttes() got multiple values for keyword argument 'donge'
We had to duplicate a method in another class and rather than
Python code:
def buttes(butte, donge=None):
the method signature became
Python code:
def buttes(butte, donge):
which it just passed into the original buttes method.
Python code:
def buttes(butte, donge):
    original.buttes(butte, donge)
Not gonna go into how this caused problems elsewhere, but it did. So if you're like me and didn't know about this, beware.

That said, do any of you know why this works this way? Is it just so you can assign defaults to positional arguments or is there another reason? Is it just me that sees implicit conversion of a positional argument to a keyword argument as kinda weird?

Plasmafountain
Jun 17, 2008

I've got an idea for a thing that involves a lot of things that I've never used before and thought I would ask for some help.

I'm working at a company that uses a CFD program that largely operates as a black box, not giving any information until the run is finished. After poking through the files that the program generates while it is running, it generates and stores a lot of data in a bunch of Notepad - readable files delimited by column, headers in text and values for various properties of the flow. I think this is ripe for a program/script that reads each file and plots each (or a selection of) parameters as they are generated using Matplotlib. This way we can see if the simulation is worth continuing early on instead of having to wait until the end.

1) The files are not csv files, but .in and .out . Will a module like csvwrite happily take it anyway if its in a clearly delimited format?

2) What else could I do to read this file?

3) I'm used to working with arrays of numbers in numpy but these files will have a couple of header rows (column number, Parameter name, units) and then a list of floats in scientific notation. Is it ok to use text and floats in an array or am I better off using a different way of storing these?

4) The CFD program updates these files continuously - how would I make my script/program check the files for updates? I think its a pretty simple thing to do to simply check the files every ten seconds or so and generate new images by wrapping the entire thing in a timed loop, but I think that computationally instead of reading and plotting values from (number of files) x 15 variables x 40000 iterations every ten seconds its probably easier to simply append the new values to the arrays (or other device) where they are stored.

Any pointers?

Cingulate
Oct 23, 2012

by Fluffdaddy

Zero Gravitas posted:

I've got an idea for a thing that involves a lot of things that I've never used before and thought I would ask for some help.

I'm working at a company that uses a CFD program that largely operates as a black box, not giving any information until the run is finished. After poking through the files that the program generates while it is running, it generates and stores a lot of data in a bunch of Notepad - readable files delimited by column, headers in text and values for various properties of the flow. I think this is ripe for a program/script that reads each file and plots each (or a selection of) parameters as they are generated using Matplotlib. This way we can see if the simulation is worth continuing early on instead of having to wait until the end.

1) The files are not csv files, but .in and .out . Will a module like csvwrite happily take it anyway if its in a clearly delimited format?

2) What else could I do to read this file?

3) I'm used to working with arrays of numbers in numpy but these files will have a couple of header rows (column number, Parameter name, units) and then a list of floats in scientific notation. Is it ok to use text and floats in an array or am I better off using a different way of storing these?

4) The CFD program updates these files continuously - how would I make my script/program check the files for updates? I think its a pretty simple thing to do to simply check the files every ten seconds or so and generate new images by wrapping the entire thing in a timed loop, but I think that computationally instead of reading and plotting values from (number of files) x 15 variables x 40000 iterations every ten seconds its probably easier to simply append the new values to the arrays (or other device) where they are stored.

Any pointers?
I'm really a one-trick pony, but: you probably want to use Pandas. It has a good read_csv method and comfortable plotting functions.

There's also an append method.

Dominoes
Sep 20, 2007

Do y'all know if PyCharm has a shortcut for unicode character entry similar to iPython's QtConsole and Jupyter? Ie backslash + name +tab

tef
May 30, 2004

-> some l-system crap ->

Ghost of Reagan Past posted:

I'm wondering if abstracting some of this into something more object-oriented would be fruitful, since the biggest issue is that the functions end up requiring between 4-7 arguments to do all the work and the code can be somewhat ugly when that's all floating around (the functions are as separate as they can be at this point).

If you have more than 6 you've probably missed one (Kernighan or Ritchie, probably (maybe Plaugher))

quote:

There are three ways I think this could be done to clean things up: one, throw all the parameters into a class and let the functions pull what they need out. Two, throw all the data structures into a class and pass that around. Three, build an all-encompassing class, instantiate. I don't like the third option that much, since it seems mostly pointless (everything would need to be called the exact same way, except now as an object! WOW!). The second is better, but there are three data structures I use and a single class for them seems gratuitous and dangerous. Finally, the first: I like it, but since different functions need different arguments, the only thing it would do is move the location of the ugliness. Now, there's something to be said for cleaning up the ugliness a bit, but since I don't have any real idea whether object-oriented design would be remotely useful in this case at all, I'm curious if this sounds like a good idea, or if it's just overcomplicating things.

It depends how these functions compose and interact. You may find it easier to take option three: one big rear end class with the state and methods, and a simpler interface on top for what the program does, or at least a simpler api to call and interact with it. From there you may find it easier to slowly clump things together.

quote:

Basically, I'm asking a more general question about designing Python applications: should I be wary of throwing around objects when it's not clear to me (an amateur) that there's a very good reason (I'm not sure slightly deuglifying code is a very good reason, but I could be wrong!)? And what are some ways to tell that there's a good reason for throwing around objects (beyond the obvious cases like representing objects and their properties)?

The reason other languages (Java, Ruby especially) use objects all the goddam time is because they have no other choice. Instead of a function do_thing we have a tyranny of ThingDoer objects with a do_thing method. Don't feel you have to use classes to get things done in python, but there are some good reasons to use classes:

To hide a decision that you might want to change in future, to hide a difficult mechanism (eg requests vs urllib2)

To spread out the moving parts, so they don't touch each other (but this requires care, it's still easy to have all the methods in different files but yet touching all the same datastructures at runtime)

To pull out some feature into stand alone code.

The code shouldn't try to be more elegant than the problem it solves. Many problems are ugly, clunky, and clumsy, and it's not unheard of for some numerical computation methods to be pages upon pages of work (See also automation scripting). Sometimes code is best left to be ugly and rudimentary over stylish with a veneer of elegance.

keanu
Jul 27, 2013

Thermopyle posted:

When you're using a library that provides objects whose data is provided by an online bank for personal checking/savings which of the following designs is most appealing to you?

A get method. If you want to implement caching, give the caller a way to clear the cache.

I feel like people discover properties and then want to use them for everything. I know that's exactly what I did when I discovered them (I did the same with decorators). They're definitely neat and very useful in certain situations but they're not a replacement for methods. In my opinion, accessing a property should not have any side effects (i.e. no web requests or cache updating) and the value returned shouldn't change unless the underlying object changes (which, I suppose, falls under the no effects rule). Ideally, you should be able to get away with treating properties like attributes.

If I'm using your library I don't want to be checking the docs every few minutes to make sure the attribute I'm accessing isn't actually a property that's going to go do some expensive IO under the hood. I also don't want the behavior/cost of the operation to change so significantly depending on the state of the underlying object. "Explicit is better than implicit" and all that jazz.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

keanu posted:

A get method. If you want to implement caching, give the caller a way to clear the cache.

I feel like people discover properties and then want to use them for everything. I know that's exactly what I did when I discovered them (I did the same with decorators). They're definitely neat and very useful in certain situations but they're not a replacement for methods. In my opinion, accessing a property should not have any side effects (i.e. no web requests or cache updating) and the value returned shouldn't change unless the underlying object changes (which, I suppose, falls under the no effects rule). Ideally, you should be able to get away with treating properties like attributes.

If I'm using your library I don't want to be checking the docs every few minutes to make sure the attribute I'm accessing isn't actually a property that's going to go do some expensive IO under the hood. I also don't want the behavior/cost of the operation to change so significantly depending on the state of the underlying object. "Explicit is better than implicit" and all that jazz.

Yep. The person I'm working with keeps wanting to make everything a caching, lazy property like I described.

What we compromised on is a base API which implements get_* methods, and then provide some helper classes to allow end users to make their own caching, lazy properties if they want to use them that way.

Adbot
ADBOT LOVES YOU

BigRedDot
Mar 6, 2008

Stringent posted:

code:
buttes('butte', 'fudge', donge='donge')
# TypeError: buttes() got multiple values for keyword argument 'donge'
That said, do any of you know why this works this way? Is it just so you can assign defaults to positional arguments or is there another reason? Is it just me that sees implicit conversion of a positional argument to a keyword argument as kinda weird?

You have a misunderstanding of function arguments. There is nothing at all about a function definition that makes an argument positional or keyword. There are only arguments, some or all of which may also have default values. The positional/keyword distinction is solely, only, entirely determined by how you pass arguments when you call the function:
code:
# nothing here is positional or keyword yet
def foo(bar, baz):
    pass

foo(1, 2)         # bar, baz supplied as positional args

foo(1, baz=2)     # bar positional, baz keyword

foo(bar=1, baz=2) # bar, baz supplied as keyword args 
Given that understanding, in this case:
code:
buttes('butte', 'fudge', donge='donge')
# TypeError: buttes() got multiple values for keyword argument 'donge'
What other reasonable behaviour is possible?

Edit: FWIW I think this is probably a common confusion, given that the spelling of default arguments in function definitions is basically the same as the spelling of passing keyword arguments in function invocations. But they are two separate, different things.

BigRedDot fucked around with this message at 22:16 on Aug 29, 2015

  • Locked thread