Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
SurgicalOntologist
Jun 17, 2004

code:
df.groupby(df.index.dayofyear)
?

Adbot
ADBOT LOVES YOU

SirPablo
May 1, 2004

Pillbug

SurgicalOntologist posted:

code:
df.groupby(df.index.dayofyear)
?

Damnit that does work. Thanks for pointing out my idiocy. Suppose I can take the output values and make a new Dataframe with them.

What about restricting data to a specific window of days in the year? I know I can slice easily by year-mo (for example, df['1990-01':'1999-12']), but can you slice by mo-dd? It'd be nice to get all the data between, say, June 15 and Sep 30, like df['06-15':'09-30'] but for all years. Is that possible?

Cingulate
Oct 23, 2012

by Fluffdaddy

SirPablo posted:

Damnit that does work. Thanks for pointing out my idiocy. Suppose I can take the output values and make a new Dataframe with them.

What about restricting data to a specific window of days in the year? I know I can slice easily by year-mo (for example, df['1990-01':'1999-12']), but can you slice by mo-dd? It'd be nice to get all the data between, say, June 15 and Sep 30, like df['06-15':'09-30'] but for all years. Is that possible?
https://stackoverflow.com/a/16179190 ?

tractor fanatic
Sep 9, 2005

Pillbug
I have a list of lists of words, and a dict of word=>int. For each list, I'm trying to find min(dict[word] for word in list if word in dict). Is there a way to do this in numpy or some other high performance library?

SurgicalOntologist
Jun 17, 2004

I think with pandas you could do

Python code:
word_values = pd.Series(values_dict).sort_index()
for words in word_lists:
    list_min = word_values[words].min()
    ...
Not sure how much more performant it would be, if any.

huhu
Feb 24, 2006
Edit: nevermind.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
How do I avoid having import statements below my logging config? My IDE tells me all the import statements should be at the top, but I need to set the logging config before some of these libraries are loaded. This is for a little CLI python app.

code:
import logging
import logging.config

logging.config.dictConfig({
    'blah': 'blah'
})

import whateverlib

# rest of my app

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

fletcher posted:

How do I avoid having import statements below my logging config? My IDE tells me all the import statements should be at the top, but I need to set the logging config before some of these libraries are loaded. This is for a little CLI python app.

code:
import logging
import logging.config

logging.config.dictConfig({
    'blah': 'blah'
})

import whateverlib

# rest of my app

Your IDE doesn't always know where import statements should go. Style guides are just guides.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Thermopyle posted:

Your IDE doesn't always know where import statements should go. Style guides are just guides.

Ah ok, I figured maybe there was some __init__.py thing I was supposed to be doing or something. Thanks for the tip!

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Say i have a routine that searches for and gets a thing. You can search for a thing via a regex, via an int, or via a tuple. Which of the following do you prefer?

one
Python code:
 
def find_thing(filter):
  # code that type checks 'filter' and does the right thing.
two
Python code:
 
def find_thing_regex(regex):
def find_thing_int(an_int):
def find_thing_tuple(a_tuple):
three
Python code:
def find_thing(regex=None, an_int=None, a_tuple=None):

Ghost of Reagan Past
Oct 7, 2003

rock and roll fun
One.

Otherwise, users of the function will have to do type-checking themselves and call the right function or pass in the right kwargs, when you could do that burdensome work for them and reduce the possibility of error/proliferation of ways of doing said checking.

Simple is better than complex.

VikingofRock
Aug 24, 2008




Thermopyle posted:

Say i have a routine that searches for and gets a thing. You can search for a thing via a regex, via an int, or via a tuple. Which of the following do you prefer?

one
Python code:
 
def find_thing(filter):
  # code that type checks 'filter' and does the right thing.
two
Python code:
 
def find_thing_regex(regex):
def find_thing_int(an_int):
def find_thing_tuple(a_tuple):
three
Python code:
def find_thing(regex=None, an_int=None, a_tuple=None):

Python code:
def find_thing(predicate):

where predicate is a function/callable which takes as input elements of the space you are searching over and which returns true if a match is found.

If you don't want to do that though, I think the first function you posted has the cleanest interface so I would prefer that.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I don't know why I didn't even think of using a predicate since most of my time is spent doing JS nowadays and thats a very common pattern there. Thanks for reminding me.

Now I accept a predicate, or build a predicate for you if you provide the other types I mentioned above.

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

Say i have a routine that searches for and gets a thing. You can search for a thing via a regex, via an int, or via a tuple. Which of the following do you prefer?

one
Python code:
 
def find_thing(filter):
  # code that type checks 'filter' and does the right thing.
two
Python code:
 
def find_thing_regex(regex):
def find_thing_int(an_int):
def find_thing_tuple(a_tuple):
three
Python code:
def find_thing(regex=None, an_int=None, a_tuple=None):

I would combine the first two approaches. The first approach is easiest for the user, the second is easiest for the developer.
code:
def find_thing(filter):
    if is_regex(filter):
        return _find_thing_regex(filter)
    elif is_instance(filter, int):
        return _find_thing_int(filter)
    elif is_instance(filter, tuple)
        return _find_thing_tuple(filter)
    else:
        raise TypeError("f'filter {filter!r} should be a regex, int, or tuple, but is a {type(filter)'})


Ed: A predicate is a fine approach.

VikingofRock
Aug 24, 2008




Thermopyle posted:

I don't know why I didn't even think of using a predicate since most of my time is spent doing JS nowadays and thats a very common pattern there. Thanks for reminding me.

Now I accept a predicate, or build a predicate for you if you provide the other types I mentioned above.

Nice, that sounds like an excellent, convenient interface.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Might be a bit complicated if you're using predicate functions, but singledispatch is an option too

Python code:
from functools import singledispatch
import re

@singledispatch
def find_thing(arg):
    print('finding a whatever this is! Or not!')

@find_thing.register(int)
def _(arg):
    print('finding an int!')

@find_thing.register(tuple)
def _(arg):
    print("it's a tuple!")

@find_thing.register(re._pattern_type)
def _(arg):
    print("it's a regex! You monster")

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Well gently caress, how did I not remember single dispatch?

I'm happy enough with what I've got going now, but I certainly agree with PEP-443 where they introduce single dispatch where it says:

quote:

In addition, it is currently a common anti-pattern for Python code to inspect the types of received arguments, in order to decide what to do with the objects.

For example, code may wish to accept either an object of some type, or a sequence of objects of that type. Currently, the "obvious way" to do this is by type inspection, but this is brittle and closed to extension.

Thats exactly the reason I always get the heebie-jeebies when I see a function body starting out with a bunch of isinstance checks.

Thermopyle fucked around with this message at 19:02 on Aug 6, 2017

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

There's another pattern I've seen (forget the name) where you have a list of handlers and you just fire your arguments at them until one of them goes 'yeah I've got this'. That way all the type checking goes on in the handler classes and only for the types they handle, and adding more cases just means writing a handler and adding it to your list

Feels a bit Java but I saw it on Python examples sooo :shrug:

Eela6
May 25, 2007
Shredded Hen
I completely forgot about singledispatch. I don't think I've ever used it, but maybe I should have.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
A Flask question, maybe just a style one.

I'm tidying up a Flask web-service and was just going to stick in the footer:

- The current number of records ("Currently storing 323,000 foos!")
- A link to the latest news item (which are also stored in the flask db)

But it's not kosher to stick this sort of logic in footer template. And it's another two queries for each page. And there's no need to call (and recall) these two queries for every page, as they don't have to be precisely up to date.

So what's the neatest way to do this?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

outlier posted:

A Flask question, maybe just a style one.

I'm tidying up a Flask web-service and was just going to stick in the footer:

- The current number of records ("Currently storing 323,000 foos!")
- A link to the latest news item (which are also stored in the flask db)

But it's not kosher to stick this sort of logic in footer template. And it's another two queries for each page. And there's no need to call (and recall) these two queries for every page, as they don't have to be precisely up to date.

So what's the neatest way to do this?

Cache the database results somewhere (memcache). Update the cached values when new news item posted and every X period of time depending on how accurate you want the number of records to be.

accipter
Sep 12, 2003

Eela6 posted:

I completely forgot about singledispatch. I don't think I've ever used it, but maybe I should have.

I feel like I need to review functools to re-learn all of it's cool features.

Tigren
Oct 3, 2003

outlier posted:

A Flask question, maybe just a style one.

I'm tidying up a Flask web-service and was just going to stick in the footer:

- The current number of records ("Currently storing 323,000 foos!")
- A link to the latest news item (which are also stored in the flask db)

But it's not kosher to stick this sort of logic in footer template. And it's another two queries for each page. And there's no need to call (and recall) these two queries for every page, as they don't have to be precisely up to date.

So what's the neatest way to do this?

Like Thermopyle said, you want to use a cache for this. Depending on what kind of traffic your Flask app will see, you can either use memcached or Werkzeug has a built in SimpleCache object.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Tigren posted:

Like Thermopyle said, you want to use a cache for this. Depending on what kind of traffic your Flask app will see, you can either use memcached or Werkzeug has a built in SimpleCache object.

Noice. Thanks both.

Jose
Jul 24, 2007

Adrian Chiles is a broadcaster and writer
i've started playing around with python for work mostly so i can make charts that can be tweeted out quickly or whatever. All i really want to do is import csv's so i can start using matplotlib stuff with them but finding an easy guide of doing this is really annoying me. I've spent a while playing with anaconda and i can import the csv using a couple of methods but they've all added stuff to the output. I'm just after either a decent guide using unicodecsv which seems to have replaced the csv package or the short bit of code that will get me an array or whatever where the first row of the csv is saved as column headers so I can stick things onto an x and y axis.

I only started doing the data camp courses that were free on friday so far

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Jose posted:

i've started playing around with python for work mostly so i can make charts that can be tweeted out quickly or whatever. All i really want to do is import csv's so i can start using matplotlib stuff with them but finding an easy guide of doing this is really annoying me. I've spent a while playing with anaconda and i can import the csv using a couple of methods but they've all added stuff to the output. I'm just after either a decent guide using unicodecsv which seems to have replaced the csv package or the short bit of code that will get me an array or whatever where the first row of the csv is saved as column headers so I can stick things onto an x and y axis.

I only started doing the data camp courses that were free on friday so far

You really want csv.DictReader out of the standard library for this. Unless there's unicode in your CSVs in which case unicodecsv should have a DictReader equivalent.

p

Cingulate
Oct 23, 2012

by Fluffdaddy
If it's tabular data, just use pandas read_csv (and make plots in seaboard).

Eela6
May 25, 2007
Shredded Hen

Jose posted:

i've started playing around with python for work mostly so i can make charts that can be tweeted out quickly or whatever. All i really want to do is import csv's so i can start using matplotlib stuff with them but finding an easy guide of doing this is really annoying me. I've spent a while playing with anaconda and i can import the csv using a couple of methods but they've all added stuff to the output. I'm just after either a decent guide using unicodecsv which seems to have replaced the csv package or the short bit of code that will get me an array or whatever where the first row of the csv is saved as column headers so I can stick things onto an x and y axis.

I only started doing the data camp courses that were free on friday so far

The csv module in the python standard library will work just fine for you. Here's an example to get you started.

Suppose we have a CSV of wedding guests, that contain their name, their relationship to the bride and groom, and their food preferences.

Here's our CSV:
code:
name,relationship,food_preferences
aaron,friend,kosher
amy,family,
steven,friend,vegetarian
It's saved as guests.csv

We want to process the list to find out which guests need what food. We can do so as follows:

IN
Python code:
import csv

needs_kosher, needs_vegetarian = [], []
with open('guests.csv') as guests_file:
    for guest in csv.DictReader(guests_file):
        if 'vegetarian' in guest['food_preferences']:
            needs_vegetarian.append(guest['name'])
        elif 'kosher' in guest['food_preferences']:
            needs_kosher.append(guest['name'])
        #vegetarian food is always kosher


print(f'needs_kosher: {needs_kosher}')
print(f'needs_vegetarian: {needs_vegetarian}')
OUT:
code:
needs_kosher: ['aaron']
needs_vegetarian: ['steven']

Nippashish
Nov 2, 2005

Let me see you dance!

Cingulate posted:

If it's tabular data, just use pandas read_csv (and make plots in seaboard).

Seriously, do this. Do not even consider the csv reader in the stdlib. Pandas is infinitely superior.

a witch
Jan 12, 2017

I don't think you need to drag along a bunch of fortran libraries (via numpy and scipy) to parse csv files.

Nippashish
Nov 2, 2005

Let me see you dance!
You should though, especially since the next step is "make charts".

Cingulate
Oct 23, 2012

by Fluffdaddy
Situations where I can see the hand-crafted option make sense:
- you're analysing data and tweeting on an embedded system running python 2.6 on 128MB RAM
- you want to Learn Python the Hard and Manly Way

Situations where I'd prefer the Pandas option:
- you want to get things done and learn the tools you're actually gonna use in real situations

Anything I forgot? :v:

Jose, in case "seaboard" confused you, I apologise: install anaconda and go here.

Eela6
May 25, 2007
Shredded Hen

Nippashish posted:

Seriously, do this. Do not even consider the csv reader in the stdlib. Pandas is infinitely superior.

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.

Cingulate posted:

Situations where I can see the hand-crafted option make sense:
- you're analysing data and tweeting on an embedded system running python 2.6 on 128MB RAM
- you want to Learn Python the Hard and Manly Way

I am no fan of masochism. I just don't like solutions that are 'well, first install this dependency' for simple problems. I think you should use the level of tool that's appropriate for your problem.

If I'm using CSVs, I'm probably going to be doing it by hand. If I have a big dataset that requires the big guns,

1. I'm going to use xarray , not pandas
2. Why the hell am I using CSVs?

Eela6 fucked around with this message at 20:24 on Aug 14, 2017

Cingulate
Oct 23, 2012

by Fluffdaddy

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.
But if you work with data and want to plot it, Pandas is the API you'll be learning in the end anyways. For data stuff, Pandas is more standardlib than the standardlib!

Nippashish
Nov 2, 2005

Let me see you dance!

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.

The audience is someone who wants to load data from csvs to produce charts. Someone who is already using matplotlib and is enrolled in something called "data camp".

Pandas is a cornerstone of the python data science toolchain. Avoiding it is pretty much the data science equivalent of rolling your own web server.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Probably, most people who need to read csv's don't need pandas, but it certainly seems like this person could use it.

vikingstrike
Sep 23, 2007

whats happening, captain
I also bet he's going to need to do manipulation/cleaning of the data before plotting which will be much easier in pandas, but sure reinvent the wheel.

Cingulate
Oct 23, 2012

by Fluffdaddy

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.


I am no fan of masochism. I just don't like solutions that are 'well, first install this dependency' for simple problems. I think you should use the level of tool that's appropriate for your problem.

If I'm using CSVs, I'm probably going to be doing it by hand. If I have a big dataset that requires the big guns,

1. I'm going to use xarray , not pandas
2. Why the hell am I using CSVs?
The recommendation for using Python for data sciency stuff is to install Anaconda, which carries pandas.

(You're probably using CSVs/pandas because 1. the party/platform you got the data from uses CSVs/stuff pandas can easily parse.)

I don't think this is about "big guns". It's just so much more handy to use pandas and the standard scientific Python stack even for tiny data.

It seems like 1/3rd of questions ITT are solved by SurgicalOntologist posting a pandas one-liner.

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

Probably, most people who need to read csv's don't need pandas, but it certainly seems like this person could use it.

Fair enough.

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Speaking of Pandas, I run in to trouble when I need to create additional columns that are filled based on other column criteria. For example, if I have a CSV of:
code:
name,party_size,ticket_price
john,3,$14
sarah,1,$20
phil,6,$11
After I read that into Pandas, I then want to add two more columns. First column "More_Than_One" is Y/N based on party size being greater than 1. Next column is "Total_Cost" which is party_size * ticket_price.

How would I do something like that?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply