Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

SurgicalOntologist: Jun 17, 2004

code:

df.groupby(df.index.dayofyear)

# ? Aug 2, 2017 05:49

Adbot: ADBOT LOVES YOU

# ? May 16, 2024 10:10

SirPablo: May 1, 2004; Pillbug

SurgicalOntologist posted:

code:

df.groupby(df.index.dayofyear)

Damnit that does work. Thanks for pointing out my idiocy. Suppose I can take the output values and make a new Dataframe with them.

What about restricting data to a specific window of days in the year? I know I can slice easily by year-mo (for example, df['1990-01':'1999-12']), but can you slice by mo-dd? It'd be nice to get all the data between, say, June 15 and Sep 30, like df['06-15':'09-30'] but for all years. Is that possible?

# ? Aug 2, 2017 08:24

Cingulate: Oct 23, 2012; by Fluffdaddy

SirPablo posted:

Damnit that does work. Thanks for pointing out my idiocy. Suppose I can take the output values and make a new Dataframe with them.

What about restricting data to a specific window of days in the year? I know I can slice easily by year-mo (for example, df['1990-01':'1999-12']), but can you slice by mo-dd? It'd be nice to get all the data between, say, June 15 and Sep 30, like df['06-15':'09-30'] but for all years. Is that possible?

https://stackoverflow.com/a/16179190 ?

# ? Aug 2, 2017 17:25

tractor fanatic: Sep 9, 2005; Pillbug

I have a list of lists of words, and a dict of word=>int. For each list, I'm trying to find min(dict[word] for word in list if word in dict). Is there a way to do this in numpy or some other high performance library?

# ? Aug 3, 2017 04:20

SurgicalOntologist: Jun 17, 2004

I think with pandas you could do

Python code:

word_values = pd.Series(values_dict).sort_index()
for words in word_lists:
    list_min = word_values[words].min()
    ...

Not sure how much more performant it would be, if any.

# ? Aug 3, 2017 04:42

huhu: Feb 24, 2006

Edit: nevermind.

# ? Aug 3, 2017 17:36

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

How do I avoid having import statements below my logging config? My IDE tells me all the import statements should be at the top, but I need to set the logging config before some of these libraries are loaded. This is for a little CLI python app.

code:

import logging
import logging.config

logging.config.dictConfig({
    'blah': 'blah'
})

import whateverlib

# rest of my app

# ? Aug 4, 2017 00:26

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

fletcher posted:

How do I avoid having import statements below my logging config? My IDE tells me all the import statements should be at the top, but I need to set the logging config before some of these libraries are loaded. This is for a little CLI python app.
code:
import logging
import logging.config

logging.config.dictConfig({
    'blah': 'blah'
})

import whateverlib

# rest of my app

Your IDE doesn't always know where import statements should go. Style guides are just guides.

# ? Aug 4, 2017 00:28

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Thermopyle posted:

Your IDE doesn't always know where import statements should go. Style guides are just guides.

Ah ok, I figured maybe there was some __init__.py thing I was supposed to be doing or something. Thanks for the tip!

# ? Aug 4, 2017 00:32

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Say i have a routine that searches for and gets a thing. You can search for a thing via a regex, via an int, or via a tuple. Which of the following do you prefer?

one

Python code:

 
def find_thing(filter):
  # code that type checks 'filter' and does the right thing.

two

Python code:

 
def find_thing_regex(regex):
def find_thing_int(an_int):
def find_thing_tuple(a_tuple):

three

Python code:

def find_thing(regex=None, an_int=None, a_tuple=None):

# ? Aug 5, 2017 19:00

Ghost of Reagan Past: Oct 7, 2003; rock and roll fun

One.

Otherwise, users of the function will have to do type-checking themselves and call the right function or pass in the right kwargs, when you could do that burdensome work for them and reduce the possibility of error/proliferation of ways of doing said checking.

Simple is better than complex.

# ? Aug 5, 2017 19:23

VikingofRock: Aug 24, 2008

Thermopyle posted:

Say i have a routine that searches for and gets a thing. You can search for a thing via a regex, via an int, or via a tuple. Which of the following do you prefer?

one
Python code:
 
def find_thing(filter):
  # code that type checks 'filter' and does the right thing.
two
Python code:
 
def find_thing_regex(regex):
def find_thing_int(an_int):
def find_thing_tuple(a_tuple):
three
Python code:
def find_thing(regex=None, an_int=None, a_tuple=None):

Python code:

def find_thing(predicate):

where predicate is a function/callable which takes as input elements of the space you are searching over and which returns true if a match is found.

If you don't want to do that though, I think the first function you posted has the cleanest interface so I would prefer that.

# ? Aug 5, 2017 19:28

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

I don't know why I didn't even think of using a predicate since most of my time is spent doing JS nowadays and thats a very common pattern there. Thanks for reminding me.

Now I accept a predicate, or build a predicate for you if you provide the other types I mentioned above.

# ? Aug 6, 2017 00:16

Eela6: May 25, 2007; Shredded Hen

Thermopyle posted:

Say i have a routine that searches for and gets a thing. You can search for a thing via a regex, via an int, or via a tuple. Which of the following do you prefer?

one
Python code:
 
def find_thing(filter):
  # code that type checks 'filter' and does the right thing.
two
Python code:
 
def find_thing_regex(regex):
def find_thing_int(an_int):
def find_thing_tuple(a_tuple):
three
Python code:
def find_thing(regex=None, an_int=None, a_tuple=None):

I would combine the first two approaches. The first approach is easiest for the user, the second is easiest for the developer.

code:

def find_thing(filter):
    if is_regex(filter):
        return _find_thing_regex(filter)
    elif is_instance(filter, int):
        return _find_thing_int(filter)
    elif is_instance(filter, tuple)
        return _find_thing_tuple(filter)
    else:
        raise TypeError("f'filter {filter!r} should be a regex, int, or tuple, but is a {type(filter)'})

Ed: A predicate is a fine approach.

# ? Aug 6, 2017 03:53

VikingofRock: Aug 24, 2008

Thermopyle posted:

I don't know why I didn't even think of using a predicate since most of my time is spent doing JS nowadays and thats a very common pattern there. Thanks for reminding me.

Now I accept a predicate, or build a predicate for you if you provide the other types I mentioned above.

Nice, that sounds like an excellent, convenient interface.

# ? Aug 6, 2017 06:19

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Might be a bit complicated if you're using predicate functions, but singledispatch is an option too

Python code:

from functools import singledispatch
import re

@singledispatch
def find_thing(arg):
    print('finding a whatever this is! Or not!')

@find_thing.register(int)
def _(arg):
    print('finding an int!')

@find_thing.register(tuple)
def _(arg):
    print("it's a tuple!")

@find_thing.register(re._pattern_type)
def _(arg):
    print("it's a regex! You monster")

# ? Aug 6, 2017 16:12

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Well gently caress, how did I not remember single dispatch?

I'm happy enough with what I've got going now, but I certainly agree with PEP-443 where they introduce single dispatch where it says:

quote:

In addition, it is currently a common anti-pattern for Python code to inspect the types of received arguments, in order to decide what to do with the objects.

For example, code may wish to accept either an object of some type, or a sequence of objects of that type. Currently, the "obvious way" to do this is by type inspection, but this is brittle and closed to extension.

Thats exactly the reason I always get the heebie-jeebies when I see a function body starting out with a bunch of isinstance checks.

Thermopyle fucked around with this message at 19:02 on Aug 6, 2017

# ? Aug 6, 2017 18:59

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

There's another pattern I've seen (forget the name) where you have a list of handlers and you just fire your arguments at them until one of them goes 'yeah I've got this'. That way all the type checking goes on in the handler classes and only for the types they handle, and adding more cases just means writing a handler and adding it to your list

Feels a bit Java but I saw it on Python examples sooo :shrug:

# ? Aug 6, 2017 19:14

Eela6: May 25, 2007; Shredded Hen

I completely forgot about singledispatch. I don't think I've ever used it, but maybe I should have.

# ? Aug 6, 2017 21:15

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

A Flask question, maybe just a style one.

I'm tidying up a Flask web-service and was just going to stick in the footer:

- The current number of records ("Currently storing 323,000 foos!")
- A link to the latest news item (which are also stored in the flask db)

But it's not kosher to stick this sort of logic in footer template. And it's another two queries for each page. And there's no need to call (and recall) these two queries for every page, as they don't have to be precisely up to date.

So what's the neatest way to do this?

# ? Aug 7, 2017 15:34

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

outlier posted:

A Flask question, maybe just a style one.

I'm tidying up a Flask web-service and was just going to stick in the footer:

- The current number of records ("Currently storing 323,000 foos!")
- A link to the latest news item (which are also stored in the flask db)

But it's not kosher to stick this sort of logic in footer template. And it's another two queries for each page. And there's no need to call (and recall) these two queries for every page, as they don't have to be precisely up to date.

So what's the neatest way to do this?

Cache the database results somewhere (memcache). Update the cached values when new news item posted and every X period of time depending on how accurate you want the number of records to be.

# ? Aug 7, 2017 16:02

accipter: Sep 12, 2003

Eela6 posted:

I completely forgot about singledispatch. I don't think I've ever used it, but maybe I should have.

I feel like I need to review functools to re-learn all of it's cool features.

# ? Aug 7, 2017 16:05

Tigren: Oct 3, 2003

outlier posted:

A Flask question, maybe just a style one.

I'm tidying up a Flask web-service and was just going to stick in the footer:

- The current number of records ("Currently storing 323,000 foos!")
- A link to the latest news item (which are also stored in the flask db)

But it's not kosher to stick this sort of logic in footer template. And it's another two queries for each page. And there's no need to call (and recall) these two queries for every page, as they don't have to be precisely up to date.

So what's the neatest way to do this?

Like Thermopyle said, you want to use a cache for this. Depending on what kind of traffic your Flask app will see, you can either use memcached or Werkzeug has a built in SimpleCache object.

# ? Aug 7, 2017 16:17

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

Tigren posted:

Like Thermopyle said, you want to use a cache for this. Depending on what kind of traffic your Flask app will see, you can either use memcached or Werkzeug has a built in SimpleCache object.

Noice. Thanks both.

# ? Aug 7, 2017 17:25

Jose: Jul 24, 2007; Adrian Chiles is a broadcaster and writer

i've started playing around with python for work mostly so i can make charts that can be tweeted out quickly or whatever. All i really want to do is import csv's so i can start using matplotlib stuff with them but finding an easy guide of doing this is really annoying me. I've spent a while playing with anaconda and i can import the csv using a couple of methods but they've all added stuff to the output. I'm just after either a decent guide using unicodecsv which seems to have replaced the csv package or the short bit of code that will get me an array or whatever where the first row of the csv is saved as column headers so I can stick things onto an x and y axis.

I only started doing the data camp courses that were free on friday so far

# ? Aug 14, 2017 15:37

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

Jose posted:

i've started playing around with python for work mostly so i can make charts that can be tweeted out quickly or whatever. All i really want to do is import csv's so i can start using matplotlib stuff with them but finding an easy guide of doing this is really annoying me. I've spent a while playing with anaconda and i can import the csv using a couple of methods but they've all added stuff to the output. I'm just after either a decent guide using unicodecsv which seems to have replaced the csv package or the short bit of code that will get me an array or whatever where the first row of the csv is saved as column headers so I can stick things onto an x and y axis.

I only started doing the data camp courses that were free on friday so far

You really want csv.DictReader out of the standard library for this. Unless there's unicode in your CSVs in which case unicodecsv should have a DictReader equivalent.

p

# ? Aug 14, 2017 15:55

Cingulate: Oct 23, 2012; by Fluffdaddy

If it's tabular data, just use pandas read_csv (and make plots in seaboard).

# ? Aug 14, 2017 16:08

Eela6: May 25, 2007; Shredded Hen

Jose posted:

i've started playing around with python for work mostly so i can make charts that can be tweeted out quickly or whatever. All i really want to do is import csv's so i can start using matplotlib stuff with them but finding an easy guide of doing this is really annoying me. I've spent a while playing with anaconda and i can import the csv using a couple of methods but they've all added stuff to the output. I'm just after either a decent guide using unicodecsv which seems to have replaced the csv package or the short bit of code that will get me an array or whatever where the first row of the csv is saved as column headers so I can stick things onto an x and y axis.

I only started doing the data camp courses that were free on friday so far

The csv module in the python standard library will work just fine for you. Here's an example to get you started.

Suppose we have a CSV of wedding guests, that contain their name, their relationship to the bride and groom, and their food preferences.

Here's our CSV:

code:

name,relationship,food_preferences
aaron,friend,kosher
amy,family,
steven,friend,vegetarian

It's saved as guests.csv

We want to process the list to find out which guests need what food. We can do so as follows:

IN

Python code:

import csv

needs_kosher, needs_vegetarian = [], []
with open('guests.csv') as guests_file:
    for guest in csv.DictReader(guests_file):
        if 'vegetarian' in guest['food_preferences']:
            needs_vegetarian.append(guest['name'])
        elif 'kosher' in guest['food_preferences']:
            needs_kosher.append(guest['name'])
        #vegetarian food is always kosher


print(f'needs_kosher: {needs_kosher}')
print(f'needs_vegetarian: {needs_vegetarian}')

OUT:

code:

needs_kosher: ['aaron']
needs_vegetarian: ['steven']

# ? Aug 14, 2017 17:39

Nippashish: Nov 2, 2005; Let me see you dance!

Cingulate posted:

If it's tabular data, just use pandas read_csv (and make plots in seaboard).

Seriously, do this. Do not even consider the csv reader in the stdlib. Pandas is infinitely superior.

# ? Aug 14, 2017 19:53

a witch: Jan 12, 2017

I don't think you need to drag along a bunch of fortran libraries (via numpy and scipy) to parse csv files.

# ? Aug 14, 2017 19:57

Nippashish: Nov 2, 2005; Let me see you dance!

You should though, especially since the next step is "make charts".

# ? Aug 14, 2017 19:58

Cingulate: Oct 23, 2012; by Fluffdaddy

Situations where I can see the hand-crafted option make sense:
- you're analysing data and tweeting on an embedded system running python 2.6 on 128MB RAM
- you want to Learn Python the Hard and Manly Way

Situations where I'd prefer the Pandas option:
- you want to get things done and learn the tools you're actually gonna use in real situations

Anything I forgot? :v:

Jose, in case "seaboard" confused you, I apologise: install anaconda and go here.

# ? Aug 14, 2017 20:11

Eela6: May 25, 2007; Shredded Hen

Nippashish posted:

Seriously, do this. Do not even consider the csv reader in the stdlib. Pandas is infinitely superior.

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.

Cingulate posted:

Situations where I can see the hand-crafted option make sense:
- you're analysing data and tweeting on an embedded system running python 2.6 on 128MB RAM
- you want to Learn Python the Hard and Manly Way

I am no fan of masochism. I just don't like solutions that are 'well, first install this dependency' for simple problems. I think you should use the level of tool that's appropriate for your problem.

If I'm using CSVs, I'm probably going to be doing it by hand. If I have a big dataset that requires the big guns,

1. I'm going to use xarray , not pandas
2. Why the hell am I using CSVs?

Eela6 fucked around with this message at 20:24 on Aug 14, 2017

# ? Aug 14, 2017 20:19

Cingulate: Oct 23, 2012; by Fluffdaddy

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.

But if you work with data and want to plot it, Pandas is the API you'll be learning in the end anyways. For data stuff, Pandas is more standardlib than the standardlib!

# ? Aug 14, 2017 20:24

Nippashish: Nov 2, 2005; Let me see you dance!

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.

The audience is someone who wants to load data from csvs to produce charts. Someone who is already using matplotlib and is enrolled in something called "data camp".

Pandas is a cornerstone of the python data science toolchain. Avoiding it is pretty much the data science equivalent of rolling your own web server.

# ? Aug 14, 2017 20:34

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Probably, most people who need to read csv's don't need pandas, but it certainly seems like this person could use it.

# ? Aug 14, 2017 20:38

vikingstrike: Sep 23, 2007; whats happening, captain

I also bet he's going to need to do manipulation/cleaning of the data before plotting which will be much easier in pandas, but sure reinvent the wheel.

# ? Aug 14, 2017 20:57

Cingulate: Oct 23, 2012; by Fluffdaddy

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.

I am no fan of masochism. I just don't like solutions that are 'well, first install this dependency' for simple problems. I think you should use the level of tool that's appropriate for your problem.

If I'm using CSVs, I'm probably going to be doing it by hand. If I have a big dataset that requires the big guns,

1. I'm going to use xarray , not pandas
2. Why the hell am I using CSVs?

The recommendation for using Python for data sciency stuff is to install Anaconda, which carries pandas.

(You're probably using CSVs/pandas because 1. the party/platform you got the data from uses CSVs/stuff pandas can easily parse.)

I don't think this is about "big guns". It's just so much more handy to use pandas and the standard scientific Python stack even for tiny data.

It seems like 1/3rd of questions ITT are solved by SurgicalOntologist posting a pandas one-liner.

# ? Aug 14, 2017 22:05

Eela6: May 25, 2007; Shredded Hen

Thermopyle posted:

Probably, most people who need to read csv's don't need pandas, but it certainly seems like this person could use it.

Fair enough.

# ? Aug 14, 2017 22:16

Adbot: ADBOT LOVES YOU

# ? May 16, 2024 10:10

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Speaking of Pandas, I run in to trouble when I need to create additional columns that are filled based on other column criteria. For example, if I have a CSV of:

code:

name,party_size,ticket_price
john,3,$14
sarah,1,$20
phil,6,$11

After I read that into Pandas, I then want to add two more columns. First column "More_Than_One" is Y/N based on party size being greater than 1. Next column is "Total_Cost" which is party_size * ticket_price.

How would I do something like that?

# ? Aug 15, 2017 02:46

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »