Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Ice For My Nuts: May 22, 2012

Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method?

I'm just starting out and I'd like to actually grasp how to utilize a data structure with variables that I can feed into a method.

# ? Jan 26, 2016 09:17

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 04:55

Dominoes: Sep 20, 2007

Ice For My Nuts posted:

Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method?

I'm just starting out and I'd like to actually grasp how to utilize a data structure with variables that I can feed into a method.

More specific, please.

# ? Jan 26, 2016 11:42

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Yeah, that's way too general of a question, not far removed from "how do I write a program?".

Sometimes those of us with more experience have a really hard time grasping what it's like to be someone without all the experience. I can grasp on some intellectual level that not having the right knowledge makes it hard to ask the right questions, but only in my most lucid, introspective moments do I feel like I can remember what it's really like...and I feel like programming is particularly unique in this area.

# ? Jan 26, 2016 15:16

floppo: Aug 24, 2005

Ice For My Nuts posted:

Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method?

I'm just starting out and I'd like to actually grasp how to utilize a data structure with variables that I can feed into a method.

How about something like this: You have two partitions of a set G (of elements a_1, a_2,...a_n), call them Y and Z. We store this information in two dicts: Ydict and Zdict, where keys are elements of G, and values identify which subset of the partition the element is in.

Here is an example, where we have fruits and veggies of different colors. We have a fruits/veggies partition and a red/green partition. We want to know how often the partitions classifies our food in the same way (ie how well can a color classification proxy for knowing what is a fruit or veggie).

code:

G = ['apple','red_pepper','strawberry','celery','broccoli','tomato','raspberry']

Y = {'apple':'red', 'red_pepper':'red',
'strawberry':'red','celery':'green',
'tomato':'red','celery':'green'}

Z = {'apple':'fruit', 'red_pepper':'veggie',
'strawberry':'fruit','celery':'veggie',
'tomato':'fruit','celery':'veggie'}

#create a counter that will track how many matches we have
matches = 0
#enumerate the pairs of foods
import itertools
food_pairs = list(itertools.combinations(G,2))

#now we access the dictionaries
for pair in food_pairs:
	if Y[pair[0]]=Y[pair[1]]:
		if Z[pair[0]]=Z[pair[1]]:
			matches+=1

print (matches)

This will print the number of times our two classifiers categorize pairs of foods in the same way. There are three other possibilities for a pair of foods:
1) matching in fruit/veggie and mismatching in red/green (ie (red_pepper,celery))
2) matching in red/green and mismatching in fruit/veggie (ie (red_pepper,strawberry))
3) mismatching in both (ie (strawberry,celery))

You could write a function that takes a list of items and two dictionaries classifying them in different ways, and outputs these four numbers - which can be used to generate a measure of the distance between two partitions (ie the Jaccard Index). As the others mentioned - the question is kind of vague, but perhaps this is what you are looking for.

# ? Jan 27, 2016 11:42

Ice For My Nuts: May 22, 2012

floppo hit the nail on the head, thank you.

# ? Jan 27, 2016 21:35

Bozart: Oct 28, 2006; Give me the finger.

If that is the kind of thing you are interested in, then pandas dataframe is a generalization of sets called a relation, well suited to that kind of problem.

# ? Jan 28, 2016 02:52

politicorific: Sep 15, 2007

Hi, I'm trying to grab some air quality data using beautiful soup from this page, but I'm not sure what I'm doing wrong.

Visit this page, I'm trying to get the column under "list" and the AQI - the biggest rectangular box on the page.

http://aqicn.org/city/newyork

code:

from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

r="http://aqicn.org/city/newyork/"
page = urllib.request.urlopen(r).read()
soup = BeautifulSoup(page)

print(soup.find_all(id="cur_pm25"))
print(soup.find_all(id="cur_pm10"))
print(soup.find_all(id="cur_o3"))
print(soup.find_all(id="cur_no2"))
print(soup.find_all(id="cur_co"))
print(soup.find_all(id="cur_uvi"))
print(soup.find_all(id="cur_t"))
print(soup.find_all(id="cur_p"))
print(soup.find_all(id="cur_h"))
print(soup.find_all(id="aqivalue"))

The output does not match what's in my browser. I think I'm supposed to have "td#cur_XX.tdcur"

# ? Jan 28, 2016 13:56

accipter: Sep 12, 2003

politicorific posted:

Hi, I'm trying to grab some air quality data using beautiful soup from this page, but I'm not sure what I'm doing wrong.

Visit this page, I'm trying to get the column under "list" and the AQI - the biggest rectangular box on the page.

http://aqicn.org/city/newyork
code:
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

r="http://aqicn.org/city/newyork/"
page = urllib.request.urlopen(r).read()
soup = BeautifulSoup(page)

print(soup.find_all(id="cur_pm25"))
print(soup.find_all(id="cur_pm10"))
print(soup.find_all(id="cur_o3"))
print(soup.find_all(id="cur_no2"))
print(soup.find_all(id="cur_co"))
print(soup.find_all(id="cur_uvi"))
print(soup.find_all(id="cur_t"))
print(soup.find_all(id="cur_p"))
print(soup.find_all(id="cur_h"))
print(soup.find_all(id="aqivalue"))
The output does not match what's in my browser. I think I'm supposed to have "td#cur_XX.tdcur"

First, the website you are looking at is getting the data from here, which has a nice feature to download a report of data. So that might solve your problem right away.

Second, I prefer to scrape with lxml.html rather than BeautifulSoup. I know this doesn't answer your question, but the following should help.

Python code:

import pprint
import lxml.html
import requests

url = 'http://aqicn.org/city/newyork'
r = requests.get(url)

root = lxml.html.fromstring(r.content)

# Find the table containing all of the information. To get this XPath
# I inspect an element in Chrome and then copy the XPath via the context 
# menu.
e_table = root.xpath('//*[@id="citydivmain"]/div/div/div/table[3]')[0]

# Now we want to get all of the rows that have current values
rows = e_table.xpath('.//td[@class="tdcur"]')

def clean_id(s):
    return s.split('_')[-1]

def to_value(s):
    try:
        return int(s)
    except ValueError:
        return None

data = {clean_id(r.get('id')): to_value(r.text) for r in rows}

# Add the AQI value
data['aqi'] = to_value(root.xpath('//div[@class="aqivalue"]/text()')[0])

pprint.pprint(data)

Result:

code:

  {'aqi': 74,
 'co': 11,
 'd': None,
 'h': None,
 'no2': 58,
 'o3': 1,
 'p': None,
 'pm25': 74,
 't': None,
 'w': None}

Edit: Note that selecting the table, and then rows within the table isn't strictly needed for this website. The rows could have been selected directly from the root with: rows = root.xpath('//td[@class="tdcur"]'), however I thought it would be good to show a relative path (notice the start with . in the code) for html that is not as nicely formatted.

accipter fucked around with this message at 17:04 on Jan 28, 2016

# ? Jan 28, 2016 16:59

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

When you're searching for an ID like cur_pm25, you should just use find because there shouldn't be more than one on the page but you also seem to be confusing classes and IDs. aqivalue is a class, so there can be more than one, but you can do

Python code:

find(id='citydivouter').find(attrs={"class": "aqivalue"})

Python code:

select('#citydivouter .aqivalue')[0]

to get the big one.

E: forgot class was a keyword :downs:

Munkeymon fucked around with this message at 17:11 on Jan 28, 2016

# ? Jan 28, 2016 17:09

politicorific: Sep 15, 2007

Awesome. Thank you both. That really is a bit more complicated than I expected, but the beautiful soup documentation I found was very basic and with simple examples. I knew Aqi was a different form of data, but I'm still unsure of the difference.

I believe my visual studio version is broken, so pip won't install Lxml on my desktop, but my raspberrypi should do just fine.

# ? Jan 28, 2016 19:42

Risket: Apr 3, 2004; Lipstick Apathy

politicorific posted:

I believe my visual studio version is broken, so pip won't install Lxml on my desktop, but my raspberrypi should do just fine.

I had problems like this too. Downloading WinPython (http://winpython.github.io/) solved that for me because it has a ton of packages already installed and ready to run. Lots of people have recommended Anaconda too, but I like WinPython because it's portable.

# ? Jan 29, 2016 00:30

vikingstrike: Sep 23, 2007; whats happening, captain

Fwiw, Anaconda on Windows has been pretty hassle free for me and comes with a bunch of stuff pre installed (well at least everything I usually use regularly).

# ? Jan 29, 2016 01:07

SurgicalOntologist: Jun 17, 2004

And I don't know how it comes packaged on Windows, but it's portable in the sense that it doesn't need admin permissions and is entirely self-contained in a single folder.

# ? Jan 29, 2016 01:24

politicorific: Sep 15, 2007

Jesus christ...

So I tried New York, but if i put in Taipei... no dice.

url = 'http://aqicn.org/taiwan/songshan'

Also, Anaconda/Spyder is really finicky about lxml... half the time it cannot find "import lxml.html"

# ? Jan 29, 2016 10:00

floppo: Aug 24, 2005

Bozart posted:

If that is the kind of thing you are interested in, then pandas dataframe is a generalization of sets called a relation, well suited to that kind of problem.

Good point!

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?

# ? Jan 29, 2016 12:24

accipter: Sep 12, 2003

politicorific posted:

Jesus christ...

So I tried New York, but if i put in Taipei... no dice.

url = 'http://aqicn.org/taiwan/songshan'

Also, Anaconda/Spyder is really finicky about lxml... half the time it cannot find "import lxml.html"

The proper URL is: http://aqicn.org/city/taiwan/songshan

I am not sure what to tell you about lxml. I install it via conda and have never had any issues.

# ? Jan 29, 2016 15:03

vikingstrike: Sep 23, 2007; whats happening, captain

politicorific posted:

Jesus christ...

So I tried New York, but if i put in Taipei... no dice.

url = 'http://aqicn.org/taiwan/songshan'

Also, Anaconda/Spyder is really finicky about lxml... half the time it cannot find "import lxml.html"

I've never had any issues like this with Anaconda before. Do you have multiple python installations on the same machine?

# ? Jan 29, 2016 15:42

SurgicalOntologist: Jun 17, 2004

floppo posted:

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?

I would say use dictionaries until you need some specific capability of pandas, like joins, fancy indexing, dtype management, vectorized column operations, etc.

Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE

# ? Jan 29, 2016 16:08

floppo: Aug 24, 2005

SurgicalOntologist posted:

I would say use dictionaries until you need some specific capability of pandas, like joins, fancy indexing, dtype management, vectorized column operations, etc.

Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE

very cool

# ? Jan 29, 2016 16:56

Dominoes: Sep 20, 2007

floppo posted:

Good point!

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?

It depends on use case, and a more direct comparison might be dataframes compared to numpy arrays.

Keep in mind, calculations using dataframes is very slow compared to other data types like dicts/lists/tuples/iterators/arrays.

# ? Jan 29, 2016 19:20

Nippashish: Nov 2, 2005; Let me see you dance!

floppo posted:

Good point!

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?

I use pandas for anything that looks like a collection of records. If you find yourself building a list of dicts, or a dict of lists then you should probably be building a DataFrame instead.

# ? Jan 29, 2016 19:48

Cingulate: Oct 23, 2012; by Fluffdaddy

Assuming the lists have the same length, or the dicts have the same keys.

# ? Jan 29, 2016 19:55

Bozart: Oct 28, 2006; Give me the finger.

Cingulate posted:

Assuming the lists have the same length, or the dicts have the same keys.

We're talking hypotheticals here but in the case that you have a limited number of heterogeneous keys you would probably be able to use several dataframes instead, but it would depend on the application.

# ? Jan 29, 2016 20:52

onionradish: Jul 6, 2006; That's spicy.

What's the view on Python ORMs for general use?

Up to now, I've been using direct sqlite queries in my scripts, but recently started playing with the lightweight peewee ORM as an alternative to manually building sql queries and joins. It seems cleaner syntax, and for the lightweight sqlite DB stuff I'm doing, not a significant performance hit. I don't need the cross-DB or advanced capabilities that an ORM like SQLAlchemy offers.

I like the cleaner syntax, explicit definition of DB models that match the actual database, but am wondering if the tradeoffs in dependencies or other factors will be a problem later.

# ? Jan 29, 2016 21:05

SurgicalOntologist: Jun 17, 2004

I'm usually the "use pandas" guy but there are reasons to stick with core data structures. Check the YouTube link I posted. Pandas has some warts, so I don't think "use it if your data fits" is good advice. Functional, lazy dict operations can result in much more readable code in many situations, and not necessarily with a loss of efficiency.

# ? Jan 29, 2016 21:12

Dominoes: Sep 20, 2007

onionradish posted:

What's the view on Python ORMs for general use?

Up to now, I've been using direct sqlite queries in my scripts, but recently started playing with the lightweight peewee ORM as an alternative to manually building sql queries and joins. It seems cleaner syntax, and for the lightweight sqlite DB stuff I'm doing, not a significant performance hit. I don't need the cross-DB or advanced capabilities that an ORM like SQLAlchemy offers.

I like the cleaner syntax, explicit definition of DB models that match the actual database, but am wondering if the tradeoffs in dependencies or other factors will be a problem later.

I wish Django's ORM was available as a standalone package. Cleaner syntax and better documentation than SQLAlchemy.

Dominoes fucked around with this message at 21:29 on Jan 29, 2016

# ? Jan 29, 2016 21:25

Risket: Apr 3, 2004; Lipstick Apathy

onionradish posted:

What's the view on Python ORMs for general use?

Up to now, I've been using direct sqlite queries in my scripts, but recently started playing with the lightweight peewee ORM as an alternative to manually building sql queries and joins. It seems cleaner syntax, and for the lightweight sqlite DB stuff I'm doing, not a significant performance hit. I don't need the cross-DB or advanced capabilities that an ORM like SQLAlchemy offers.

I like the cleaner syntax, explicit definition of DB models that match the actual database, but am wondering if the tradeoffs in dependencies or other factors will be a problem later.

I'm pretty new to Python and programming in general, but I've been messing around with PonyORM (http://www.ponyorm.com), and it really seems to make database operations easy. Of course, I'm messing with maybe 10000 records max, so your mileage may vary.

# ? Jan 29, 2016 23:15

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

IIRC, Peewee is pretty good, though it's been a couple years since I used it.

# ? Jan 29, 2016 23:30

Full Battle Rattle: Aug 29, 2009; As long as the times refuse to change, we're going to make a hell of a racket.

I've been coding for months, but I can finally make a white dot move around a maze

# ? Jan 30, 2016 03:49

Dominoes: Sep 20, 2007

SurgicalOntologist posted:

Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE

Thanks; going to try out toolz/cytoolz.

# ? Jan 30, 2016 12:27

hooah: Feb 6, 2006; WTF?

I'm TA-ing a newbie Python course, and one student tried to do a = random.randint(0,99) + random.randint(0,99), but that throws an error saying int isn't iterable. I understand what iterating is, but I don't get what's trying to get iterated here. Could someone clear that up for me?

# ? Feb 1, 2016 14:02

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Works 4 me, did you see the error being thrown?

# ? Feb 1, 2016 14:19

EAT THE EGGS RICOLA: May 29, 2008

I use dictionaries for almost everything. I work on a huge ridiculous ten year old python project that is used by lots of governments and giant corporations around the world, so code readability is more important than almost anything else.

# ? Feb 1, 2016 14:26

hooah: Feb 6, 2006; WTF?

baka kaba posted:

Works 4 me, did you see the error being thrown?

I wrote the code wrong. For some reason, the student was doing a = sum(random.randint(0,99) + random.randint(0,99)), so I'm assuming that sum uses an iterator, correct?

# ? Feb 1, 2016 14:51

BannedNewbie: Apr 22, 2003; HOW ARE YOU? -> YOSHI?
FINE, THANK YOU. -> YOSHI.

hooah posted:

I wrote the code wrong. For some reason, the student was doing a = sum(random.randint(0,99) + random.randint(0,99)), so I'm assuming that sum uses an iterator, correct?

It should be a comma instead of a plus sign.

# ? Feb 1, 2016 14:55

hooah: Feb 6, 2006; WTF?

BannedNewbie posted:

It should be a comma instead of a plus sign.

Oh ffs I can't believe I missed that.

# ? Feb 1, 2016 14:59

ArcticZombie: Sep 15, 2010

The problem is that the sum function expects an iterable, e.g. a list of ints or floats. You're giving it a single int when using a plus sign and 2 arguments when using a comma. Just playing around with the function, it seems to accept 2 arguments, the first must be an iterable and the second must be an int or float.

# ? Feb 1, 2016 20:27

Cingulate: Oct 23, 2012; by Fluffdaddy

Why does this one:

code:

sum(['Ehgi', 'Hu'])

return

code:

TypeError: unsupported operand type(s) for +: 'int' and 'str'

I originally wanted to ask: why can't I sum a list of strings considering 1. strings are True, 2. True is 1.
(I mean, I get why it'd be silly for that to work, but I'm not sure what it does under the hood - is it that non-empty strings being True is only true in specific context such as e.g. an `if` calling the string's __bool__ method?)

But now I'm looking at this one and don't what what int my iPython is talking about.

# ? Feb 1, 2016 20:55

Space Kablooey: May 6, 2009

sum assumes that the start value is the integer 0.

You should use ''.join(['Ehgi', 'Hu']) for concatenating a list of strings, though.

Python code:

>>> sum(['Ehgi', 'Hu'], '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Space Kablooey fucked around with this message at 21:03 on Feb 1, 2016

# ? Feb 1, 2016 20:58

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 04:55

Cingulate: Oct 23, 2012; by Fluffdaddy

Yup, it wasn't a realistic question - I assumed from the beginning that sum(list_of_str) would fail, I just wanted to know why, under-the-hood, it failed. Thanks for the answer too!

# ? Feb 1, 2016 21:05

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »