Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Ice For My Nuts
May 22, 2012
Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method?

I'm just starting out and I'd like to actually grasp how to utilize a data structure with variables that I can feed into a method.

Adbot
ADBOT LOVES YOU

Dominoes
Sep 20, 2007

Ice For My Nuts posted:

Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method?

I'm just starting out and I'd like to actually grasp how to utilize a data structure with variables that I can feed into a method.
More specific, please.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Yeah, that's way too general of a question, not far removed from "how do I write a program?".

Sometimes those of us with more experience have a really hard time grasping what it's like to be someone without all the experience. I can grasp on some intellectual level that not having the right knowledge makes it hard to ask the right questions, but only in my most lucid, introspective moments do I feel like I can remember what it's really like...and I feel like programming is particularly unique in this area.

floppo
Aug 24, 2005

Ice For My Nuts posted:

Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method?

I'm just starting out and I'd like to actually grasp how to utilize a data structure with variables that I can feed into a method.

How about something like this: You have two partitions of a set G (of elements a_1, a_2,...a_n), call them Y and Z. We store this information in two dicts: Ydict and Zdict, where keys are elements of G, and values identify which subset of the partition the element is in.

Here is an example, where we have fruits and veggies of different colors. We have a fruits/veggies partition and a red/green partition. We want to know how often the partitions classifies our food in the same way (ie how well can a color classification proxy for knowing what is a fruit or veggie).


code:
G = ['apple','red_pepper','strawberry','celery','broccoli','tomato','raspberry']

Y = {'apple':'red', 'red_pepper':'red',
'strawberry':'red','celery':'green',
'tomato':'red','celery':'green'}

Z = {'apple':'fruit', 'red_pepper':'veggie',
'strawberry':'fruit','celery':'veggie',
'tomato':'fruit','celery':'veggie'}

#create a counter that will track how many matches we have
matches = 0
#enumerate the pairs of foods
import itertools
food_pairs = list(itertools.combinations(G,2))

#now we access the dictionaries
for pair in food_pairs:
	if Y[pair[0]]=Y[pair[1]]:
		if Z[pair[0]]=Z[pair[1]]:
			matches+=1

print (matches)
This will print the number of times our two classifiers categorize pairs of foods in the same way. There are three other possibilities for a pair of foods:
1) matching in fruit/veggie and mismatching in red/green (ie (red_pepper,celery))
2) matching in red/green and mismatching in fruit/veggie (ie (red_pepper,strawberry))
3) mismatching in both (ie (strawberry,celery))

You could write a function that takes a list of items and two dictionaries classifying them in different ways, and outputs these four numbers - which can be used to generate a measure of the distance between two partitions (ie the Jaccard Index). As the others mentioned - the question is kind of vague, but perhaps this is what you are looking for.

Ice For My Nuts
May 22, 2012
floppo hit the nail on the head, thank you.

Bozart
Oct 28, 2006

Give me the finger.
If that is the kind of thing you are interested in, then pandas dataframe is a generalization of sets called a relation, well suited to that kind of problem.

politicorific
Sep 15, 2007
Hi, I'm trying to grab some air quality data using beautiful soup from this page, but I'm not sure what I'm doing wrong.

Visit this page, I'm trying to get the column under "list" and the AQI - the biggest rectangular box on the page.

http://aqicn.org/city/newyork

code:
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

r="http://aqicn.org/city/newyork/"
page = urllib.request.urlopen(r).read()
soup = BeautifulSoup(page)

print(soup.find_all(id="cur_pm25"))
print(soup.find_all(id="cur_pm10"))
print(soup.find_all(id="cur_o3"))
print(soup.find_all(id="cur_no2"))
print(soup.find_all(id="cur_co"))
print(soup.find_all(id="cur_uvi"))
print(soup.find_all(id="cur_t"))
print(soup.find_all(id="cur_p"))
print(soup.find_all(id="cur_h"))
print(soup.find_all(id="aqivalue"))
The output does not match what's in my browser. I think I'm supposed to have "td#cur_XX.tdcur"

accipter
Sep 12, 2003

politicorific posted:

Hi, I'm trying to grab some air quality data using beautiful soup from this page, but I'm not sure what I'm doing wrong.

Visit this page, I'm trying to get the column under "list" and the AQI - the biggest rectangular box on the page.

http://aqicn.org/city/newyork

code:
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

r="http://aqicn.org/city/newyork/"
page = urllib.request.urlopen(r).read()
soup = BeautifulSoup(page)

print(soup.find_all(id="cur_pm25"))
print(soup.find_all(id="cur_pm10"))
print(soup.find_all(id="cur_o3"))
print(soup.find_all(id="cur_no2"))
print(soup.find_all(id="cur_co"))
print(soup.find_all(id="cur_uvi"))
print(soup.find_all(id="cur_t"))
print(soup.find_all(id="cur_p"))
print(soup.find_all(id="cur_h"))
print(soup.find_all(id="aqivalue"))
The output does not match what's in my browser. I think I'm supposed to have "td#cur_XX.tdcur"

First, the website you are looking at is getting the data from here, which has a nice feature to download a report of data. So that might solve your problem right away.

Second, I prefer to scrape with lxml.html rather than BeautifulSoup. I know this doesn't answer your question, but the following should help.

Python code:
import pprint
import lxml.html
import requests

url = 'http://aqicn.org/city/newyork'
r = requests.get(url)

root = lxml.html.fromstring(r.content)

# Find the table containing all of the information. To get this XPath
# I inspect an element in Chrome and then copy the XPath via the context 
# menu.
e_table = root.xpath('//*[@id="citydivmain"]/div/div/div/table[3]')[0]

# Now we want to get all of the rows that have current values
rows = e_table.xpath('.//td[@class="tdcur"]')

def clean_id(s):
    return s.split('_')[-1]

def to_value(s):
    try:
        return int(s)
    except ValueError:
        return None

data = {clean_id(r.get('id')): to_value(r.text) for r in rows}

# Add the AQI value
data['aqi'] = to_value(root.xpath('//div[@class="aqivalue"]/text()')[0])

pprint.pprint(data)
Result:
code:
  {'aqi': 74,
 'co': 11,
 'd': None,
 'h': None,
 'no2': 58,
 'o3': 1,
 'p': None,
 'pm25': 74,
 't': None,
 'w': None}
Edit: Note that selecting the table, and then rows within the table isn't strictly needed for this website. The rows could have been selected directly from the root with: rows = root.xpath('//td[@class="tdcur"]'), however I thought it would be good to show a relative path (notice the start with . in the code) for html that is not as nicely formatted.

accipter fucked around with this message at 17:04 on Jan 28, 2016

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



When you're searching for an ID like cur_pm25, you should just use find because there shouldn't be more than one on the page but you also seem to be confusing classes and IDs. aqivalue is a class, so there can be more than one, but you can do
Python code:
find(id='citydivouter').find(attrs={"class": "aqivalue"})
or
Python code:
select('#citydivouter .aqivalue')[0]
to get the big one.

E: forgot class was a keyword :downs:

Munkeymon fucked around with this message at 17:11 on Jan 28, 2016

politicorific
Sep 15, 2007
Awesome. Thank you both. That really is a bit more complicated than I expected, but the beautiful soup documentation I found was very basic and with simple examples. I knew Aqi was a different form of data, but I'm still unsure of the difference.

I believe my visual studio version is broken, so pip won't install Lxml on my desktop, but my raspberrypi should do just fine.

Risket
Apr 3, 2004
Lipstick Apathy

politicorific posted:

I believe my visual studio version is broken, so pip won't install Lxml on my desktop, but my raspberrypi should do just fine.
I had problems like this too. Downloading WinPython (http://winpython.github.io/) solved that for me because it has a ton of packages already installed and ready to run. Lots of people have recommended Anaconda too, but I like WinPython because it's portable.

vikingstrike
Sep 23, 2007

whats happening, captain
Fwiw, Anaconda on Windows has been pretty hassle free for me and comes with a bunch of stuff pre installed (well at least everything I usually use regularly).

SurgicalOntologist
Jun 17, 2004

And I don't know how it comes packaged on Windows, but it's portable in the sense that it doesn't need admin permissions and is entirely self-contained in a single folder.

politicorific
Sep 15, 2007
Jesus christ...

So I tried New York, but if i put in Taipei... no dice.

url = 'http://aqicn.org/taiwan/songshan'

Also, Anaconda/Spyder is really finicky about lxml... half the time it cannot find "import lxml.html"

floppo
Aug 24, 2005

Bozart posted:

If that is the kind of thing you are interested in, then pandas dataframe is a generalization of sets called a relation, well suited to that kind of problem.

Good point!

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?

accipter
Sep 12, 2003

politicorific posted:

Jesus christ...

So I tried New York, but if i put in Taipei... no dice.

url = 'http://aqicn.org/taiwan/songshan'

Also, Anaconda/Spyder is really finicky about lxml... half the time it cannot find "import lxml.html"

The proper URL is: http://aqicn.org/city/taiwan/songshan

I am not sure what to tell you about lxml. I install it via conda and have never had any issues.

vikingstrike
Sep 23, 2007

whats happening, captain

politicorific posted:

Jesus christ...

So I tried New York, but if i put in Taipei... no dice.

url = 'http://aqicn.org/taiwan/songshan'

Also, Anaconda/Spyder is really finicky about lxml... half the time it cannot find "import lxml.html"

I've never had any issues like this with Anaconda before. Do you have multiple python installations on the same machine?

SurgicalOntologist
Jun 17, 2004

floppo posted:

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?

I would say use dictionaries until you need some specific capability of pandas, like joins, fancy indexing, dtype management, vectorized column operations, etc.

Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE

floppo
Aug 24, 2005

SurgicalOntologist posted:

I would say use dictionaries until you need some specific capability of pandas, like joins, fancy indexing, dtype management, vectorized column operations, etc.

Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE

very cool

Dominoes
Sep 20, 2007

floppo posted:

Good point!

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?
It depends on use case, and a more direct comparison might be dataframes compared to numpy arrays.

Keep in mind, calculations using dataframes is very slow compared to other data types like dicts/lists/tuples/iterators/arrays.

Nippashish
Nov 2, 2005

Let me see you dance!

floppo posted:

Good point!

that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?

I use pandas for anything that looks like a collection of records. If you find yourself building a list of dicts, or a dict of lists then you should probably be building a DataFrame instead.

Cingulate
Oct 23, 2012

by Fluffdaddy
Assuming the lists have the same length, or the dicts have the same keys.

Bozart
Oct 28, 2006

Give me the finger.

Cingulate posted:

Assuming the lists have the same length, or the dicts have the same keys.

We're talking hypotheticals here but in the case that you have a limited number of heterogeneous keys you would probably be able to use several dataframes instead, but it would depend on the application.

onionradish
Jul 6, 2006

That's spicy.
What's the view on Python ORMs for general use?

Up to now, I've been using direct sqlite queries in my scripts, but recently started playing with the lightweight peewee ORM as an alternative to manually building sql queries and joins. It seems cleaner syntax, and for the lightweight sqlite DB stuff I'm doing, not a significant performance hit. I don't need the cross-DB or advanced capabilities that an ORM like SQLAlchemy offers.

I like the cleaner syntax, explicit definition of DB models that match the actual database, but am wondering if the tradeoffs in dependencies or other factors will be a problem later.

SurgicalOntologist
Jun 17, 2004

I'm usually the "use pandas" guy but there are reasons to stick with core data structures. Check the YouTube link I posted. Pandas has some warts, so I don't think "use it if your data fits" is good advice. Functional, lazy dict operations can result in much more readable code in many situations, and not necessarily with a loss of efficiency.

Dominoes
Sep 20, 2007

onionradish posted:

What's the view on Python ORMs for general use?

Up to now, I've been using direct sqlite queries in my scripts, but recently started playing with the lightweight peewee ORM as an alternative to manually building sql queries and joins. It seems cleaner syntax, and for the lightweight sqlite DB stuff I'm doing, not a significant performance hit. I don't need the cross-DB or advanced capabilities that an ORM like SQLAlchemy offers.

I like the cleaner syntax, explicit definition of DB models that match the actual database, but am wondering if the tradeoffs in dependencies or other factors will be a problem later.
I wish Django's ORM was available as a standalone package. Cleaner syntax and better documentation than SQLAlchemy.

Dominoes fucked around with this message at 21:29 on Jan 29, 2016

Risket
Apr 3, 2004
Lipstick Apathy

onionradish posted:

What's the view on Python ORMs for general use?

Up to now, I've been using direct sqlite queries in my scripts, but recently started playing with the lightweight peewee ORM as an alternative to manually building sql queries and joins. It seems cleaner syntax, and for the lightweight sqlite DB stuff I'm doing, not a significant performance hit. I don't need the cross-DB or advanced capabilities that an ORM like SQLAlchemy offers.

I like the cleaner syntax, explicit definition of DB models that match the actual database, but am wondering if the tradeoffs in dependencies or other factors will be a problem later.
I'm pretty new to Python and programming in general, but I've been messing around with PonyORM (http://www.ponyorm.com), and it really seems to make database operations easy. Of course, I'm messing with maybe 10000 records max, so your mileage may vary.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

IIRC, Peewee is pretty good, though it's been a couple years since I used it.

Full Battle Rattle
Aug 29, 2009

As long as the times refuse to change, we're going to make a hell of a racket.
I've been coding for months, but I can finally make a white dot move around a maze

Dominoes
Sep 20, 2007

SurgicalOntologist posted:

Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE
Thanks; going to try out toolz/cytoolz.

hooah
Feb 6, 2006
WTF?
I'm TA-ing a newbie Python course, and one student tried to do a = random.randint(0,99) + random.randint(0,99), but that throws an error saying int isn't iterable. I understand what iterating is, but I don't get what's trying to get iterated here. Could someone clear that up for me?

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Works 4 me, did you see the error being thrown?

EAT THE EGGS RICOLA
May 29, 2008

I use dictionaries for almost everything. I work on a huge ridiculous ten year old python project that is used by lots of governments and giant corporations around the world, so code readability is more important than almost anything else.

hooah
Feb 6, 2006
WTF?

baka kaba posted:

Works 4 me, did you see the error being thrown?

I wrote the code wrong. For some reason, the student was doing a = sum(random.randint(0,99) + random.randint(0,99)), so I'm assuming that sum uses an iterator, correct?

BannedNewbie
Apr 22, 2003

HOW ARE YOU? -> YOSHI?
FINE, THANK YOU. -> YOSHI.

hooah posted:

I wrote the code wrong. For some reason, the student was doing a = sum(random.randint(0,99) + random.randint(0,99)), so I'm assuming that sum uses an iterator, correct?

It should be a comma instead of a plus sign.

hooah
Feb 6, 2006
WTF?

BannedNewbie posted:

It should be a comma instead of a plus sign.

Oh ffs I can't believe I missed that.

ArcticZombie
Sep 15, 2010
The problem is that the sum function expects an iterable, e.g. a list of ints or floats. You're giving it a single int when using a plus sign and 2 arguments when using a comma. Just playing around with the function, it seems to accept 2 arguments, the first must be an iterable and the second must be an int or float.

Cingulate
Oct 23, 2012

by Fluffdaddy
Why does this one:
code:
sum(['Ehgi', 'Hu'])
return
code:
TypeError: unsupported operand type(s) for +: 'int' and 'str'
I originally wanted to ask: why can't I sum a list of strings considering 1. strings are True, 2. True is 1.
(I mean, I get why it'd be silly for that to work, but I'm not sure what it does under the hood - is it that non-empty strings being True is only true in specific context such as e.g. an `if` calling the string's __bool__ method?)

But now I'm looking at this one and don't what what int my iPython is talking about.

Space Kablooey
May 6, 2009


sum assumes that the start value is the integer 0.

You should use ''.join(['Ehgi', 'Hu']) for concatenating a list of strings, though.

Python code:
>>> sum(['Ehgi', 'Hu'], '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Space Kablooey fucked around with this message at 21:03 on Feb 1, 2016

Adbot
ADBOT LOVES YOU

Cingulate
Oct 23, 2012

by Fluffdaddy
Yup, it wasn't a realistic question - I assumed from the beginning that sum(list_of_str) would fail, I just wanted to know why, under-the-hood, it failed. Thanks for the answer too!

  • Locked thread