|
Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method? I'm just starting out and I'd like to actually grasp how to utilize a data structure with variables that I can feed into a method.
|
# ? Jan 26, 2016 09:17 |
|
|
# ? May 9, 2024 04:55 |
|
Ice For My Nuts posted:Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method?
|
# ? Jan 26, 2016 11:42 |
|
Yeah, that's way too general of a question, not far removed from "how do I write a program?". Sometimes those of us with more experience have a really hard time grasping what it's like to be someone without all the experience. I can grasp on some intellectual level that not having the right knowledge makes it hard to ask the right questions, but only in my most lucid, introspective moments do I feel like I can remember what it's really like...and I feel like programming is particularly unique in this area.
|
# ? Jan 26, 2016 15:16 |
|
Ice For My Nuts posted:Does anyone have any examples of using two separate dictionaries as data structures to fetch from and then run through some method? How about something like this: You have two partitions of a set G (of elements a_1, a_2,...a_n), call them Y and Z. We store this information in two dicts: Ydict and Zdict, where keys are elements of G, and values identify which subset of the partition the element is in. Here is an example, where we have fruits and veggies of different colors. We have a fruits/veggies partition and a red/green partition. We want to know how often the partitions classifies our food in the same way (ie how well can a color classification proxy for knowing what is a fruit or veggie). code:
1) matching in fruit/veggie and mismatching in red/green (ie (red_pepper,celery)) 2) matching in red/green and mismatching in fruit/veggie (ie (red_pepper,strawberry)) 3) mismatching in both (ie (strawberry,celery)) You could write a function that takes a list of items and two dictionaries classifying them in different ways, and outputs these four numbers - which can be used to generate a measure of the distance between two partitions (ie the Jaccard Index). As the others mentioned - the question is kind of vague, but perhaps this is what you are looking for.
|
# ? Jan 27, 2016 11:42 |
|
floppo hit the nail on the head, thank you.
|
# ? Jan 27, 2016 21:35 |
|
If that is the kind of thing you are interested in, then pandas dataframe is a generalization of sets called a relation, well suited to that kind of problem.
|
# ? Jan 28, 2016 02:52 |
|
Hi, I'm trying to grab some air quality data using beautiful soup from this page, but I'm not sure what I'm doing wrong. Visit this page, I'm trying to get the column under "list" and the AQI - the biggest rectangular box on the page. http://aqicn.org/city/newyork code:
|
# ? Jan 28, 2016 13:56 |
|
politicorific posted:Hi, I'm trying to grab some air quality data using beautiful soup from this page, but I'm not sure what I'm doing wrong. First, the website you are looking at is getting the data from here, which has a nice feature to download a report of data. So that might solve your problem right away. Second, I prefer to scrape with lxml.html rather than BeautifulSoup. I know this doesn't answer your question, but the following should help. Python code:
code:
accipter fucked around with this message at 17:04 on Jan 28, 2016 |
# ? Jan 28, 2016 16:59 |
|
When you're searching for an ID like cur_pm25, you should just use find because there shouldn't be more than one on the page but you also seem to be confusing classes and IDs. aqivalue is a class, so there can be more than one, but you can do Python code:
Python code:
E: forgot class was a keyword Munkeymon fucked around with this message at 17:11 on Jan 28, 2016 |
# ? Jan 28, 2016 17:09 |
|
Awesome. Thank you both. That really is a bit more complicated than I expected, but the beautiful soup documentation I found was very basic and with simple examples. I knew Aqi was a different form of data, but I'm still unsure of the difference. I believe my visual studio version is broken, so pip won't install Lxml on my desktop, but my raspberrypi should do just fine.
|
# ? Jan 28, 2016 19:42 |
|
politicorific posted:I believe my visual studio version is broken, so pip won't install Lxml on my desktop, but my raspberrypi should do just fine.
|
# ? Jan 29, 2016 00:30 |
|
Fwiw, Anaconda on Windows has been pretty hassle free for me and comes with a bunch of stuff pre installed (well at least everything I usually use regularly).
|
# ? Jan 29, 2016 01:07 |
|
And I don't know how it comes packaged on Windows, but it's portable in the sense that it doesn't need admin permissions and is entirely self-contained in a single folder.
|
# ? Jan 29, 2016 01:24 |
|
Jesus christ... So I tried New York, but if i put in Taipei... no dice. url = 'http://aqicn.org/taiwan/songshan' Also, Anaconda/Spyder is really finicky about lxml... half the time it cannot find "import lxml.html"
|
# ? Jan 29, 2016 10:00 |
|
Bozart posted:If that is the kind of thing you are interested in, then pandas dataframe is a generalization of sets called a relation, well suited to that kind of problem. Good point! that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other?
|
# ? Jan 29, 2016 12:24 |
|
politicorific posted:Jesus christ... The proper URL is: http://aqicn.org/city/taiwan/songshan I am not sure what to tell you about lxml. I install it via conda and have never had any issues.
|
# ? Jan 29, 2016 15:03 |
|
politicorific posted:Jesus christ... I've never had any issues like this with Anaconda before. Do you have multiple python installations on the same machine?
|
# ? Jan 29, 2016 15:42 |
|
floppo posted:that leads me to a vague/soft question of my own: what are some rules of thumb for using dictionaries vs dataframes? To me dataframes are ideal for data analysis, while dictionaries help me reorganize data - indeed I often use them to populate dataframe columns from messy data. Are there things that should absolutely not be done with one or the other? I would say use dictionaries until you need some specific capability of pandas, like joins, fancy indexing, dtype management, vectorized column operations, etc. Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE
|
# ? Jan 29, 2016 16:08 |
|
SurgicalOntologist posted:I would say use dictionaries until you need some specific capability of pandas, like joins, fancy indexing, dtype management, vectorized column operations, etc. very cool
|
# ? Jan 29, 2016 16:56 |
|
floppo posted:Good point! Keep in mind, calculations using dataframes is very slow compared to other data types like dicts/lists/tuples/iterators/arrays.
|
# ? Jan 29, 2016 19:20 |
|
floppo posted:Good point! I use pandas for anything that looks like a collection of records. If you find yourself building a list of dicts, or a dict of lists then you should probably be building a DataFrame instead.
|
# ? Jan 29, 2016 19:48 |
|
Assuming the lists have the same length, or the dicts have the same keys.
|
# ? Jan 29, 2016 19:55 |
|
Cingulate posted:Assuming the lists have the same length, or the dicts have the same keys. We're talking hypotheticals here but in the case that you have a limited number of heterogeneous keys you would probably be able to use several dataframes instead, but it would depend on the application.
|
# ? Jan 29, 2016 20:52 |
|
What's the view on Python ORMs for general use? Up to now, I've been using direct sqlite queries in my scripts, but recently started playing with the lightweight peewee ORM as an alternative to manually building sql queries and joins. It seems cleaner syntax, and for the lightweight sqlite DB stuff I'm doing, not a significant performance hit. I don't need the cross-DB or advanced capabilities that an ORM like SQLAlchemy offers. I like the cleaner syntax, explicit definition of DB models that match the actual database, but am wondering if the tradeoffs in dependencies or other factors will be a problem later.
|
# ? Jan 29, 2016 21:05 |
|
I'm usually the "use pandas" guy but there are reasons to stick with core data structures. Check the YouTube link I posted. Pandas has some warts, so I don't think "use it if your data fits" is good advice. Functional, lazy dict operations can result in much more readable code in many situations, and not necessarily with a loss of efficiency.
|
# ? Jan 29, 2016 21:12 |
|
onionradish posted:What's the view on Python ORMs for general use? Dominoes fucked around with this message at 21:29 on Jan 29, 2016 |
# ? Jan 29, 2016 21:25 |
|
onionradish posted:What's the view on Python ORMs for general use?
|
# ? Jan 29, 2016 23:15 |
|
IIRC, Peewee is pretty good, though it's been a couple years since I used it.
|
# ? Jan 29, 2016 23:30 |
|
I've been coding for months, but I can finally make a white dot move around a maze
|
# ? Jan 30, 2016 03:49 |
|
SurgicalOntologist posted:Matthew Rocklin of Continuum has given some excellent talks on functional programming in python emphasizing the advantages of sticking with core data structures for data analysis. E.g.: https://www.youtube.com/watch?v=PpBK4zIaFLE
|
# ? Jan 30, 2016 12:27 |
|
I'm TA-ing a newbie Python course, and one student tried to do a = random.randint(0,99) + random.randint(0,99), but that throws an error saying int isn't iterable. I understand what iterating is, but I don't get what's trying to get iterated here. Could someone clear that up for me?
|
# ? Feb 1, 2016 14:02 |
|
Works 4 me, did you see the error being thrown?
|
# ? Feb 1, 2016 14:19 |
|
I use dictionaries for almost everything. I work on a huge ridiculous ten year old python project that is used by lots of governments and giant corporations around the world, so code readability is more important than almost anything else.
|
# ? Feb 1, 2016 14:26 |
|
baka kaba posted:Works 4 me, did you see the error being thrown? I wrote the code wrong. For some reason, the student was doing a = sum(random.randint(0,99) + random.randint(0,99)), so I'm assuming that sum uses an iterator, correct?
|
# ? Feb 1, 2016 14:51 |
|
hooah posted:I wrote the code wrong. For some reason, the student was doing a = sum(random.randint(0,99) + random.randint(0,99)), so I'm assuming that sum uses an iterator, correct? It should be a comma instead of a plus sign.
|
# ? Feb 1, 2016 14:55 |
|
BannedNewbie posted:It should be a comma instead of a plus sign. Oh ffs I can't believe I missed that.
|
# ? Feb 1, 2016 14:59 |
|
The problem is that the sum function expects an iterable, e.g. a list of ints or floats. You're giving it a single int when using a plus sign and 2 arguments when using a comma. Just playing around with the function, it seems to accept 2 arguments, the first must be an iterable and the second must be an int or float.
|
# ? Feb 1, 2016 20:27 |
|
Why does this one:code:
code:
(I mean, I get why it'd be silly for that to work, but I'm not sure what it does under the hood - is it that non-empty strings being True is only true in specific context such as e.g. an `if` calling the string's __bool__ method?) But now I'm looking at this one and don't what what int my iPython is talking about.
|
# ? Feb 1, 2016 20:55 |
|
sum assumes that the start value is the integer 0. You should use ''.join(['Ehgi', 'Hu']) for concatenating a list of strings, though. Python code:
Space Kablooey fucked around with this message at 21:03 on Feb 1, 2016 |
# ? Feb 1, 2016 20:58 |
|
|
# ? May 9, 2024 04:55 |
|
Yup, it wasn't a realistic question - I assumed from the beginning that sum(list_of_str) would fail, I just wanted to know why, under-the-hood, it failed. Thanks for the answer too!
|
# ? Feb 1, 2016 21:05 |