|
I'm an utter neophyte, not just with python, but with programming in general so please bear with me. I'm working through a beginner-oriented book and he's going through lists and sorting. At first he shows how to display a list in reverse order:code:
code:
code:
|
# ¿ Mar 12, 2017 02:54 |
|
|
# ¿ May 5, 2024 02:44 |
|
Thanks very much, guys! This is good stuff to work with.
|
# ¿ Mar 12, 2017 05:57 |
|
A few questions: 1) What are some recommended books for python with big data/data analytics? (does not have to be specific to pyspark) 2) I'm still kind of new to programming and there's a lot of stuff to take in/remember. I assume that people usually don't just code things from scratch - do you keep cheat sheets around or what?
|
# ¿ Jun 9, 2017 00:29 |
|
Excellent, thanks for the answers!
|
# ¿ Jun 9, 2017 00:58 |
|
Well I'm still working my way through this book and I'm going to follow it up with this one, so I think I'll have my hands busy for a while. I'm also trying to learn Apache Spark , as if that weren't enough
|
# ¿ Jun 9, 2017 01:32 |
|
I'm looking for a python tutor and not really sure how to go about it. I'm taking a data science course and my lack of proficiency with python is my biggest weakness. So obviously I want to focus on data science/data analysis concepts, but also not-quite-so-directly related things like web scraping and working with APIs. I've read "Python Crash Course" and "Automate the Boring Stuff with Python" so I'm familiar with the basics but I tend to struggle with coming up with code on my own, or analyzing existing scripts. I live in Toronto, but this seems like the kind of thing that could be done via skype or discord or whatever. I guess(?) The catch is that I'm unemployed and receiving employment insurance, so I don't have a lot of cash to throw around. I'll try to work out something reasonable, regardless.
|
# ¿ Aug 22, 2017 03:12 |
|
accipter posted:Have you tried the IRC #python channel on Freenode? I haven't but I'll take a look into it, thanks!
|
# ¿ Aug 22, 2017 15:59 |
|
So my attempts to find a tutor were not very fruitful. So maybe I could get a bit of direction instead. I'm in a data science / data engineering course, but the level of python expertise required is higher than I initially thought. I've read "Python Crash Course" and am working my way through "Automate the Boring Stuff with Python" so I know what lists and dictionaries are, etc., but I draw a blank when trying to come up with my own code and analyzing scripts is tricky - like here's an example of a script that I need to modify for a project. I eventually figured it out, but it was pretty challenging. So anyways, books kinda take longer than I'd like, is there maybe an online course that's fairly decent? It doesn't have to be free. I think I just need to start doing more exercises, but I also still need training wheels somewhat. Mod Edit: Somebody fucked around with this message at 22:21 on Sep 7, 2017 |
# ¿ Sep 7, 2017 20:29 |
|
KernelSlanders posted:It kind of sounds like you're confusing "I need to learn programming" with "I need to learn Python". Have you learned to program in any other languages previously? If not, there aren't really any shortcuts. You need to learn basic programming (you can do it in Python). Ideas like scope, abstraction, control flow, algorithmic complexity don't just show up in a python tutorial. You'll need to study a bit and practice a lot. Yes, this is my first programming language (except for a bit of BASIC in the 80's, haha). I kind of suspected that the things you mentioned were lacking in my approach. I agree that a lot of practice is probably needed, so any suggestions are welcome. Thanks everyone for the recommendations so far!
|
# ¿ Sep 8, 2017 02:47 |
|
There's no specific examples at the moment. I'm just in that awkward phase of having read about dictionaries, strings, lists, etc and now having to go out and do stuff with it. And what I need to do is specific to data science / data engineering. Since my initial post I've seen sites that have python exercises, so that might be the right direction to go in. There actually is a project that I'm working on currently, but the main issues that I'm facing have to do mostly with the API that the script is interacting with and not python per se.
|
# ¿ Sep 8, 2017 05:13 |
|
So I've got this script: https://pastebin.com/GLkh6z0T If I use it to run "python3 scriptname.py > output.csv" it will pull a list of all the "tech" groups in Toronto from meetup.com's API. So far, so good. However, I want to narrow the list down to all data-related groups: data science, data analytics, data engineering, etc. I figured the best way to do this would be to filter the results with a regex, maybe something like this: meetup_f = main.filter(lambda line: re.search(r'([Dd]ata\s\w+)', line) However, from what I'm told, this won't quite work since "main" is just printing the results instead of returning any values. I'm not good enough at python to figure this out on my own. Is there any way (in dumb-person language) to get these specific results before it shoots everything out to csv?
|
# ¿ Sep 20, 2017 18:05 |
|
Thanks for the suggestion. If I do "python meetup_data.py > meetup_data.csv", I just get a blank file. I think what you might be trying to do is pull "data" groups from the API, but the API only has a category for "tech." It's necessary - at least, as far as I can tell - to further filter the results from there. If I try doing "python3 meetup_data.py > meetup_data.csv", then I get the error: File "meetup_data.py", line 32 continue ^ TabError: inconsistent use of tabs and spaces in indentation Which is strange, since the formatting of the indentation seems to be consistent: https://pastebin.com/aHvT5GfM
|
# ¿ Sep 20, 2017 18:54 |
|
Thermopyle posted:Well you just need to find the part of the data structure the api returns that you want to filter on. I dont' know what the API data structure looks like, but you want whatever part that says "data science", "data analytics", "data engineering", etc. Unfortunately the API isn't very robust, so there's no way within it to narrow down the results. That's why I want to see if there's a way to do it directly through python. Of course I could just do a filter on the results within excel, but this is for a data science project and they want a more direct way to get the desired results (since I might need to someday do the same thing on a gargantuan csv). I appreciate the assistance, though! What's interesting is that the API has a buttload of options if you're searching within a specific group, like "Toronto Python Hackers" or something. But for other stuff (like categories), not so much.
|
# ¿ Sep 20, 2017 19:57 |
|
Excellent, thanks! I'll give that a try when I get home.
|
# ¿ Sep 20, 2017 20:21 |
|
I'm working through an exercise from a book: https://pastebin.com/zMweG525 It's supposed to be using regex queries to look for phone numbers and email addresses, but it's giving me syntax errors on line 25 and I can't figure out why. I checked if maybe "text" was a reserved word in Python or something, but that doesn't seem to be the case. Besides, I get the same error if I substitute "txt" or even "floof." The "pyperclip" module is installed and I've used it successfully before. What am I missing?
|
# ¿ Sep 30, 2017 18:59 |
|
Woops you're right, thanks. That was obvious, that'll teach me for focusing too much on line 25.
|
# ¿ Sep 30, 2017 19:19 |
|
Yeah I've been using Geany, which isn't super robust (or maybe it can be configured to be). Maybe I'll just start doing everything in jupyter.
|
# ¿ Sep 30, 2017 19:40 |
|
Portland Sucks posted:I've been writing a custom ETL tool at work in python and I've got most of it all broken down into individual scripts. All of the steps along the way are jobs that should be able to execute independently, but can also be triggered by a successful end condition of one before it in the pipeline. Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering.
|
# ¿ Oct 6, 2017 21:55 |
|
I'm in the same boat, but it doesn't seem like there are any books that are good for transitioning someone from beginner -> writing your own code. I think it's necessary to just start diving in and start doing projects - there are websites that have exercises with solutions, so I'm going to try doing those. Having said all that, I decided to get "Learn Python the Hard Way," but so far the title seems to be a bit misleading since the author handholds the reader through some very newbie stuff. Guess I'll see how it goes! Seventh Arrow fucked around with this message at 22:11 on Oct 21, 2017 |
# ¿ Oct 21, 2017 22:08 |
|
Thermopyle posted:The only way I ever progressed from being a person who had read some programming books to someone who was making money doing cool stuff was to do projects that mattered to me....that solved problems I had. Struggling through those leveled me up more than I ever could have done by reading more books. Where did you look for projects?
|
# ¿ Oct 21, 2017 22:24 |
|
I can't speak for Hughmoris, but for myself I'm looking to get into data engineering so anything oriented towards ETL, batch, and data warehousing stuff is what I'm looking for. Still, I will look into the Flask and Django stuff, since it's probably not a good idea to be fussy while still learning - thanks for the links!
|
# ¿ Oct 22, 2017 01:54 |
|
I'm trying to comprehend numpy arrays - I have an assignment and the first question is to create a random vector of size 20 and sort it in descending order. This is what I came up with:code:
array([[-0.94139218, -0.70652483, -0.67840897, -0.67044282, -0.62539388, -0.61770677, -0.58816414, -0.46556941, -0.44944398, -0.4487512 , -0.43776743, -0.41519608, -0.39534896, -0.34280607, -0.23698099, -0.0829909 , -0.05634266, -0.05450404, -0.04979055, -0.02429839]]) I'm not sure about the parameters used for 'np,' though - in this case ((1,20)). So I think '1' means that is has one dimension, so it's a vector. The '20' seems to be the total size of the array. Then I see many arrays that have a third number, but I'm not sure what it means...the tutorials that I've seen so far refuse to stoop to my level. Can anyone elucidate?
|
# ¿ Nov 24, 2017 02:04 |
|
Ok great, thanks! I did know about the "np" thing but thanks anyhoo So in the next question it says to create a 5x5 array with 6's on the borders and 0's on the inside. So I guess I would use ((5,5)) for the tuple, yes? I'll have to look into arranging the numbers in such a specific fashion though. edit: wait, the ((5,5)) doesn't seem right Also, is that a real quote from Duck Dunn?
|
# ¿ Nov 24, 2017 02:26 |
|
I have a comma-separated spreadsheet with a bunch of information about condos in my city, most importantly it has the latitude and longitude of these places. I want to be able to output these coordinates onto google maps but I'm not sure how to go about doing this. I looked at this link but none of the API's seem to quite provide what I'm looking for (at least, not with python). Any suggestions?
|
# ¿ Dec 30, 2017 22:26 |
|
Hughmoris posted:Maybe something like this? https://github.com/vgm64/gmplot That looks good, thanks! I will look into it.
|
# ¿ Dec 30, 2017 22:34 |
|
Hughmoris posted:Maybe something like this? https://github.com/vgm64/gmplot So this seems to work pretty good, it seems like the basis of this is using "gmap = gmplot.GoogleMapPlotter(43.66548, -79.3875, 16)" to store the lat & long and then "gmap.draw("filename.html")" to put it onto an actual map. So I have two challenges:
Any hints on how I can do this?
|
# ¿ Jan 2, 2018 03:13 |
|
accipter posted:Are you trying to create a map? Or load custom points on a Google Map? If you need to use Google, then I would create your own map (https://www.google.com/maps/d/), and then upload the CSV. I need to put markers on a map, but since this is for a data science python course I need to find a ~*pythonic*~ way of doing it. The module that was linked to earlier was good, but I need to find around some of the details.
|
# ¿ Jan 2, 2018 15:08 |
|
A big thank you to the goons who had map suggestions. I've been trying out folium, but I'm trying to find a way to get all the values into one map. From what I can glean from the documentation, folium uses a format like this:code:
1) Generate a sufficient amount of lines with the content: folium.Marker([x, y]).add_to(map_1) 2) Fill in x and y with the lat/long values from the spreadsheet I'm not sure how to do this. I've been able to read the data from the spreadsheet: code:
|
# ¿ Jan 10, 2018 03:13 |
|
Cingulate posted:I don't understand what this means. What I'm trying to say is that each folium line can't keep reading the first row over and over again. The first folium line needs to use row 1, the second one needs to use row 2, and so on.
|
# ¿ Jan 10, 2018 15:17 |
|
Cingulate posted:Although ideally, you'd vectorise that. Sorry for the confusion. Maybe I can make it clearer: I need these lines "folium.Marker([x, y]..." populating the python script so they can put markers on the folium map. Except there's thousands of rows in the latitude/longitude csv, so I'm not going to write each folium line by hand. So instead I need a way to get python to generate a bunch of "folium.Marker([x, y]..." lines, but also fill in the latitude/longitude information. Is that a bit better? In the meantime, I'll take a look at your and Jose Cuervo's suggestions - thanks! edit: of course, loading that much data into folium at once is another issue, but one thing at a time... Seventh Arrow fucked around with this message at 16:50 on Jan 10, 2018 |
# ¿ Jan 10, 2018 16:47 |
|
Cingulate posted:Seventh Arrow, what's throwing me off is you keep writing you want to "generate lines". But what you do want is to have Python go through the data and use the values, not literally create these lines of code, right? Yes, I think so. Maybe a better way to put it is that I want folium to put a marker on the map for every lat/long coordinate in the csv. Whatever python voodoo it takes to do that is irrelevant to me (unless it actually involves sacrificing chickens on an altar).
|
# ¿ Jan 10, 2018 18:30 |
|
Ok, thank you. What happened to the "shift(-1)"? Is that no longer necessary?
|
# ¿ Jan 10, 2018 18:43 |
|
The code cops are busting me with an "invalid syntax" error for the crime of trying to combine comparison operations in an if statement (I think):code:
Maybe the problem is trying to combine arithmetic operations on a single line? If so, I guess I could do it like this: code:
|
# ¿ Jan 22, 2018 04:36 |
|
Linear Zoetrope posted:However, your second version is preferred. Python (and most any programming language) will only evaluate the conditionals in order from first to last, so if it's testing if pr <= 2000 it's already executed and proven that it's not <= 999 I kind of suspected that was the case. Thanks!
|
# ¿ Jan 22, 2018 04:54 |
|
Boris Galerkin posted:I’m not sure how “correct” it is but I see and use this all the time. If pr <= 999 then the condition is met and nothing else will evaluate. If pr was not <= 999 then it moves onto the next condition, which is now implicitly “999 < pr <= 2000. Yes, they have a section devoted to this in Python Crash Course but I lent it to a friend; still, I had a feeling this would be the case but I wasn't sure. The next trick will be to see if it fits with the rest of the code.
|
# ¿ Jan 22, 2018 14:23 |
|
Ok I'm back and it seems like my if/elif/else loop is not quite getting along with the rest of the code. I started off trying to pull the latitude and longitude in a spreadsheet and put them as markers on a folium map. With some help from you guys, it worked with the following: code:
code:
code:
df.dtypes Post_id float64 Price float64 Bedroom float64 Bathroom float64 Sqft float64 Latitude float64 Longitude float64 Description object Latlng object Postal_code object dtype: object
|
# ¿ Jan 22, 2018 23:45 |
|
The if/elif stuff isn't in the loop already? I thought that the if/elif stuff was the loop
|
# ¿ Jan 23, 2018 00:29 |
|
Ah, I see what you mean. Thank you. I will take a swing at #1 and see how it goes.
|
# ¿ Jan 23, 2018 00:37 |
|
Ok I think I'm almost there. I've managed to twist its arm enough that it will post the map marker with a colour...once. So I think I'm in the ballpark, but I get a map with only one marker. Here's what I have so far:code:
A) The "dropna" and "iterrows" are working their magic with the "position" variable so that when the process gets around to "pr", almost all of the rows have been dropped. If this is true, then I don't know enough about python to properly address it. B) It could be something folium-specific, since jupyter displays the following: <folium.map.Marker at 0x7efe9fd758d0> Also, the "class color" thing is something I got off of Stack Overflow. If I don't put it in there, it says that "color is not defined."
|
# ¿ Jan 24, 2018 01:18 |
|
|
# ¿ May 5, 2024 02:44 |
|
I remember telling myself to put that line back, too. Woops! Thanks for the help.
|
# ¿ Jan 24, 2018 02:45 |