Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Seventh Arrow
Jan 26, 2005

I'm an utter neophyte, not just with python, but with programming in general so please bear with me. I'm working through a beginner-oriented book and he's going through lists and sorting. At first he shows how to display a list in reverse order:

code:
cars = ['bmw', 'audi', 'toyota', 'subaru']
cars.sort(reverse=True)
print(cars)
Next, he shows how to show the list in a sorted order without affecting the original order:

code:
cars = ['bmw', 'audi', 'toyota', 'subaru']

print("Here is the original list:")
print(cars)

print("\nHere is the sorted list:")
print(sorted(cars))

print("\nHere is the original list again:")
print(cars)
So finally, the point: he says that the sorted() function can also accept a reverse=True argument to display the list in reverse alphabetical order. He doesn't say how to do this, though, and the following exercise requires it. If I do

code:
cars = ['bmw', 'audi', 'toyota', 'subaru']
print(sorted(reverse=True))
print(cars)
it doesn't like it. None of the variations that I try work either. Any suggestions? I tried googling it but a lot of the results seemed overly complex and didn't really answer the question.

Adbot
ADBOT LOVES YOU

Seventh Arrow
Jan 26, 2005

Thanks very much, guys! This is good stuff to work with.

Seventh Arrow
Jan 26, 2005

A few questions:

1) What are some recommended books for python with big data/data analytics? (does not have to be specific to pyspark)

2) I'm still kind of new to programming and there's a lot of stuff to take in/remember. I assume that people usually don't just code things from scratch - do you keep cheat sheets around or what?

Seventh Arrow
Jan 26, 2005

Excellent, thanks for the answers!

Seventh Arrow
Jan 26, 2005

Well I'm still working my way through this book and I'm going to follow it up with this one, so I think I'll have my hands busy for a while. I'm also trying to learn Apache Spark , as if that weren't enough :toot:

Seventh Arrow
Jan 26, 2005

I'm looking for a python tutor and not really sure how to go about it. I'm taking a data science course and my lack of proficiency with python is my biggest weakness. So obviously I want to focus on data science/data analysis concepts, but also not-quite-so-directly related things like web scraping and working with APIs.

I've read "Python Crash Course" and "Automate the Boring Stuff with Python" so I'm familiar with the basics but I tend to struggle with coming up with code on my own, or analyzing existing scripts.

I live in Toronto, but this seems like the kind of thing that could be done via skype or discord or whatever. I guess(?)

The catch is that I'm unemployed and receiving employment insurance, so I don't have a lot of cash to throw around. I'll try to work out something reasonable, regardless.

Seventh Arrow
Jan 26, 2005

accipter posted:

Have you tried the IRC #python channel on Freenode?

I haven't but I'll take a look into it, thanks!

Seventh Arrow
Jan 26, 2005

So my attempts to find a tutor were not very fruitful. So maybe I could get a bit of direction instead.

I'm in a data science / data engineering course, but the level of python expertise required is higher than I initially thought.

I've read "Python Crash Course" and am working my way through "Automate the Boring Stuff with Python" so I know what lists and dictionaries are, etc., but I draw a blank when trying to come up with my own code and analyzing scripts is tricky - like here's an example of a script that I need to modify for a project. I eventually figured it out, but it was pretty challenging.

So anyways, books kinda take longer than I'd like, is there maybe an online course that's fairly decent? It doesn't have to be free. I think I just need to start doing more exercises, but I also still need training wheels somewhat.

Mod Edit: :iiam:

Somebody fucked around with this message at 22:21 on Sep 7, 2017

Seventh Arrow
Jan 26, 2005

KernelSlanders posted:

It kind of sounds like you're confusing "I need to learn programming" with "I need to learn Python". Have you learned to program in any other languages previously? If not, there aren't really any shortcuts. You need to learn basic programming (you can do it in Python). Ideas like scope, abstraction, control flow, algorithmic complexity don't just show up in a python tutorial. You'll need to study a bit and practice a lot.

Yes, this is my first programming language (except for a bit of BASIC in the 80's, haha). I kind of suspected that the things you mentioned were lacking in my approach. I agree that a lot of practice is probably needed, so any suggestions are welcome. Thanks everyone for the recommendations so far!

Seventh Arrow
Jan 26, 2005

There's no specific examples at the moment. I'm just in that awkward phase of having read about dictionaries, strings, lists, etc and now having to go out and do stuff with it. And what I need to do is specific to data science / data engineering. Since my initial post I've seen sites that have python exercises, so that might be the right direction to go in.

There actually is a project that I'm working on currently, but the main issues that I'm facing have to do mostly with the API that the script is interacting with and not python per se.

Seventh Arrow
Jan 26, 2005

So I've got this script: https://pastebin.com/GLkh6z0T

If I use it to run "python3 scriptname.py > output.csv" it will pull a list of all the "tech" groups in Toronto from meetup.com's API. So far, so good. However, I want to narrow the list down to all data-related groups: data science, data analytics, data engineering, etc. I figured the best way to do this would be to filter the results with a regex, maybe something like this:

meetup_f = main.filter(lambda line: re.search(r'([Dd]ata\s\w+)', line)

However, from what I'm told, this won't quite work since "main" is just printing the results instead of returning any values. I'm not good enough at python to figure this out on my own. Is there any way (in dumb-person language) to get these specific results before it shoots everything out to csv?

Seventh Arrow
Jan 26, 2005

Thanks for the suggestion. If I do "python meetup_data.py > meetup_data.csv", I just get a blank file. I think what you might be trying to do is pull "data" groups from the API, but the API only has a category for "tech." It's necessary - at least, as far as I can tell - to further filter the results from there.

If I try doing "python3 meetup_data.py > meetup_data.csv", then I get the error:

File "meetup_data.py", line 32
continue
^
TabError: inconsistent use of tabs and spaces in indentation

Which is strange, since the formatting of the indentation seems to be consistent: https://pastebin.com/aHvT5GfM

Seventh Arrow
Jan 26, 2005

Thermopyle posted:

Well you just need to find the part of the data structure the api returns that you want to filter on. I dont' know what the API data structure looks like, but you want whatever part that says "data science", "data analytics", "data engineering", etc.

Unfortunately the API isn't very robust, so there's no way within it to narrow down the results. That's why I want to see if there's a way to do it directly through python. Of course I could just do a filter on the results within excel, but this is for a data science project and they want a more direct way to get the desired results (since I might need to someday do the same thing on a gargantuan csv). I appreciate the assistance, though!

What's interesting is that the API has a buttload of options if you're searching within a specific group, like "Toronto Python Hackers" or something. But for other stuff (like categories), not so much.

Seventh Arrow
Jan 26, 2005

Excellent, thanks! I'll give that a try when I get home.

Seventh Arrow
Jan 26, 2005

I'm working through an exercise from a book:

https://pastebin.com/zMweG525

It's supposed to be using regex queries to look for phone numbers and email addresses, but it's giving me syntax errors on line 25 and I can't figure out why.

I checked if maybe "text" was a reserved word in Python or something, but that doesn't seem to be the case. Besides, I get the same error if I substitute "txt" or even "floof."

The "pyperclip" module is installed and I've used it successfully before. What am I missing?

Seventh Arrow
Jan 26, 2005

Woops you're right, thanks. That was obvious, that'll teach me for focusing too much on line 25.

Seventh Arrow
Jan 26, 2005

Yeah I've been using Geany, which isn't super robust (or maybe it can be configured to be). Maybe I'll just start doing everything in jupyter.

Seventh Arrow
Jan 26, 2005

Portland Sucks posted:

I've been writing a custom ETL tool at work in python and I've got most of it all broken down into individual scripts. All of the steps along the way are jobs that should be able to execute independently, but can also be triggered by a successful end condition of one before it in the pipeline.

I'm looking to write a process manager that can monitor these scripts, trigger them when they need to be, and give me the ability to manually start and stop them when I want to. This is running on a Windows Server which seems to limit some of the available libraries already out there.

Am I trying to reinvent the wheel? I can't really find anything that meets these requirements already, but it seems like it should exist.

Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering.

Seventh Arrow
Jan 26, 2005

I'm in the same boat, but it doesn't seem like there are any books that are good for transitioning someone from beginner -> writing your own code. I think it's necessary to just start diving in and start doing projects - there are websites that have exercises with solutions, so I'm going to try doing those.

Having said all that, I decided to get "Learn Python the Hard Way," but so far the title seems to be a bit misleading since the author handholds the reader through some very newbie stuff. Guess I'll see how it goes!

Seventh Arrow fucked around with this message at 22:11 on Oct 21, 2017

Seventh Arrow
Jan 26, 2005

Thermopyle posted:

The only way I ever progressed from being a person who had read some programming books to someone who was making money doing cool stuff was to do projects that mattered to me....that solved problems I had. Struggling through those leveled me up more than I ever could have done by reading more books.

Where did you look for projects?

Seventh Arrow
Jan 26, 2005

I can't speak for Hughmoris, but for myself I'm looking to get into data engineering so anything oriented towards ETL, batch, and data warehousing stuff is what I'm looking for. Still, I will look into the Flask and Django stuff, since it's probably not a good idea to be fussy while still learning - thanks for the links!

Seventh Arrow
Jan 26, 2005

I'm trying to comprehend numpy arrays - I have an assignment and the first question is to create a random vector of size 20 and sort it in descending order. This is what I came up with:

code:
 
import numpy as np
a = np.random.random((1,20))
np.sort(-a)
I get the following result, so I think it works:

array([[-0.94139218, -0.70652483, -0.67840897, -0.67044282, -0.62539388,
-0.61770677, -0.58816414, -0.46556941, -0.44944398, -0.4487512 ,
-0.43776743, -0.41519608, -0.39534896, -0.34280607, -0.23698099,
-0.0829909 , -0.05634266, -0.05450404, -0.04979055, -0.02429839]])

I'm not sure about the parameters used for 'np,' though - in this case ((1,20)). So I think '1' means that is has one dimension, so it's a vector. The '20' seems to be the total size of the array. Then I see many arrays that have a third number, but I'm not sure what it means...the tutorials that I've seen so far refuse to stoop to my level. Can anyone elucidate?

Seventh Arrow
Jan 26, 2005

Ok great, thanks! I did know about the "np" thing but thanks anyhoo :) So in the next question it says to create a 5x5 array with 6's on the borders and 0's on the inside. So I guess I would use ((5,5)) for the tuple, yes? I'll have to look into arranging the numbers in such a specific fashion though.

edit: wait, the ((5,5)) doesn't seem right

Also, is that a real quote from Duck Dunn?

Seventh Arrow
Jan 26, 2005

I have a comma-separated spreadsheet with a bunch of information about condos in my city, most importantly it has the latitude and longitude of these places. I want to be able to output these coordinates onto google maps but I'm not sure how to go about doing this. I looked at this link but none of the API's seem to quite provide what I'm looking for (at least, not with python). Any suggestions?

Seventh Arrow
Jan 26, 2005


That looks good, thanks! I will look into it.

Seventh Arrow
Jan 26, 2005


So this seems to work pretty good, it seems like the basis of this is using "gmap = gmplot.GoogleMapPlotter(43.66548, -79.3875, 16)" to store the lat & long and then "gmap.draw("filename.html")" to put it onto an actual map.

So I have two challenges:
  1. I need to funnel a bunch of latitudes & longitudes from the spreadsheet into gmplot.GoogleMapPlotter, and
  2. I need to arrange it so that I can put a bunch of these coordinates into one html file instead of a bunch of separate files

Any hints on how I can do this?

Seventh Arrow
Jan 26, 2005

accipter posted:

Are you trying to create a map? Or load custom points on a Google Map? If you need to use Google, then I would create your own map (https://www.google.com/maps/d/), and then upload the CSV.

I need to put markers on a map, but since this is for a data science python course I need to find a ~*pythonic*~ way of doing it. The module that was linked to earlier was good, but I need to find around some of the details.

Seventh Arrow
Jan 26, 2005

A big thank you to the goons who had map suggestions. I've been trying out folium, but I'm trying to find a way to get all the values into one map. From what I can glean from the documentation, folium uses a format like this:

code:
map_1 = folium.Map(location=[45.372, -121.6972],
                  zoom_start=12,
                  tiles='Stamen Terrain')
folium.Marker([45.3288, -121.6625], popup='Mt. Hood Meadows').add_to(map_1)
folium.Marker([45.3311, -121.7113], popup='Timberline Lodge').add_to(map_1)
map_1
As mentioned, I have a csv file with the latitude and longitude and it actually even has a field with both values in the one cell. So as far as I can tell I need to do two things:

1) Generate a sufficient amount of lines with the content: folium.Marker([x, y]).add_to(map_1)

2) Fill in x and y with the lat/long values from the spreadsheet

I'm not sure how to do this. I've been able to read the data from the spreadsheet:

code:
import pandas as pd
import folium

df_raw = pd.read_excel('df_condo_v9_t1.xlsx', sheetname=0, header=0)

df_raw.shape

df_raw.dtypes

df_lat = df_raw['Latlng']

df_lat.head()
But I'm not really sure what to do next. I think that the folium lines can be formatted "folium.Marker([df_lat]).add_to(map_1)" but even that's not so straightforward because each line needs to take the value from the next row in the spreadsheet. Any suggestions would be appreciated.

Seventh Arrow
Jan 26, 2005

Cingulate posted:

I don't understand what this means.

What I'm trying to say is that each folium line can't keep reading the first row over and over again. The first folium line needs to use row 1, the second one needs to use row 2, and so on.

Seventh Arrow
Jan 26, 2005

Cingulate posted:

Although ideally, you'd vectorise that.

Sorry if I'm totally missing your point.

Sorry for the confusion. Maybe I can make it clearer: I need these lines "folium.Marker([x, y]..." populating the python script so they can put markers on the folium map. Except there's thousands of rows in the latitude/longitude csv, so I'm not going to write each folium line by hand.

So instead I need a way to get python to generate a bunch of "folium.Marker([x, y]..." lines, but also fill in the latitude/longitude information. Is that a bit better?

In the meantime, I'll take a look at your and Jose Cuervo's suggestions - thanks!


edit: of course, loading that much data into folium at once is another issue, but one thing at a time...

Seventh Arrow fucked around with this message at 16:50 on Jan 10, 2018

Seventh Arrow
Jan 26, 2005

Cingulate posted:

Seventh Arrow, what's throwing me off is you keep writing you want to "generate lines". But what you do want is to have Python go through the data and use the values, not literally create these lines of code, right?

Yes, I think so. Maybe a better way to put it is that I want folium to put a marker on the map for every lat/long coordinate in the csv. Whatever python voodoo it takes to do that is irrelevant to me (unless it actually involves sacrificing chickens on an altar).

Seventh Arrow
Jan 26, 2005

Ok, thank you. What happened to the "shift(-1)"? Is that no longer necessary?

Seventh Arrow
Jan 26, 2005

The code cops are busting me with an "invalid syntax" error for the crime of trying to combine comparison operations in an if statement (I think):

code:
pr = [df["price"]]
cl = color
if pr <= 999:
    cl = 'green'
elif pr > 1000 and < 2000:
    cl = 'yellow'
elif pr > 2000 and < 3000:
    cl = 'orange'
else:
    cl = 'red'
The results from my googling efforts all had bad indentation of some sort, but I don't believe that's the case here - although I could be wrong.

Maybe the problem is trying to combine arithmetic operations on a single line? If so, I guess I could do it like this:

code:
pr = [df["price"]]
cl = color
if pr <= 999:
    cl = 'green'
elif pr <= 2000:
    cl = 'yellow'
elif pr <= 3000:
    cl = 'orange'
else:
    cl = 'red'
But this seems strange...if python sees that "pr" is less than 999 and also less than 2000, won't the universe collapse on itself or something?

Seventh Arrow
Jan 26, 2005

Linear Zoetrope posted:

However, your second version is preferred. Python (and most any programming language) will only evaluate the conditionals in order from first to last, so if it's testing if pr <= 2000 it's already executed and proven that it's not <= 999

I kind of suspected that was the case. Thanks!

Seventh Arrow
Jan 26, 2005

Boris Galerkin posted:

I’m not sure how “correct” it is but I see and use this all the time. If pr <= 999 then the condition is met and nothing else will evaluate. If pr was not <= 999 then it moves onto the next condition, which is now implicitly “999 < pr <= 2000.

Yes, they have a section devoted to this in Python Crash Course but I lent it to a friend; still, I had a feeling this would be the case but I wasn't sure. The next trick will be to see if it fits with the rest of the code.

Seventh Arrow
Jan 26, 2005

Ok I'm back and it seems like my if/elif/else loop is not quite getting along with the rest of the code.

I started off trying to pull the latitude and longitude in a spreadsheet and put them as markers on a folium map. With some help from you guys, it worked with the following:

code:
import pandas as pd
import folium

df = pd.read_excel('df_condo_v9_t1.xlsx', sheetname=0, header=0)

df.shape

df.dtypes

map_center = [df["Latitude"].mean(), df["Longitude"].mean()]

map_1 = folium.Map(location=map_center, tiles="Stamen Terrain", zoom_start=12)

for i, row in df[["Latitude", "Longitude"]].dropna().iterrows():
    position = (row["Latitude"], row["Longitude"])
    folium.Marker(position).add_to(map_1)

map_1
This worked A-OK. Then I decided to get a little fancy and make the markers different colours based on the price, so $0 - $1000 would have a green marker, $1000 - $2000 would have a yellow marker, $2000 - $3000 would have an orange marker, and $3000 and above would have a red marker. So I added some variables and my if loop to get this:

code:
import pandas as pd
import folium

df = pd.read_excel('df_condo_v9_t1.xlsx', sheetname=0, header=0)

df.shape

df.dtypes

map_center = [df["Latitude"].mean(), df["Longitude"].mean()]

map_1 = folium.Map(location=map_center, tiles="Stamen Terrain", zoom_start=12)


pr = [df["Price"]]
cl = color
if pr <= 999:
    cl = 'green'
elif pr < 2000:
    cl = 'yellow'
elif pr < 3000:
    cl = 'orange'
else:
    cl = 'red'


for i, row in df[["Latitude", "Longitude"]].dropna().iterrows():
    position = (row["Latitude"], row["Longitude"])
    folium.Marker(position), icon=folium.Icon(color='cl').add_to(map_1)

map_1
When I try to run this in jupyter, I get:

code:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-48-e9dbacbdd4bf> in <module>()
      1 pr = [df["Price"]]
      2 cl = color
----> 3 if pr < 999:
      4     cl = 'green'
      5 elif pr < 2000:

TypeError: unorderable types: list() < int()
Which is strange because at first it was telling me "color is not defined." Anyways, it might be because it sees the price field as float64? I'm not sure.

df.dtypes

Post_id float64
Price float64
Bedroom float64
Bathroom float64
Sqft float64
Latitude float64
Longitude float64
Description object
Latlng object
Postal_code object
dtype: object

Seventh Arrow
Jan 26, 2005

The if/elif stuff isn't in the loop already? I thought that the if/elif stuff was the loop :psyduck:

Seventh Arrow
Jan 26, 2005

Ah, I see what you mean. Thank you. I will take a swing at #1 and see how it goes.

Seventh Arrow
Jan 26, 2005

Ok I think I'm almost there. I've managed to twist its arm enough that it will post the map marker with a colour...once. So I think I'm in the ballpark, but I get a map with only one marker. Here's what I have so far:

code:
import pandas as pd
import folium

df = pd.read_excel('df_condo_v9_t1.xlsx', sheetname=0, header=0)

df.shape

df.dtypes

map_center = [df["Latitude"].mean(), df["Longitude"].mean()]

map_1 = folium.Map(location=map_center, tiles="Stamen Terrain", zoom_start=12)

class color:
    def out(self):
        print("successful.")

for i, row in df[["Latitude", "Longitude", "Price"]].dropna().iterrows():
    position = (row["Latitude"], row["Longitude"])
    pr = (row["Price"])
    cl = color
    if pr < 999:
        cl = 'green'
    elif pr < 2000:
        cl = 'yellow'
    elif pr < 3000:
        cl = 'orange'
    else:
        cl = 'red'
folium.Marker(position, icon=folium.Icon(color=cl)).add_to(map_1)

map_1
I think this could be due to one of two things:

A) The "dropna" and "iterrows" are working their magic with the "position" variable so that when the process gets around to "pr", almost all of the rows have been dropped. If this is true, then I don't know enough about python to properly address it.
B) It could be something folium-specific, since jupyter displays the following: <folium.map.Marker at 0x7efe9fd758d0>

Also, the "class color" thing is something I got off of Stack Overflow. If I don't put it in there, it says that "color is not defined."

Adbot
ADBOT LOVES YOU

Seventh Arrow
Jan 26, 2005

:doh: I remember telling myself to put that line back, too. Woops! Thanks for the help.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply