Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
FingersMaloy
Dec 23, 2004

Fuck! That's Delicious.
quote not edit

Adbot
ADBOT LOVES YOU

SurgicalOntologist
Jun 17, 2004

Nope, you shouldn't need to figure out the html variable. response.text in your second block is the same as html.read() in your first. In other words, scrapy gets the html for you and passes it in as response.

That is, if I'm understanding the scrapy API correctly based on what you posted.

FingersMaloy
Dec 23, 2004

Fuck! That's Delicious.
Yeah so I imagined this would work:

Python code:
import scrapy
from bs4 import BeautifulSoup
import csv

class CraigslistSpider(scrapy.Spider):
	name = "craig"
	allowed_domain = "cleveland.craigslist.org/"
	start_urls = ["https://cleveland.craigslist.org/search/apa"]

	def parse(self, response):
		soup = BeautifulSoup(response.text, 'lxml')
		ad_title = soup.title
		for date_and_time in soup.findAll(class_="timeago"):
			date_posted = date_and_time.get("datetime")
		body = soup.find(id="postingbody")
		mapaddress = soup.find(class_="mapaddress")
		for apt in soup.findAll(id="map"):
			lat = apt.get("data-latitude")
			lon = apt.get("data-longitude")
		csvFile = open("test.csv", 'a')
		try:
			writer = csv.writer(csvFile)
			writer.writerow((ad_title, date_posted, body, mapaddress, lat, lon))
		finally:
			csvFile.close()
It's throwing up a syntax error on line 24 ("finally:") right now that I can't resolve, but is it right to have the CSV pieces in the parse method?

Also, I know I'm going to need to write come exceptions into that method later, once I know this will work.

It ran without error, but didn't write to the CSV file! Getting close.

FingersMaloy fucked around with this message at 21:48 on Mar 23, 2017

Space Kablooey
May 6, 2009


It's throwing a syntax error because it's missing the except clause.

By the way, you can rewrite any file opening operation as:

Python code:
with open('filename') as file:
    # ... Do things with the file
#... Program goes on
Doing this way makes Python close the file automagically.

Eela6
May 25, 2007
Shredded Hen
As a small note, this part of your code:

Python code:
csvFile = open("test.csv", 'a')
try:
    writer = csv.writer(csvFile)
    writer.writerow((ad_title, date_posted, body, mapaddress, lat, lon))
finally:
    csvFile.close()	
Is best expressed in idomatic python with a context manager ('with statement'), as follows:
Python code:
with open('test.csv', 'a') as file:
	writer = csv.writer(file)
	writer.writerow((ad_title, date_posted, body, mapaddress, lat, lon))
The file will automatically be closed when the interpreter exits the scope of the with block for any reason, including raising an error.

edit: EFB

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

You've got a few for loops in there where you repeatedly assign the same variables, so they'll end up with whatever the last value was. If you're trying to write multiple records to the CSV file, you need to write one inside the loop each time around (or store them in say a list and then write the whole lot at the end)

SurgicalOntologist
Jun 17, 2004

Yeah, you probably want to (a) decide if you want to write a new CSV for each page that is scraped, or just append to a single file; and (b) call writerow more than once (i.e. put it inside a loop).

FingersMaloy
Dec 23, 2004

Fuck! That's Delicious.
Thanks everyone! This is very helpful. I've reworked the CSV part to:

Python code:
with open("test.csv", 'a') as file:
			writer = csv.writer(file)
			writer.writerow((ad_title, date_posted, body, mapaddress, lat, lon))
That's a lot tidier.

baka kaba posted:

You've got a few for loops in there where you repeatedly assign the same variables, so they'll end up with whatever the last value was. If you're trying to write multiple records to the CSV file, you need to write one inside the loop each time around (or store them in say a list and then write the whole lot at the end)

I think you mean these two sections:
Python code:
		
for date_and_time in soup.findAll(class_="timeago"):
			date_posted = date_and_time.get("datetime")

for apt in soup.findAll(id="map"):
			lat = apt.get("data-latitude")
			lon = apt.get("data-longitude")
Without going back and finding the full bit of HTML, I'm trying to pull "datetime", "data-longitude", and "data-latitude" from the HTML, but I couldn't figure out how to make it happen without pulling the whole tag which has several variables and then breaking it out.

I'm going to work making the whole thing create a list and then write the list to the CSV, if I can figure that out. Thanks again all!

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

HardDiskD posted:

It's throwing a syntax error because it's missing the except clause.

By the way, you can rewrite any file opening operation as:

Python code:
with open('filename') as file:
    # ... Do things with the file
#... Program goes on
Doing this way makes Python close the file automagically.

except clauses are not required if there's a finally.

code:
In [1]: try:
   ...:     print(undefined_name)
   ...: finally:
   ...:     print('finally')
   ...:     
finally
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-3f894de109e3> in <module>()
      1 try:
----> 2     print(undefined_name)
      3 finally:
      4     print('finally')
      5 

NameError: name 'undefined_name' is not defined

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

FingersMaloy posted:



I think you mean these two sections:
Python code:
		
for date_and_time in soup.findAll(class_="timeago"):
			date_posted = date_and_time.get("datetime")

for apt in soup.findAll(id="map"):
			lat = apt.get("data-latitude")
			lon = apt.get("data-longitude")
Without going back and finding the full bit of HTML, I'm trying to pull "datetime", "data-longitude", and "data-latitude" from the HTML, but I couldn't figure out how to make it happen without pulling the whole tag which has several variables and then breaking it out.

I'm going to work making the whole thing create a list and then write the list to the CSV, if I can figure that out. Thanks again all!

What I mean is you're doing a for loop, which assumes you're (possibly) working with multiple items. So in your first bit of code, you call that method that finds all the elements with a "timeago" class. Then the for loop iterates over each of those elements in turn, tries to get the value of its "datetime" attribute, and assigns that to your date_posted variable

So if you have multiple "timeago"-class elements, you'll rewrite the value of date_posted each time, and in the end it will be set to whatever the last element said, and you'll lose the previous values. If that's what you want to do, there are better ways to go about getting the last element of a sequence. If you're only expecting one "timeago" element on the page, so you'll only write to your variable once, you don't need the for loop! (plus your intent is clearer)

Python code:
# using the 'find' method which returns only the first matching element
date_and_time = soup.find(class_="timeago")
# if it wasn't found date_and_time will be None, so check before calling a method on it!
if date_and_time:
	date_posted = date_and_time.get("datetime")
Also you're doing the right thing pulling out tags and then extracting the different attributes you need, it's just what you gotta do. There are some nice tricks to help you find certain tags though, so if you ever find yourself getting a tag just so you can find another tag within it, you can probably pull out the one you want with a single find call

baka kaba fucked around with this message at 10:45 on Mar 24, 2017

Space Kablooey
May 6, 2009


Lysidas posted:

except clauses are not required if there's a finally.

Neat. I didn't know about that.

Space Kablooey fucked around with this message at 13:45 on Mar 24, 2017

death cob for cutie
Dec 30, 2006

dwarves won't delve no more
too much splatting down on Zot:4
What're my best options for platform-independent MP3 and FLAC playback in Python? I'm writing a media player to serve as a reference implementation for a new metadata format I'm writing, and most of the stuff that handles MP3 or FLAC either a. only works on Windows or b. is bundled in with a bunch of poo poo focused around game development that I'd rather not distribute with my program (like pygame). Right now I'm settled on pyglet (smaller and simpler than pygame at least) but it would be really keen if I just had a teeny-tiny package (or two, for each format) that did nothing but play back media, modulate the volume, etc.

Eela6
May 25, 2007
Shredded Hen
^^ No idea, but I would love to know the answer.

I gave my second talk at San Diego Python last night! It went really well.

Who in the thread is going to PyCon this year? It might be fun to have a goon Python dev lunch.

Tigren
Oct 3, 2003
Pissing me off today: back porting scripts to support RHEL5. Just, like, basic poo poo is missing from python 2.4 that I rely on all the time. Set comprehensions, any(), conditional expressions, argparse! We've still got about 20% of our infrastructure running on this 10 year old OS.

At least it's Friday. :guinness:

QuarkJets
Sep 8, 2008

Some stuff you may be able to pull out of __future__; according to the docs List Comprehensions were in 2.4, so presumably you can just run a list comprehension and then convert to a set

Jose Cuervo
Aug 25, 2004

QuarkJets posted:

Try using a comma instead of a colon in the typehinting

Forgot to say that this is what the problem was. Thanks!

tricksnake
Nov 16, 2012
Hi I am a new python user just now working my way through Learn Python The Hard Way

I'm wondering what a good framework for python is? Especially for beginners if that's a thing. I have 2.x and 3.x installed but I'm reading in a lot of places that 3.x is the way to go so I'm gonna need a framework that's for 3.x work.

edit: Crap... is Atom text editor what I'm looking for? It says to download it but that's just a text editor. I'm guessing all I need is that at this point in my quest for programming skills.

tricksnake fucked around with this message at 00:51 on Mar 28, 2017

death cob for cutie
Dec 30, 2006

dwarves won't delve no more
too much splatting down on Zot:4
All you need is a text editor at this point. Don't worry about getting fancy too quick. (I followed the same book and I and several pros I know recommend it to a lot of people, it's a very good intro) Shaw suggesting Atom is a new one, but I just checked and I guess he's updated it since I last looked at the starting chapters.

When you say "framework" I assume you mean an IDE - definitely overkill for you right now (unless you're used to using IDEs with other programming languages - but if you've been programming before, you probably already have a preferred text editor installed)

tricksnake
Nov 16, 2012

Epsilon Plus posted:

All you need is a text editor at this point. Don't worry about getting fancy too quick. (I followed the same book and I and several pros I know recommend it to a lot of people, it's a very good intro) Shaw suggesting Atom is a new one, but I just checked and I guess he's updated it since I last looked at the starting chapters.

When you say "framework" I assume you mean an IDE - definitely overkill for you right now (unless you're used to using IDEs with other programming languages - but if you've been programming before, you probably already have a preferred text editor installed)

Ok thanks dude I really appreciate it. I'll just use Atom for now.

Eela6
May 25, 2007
Shredded Hen

tricksnake posted:

Hi I am a new python user just now working my way through Learn Python The Hard Way

I'm wondering what a good framework for python is? Especially for beginners if that's a thing. I have 2.x and 3.x installed but I'm reading in a lot of places that 3.x is the way to go so I'm gonna need a framework that's for 3.x work.

edit: Crap... is Atom text editor what I'm looking for? It says to download it but that's just a text editor. I'm guessing all I need is that at this point in my quest for programming skills.

I would recommend downloading Anaconda. It comes with a lightweight IDE, Spyder, that I find excellent for beginners to-intermediate programmers. In addition, Anaconda includes many of the most useful packahes and dependencies 'in-box', some of which can be frustrating to install otherwise. It has decent autocomplete and will warn you about syntax errors, and it has a very easy REPL work flow.

Other good choices for IDEs are PyCharm and Visual Studio Code, but you might find them a little heavyweight for what you're doing right now.

Good luck with Python! I hope you enjoy it

Dominoes
Sep 20, 2007

I'm mostly repeating what Eela and Epsilon said:

It doesn't matter what tools you use at this point. Atom and Visual Studio Code are simple, good text editors.

Spyder's an IDE that's easy to use; you could try that as well: 'pip install spyder' in a terminal. You might like it because it has a console built in, and buttons to run your code. Don't mess with PyCharm yet; it can be overwhelming while you're learning.

Installing Anaconda will make installing third-party packages easier.

tricksnake
Nov 16, 2012
Yea I'll just stick with Atom until i start getting into the heavy duty stuff. Thanks for the recommendation I'll check out Anaconda.

a witch
Jan 12, 2017

What's the current hotness for type annotations and checking? Mypy or something else?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

a witch posted:

What's the current hotness for type annotations and checking? Mypy or something else?

mypy

vikingstrike
Sep 23, 2007

whats happening, captain

Does mypy have support for pandas objects? So I could set the return/input type to be pd.Series or pd.DataFrame?

huhu
Feb 24, 2006
Would this be the best way to log errors on a script that I'm running?

code:
try:
    # Threading here for 3 different functions
except Exception as e:     
    logf = open("log.txt", "a")
    logf.write("Error: {} \n".format(str(e)))
    logf.close()
finally:
    pass

vikingstrike
Sep 23, 2007

whats happening, captain

huhu posted:

Would this be the best way to log errors on a script that I'm running?

code:
try:
    # Threading here for 3 different functions
except Exception as e:     
    logf = open("log.txt", "a")
    logf.write("Error: {} \n".format(str(e)))
    logf.close()
finally:
    pass

Check out the logging module that comes in the standard library. It offers a cleaner interface for this type of stuff.

Eela6
May 25, 2007
Shredded Hen

huhu posted:

Would this be the best way to log errors on a script that I'm running?

code:
try:
    # Threading here for 3 different functions
except Exception as e:     
    logf = open("log.txt", "a")
    logf.write("Error: {} \n".format(str(e)))
    logf.close()
finally:
    pass

Python has a standard logging module .

However, if you're just doing something very basic, this is the 'pythonic' way to do what you're asking.

Python code:
try:
    #threading
except Exception as err:
    with open('log.txt', 'a') as log:
        print(f'Error: {err}', file=log)
This is syntatically equivalent to the following:
Python code:
try:
    #threading
except Exception as err:
    logf = open('log.txt', 'a')
    logf.write('Error: {}\n'.format(str(e)))
finally:
    logf.close()
If you are going to use try/except, it's important to have the closure of the file be in the finally clause; the way you have it will leave the file open if an error occurs during logf.write(). A context manager ('with') is a cleaner way to handle file IO and the current preferred idiom.

huhu
Feb 24, 2006
Awesome thanks for the quick replies!

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

Eela6 posted:

This is syntatically equivalent to the following:

Just a small correction, you actually meant this:
Python code:
try:
    #threading
except Exception as err:
    try:
        logf = open('log.txt', 'a')
        logf.write('Error: {}\n'.format(str(e)))
    finally:
        logf.close()

Eela6
May 25, 2007
Shredded Hen
Indeed I did. Thank you.

Edit: this is actually a great example of why to use 'with' over manual open/close: even with experience it's easy to gently caress it up doing it manually.

Eela6 fucked around with this message at 21:41 on Mar 28, 2017

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

vikingstrike posted:

Does mypy have support for pandas objects? So I could set the return/input type to be pd.Series or pd.DataFrame?

https://github.com/pandas-dev/pandas/issues/14468

pubic void nullo
May 17, 2002


huhu posted:

Would this be the best way to log errors on a script that I'm running?

code:
try:
    # Threading here for 3 different functions
except Exception as e:     
    logf = open("log.txt", "a")
    logf.write("Error: {} \n".format(str(e)))
    logf.close()
finally:
    pass

I know this is already answered, but here's an entire blog post on the topic of logging exceptions, with some helpful examples for Python 2 and 3. It includes how to log the stack trace. https://realpython.com/blog/python/the-most-diabolical-python-antipattern/

Baby Babbeh
Aug 2, 2005

It's hard to soar with the eagles when you work with Turkeys!!



Thanks for inadvertently bringing f-strings to my attention. I always use str.format() but that seems like a better way to do that...

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Baby Babbeh posted:

Thanks for inadvertently bringing f-strings to my attention. I always use str.format() but that seems like a better way to do that...

They're the best.

It's weird that .format() has always bothered me as much as it has, but dang it sucks.

huhu
Feb 24, 2006
Curious if anyone would be interested in a Flask thread? I'm starting a new job working full time with it and would love a place to discuss it more.

Eela6
May 25, 2007
Shredded Hen
Why not? I'd love to know more anything about flask

Jose Cuervo
Aug 25, 2004

Eela6 posted:

Why not? I'd love to know more anything about flask

Me too.

Space Kablooey
May 6, 2009


nvm

Space Kablooey fucked around with this message at 15:32 on Mar 30, 2017

Adbot
ADBOT LOVES YOU

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

The Django thread barely gets enough traffic to justify its existence. You may be better off just talking Flask in here?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply