Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Eela6
May 25, 2007
Shredded Hen

QuarkJets posted:

Probably execution is supposed to stop at that point, once the error is raised? Definitely throw a raise at the end of that except, if so

A catch of Exception (or worse yet, BaseException) without a re-raise is a huge red flag.

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

Eela6 posted:

A catch of Exception (or worse yet, BaseException) without a re-raise is a huge red flag.

That's often true, but there are circumstances where it can be okay, such as in a process that's supposed to live forever.

e: Which I'd caveat with "you should log the stack trace yourself in that case"

QuarkJets fucked around with this message at 23:55 on May 9, 2017

ButtWolf
Dec 30, 2004

by Jeffrey of YOSPOS
Hey, hopefully easy question for ya'll: I wrote a webscrapey thing last year w BeautifulSoup. I changed variable names to get the new data (new <tr> headings). Nothing works now. I'm out of practice and wasn't good to begin with. Can anyone help? BTW, it worked perfectly last year.

code:
import sys
import time
from bs4 import BeautifulSoup
from urllib2 import urlopen  # for Python 3: from urllib.request import urlopen

url_test = open('txt_files/url_list.txt', 'r')

f = open('txt_files/scraped_final.txt', 'w')

for line in url_test:

#html_doc = 'http://www.basketball-reference.com/players/a/arizatr01.html?lid=carousel_player'
	html_doc = line

	soup = BeautifulSoup(urlopen(html_doc, "html.parser"))

        name_tag = soup.find("h1")
	player_name = name_tag.string

	per36_line_2016 = soup.find("tr", id="per_minute.2016")
	per36_line_2017 = soup.find("tr", id="per_minute.2017")
	adv_line_2016 = soup.find("tr", id="advanced.2016")
	adv_line_2017 = soup.find("tr", id="advanced.2017")
	per_game_line_2016 = soup.find("tr", id="per_game.2016")
	per_game_line_2017 = soup.find("tr", id="per_game.2017")
	f.write("Name: " + player_name + "\n")


	for string in per36_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per36_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")


	print player_name
	time.sleep(1.0)
It gives me nonetype errors. I changed the for strings and wrote straight, it of course did not strip all of the html tags, but also only two of the six were written, the others were None. I'm not sure what has changed. The site looks like it's the same. Any ideas?

ButtWolf fucked around with this message at 16:08 on May 10, 2017

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

ButtWolf posted:

Hey, hopefully easy question for ya'll: I wrote a webscrapey thing last year w BeautifulSoup. I changed variable names to get the new data (new <tr> headings). Nothing works now. I'm out of practice and wasn't good to begin with. Can anyone help. BTW, it worked perfectly last year.

code:
import sys
import time
from bs4 import BeautifulSoup
from urllib2 import urlopen  # for Python 3: from urllib.request import urlopen

url_test = open('txt_files/url_list.txt', 'r')

f = open('txt_files/scraped_final.txt', 'w')

for line in url_test:

#html_doc = 'http://www.basketball-reference.com/players/a/arizatr01.html?lid=carousel_player'
	html_doc = line

	soup = BeautifulSoup(urlopen(html_doc, "html.parser"))

#This works do not touch
	name_tag = soup.find("h1")
	player_name = name_tag.string
#This works do not touch
	per36_line_2016 = soup.find("tr", id="per_minute.2016")
	per36_line_2017 = soup.find("tr", id="per_minute.2017")
	adv_line_2016 = soup.find("tr", id="advanced.2016")
	adv_line_2017 = soup.find("tr", id="advanced.2017")
	per_game_line_2016 = soup.find("tr", id="per_game.2016")
	per_game_line_2017 = soup.find("tr", id="per_game.2017")
	f.write("Name: " + player_name + "\n")


	for string in per36_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per36_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")

#This works do not touch
	print player_name
	time.sleep(1.0)
It gives me nonetype errors. I changed the for strings and wrote straight, it of course did not strip all of the html tags, but also only two of the six were written, the others were None. I'm not sure what has changed. The site looks like it's the same. Any ideas?

You should give us the actual error stacktraces so we don't have to go through your code line by line to figure out whats what.

ButtWolf
Dec 30, 2004

by Jeffrey of YOSPOS

Thermopyle posted:

You should give us the actual error stacktraces so we don't have to go through your code line by line to figure out whats what.
I'm not good at posting either.
code:
Traceback (most recent call last):
  File "C:\Users\Tony\Python\NBA\scrape_build.py", line 31, in <module>
    for string in per36_line_2016:
TypeError: 'NoneType' object is not iterable
code:
#This works do not touch
This was me talking to myself. Not for you guys.

ButtWolf fucked around with this message at 16:28 on May 10, 2017

Dex
May 26, 2006

Quintuple x!!!

Would not escrow again.

VERY MISLEADING!
looks like the site you're trying to read might have been updated, then. "'NoneType' object is not iterable" there is telling you that "per36_line_2016" is None, which means the line:

per36_line_2016 = soup.find("tr", id="per_minute.2016")

isn't finding anything whereas you're expecting a list(or some kind of iterable, at least) of things.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



The document that comes from the server has those tables in giant comment blocks. I guess you could iterate over every comment block, check it for that per_minute text, load that as a document and then use find on the result, but 1) yikes that's complex and 2) it's fragile: they're already rendering their stuff in JS so you might want to change your approach now so if they switch their front-end to a more, lets say, sane approach your scraper won't break immediately.

ButtWolf
Dec 30, 2004

by Jeffrey of YOSPOS

Dex posted:

looks like the site you're trying to read might have been updated, then. "'NoneType' object is not iterable" there is telling you that "per36_line_2016" is None, which means the line:

per36_line_2016 = soup.find("tr", id="per_minute.2016")

isn't finding anything whereas you're expecting a list(or some kind of iterable, at least) of things.

That's what I thought, but it's there. Does it not inherently grab everything inside all <td> in the <tr>? Either it used to, or they changed the site.
code:
<tr id="per_game.2017" class="full_table rowSum" data-row="5"><th scope="row" class="left " data-stat="season">...</td></tr>

ButtWolf fucked around with this message at 16:47 on May 10, 2017

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

That doesn't have the id you're looking for though? Like Munkeymon says the <tr> elements you're looking for have been commented out

Go to the page, hit f12 to open your browser's web developer tools, and do a search on the source with ctrl+f or whatever. You can search by CSS selector, so look for #per_minute.2016 and it won't find anything, because there's no element with that id. Then try it without the # (so you're just searching for plain text) and you'll see where it's gone

ButtWolf
Dec 30, 2004

by Jeffrey of YOSPOS

baka kaba posted:

That doesn't have the id you're looking for though? Like Munkeymon says the <tr> elements you're looking for have been commented out

Go to the page, hit f12 to open your browser's web developer tools, and do a search on the source with ctrl+f or whatever. You can search by CSS selector, so look for #per_minute.2016 and it won't find anything, because there's no element with that id. Then try it without the # (so you're just searching for plain text) and you'll see where it's gone

It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code.


I click on View Source, ctrl f, per_game.2016 It's there.

edit: but per_minute ones are commented out! That's the confusion between us. only the per_games are there.

ButtWolf fucked around with this message at 17:33 on May 10, 2017

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



ButtWolf posted:

It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code.


The inspector shows you the state of the page after the JavaScript runs. If you have your script dump the file it gets from urlopen to a file on disk, you can search it and see what comes over the wire before the JS gets a chance to do anything (or go into dev tools' settings and disable JavaScript). That's all requests can get you to work with.

e: yeah, you found it.

ButtWolf
Dec 30, 2004

by Jeffrey of YOSPOS

Munkeymon posted:

The inspector shows you the state of the page after the JavaScript runs. If you have your script dump the file it gets from urlopen to a file on disk, you can search it and see what comes over the wire before the JS gets a chance to do anything (or go into dev tools' settings and disable JavaScript). That's all requests can get you to work with.

e: yeah, you found it.

Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone.
I would imagine this was done so it can't be scraped anymore.

edit: are you saying I can grab the file and save to hard drive, so I can do what I need to offline?

ButtWolf fucked around with this message at 17:45 on May 10, 2017

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



ButtWolf posted:

Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone.
I would imagine this was done so it can't be scraped anymore.

edit: are you saying I can grab the file and save to hard drive, so I can do what I need to offline?

Yeah, you can just save the result of calling read on the object urlopen returns.

You can also, as I mentioned earlier, pick out the comment blocks, find the one with the string you want and load it as a document that you can use find on.

Dex
May 26, 2006

Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

ButtWolf posted:

Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone.
I would imagine this was done so it can't be scraped anymore.

edit: are you saying I can grab the file and save to hard drive, so I can do what I need to offline?

how often and where does this thing run? if js is populating the table you want, you could use selenium to open an actual browser window, and pass the .page_source to beautifulsoup:

Python code:
from selenium.webdriver import Chrome
browser = Chrome()
browser.get('http://www.basketball-reference.com/players/a/arizatr01.html?lid=carousel_player')
html = browser.page_source
browser.quit()
soup = BeautifulSoup(html, "html.parser")
per36_line_2016 = soup.find("tr", id="per_minute.2016") # This has 29 things in it. I don't know what the things are but I think they're what you want.
slow as balls in comparison to a direct request, but maybe that's ok for your use case.

ButtWolf
Dec 30, 2004

by Jeffrey of YOSPOS
I think I'm just gonna do it by hand... New method is above my skill level. Thanks.

Dex
May 26, 2006

Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

ButtWolf posted:

I think I'm just gonna do it by hand... New method is above my skill level. Thanks.

the thing i posted actually pops chrome up in front of you, opens that page, copies the page source to the html var, then closes the window - there's nothing too complicated at work there, even if it looks brand new and insane. just read the selenium install docs, you need to drop the geckodriver executable somewhere on your path so your script knows how to actually talk to chrome

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

ButtWolf posted:

It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code.


I click on View Source, ctrl f, per_game.2016 It's there.

edit: but per_minute ones are commented out! That's the confusion between us. only the per_games are there.

Oh sorry, my bad, Chrome doesn't seem to like searching for an id selector with a . in the name :shobon:

But yeah they're doing some silly stuff with the html, your only option really is to get around it. Don't worry, same thing happens to me whenever I see some F# 'type providers are awesome!' article, so I try and use it to scrape a site and yep... they did something to make the easy way impossible

Munkeymon's comments idea is probably easiest - grab all the comments, iterate over them looking for the id bit, when you find the comment with that make it a document and then select the element. Stick that in a function and you can just change that one line in your script to call the function instead. Selenium's pretty straightforward once you get the webdriver installed, its API is a lot like BeautifulSoup and honestly it's worth learning, this won't be the last site you run into that needs to javascript to render the page

breaks
May 12, 2001

You need chromedriver for chrome, not geckodriver. I think you know this but just to clarify for the guy.

It isn't difficult to set up, just drop chromedriver in your working directory or path somewhere, pip install selenium and you should be good to go. If you have trouble try downgrading to selenium 3.0.2.

I'd recommend against geckodriver/Firefox with selenium for the time being unless you really need Firefox specifically, it's a half working mess right now as Mozilla is, to frame it positively, leading the charge on the transition to W3C webdriver standard.

ButtWolf
Dec 30, 2004

by Jeffrey of YOSPOS
I hate messing with stuff I don't understand.
code:
Traceback (most recent call last):
  File "C:\Users\Tony\Python\NBA\scrape_build.py", line 35, in <module>
    f.write(string.encode('ascii', 'ignore') + " ")
  File "C:\Python27\lib\site-packages\bs4\element.py", line 1055, in encode
    u = self.decode(indent_level, encoding, formatter)
  File "C:\Python27\lib\site-packages\bs4\element.py", line 1119, in decode
    indent_space = (' ' * (indent_level - 1))
TypeError: unsupported operand type(s) for -: 'str' and 'int'
It brought up the first page, then crashed.

ButtWolf fucked around with this message at 18:49 on May 10, 2017

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I'm just learning about anaconda/conda and it seems kinda of great. So I have a few questions about it:

1. How do I pre-download all of the necessary files to create a conda environment? I need to "deploy" on a system that has no internet access so if I could just upload a tarball with all the packages already downloaded that would be great.

2. For one of the environments I want to set up it uses two libraries that need to be compiled. There are conda build scripts that do this so that's great, but package #2 requires package #1 to build.

So I was thinking of doing something like this:

code:

# build package #1
conda build package1/

# create conda env with package1 to build package2
conda create -n foo --use-local package1
source activate foo
conda build package2/
source deactivate foo

# create new env with both packages
conda create -n bar --use-local package1 package2

Is there a better way to do this?

QuarkJets
Sep 8, 2008

Miniconda should have everything that you need to create a conda environment. If you have an internet-connected system with similar hardware, then you could just fully build your environment there and then move the whole anaconda directory to your not-connected system, saving you some time if anything goes wrong.

You should be able to just build both of the packages, in the right order, without creating multiple environments. Create the environment, build package1, build package2

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

QuarkJets posted:

Miniconda should have everything that you need to create a conda environment. If you have an internet-connected system with similar hardware, then you could just fully build your environment there and then move the whole anaconda directory to your not-connected system, saving you some time if anything goes wrong.

You should be able to just build both of the packages, in the right order, without creating multiple environments. Create the environment, build package1, build package2

To the first point, on my laptop anaconda was installed to $HOME/anaconda3. I can just upload this entire folder up to the other machine and add it to the PATH?

QuarkJets
Sep 8, 2008

Boris Galerkin posted:

To the first point, on my laptop anaconda was installed to $HOME/anaconda3. I can just upload this entire folder up to the other machine and add it to the PATH?

Yup, the anaconda directory is portable, it can be moved to a different similar-enough platform and it will still work. The only caveat is that some of the files in anaconda3/bin will use $HOME/anaconda3/bin/python in their shebangs, so to get everything working perfectly you either need to A) install to $HOME/anaconda3 on the remote machine or B) install locally to the same path that you intend to use on the destination machine. Or I guess you can write a script to modify all of the shebangs to point to your final destination path

Eela6
May 25, 2007
Shredded Hen
Hi Python goons! PyCon starts next week. I will be in Portland, OR for it. Anyone else going?

underage at the vape shop
May 11, 2011

by Cyrano4747

breaks posted:

So basically what's happening there is that tkinter is only going to paint the GUI at some point in its event loop, but because that's one continuous section of code the event loop doesn't get a chance to execute until it's finished, leading to the behavior you see.

It's been way too long since I've used tkinter to offer a good solution but what you probably shouldn't do is try to force a paint in that code. This is a common situation and tkinter probably offers some Proper Way of allowing you to break that up so that the event loop gets to execute often enough to see your updates, which hopefully doesn't involve threading.

What do I google to try and find the proper way?

E: I got it! Tk.update()
Tkinter can suck my nuts, it's like they designed it to be a pain.

underage at the vape shop fucked around with this message at 11:03 on May 15, 2017

Proteus Jones
Feb 28, 2013



underage at the vape shop posted:

Tkinter can suck my nuts, it's like they designed it to be a pain.

You speak truth.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

This is a nice library that I use fairly often. backoff.

When working with external APIs over the network you have to handle the stuff that happens when you're dealing with networks and other network devices. This library helps with that. Amazon has a good post on their architecture blog about the best algorithms for choosing a time to wait between retrying failed requests if you want to read about it.

Anyway, it's pretty simple to use...you just decorate the function that might fail and the decorator retries the function until its successful.

Simple example that backs off each request by an exponential amount of time:
Python code:
@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_tries=8)
def get_url(url):
    return requests.get(url)
Complex example using multiple decorators to catch different types of stuff:
Python code:
@backoff.on_predicate(backoff.fibo, max_value=13)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.HTTPError,
                      max_tries=4)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.TimeoutError,
                      max_tries=8)
def poll_for_message(queue):
    return queue.get()
Anyway, I thought I'd share this as beginners often get into python wanting to scrape websites or download some sort of data. This is a good thing to implement if you're doing that!

There's another, more popular, library called Retrying that you can take a look at as well. The reason I don't use it is that it doesn't support the recommended algorithm in that Amazon blog post, but you might be interested in looking at it as well.

FoiledAgain
May 6, 2007

I'm using cx_freeze for a project and I'm running into problems. The project is a GUI made with PyQt5, and I can't get images (.png) to display when I build a .dmg for Mac. For each image, cx_freeze is spitting out an error that says "not a Mach-O file". I'm not a regular Mac user (I'm borrowing someone else's computer to build the dmg), so I don't really understand what this means, or how to fix the problem. Any suggestions?

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Thermopyle posted:

This is a nice library that I use fairly often. backoff.

When working with external APIs over the network you have to handle the stuff that happens when you're dealing with networks and other network devices. This library helps with that. Amazon has a good post on their architecture blog about the best algorithms for choosing a time to wait between retrying failed requests if you want to read about it.

Anyway, it's pretty simple to use...you just decorate the function that might fail and the decorator retries the function until its successful.

Simple example that backs off each request by an exponential amount of time:
Python code:
@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_tries=8)
def get_url(url):
    return requests.get(url)
Complex example using multiple decorators to catch different types of stuff:
Python code:
@backoff.on_predicate(backoff.fibo, max_value=13)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.HTTPError,
                      max_tries=4)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.TimeoutError,
                      max_tries=8)
def poll_for_message(queue):
    return queue.get()
Anyway, I thought I'd share this as beginners often get into python wanting to scrape websites or download some sort of data. This is a good thing to implement if you're doing that!

There's another, more popular, library called Retrying that you can take a look at as well. The reason I don't use it is that it doesn't support the recommended algorithm in that Amazon blog post, but you might be interested in looking at it as well.

oh nice i use retrying but this looks to be more configurable at runtime

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Thermopyle posted:

This is a nice library that I use fairly often. backoff.

That looks like a very handy library that I had never heard of, thank you for sharing!

huhu
Feb 24, 2006
code:
temp/
	target1/
		/x
			file1.txt
			file2.txt
		/y
			file1.txt
			file2.txt
	target2/
		...
code:
rootPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
targets = [target for target in os.listdir(rootPath) if os.path.isdir(os.path.join(rootPath, target))]
for target in targets:
    xPath = os.path.join(rootPath, target, "x")
    xFile = os.listdir(xPath)[0]
    with open(os.path.join(xPath, xFile) "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here. 
Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

huhu posted:

code:
rootPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
targets = [target for target in os.listdir(rootPath) if os.path.isdir(os.path.join(rootPath, target))]
for target in targets:
    xPath = os.path.join(rootPath, target, "x")
    xFile = os.listdir(xPath)[0]
    with open(os.path.join(xPath, xFile) "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here. 
Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

1. Use pathlib:
Python code:
from pathlib import Path

rootPath = Path(__file__).parent / 'temp'
subdirs = [child for child in rootPath.iterdir() if child.is_dir()]
for subdir in subdirs:
    xPath = subdir / "x"
    xFile = next(iter(xPath.iterdir()))
    with open(xPath, "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here. 
2. Do you really just want to process the first file in each 'x' subdirectory? Do you know that it isn't guaranteed to be the first file you see when filenames are sorted alphabetically?
3. To find any file in any subdir, use os.walk and use fnmatch to check whether you're interested in it based on the filename.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

huhu posted:

code:

temp/
	target1/
		/x
			file1.txt
			file2.txt
		/y
			file1.txt
			file2.txt
	target2/
		...

code:

rootPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
targets = [target for target in os.listdir(rootPath) if os.path.isdir(os.path.join(rootPath, target))]
for target in targets:
    xPath = os.path.join(rootPath, target, "x")
    xFile = os.listdir(xPath)[0]
    with open(os.path.join(xPath, xFile) "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here. 

Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

Import pathlib

Or just use '/' since windows can normalize it just fine (except for the leading r'\\')

QuarkJets
Sep 8, 2008

huhu posted:

code:
snip
code:
snip
Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

The cleanest way is to import join from os.path, then you can just call join instead of os.path.join. You could use glob and/or walk to help generalize

Python code:
from os.path import abspath, dirname, join
from os import walk

rootPath = join(dirname(abspath(__file__)), 'temp')
for rootdir, dirnames, filenames in walk(rootPath):
    if 'x' in filenames:
        with open(join(rootdir, 'x'), 'r') as fi:
            for line in fi:
                 print fi

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

PyCon 2017 videos are up.

Tigren
Oct 3, 2003

Awesome, thanks for the heads up. I really wish my company would send me to PyCon, or at least not make me use vacation days to go. I look forward to the videos every year and often go back and find valuable talks from years past.

I watched Tim Head's talk on MicroPython, which I've tried out before and really enjoyed. The talk made me want to sink my teeth into microcontrollers again. I've been meaning to do something with a bird house camera.

I'm also excited to see a few talks on async, which seems like something I should become more familiar with in TYOOL 2017. Kelsey Hightower's Kubernetes for Pythonistas also looks like a great talk.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
What's the canonical way to include/make available a binary file in my Python package that can be used by unit tests and user tutorials? I could drop it into a "resources" folder at the project root but then to access it I need to use a lot of relative paths unless there's some kind of shortcut I'm missing?

FAT32 SHAMER
Aug 16, 2012



Dumb question: using flask, how would I push POST to my IP address so I don't have to deal with registering a domain to push to? I have already set up Dynu since my friend's IP is dynamic, so now I would assume I just need to forward the UDP ports (80 and 110? idk web poo poo is hard) and launch the flask app

it's just powering a small web hook catcher that lights up a sign via a raspberry pi so we don't really care if it gets owned by some script kiddy

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

funny Star Wars parody posted:

Dumb question: using flask, how would I push POST to my IP address so I don't have to deal with registering a domain to push to? I have already set up Dynu since my friend's IP is dynamic, so now I would assume I just need to forward the UDP ports (80 and 110? idk web poo poo is hard) and launch the flask app

it's just powering a small web hook catcher that lights up a sign via a raspberry pi so we don't really care if it gets owned by some script kiddy

Not really sure I fully understand your request, but here's a couple of comments:
  • HTTP would be TCP not UDP. What do you need UDP for?
  • It might not matter if your rasberry pi gets pwnd but what about other computers or devices on your network? If you poke a hole through the firewall for the pi and somebody is able to get to the pi, they might be able to get to other stuff on your network.
  • If you run a server on :80 or :443 it is likely against the ToS for your ISP and they might send you a nastygram or cut off your access

Adbot
ADBOT LOVES YOU

FAT32 SHAMER
Aug 16, 2012



fletcher posted:

Not really sure I fully understand your request, but here's a couple of comments:
  • HTTP would be TCP not UDP. What do you need UDP for?
  • It might not matter if your rasberry pi gets pwnd but what about other computers or devices on your network? If you poke a hole through the firewall for the pi and somebody is able to get to the pi, they might be able to get to other stuff on your network.
  • If you run a server on :80 or :443 it is likely against the ToS for your ISP and they might send you a nastygram or cut off your access

Welp that covers that! Thanks for the input :)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply