Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Eela6: May 25, 2007; Shredded Hen

QuarkJets posted:

Probably execution is supposed to stop at that point, once the error is raised? Definitely throw a raise at the end of that except, if so

A catch of Exception (or worse yet, BaseException) without a re-raise is a huge red flag.

# ? May 9, 2017 23:09

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 16:37

QuarkJets: Sep 8, 2008

Eela6 posted:

A catch of Exception (or worse yet, BaseException) without a re-raise is a huge red flag.

That's often true, but there are circumstances where it can be okay, such as in a process that's supposed to live forever.

e: Which I'd caveat with "you should log the stack trace yourself in that case"

QuarkJets fucked around with this message at 23:55 on May 9, 2017

# ? May 9, 2017 23:43

ButtWolf: Dec 30, 2004; by Jeffrey of YOSPOS

Hey, hopefully easy question for ya'll: I wrote a webscrapey thing last year w BeautifulSoup. I changed variable names to get the new data (new <tr> headings). Nothing works now. I'm out of practice and wasn't good to begin with. Can anyone help? BTW, it worked perfectly last year.

code:

import sys
import time
from bs4 import BeautifulSoup
from urllib2 import urlopen  # for Python 3: from urllib.request import urlopen

url_test = open('txt_files/url_list.txt', 'r')

f = open('txt_files/scraped_final.txt', 'w')

for line in url_test:

#html_doc = 'http://www.basketball-reference.com/players/a/arizatr01.html?lid=carousel_player'
	html_doc = line

	soup = BeautifulSoup(urlopen(html_doc, "html.parser"))

        name_tag = soup.find("h1")
	player_name = name_tag.string

	per36_line_2016 = soup.find("tr", id="per_minute.2016")
	per36_line_2017 = soup.find("tr", id="per_minute.2017")
	adv_line_2016 = soup.find("tr", id="advanced.2016")
	adv_line_2017 = soup.find("tr", id="advanced.2017")
	per_game_line_2016 = soup.find("tr", id="per_game.2016")
	per_game_line_2017 = soup.find("tr", id="per_game.2017")
	f.write("Name: " + player_name + "\n")


	for string in per36_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per36_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")


	print player_name
	time.sleep(1.0)

It gives me nonetype errors. I changed the for strings and wrote straight, it of course did not strip all of the html tags, but also only two of the six were written, the others were None. I'm not sure what has changed. The site looks like it's the same. Any ideas?

ButtWolf fucked around with this message at 16:08 on May 10, 2017

# ? May 10, 2017 16:02

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

ButtWolf posted:

code:

import sys
import time
from bs4 import BeautifulSoup
from urllib2 import urlopen  # for Python 3: from urllib.request import urlopen

url_test = open('txt_files/url_list.txt', 'r')

f = open('txt_files/scraped_final.txt', 'w')

for line in url_test:

#html_doc = 'http://www.basketball-reference.com/players/a/arizatr01.html?lid=carousel_player'
	html_doc = line

	soup = BeautifulSoup(urlopen(html_doc, "html.parser"))

#This works do not touch
	name_tag = soup.find("h1")
	player_name = name_tag.string
#This works do not touch
	per36_line_2016 = soup.find("tr", id="per_minute.2016")
	per36_line_2017 = soup.find("tr", id="per_minute.2017")
	adv_line_2016 = soup.find("tr", id="advanced.2016")
	adv_line_2017 = soup.find("tr", id="advanced.2017")
	per_game_line_2016 = soup.find("tr", id="per_game.2016")
	per_game_line_2017 = soup.find("tr", id="per_game.2017")
	f.write("Name: " + player_name + "\n")


	for string in per36_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per36_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in adv_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2016:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")
	for string in per_game_line_2017:
		f.write(string.encode('ascii', 'ignore') + " ")
	f.write("\n")

#This works do not touch
	print player_name
	time.sleep(1.0)

You should give us the actual error stacktraces so we don't have to go through your code line by line to figure out whats what.

# ? May 10, 2017 16:04

ButtWolf: Dec 30, 2004; by Jeffrey of YOSPOS

Thermopyle posted:

You should give us the actual error stacktraces so we don't have to go through your code line by line to figure out whats what.

I'm not good at posting either.

code:

Traceback (most recent call last):
  File "C:\Users\Tony\Python\NBA\scrape_build.py", line 31, in <module>
    for string in per36_line_2016:
TypeError: 'NoneType' object is not iterable

code:

#This works do not touch

This was me talking to myself. Not for you guys.

ButtWolf fucked around with this message at 16:28 on May 10, 2017

# ? May 10, 2017 16:09

Dex: May 26, 2006; Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

looks like the site you're trying to read might have been updated, then. "'NoneType' object is not iterable" there is telling you that "per36_line_2016" is None, which means the line:

per36_line_2016 = soup.find("tr", id="per_minute.2016")

isn't finding anything whereas you're expecting a list(or some kind of iterable, at least) of things.

# ? May 10, 2017 16:36

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

The document that comes from the server has those tables in giant comment blocks. I guess you could iterate over every comment block, check it for that per_minute text, load that as a document and then use find on the result, but 1) yikes that's complex and 2) it's fragile: they're already rendering their stuff in JS so you might want to change your approach now so if they switch their front-end to a more, lets say, sane approach your scraper won't break immediately.

# ? May 10, 2017 16:39

ButtWolf: Dec 30, 2004; by Jeffrey of YOSPOS

Dex posted:

looks like the site you're trying to read might have been updated, then. "'NoneType' object is not iterable" there is telling you that "per36_line_2016" is None, which means the line:

per36_line_2016 = soup.find("tr", id="per_minute.2016")

isn't finding anything whereas you're expecting a list(or some kind of iterable, at least) of things.

That's what I thought, but it's there. Does it not inherently grab everything inside all <td> in the <tr>? Either it used to, or they changed the site.

code:

<tr id="per_game.2017" class="full_table rowSum" data-row="5"><th scope="row" class="left " data-stat="season">...</td></tr>

ButtWolf fucked around with this message at 16:47 on May 10, 2017

# ? May 10, 2017 16:43

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

That doesn't have the id you're looking for though? Like Munkeymon says the <tr> elements you're looking for have been commented out

Go to the page, hit f12 to open your browser's web developer tools, and do a search on the source with ctrl+f or whatever. You can search by CSS selector, so look for #per_minute.2016 and it won't find anything, because there's no element with that id. Then try it without the # (so you're just searching for plain text) and you'll see where it's gone

# ? May 10, 2017 17:14

ButtWolf: Dec 30, 2004; by Jeffrey of YOSPOS

baka kaba posted:

That doesn't have the id you're looking for though? Like Munkeymon says the <tr> elements you're looking for have been commented out

Go to the page, hit f12 to open your browser's web developer tools, and do a search on the source with ctrl+f or whatever. You can search by CSS selector, so look for #per_minute.2016 and it won't find anything, because there's no element with that id. Then try it without the # (so you're just searching for plain text) and you'll see where it's gone

It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code.

I click on View Source, ctrl f, per_game.2016 It's there.

edit: but per_minute ones are commented out! That's the confusion between us. only the per_games are there.

ButtWolf fucked around with this message at 17:33 on May 10, 2017

# ? May 10, 2017 17:21

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

ButtWolf posted:

It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code.

The inspector shows you the state of the page after the JavaScript runs. If you have your script dump the file it gets from urlopen to a file on disk, you can search it and see what comes over the wire before the JS gets a chance to do anything (or go into dev tools' settings and disable JavaScript). That's all requests can get you to work with.

e: yeah, you found it.

# ? May 10, 2017 17:35

ButtWolf: Dec 30, 2004; by Jeffrey of YOSPOS

Munkeymon posted:

The inspector shows you the state of the page after the JavaScript runs. If you have your script dump the file it gets from urlopen to a file on disk, you can search it and see what comes over the wire before the JS gets a chance to do anything (or go into dev tools' settings and disable JavaScript). That's all requests can get you to work with.

e: yeah, you found it.

Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone.
I would imagine this was done so it can't be scraped anymore.

edit: are you saying I can grab the file and save to hard drive, so I can do what I need to offline?

ButtWolf fucked around with this message at 17:45 on May 10, 2017

# ? May 10, 2017 17:37

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

ButtWolf posted:

Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone.
I would imagine this was done so it can't be scraped anymore.

edit: are you saying I can grab the file and save to hard drive, so I can do what I need to offline?

Yeah, you can just save the result of calling read on the object urlopen returns.

You can also, as I mentioned earlier, pick out the comment blocks, find the one with the string you want and load it as a document that you can use find on.

# ? May 10, 2017 17:58

Dex: May 26, 2006; Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

ButtWolf posted:

Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone.
I would imagine this was done so it can't be scraped anymore.

edit: are you saying I can grab the file and save to hard drive, so I can do what I need to offline?

how often and where does this thing run? if js is populating the table you want, you could use selenium to open an actual browser window, and pass the .page_source to beautifulsoup:

Python code:

from selenium.webdriver import Chrome
browser = Chrome()
browser.get('http://www.basketball-reference.com/players/a/arizatr01.html?lid=carousel_player')
html = browser.page_source
browser.quit()
soup = BeautifulSoup(html, "html.parser")
per36_line_2016 = soup.find("tr", id="per_minute.2016") # This has 29 things in it. I don't know what the things are but I think they're what you want.

slow as balls in comparison to a direct request, but maybe that's ok for your use case.

# ? May 10, 2017 18:11

ButtWolf: Dec 30, 2004; by Jeffrey of YOSPOS

I think I'm just gonna do it by hand... New method is above my skill level. Thanks.

# ? May 10, 2017 18:14

Dex: May 26, 2006; Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

ButtWolf posted:

I think I'm just gonna do it by hand... New method is above my skill level. Thanks.

the thing i posted actually pops chrome up in front of you, opens that page, copies the page source to the html var, then closes the window - there's nothing too complicated at work there, even if it looks brand new and insane. just read the selenium install docs, you need to drop the geckodriver executable somewhere on your path so your script knows how to actually talk to chrome

# ? May 10, 2017 18:18

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

ButtWolf posted:

It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code.

I click on View Source, ctrl f, per_game.2016 It's there.

edit: but per_minute ones are commented out! That's the confusion between us. only the per_games are there.

Oh sorry, my bad, Chrome doesn't seem to like searching for an id selector with a . in the name :shobon:

But yeah they're doing some silly stuff with the html, your only option really is to get around it. Don't worry, same thing happens to me whenever I see some F# 'type providers are awesome!' article, so I try and use it to scrape a site and yep... they did something to make the easy way impossible

Munkeymon's comments idea is probably easiest - grab all the comments, iterate over them looking for the id bit, when you find the comment with that make it a document and then select the element. Stick that in a function and you can just change that one line in your script to call the function instead. Selenium's pretty straightforward once you get the webdriver installed, its API is a lot like BeautifulSoup and honestly it's worth learning, this won't be the last site you run into that needs to javascript to render the page

# ? May 10, 2017 18:37

breaks: May 12, 2001

You need chromedriver for chrome, not geckodriver. I think you know this but just to clarify for the guy.

It isn't difficult to set up, just drop chromedriver in your working directory or path somewhere, pip install selenium and you should be good to go. If you have trouble try downgrading to selenium 3.0.2.

I'd recommend against geckodriver/Firefox with selenium for the time being unless you really need Firefox specifically, it's a half working mess right now as Mozilla is, to frame it positively, leading the charge on the transition to W3C webdriver standard.

# ? May 10, 2017 18:41

ButtWolf: Dec 30, 2004; by Jeffrey of YOSPOS

I hate messing with stuff I don't understand.

code:

Traceback (most recent call last):
  File "C:\Users\Tony\Python\NBA\scrape_build.py", line 35, in <module>
    f.write(string.encode('ascii', 'ignore') + " ")
  File "C:\Python27\lib\site-packages\bs4\element.py", line 1055, in encode
    u = self.decode(indent_level, encoding, formatter)
  File "C:\Python27\lib\site-packages\bs4\element.py", line 1119, in decode
    indent_space = (' ' * (indent_level - 1))
TypeError: unsupported operand type(s) for -: 'str' and 'int'

It brought up the first page, then crashed.

ButtWolf fucked around with this message at 18:49 on May 10, 2017

# ? May 10, 2017 18:46

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

I'm just learning about anaconda/conda and it seems kinda of great. So I have a few questions about it:

1. How do I pre-download all of the necessary files to create a conda environment? I need to "deploy" on a system that has no internet access so if I could just upload a tarball with all the packages already downloaded that would be great.

2. For one of the environments I want to set up it uses two libraries that need to be compiled. There are conda build scripts that do this so that's great, but package #2 requires package #1 to build.

So I was thinking of doing something like this:

code:


# build package #1
conda build package1/

# create conda env with package1 to build package2
conda create -n foo --use-local package1
source activate foo
conda build package2/
source deactivate foo

# create new env with both packages
conda create -n bar --use-local package1 package2

Is there a better way to do this?

# ? May 14, 2017 10:08

QuarkJets: Sep 8, 2008

Miniconda should have everything that you need to create a conda environment. If you have an internet-connected system with similar hardware, then you could just fully build your environment there and then move the whole anaconda directory to your not-connected system, saving you some time if anything goes wrong.

You should be able to just build both of the packages, in the right order, without creating multiple environments. Create the environment, build package1, build package2

# ? May 14, 2017 10:14

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

QuarkJets posted:

Miniconda should have everything that you need to create a conda environment. If you have an internet-connected system with similar hardware, then you could just fully build your environment there and then move the whole anaconda directory to your not-connected system, saving you some time if anything goes wrong.

You should be able to just build both of the packages, in the right order, without creating multiple environments. Create the environment, build package1, build package2

To the first point, on my laptop anaconda was installed to $HOME/anaconda3. I can just upload this entire folder up to the other machine and add it to the PATH?

# ? May 14, 2017 10:18

QuarkJets: Sep 8, 2008

Boris Galerkin posted:

To the first point, on my laptop anaconda was installed to $HOME/anaconda3. I can just upload this entire folder up to the other machine and add it to the PATH?

Yup, the anaconda directory is portable, it can be moved to a different similar-enough platform and it will still work. The only caveat is that some of the files in anaconda3/bin will use $HOME/anaconda3/bin/python in their shebangs, so to get everything working perfectly you either need to A) install to $HOME/anaconda3 on the remote machine or B) install locally to the same path that you intend to use on the destination machine. Or I guess you can write a script to modify all of the shebangs to point to your final destination path

# ? May 14, 2017 10:46

Eela6: May 25, 2007; Shredded Hen

Hi Python goons! PyCon starts next week. I will be in Portland, OR for it. Anyone else going?

# ? May 14, 2017 16:30

underage at the vape shop: May 11, 2011; by Cyrano4747

breaks posted:

So basically what's happening there is that tkinter is only going to paint the GUI at some point in its event loop, but because that's one continuous section of code the event loop doesn't get a chance to execute until it's finished, leading to the behavior you see.

It's been way too long since I've used tkinter to offer a good solution but what you probably shouldn't do is try to force a paint in that code. This is a common situation and tkinter probably offers some Proper Way of allowing you to break that up so that the event loop gets to execute often enough to see your updates, which hopefully doesn't involve threading.

What do I google to try and find the proper way?

E: I got it! Tk.update()
Tkinter can suck my nuts, it's like they designed it to be a pain.

underage at the vape shop fucked around with this message at 11:03 on May 15, 2017

# ? May 15, 2017 10:53

Proteus Jones: Feb 28, 2013

underage at the vape shop posted:

Tkinter can suck my nuts, it's like they designed it to be a pain.

You speak truth.

# ? May 15, 2017 14:22

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

This is a nice library that I use fairly often. backoff.

When working with external APIs over the network you have to handle the stuff that happens when you're dealing with networks and other network devices. This library helps with that. Amazon has a good post on their architecture blog about the best algorithms for choosing a time to wait between retrying failed requests if you want to read about it.

Anyway, it's pretty simple to use...you just decorate the function that might fail and the decorator retries the function until its successful.

Simple example that backs off each request by an exponential amount of time:

Python code:

@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_tries=8)
def get_url(url):
    return requests.get(url)

Complex example using multiple decorators to catch different types of stuff:

Python code:

@backoff.on_predicate(backoff.fibo, max_value=13)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.HTTPError,
                      max_tries=4)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.TimeoutError,
                      max_tries=8)
def poll_for_message(queue):
    return queue.get()

Anyway, I thought I'd share this as beginners often get into python wanting to scrape websites or download some sort of data. This is a good thing to implement if you're doing that!

There's another, more popular, library called Retrying that you can take a look at as well. The reason I don't use it is that it doesn't support the recommended algorithm in that Amazon blog post, but you might be interested in looking at it as well.

# ? May 15, 2017 20:55

FoiledAgain: May 6, 2007

I'm using cx_freeze for a project and I'm running into problems. The project is a GUI made with PyQt5, and I can't get images (.png) to display when I build a .dmg for Mac. For each image, cx_freeze is spitting out an error that says "not a Mach-O file". I'm not a regular Mac user (I'm borrowing someone else's computer to build the dmg), so I don't really understand what this means, or how to fix the problem. Any suggestions?

# ? May 15, 2017 21:15

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Thermopyle posted:

This is a nice library that I use fairly often. backoff.

When working with external APIs over the network you have to handle the stuff that happens when you're dealing with networks and other network devices. This library helps with that. Amazon has a good post on their architecture blog about the best algorithms for choosing a time to wait between retrying failed requests if you want to read about it.

Anyway, it's pretty simple to use...you just decorate the function that might fail and the decorator retries the function until its successful.

Simple example that backs off each request by an exponential amount of time:
Python code:
@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_tries=8)
def get_url(url):
    return requests.get(url)
Complex example using multiple decorators to catch different types of stuff:
Python code:
@backoff.on_predicate(backoff.fibo, max_value=13)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.HTTPError,
                      max_tries=4)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.TimeoutError,
                      max_tries=8)
def poll_for_message(queue):
    return queue.get()
Anyway, I thought I'd share this as beginners often get into python wanting to scrape websites or download some sort of data. This is a good thing to implement if you're doing that!

There's another, more popular, library called Retrying that you can take a look at as well. The reason I don't use it is that it doesn't support the recommended algorithm in that Amazon blog post, but you might be interested in looking at it as well.

oh nice i use retrying but this looks to be more configurable at runtime

# ? May 15, 2017 23:09

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Thermopyle posted:

This is a nice library that I use fairly often. backoff.

That looks like a very handy library that I had never heard of, thank you for sharing!

# ? May 15, 2017 23:53

huhu: Feb 24, 2006

code:

temp/
	target1/
		/x
			file1.txt
			file2.txt
		/y
			file1.txt
			file2.txt
	target2/
		...

code:

rootPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
targets = [target for target in os.listdir(rootPath) if os.path.isdir(os.path.join(rootPath, target))]
for target in targets:
    xPath = os.path.join(rootPath, target, "x")
    xFile = os.listdir(xPath)[0]
    with open(os.path.join(xPath, xFile) "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here.

Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

# ? May 17, 2017 21:07

Lysidas: Jul 26, 2002; John Diefenbaker is a madman who thinks he's John Diefenbaker.; Pillbug

huhu posted:

code:

rootPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
targets = [target for target in os.listdir(rootPath) if os.path.isdir(os.path.join(rootPath, target))]
for target in targets:
    xPath = os.path.join(rootPath, target, "x")
    xFile = os.listdir(xPath)[0]
    with open(os.path.join(xPath, xFile) "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here.

Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

1. Use pathlib:

Python code:

from pathlib import Path

rootPath = Path(__file__).parent / 'temp'
subdirs = [child for child in rootPath.iterdir() if child.is_dir()]
for subdir in subdirs:
    xPath = subdir / "x"
    xFile = next(iter(xPath.iterdir()))
    with open(xPath, "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here.

2. Do you really just want to process the first file in each 'x' subdirectory? Do you know that it isn't guaranteed to be the first file you see when filenames are sorted alphabetically?
3. To find any file in any subdir, use os.walk and use fnmatch to check whether you're interested in it based on the filename.

# ? May 17, 2017 22:03

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

huhu posted:

code:


temp/
	target1/
		/x
			file1.txt
			file2.txt
		/y
			file1.txt
			file2.txt
	target2/
		...

code:


rootPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
targets = [target for target in os.listdir(rootPath) if os.path.isdir(os.path.join(rootPath, target))]
for target in targets:
    xPath = os.path.join(rootPath, target, "x")
    xFile = os.listdir(xPath)[0]
    with open(os.path.join(xPath, xFile) "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here.

Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

Import pathlib

Or just use '/' since windows can normalize it just fine (except for the leading r'\\')

# ? May 17, 2017 22:07

QuarkJets: Sep 8, 2008

huhu posted:

code:
snip
code:
snip
Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

The cleanest way is to import join from os.path, then you can just call join instead of os.path.join. You could use glob and/or walk to help generalize

Python code:

from os.path import abspath, dirname, join
from os import walk

rootPath = join(dirname(abspath(__file__)), 'temp')
for rootdir, dirnames, filenames in walk(rootPath):
    if 'x' in filenames:
        with open(join(rootdir, 'x'), 'r') as fi:
            for line in fi:
                 print fi

# ? May 17, 2017 23:06

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

PyCon 2017 videos are up.

# ? May 21, 2017 23:31

Tigren: Oct 3, 2003

Thermopyle posted:

PyCon 2017 videos are up.

Awesome, thanks for the heads up. I really wish my company would send me to PyCon, or at least not make me use vacation days to go. I look forward to the videos every year and often go back and find valuable talks from years past.

I watched Tim Head's talk on MicroPython, which I've tried out before and really enjoyed. The talk made me want to sink my teeth into microcontrollers again. I've been meaning to do something with a bird house camera.

I'm also excited to see a few talks on async, which seems like something I should become more familiar with in TYOOL 2017. Kelsey Hightower's Kubernetes for Pythonistas also looks like a great talk.

# ? May 22, 2017 05:58

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

What's the canonical way to include/make available a binary file in my Python package that can be used by unit tests and user tutorials? I could drop it into a "resources" folder at the project root but then to access it I need to use a lot of relative paths unless there's some kind of shortcut I'm missing?

# ? May 22, 2017 07:48

FAT32 SHAMER: Aug 16, 2012

Dumb question: using flask, how would I push POST to my IP address so I don't have to deal with registering a domain to push to? I have already set up Dynu since my friend's IP is dynamic, so now I would assume I just need to forward the UDP ports (80 and 110? idk web poo poo is hard) and launch the flask app

it's just powering a small web hook catcher that lights up a sign via a raspberry pi so we don't really care if it gets owned by some script kiddy

# ? May 23, 2017 00:34

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

funny Star Wars parody posted:

Dumb question: using flask, how would I push POST to my IP address so I don't have to deal with registering a domain to push to? I have already set up Dynu since my friend's IP is dynamic, so now I would assume I just need to forward the UDP ports (80 and 110? idk web poo poo is hard) and launch the flask app

it's just powering a small web hook catcher that lights up a sign via a raspberry pi so we don't really care if it gets owned by some script kiddy

Not really sure I fully understand your request, but here's a couple of comments:

HTTP would be TCP not UDP. What do you need UDP for?
It might not matter if your rasberry pi gets pwnd but what about other computers or devices on your network? If you poke a hole through the firewall for the pi and somebody is able to get to the pi, they might be able to get to other stuff on your network.
If you run a server on :80 or :443 it is likely against the ToS for your ISP and they might send you a nastygram or cut off your access

# ? May 23, 2017 01:02

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 16:37

FAT32 SHAMER: Aug 16, 2012

fletcher posted:

Not really sure I fully understand your request, but here's a couple of comments:

HTTP would be TCP not UDP. What do you need UDP for?

It might not matter if your rasberry pi gets pwnd but what about other computers or devices on your network? If you poke a hole through the firewall for the pi and somebody is able to get to the pi, they might be able to get to other stuff on your network.

If you run a server on :80 or :443 it is likely against the ToS for your ISP and they might send you a nastygram or cut off your access

Welp that covers that! Thanks for the input

# ? May 23, 2017 01:09

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »