QuarkJets posted:Probably execution is supposed to stop at that point, once the error is raised? Definitely throw a raise at the end of that except, if so A catch of Exception (or worse yet, BaseException) without a re-raise is a huge red flag.
|
|
# ? May 9, 2017 23:09 |
|
|
# ? May 16, 2024 17:28 |
|
Eela6 posted:A catch of Exception (or worse yet, BaseException) without a re-raise is a huge red flag. That's often true, but there are circumstances where it can be okay, such as in a process that's supposed to live forever. e: Which I'd caveat with "you should log the stack trace yourself in that case" QuarkJets fucked around with this message at 23:55 on May 9, 2017 |
# ? May 9, 2017 23:43 |
|
Hey, hopefully easy question for ya'll: I wrote a webscrapey thing last year w BeautifulSoup. I changed variable names to get the new data (new <tr> headings). Nothing works now. I'm out of practice and wasn't good to begin with. Can anyone help? BTW, it worked perfectly last year.code:
ButtWolf fucked around with this message at 16:08 on May 10, 2017 |
# ? May 10, 2017 16:02 |
|
ButtWolf posted:Hey, hopefully easy question for ya'll: I wrote a webscrapey thing last year w BeautifulSoup. I changed variable names to get the new data (new <tr> headings). Nothing works now. I'm out of practice and wasn't good to begin with. Can anyone help. BTW, it worked perfectly last year. You should give us the actual error stacktraces so we don't have to go through your code line by line to figure out whats what.
|
# ? May 10, 2017 16:04 |
|
Thermopyle posted:You should give us the actual error stacktraces so we don't have to go through your code line by line to figure out whats what. code:
code:
ButtWolf fucked around with this message at 16:28 on May 10, 2017 |
# ? May 10, 2017 16:09 |
|
looks like the site you're trying to read might have been updated, then. "'NoneType' object is not iterable" there is telling you that "per36_line_2016" is None, which means the line: per36_line_2016 = soup.find("tr", id="per_minute.2016") isn't finding anything whereas you're expecting a list(or some kind of iterable, at least) of things.
|
# ? May 10, 2017 16:36 |
|
The document that comes from the server has those tables in giant comment blocks. I guess you could iterate over every comment block, check it for that per_minute text, load that as a document and then use find on the result, but 1) yikes that's complex and 2) it's fragile: they're already rendering their stuff in JS so you might want to change your approach now so if they switch their front-end to a more, lets say, sane approach your scraper won't break immediately.
|
# ? May 10, 2017 16:39 |
|
Dex posted:looks like the site you're trying to read might have been updated, then. "'NoneType' object is not iterable" there is telling you that "per36_line_2016" is None, which means the line: That's what I thought, but it's there. Does it not inherently grab everything inside all <td> in the <tr>? Either it used to, or they changed the site. code:
ButtWolf fucked around with this message at 16:47 on May 10, 2017 |
# ? May 10, 2017 16:43 |
|
That doesn't have the id you're looking for though? Like Munkeymon says the <tr> elements you're looking for have been commented out Go to the page, hit f12 to open your browser's web developer tools, and do a search on the source with ctrl+f or whatever. You can search by CSS selector, so look for #per_minute.2016 and it won't find anything, because there's no element with that id. Then try it without the # (so you're just searching for plain text) and you'll see where it's gone
|
# ? May 10, 2017 17:14 |
|
baka kaba posted:That doesn't have the id you're looking for though? Like Munkeymon says the <tr> elements you're looking for have been commented out It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code. I click on View Source, ctrl f, per_game.2016 It's there. edit: but per_minute ones are commented out! That's the confusion between us. only the per_games are there. ButtWolf fucked around with this message at 17:33 on May 10, 2017 |
# ? May 10, 2017 17:21 |
|
ButtWolf posted:It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code. The inspector shows you the state of the page after the JavaScript runs. If you have your script dump the file it gets from urlopen to a file on disk, you can search it and see what comes over the wire before the JS gets a chance to do anything (or go into dev tools' settings and disable JavaScript). That's all requests can get you to work with. e: yeah, you found it.
|
# ? May 10, 2017 17:35 |
|
Munkeymon posted:The inspector shows you the state of the page after the JavaScript runs. If you have your script dump the file it gets from urlopen to a file on disk, you can search it and see what comes over the wire before the JS gets a chance to do anything (or go into dev tools' settings and disable JavaScript). That's all requests can get you to work with. Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone. I would imagine this was done so it can't be scraped anymore. edit: are you saying I can grab the file and save to hard drive, so I can do what I need to offline? ButtWolf fucked around with this message at 17:45 on May 10, 2017 |
# ? May 10, 2017 17:37 |
|
ButtWolf posted:Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone. Yeah, you can just save the result of calling read on the object urlopen returns. You can also, as I mentioned earlier, pick out the comment blocks, find the one with the string you want and load it as a document that you can use find on.
|
# ? May 10, 2017 17:58 |
|
ButtWolf posted:Yeah this completely killed my project. Looking at other sites to grab from, but making the url list looks awful on ESPN. Thank for your help everyone. how often and where does this thing run? if js is populating the table you want, you could use selenium to open an actual browser window, and pass the .page_source to beautifulsoup: Python code:
|
# ? May 10, 2017 18:11 |
|
I think I'm just gonna do it by hand... New method is above my skill level. Thanks.
|
# ? May 10, 2017 18:14 |
|
ButtWolf posted:I think I'm just gonna do it by hand... New method is above my skill level. Thanks. the thing i posted actually pops chrome up in front of you, opens that page, copies the page source to the html var, then closes the window - there's nothing too complicated at work there, even if it looks brand new and insane. just read the selenium install docs, you need to drop the geckodriver executable somewhere on your path so your script knows how to actually talk to chrome
|
# ? May 10, 2017 18:18 |
|
ButtWolf posted:It's not commented out when I inspect it. Looks like regular old code. I see the same thing commented out, but then the actual code. Oh sorry, my bad, Chrome doesn't seem to like searching for an id selector with a . in the name But yeah they're doing some silly stuff with the html, your only option really is to get around it. Don't worry, same thing happens to me whenever I see some F# 'type providers are awesome!' article, so I try and use it to scrape a site and yep... they did something to make the easy way impossible Munkeymon's comments idea is probably easiest - grab all the comments, iterate over them looking for the id bit, when you find the comment with that make it a document and then select the element. Stick that in a function and you can just change that one line in your script to call the function instead. Selenium's pretty straightforward once you get the webdriver installed, its API is a lot like BeautifulSoup and honestly it's worth learning, this won't be the last site you run into that needs to javascript to render the page
|
# ? May 10, 2017 18:37 |
|
You need chromedriver for chrome, not geckodriver. I think you know this but just to clarify for the guy. It isn't difficult to set up, just drop chromedriver in your working directory or path somewhere, pip install selenium and you should be good to go. If you have trouble try downgrading to selenium 3.0.2. I'd recommend against geckodriver/Firefox with selenium for the time being unless you really need Firefox specifically, it's a half working mess right now as Mozilla is, to frame it positively, leading the charge on the transition to W3C webdriver standard.
|
# ? May 10, 2017 18:41 |
|
I hate messing with stuff I don't understand.code:
ButtWolf fucked around with this message at 18:49 on May 10, 2017 |
# ? May 10, 2017 18:46 |
|
I'm just learning about anaconda/conda and it seems kinda of great. So I have a few questions about it: 1. How do I pre-download all of the necessary files to create a conda environment? I need to "deploy" on a system that has no internet access so if I could just upload a tarball with all the packages already downloaded that would be great. 2. For one of the environments I want to set up it uses two libraries that need to be compiled. There are conda build scripts that do this so that's great, but package #2 requires package #1 to build. So I was thinking of doing something like this: code:
Is there a better way to do this?
|
# ? May 14, 2017 10:08 |
|
Miniconda should have everything that you need to create a conda environment. If you have an internet-connected system with similar hardware, then you could just fully build your environment there and then move the whole anaconda directory to your not-connected system, saving you some time if anything goes wrong. You should be able to just build both of the packages, in the right order, without creating multiple environments. Create the environment, build package1, build package2
|
# ? May 14, 2017 10:14 |
|
QuarkJets posted:Miniconda should have everything that you need to create a conda environment. If you have an internet-connected system with similar hardware, then you could just fully build your environment there and then move the whole anaconda directory to your not-connected system, saving you some time if anything goes wrong. To the first point, on my laptop anaconda was installed to $HOME/anaconda3. I can just upload this entire folder up to the other machine and add it to the PATH?
|
# ? May 14, 2017 10:18 |
|
Boris Galerkin posted:To the first point, on my laptop anaconda was installed to $HOME/anaconda3. I can just upload this entire folder up to the other machine and add it to the PATH? Yup, the anaconda directory is portable, it can be moved to a different similar-enough platform and it will still work. The only caveat is that some of the files in anaconda3/bin will use $HOME/anaconda3/bin/python in their shebangs, so to get everything working perfectly you either need to A) install to $HOME/anaconda3 on the remote machine or B) install locally to the same path that you intend to use on the destination machine. Or I guess you can write a script to modify all of the shebangs to point to your final destination path
|
# ? May 14, 2017 10:46 |
Hi Python goons! PyCon starts next week. I will be in Portland, OR for it. Anyone else going?
|
|
# ? May 14, 2017 16:30 |
|
breaks posted:So basically what's happening there is that tkinter is only going to paint the GUI at some point in its event loop, but because that's one continuous section of code the event loop doesn't get a chance to execute until it's finished, leading to the behavior you see. What do I google to try and find the proper way? E: I got it! Tk.update() Tkinter can suck my nuts, it's like they designed it to be a pain. underage at the vape shop fucked around with this message at 11:03 on May 15, 2017 |
# ? May 15, 2017 10:53 |
|
underage at the vape shop posted:Tkinter can suck my nuts, it's like they designed it to be a pain. You speak truth.
|
# ? May 15, 2017 14:22 |
|
This is a nice library that I use fairly often. backoff. When working with external APIs over the network you have to handle the stuff that happens when you're dealing with networks and other network devices. This library helps with that. Amazon has a good post on their architecture blog about the best algorithms for choosing a time to wait between retrying failed requests if you want to read about it. Anyway, it's pretty simple to use...you just decorate the function that might fail and the decorator retries the function until its successful. Simple example that backs off each request by an exponential amount of time: Python code:
Python code:
There's another, more popular, library called Retrying that you can take a look at as well. The reason I don't use it is that it doesn't support the recommended algorithm in that Amazon blog post, but you might be interested in looking at it as well.
|
# ? May 15, 2017 20:55 |
|
I'm using cx_freeze for a project and I'm running into problems. The project is a GUI made with PyQt5, and I can't get images (.png) to display when I build a .dmg for Mac. For each image, cx_freeze is spitting out an error that says "not a Mach-O file". I'm not a regular Mac user (I'm borrowing someone else's computer to build the dmg), so I don't really understand what this means, or how to fix the problem. Any suggestions?
|
# ? May 15, 2017 21:15 |
|
Thermopyle posted:This is a nice library that I use fairly often. backoff. oh nice i use retrying but this looks to be more configurable at runtime
|
# ? May 15, 2017 23:09 |
Thermopyle posted:This is a nice library that I use fairly often. backoff. That looks like a very handy library that I had never heard of, thank you for sharing!
|
|
# ? May 15, 2017 23:53 |
|
code:
code:
|
# ? May 17, 2017 21:07 |
|
huhu posted:
1. Use pathlib: Python code:
3. To find any file in any subdir, use os.walk and use fnmatch to check whether you're interested in it based on the filename.
|
# ? May 17, 2017 22:03 |
|
huhu posted:
Import pathlib Or just use '/' since windows can normalize it just fine (except for the leading r'\\')
|
# ? May 17, 2017 22:07 |
|
huhu posted:
The cleanest way is to import join from os.path, then you can just call join instead of os.path.join. You could use glob and/or walk to help generalize Python code:
|
# ? May 17, 2017 23:06 |
|
PyCon 2017 videos are up.
|
# ? May 21, 2017 23:31 |
|
Awesome, thanks for the heads up. I really wish my company would send me to PyCon, or at least not make me use vacation days to go. I look forward to the videos every year and often go back and find valuable talks from years past. I watched Tim Head's talk on MicroPython, which I've tried out before and really enjoyed. The talk made me want to sink my teeth into microcontrollers again. I've been meaning to do something with a bird house camera. I'm also excited to see a few talks on async, which seems like something I should become more familiar with in TYOOL 2017. Kelsey Hightower's Kubernetes for Pythonistas also looks like a great talk.
|
# ? May 22, 2017 05:58 |
|
What's the canonical way to include/make available a binary file in my Python package that can be used by unit tests and user tutorials? I could drop it into a "resources" folder at the project root but then to access it I need to use a lot of relative paths unless there's some kind of shortcut I'm missing?
|
# ? May 22, 2017 07:48 |
|
Dumb question: using flask, how would I push POST to my IP address so I don't have to deal with registering a domain to push to? I have already set up Dynu since my friend's IP is dynamic, so now I would assume I just need to forward the UDP ports (80 and 110? idk web poo poo is hard) and launch the flask app it's just powering a small web hook catcher that lights up a sign via a raspberry pi so we don't really care if it gets owned by some script kiddy
|
# ? May 23, 2017 00:34 |
funny Star Wars parody posted:Dumb question: using flask, how would I push POST to my IP address so I don't have to deal with registering a domain to push to? I have already set up Dynu since my friend's IP is dynamic, so now I would assume I just need to forward the UDP ports (80 and 110? idk web poo poo is hard) and launch the flask app Not really sure I fully understand your request, but here's a couple of comments:
|
|
# ? May 23, 2017 01:02 |
|
|
# ? May 16, 2024 17:28 |
|
fletcher posted:Not really sure I fully understand your request, but here's a couple of comments: Welp that covers that! Thanks for the input
|
# ? May 23, 2017 01:09 |