|
You can even drop the list brackets within any() as most functions that take an iterable work with a generator expression too in 3.x:code:
|
# ¿ Apr 7, 2017 22:19 |
|
|
# ¿ May 2, 2024 04:57 |
|
As a Python user on Windows, I'd put Christoph Gohlke in the same "I hope nothing ever happens to him" category. His "Unofficial Windows Binaries for Python" site has been a project lifesaver on more than one occasion.
|
# ¿ Apr 18, 2017 17:32 |
|
laxbro posted:Newbie question: I'm trying to build a web scraper with Beautiful Soup that pulls table rows off of the roster pages for a variety of college sports teams. I've got it working with regular HTML pages, but it doesn't seem to work with what appear to be Angular pages. Some quick googling makes it seem like I will need to use a python library like selenium to virtually load the page in order to scrape the html tables on the page. Would a work around be to first use Beautiful Soup, but if a table row returns as None, then call a function to try scraping the page using something like selenium. Or, should I just try to scrape all of the pages with selenium or a similar library? I also typically separate the HTML collection from the parsing, saving the HTML to disk so I can fine-tune the parsing without hitting the server. And if the site layout changes in the future, the captured HTML is available so I can update the parsing rules. baka kaba posted:Also Selenium was a real pain last time I used it, they updated something so the Firefox webdriver wasn't included anymore and everything broke. There might be a better scraper out there? Setting up selenium now involves an extra step. It isn't hard, but seems to be really poorly described. Essentially, you need a middleware geckodriver in addition to selenium, and either specify the path to that driver when initializing the webdriver or just add it to your PATH. I've always just added it to PATH, but the other way is supposed to be as simple as this: code:
code:
onionradish fucked around with this message at 15:40 on Apr 23, 2017 |
# ¿ Apr 23, 2017 15:21 |
|
underage at the vape shop posted:So I'm doing some rss stuff for Uni. Howcome sometimes the formatting of quotes gets changed and sometimes it doesn't? If you're allowed to use third-party libraries, use feedparser anytime you have to deal with RSS. It handles all the complexity of the RSS variants including namespaces and gives you a standardized dictionary. If not, you'll need to detect which escaping method has been used and use the stdlib to unescape those like the second type. In python 2, it's: code:
code:
|
# ¿ May 8, 2017 14:27 |
|
I'm trying to set up testing for a class that used to be a namedtuple. Previous tests could simply compare equality of the namedtuples:Python code:
I've added a method to dump out public attributes as a dict and compare against that, which works, but I'm not sure that's the right solution. Is there a better or 'best practices' way set up the class or my tests to compare the values that matter? I could write separate tests for each attribute, but that seems worse and will needlessly increase the number of tests. Python code:
|
# ¿ Jul 21, 2017 20:33 |
|
I'd considered __eq__, and actually used it during the "upgrade" from the namedtuple to the class. Then, once the class __init__ changed, that broke and I needed to rethink. (I've been looking at the code for too long, so likely not thinking clearly.) Is something like this a reasonable implementation to enable the comparison of the attributes? It works, but does it "smell"? Python code:
onionradish fucked around with this message at 21:55 on Jul 21, 2017 |
# ¿ Jul 21, 2017 21:35 |
|
Ooo .. I like the contract version! Thanks! EDIT: Why return NotImplemented vs. raise NotImplementedError()? onionradish fucked around with this message at 22:44 on Jul 21, 2017 |
# ¿ Jul 21, 2017 22:41 |
|
Eela6 posted:As far as NotImplemented vs NotImplementedError goes, read the docs at https://docs.python.org/3/library/constants.html or Ch. 13 of Fluent Python: Operator Overloading. EDIT: a follow up... in your "go crazy" example, you omit the hasattr(other, "_public_attribs") test inside __eq__. Is it not necessary? onionradish fucked around with this message at 23:24 on Jul 21, 2017 |
# ¿ Jul 21, 2017 23:00 |
|
Eela6 posted:Check this out. My melon is officially twisted. I'm leaving the hasattr comparison in for explicitness, but I'm digging how slick that is. Eela6 posted:Python is cool. onionradish fucked around with this message at 23:59 on Jul 21, 2017 |
# ¿ Jul 21, 2017 23:44 |
|
Seventh Arrow posted:So my attempts to find a tutor were not very fruitful. So maybe I could get a bit of direction instead. I'm not a member, so can't speak to how interactions go, but might be a consideration since you're likely to find others with a data science background who can answer questions, critique code, or point you toward relevant resources they found useful. There are some learning resources on their Github page including some video tutorials. Mod Edit: Somebody fucked around with this message at 22:21 on Sep 7, 2017 |
# ¿ Sep 7, 2017 20:53 |
|
I want to add multi-threading to a basic webscraper I've been tasked with. I have a list of URLs to spread across threads, but don't want to hit the same host simultaneously. With a list of URLs, some from the same host, some from different hosts, what's the best way to set up thread Queue()s or some other URL pool so each thread can do simultaneous downloads as long as they're from different hosts? This seems like something simple, and something that would be in stdlib collections or itertools, but I'm not seeing it. If it's actually a tricky issue, that's fine, and I'll work on a solution -- I just don't want to re-invent the wheel.
|
# ¿ Dec 1, 2017 21:26 |
|
Speaking of OAuth, is there an easy way or recommended module that will allow a Windows PC to respond to the callback to get a token after passing in the initial key to an arbitrary service? I just want to authorize access for a simple personal script on my home PC. The OAuth hassle to set up a public server vs using a simple authentication key is pushing a bunch of "I could automate that" projects to "UGH; maybe some other day..."
|
# ¿ Nov 22, 2023 18:38 |
|
|
# ¿ May 2, 2024 04:57 |
|
Sorry for the ambiguity of my OAuth question earlier. Specifically, I'm wanting to access the Pocket API from my Windows desktop to pull all my "read or file this later" bookmarks I've dropped in there from my phone while traveling. The part I'm scratching my head about is Step 2 and onwards of the Pocket authentication process -- getting a request token at the request_uri, since the calling app is a Python script on my home Windows PC and isn't a publicly-accessible address. That request_uri gets used in a couple of places downstream through the process. One of the Pocket API packages I found on PyPi suggests some random person's website in Germany to send your callback to. Another one does it's own redirect to an obfuscated/shortened goo.gl link. Running authorization through either of those sounds like a terrible idea that gives credentials for complete access to your account to some unknown entity. I'll look at Zapier; I hadn't considered that as a possiblity. If I'm over-complicating this or misunderstanding, I'd greatly appreciate advice. All of the other APIs I've worked with only need a simple "consumer key" passed through with the request so the OAuth stuff is new and I've been putting off learning how to interact with it since it's just home hobby stuff.
|
# ¿ Nov 24, 2023 16:29 |