|
SporkOfTruth posted:if you're not a bit careful, you can fall into the trap of forgetting the yield order. yes it seems very strange to ever be expected to remember the order, and accepting an argument to control the iteration order seems even weirder! I'd probably do something like yielding a namedtuple of (time, agent, source, outputs) or whatever so the user is free to later sort them however they'd like.
|
# ? Dec 12, 2022 19:17 |
|
|
# ? May 28, 2024 14:53 |
|
Hey all, been a while since i've interacted with Python but have a syntax question. I have a selenium linked in scraper as follows: code:
Typically in JavaScript I could do something like: code:
WOndering if someone can help me know what syntax I can use TO do that. I don't want to use a try -> catch because I still want it to scrape what it can without crashing. Also if anyone knows Selenium, why does my ".//" not work
|
# ? Dec 12, 2022 20:35 |
|
Just eyeballing it, looks like the part I highlighted in this line is your issue: profilePic = profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') if profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') else “” The problem is that if it doesn’t find profilePic, you are still trying to call the get_attribute method on it. Which will crash. Is your error message something like “NoneType has no method get_attribute”? Anyway, cut that method call and see if it works.
|
# ? Dec 12, 2022 20:56 |
|
That definitely helped, now I have it on this line annoyingly:code:
code:
edit: This is the error I get code:
worms butthole guy fucked around with this message at 21:40 on Dec 12, 2022 |
# ? Dec 12, 2022 21:04 |
|
C2C - 2.0 posted:Anyone else using Pycharm on a Macbook? I'm doing the 100 Days of Python off of Udemy and am working thru the course in Pycharm. I've got a directory set up that has individual folders for each day. Typically each morning I go into Finder & just create a new folder (say, "Day 35"). Then I open Pycharm, select "New Project" and then navigate to that day's folder. Inevitably, when I click "Okay" on the pop-up dialog where you choose the interpretor & a few other things, it returns another pop-up telling me that the folder isn't empty & I just select "Create Project From Source". Then the project loads and the empty Icon file opens in the editor. Is there any way to ignore this file or otherwise keep it from opening immediately upon each new project creation? I can't reproduce this on pycharm pro on an m1 mac. Is your folder on a remote volume, or a volume formatted as non-hfs or apfs volume? Because in that case osx will create a .DS_Store file in the folder which will mean pycharm sees it as non-empty. As a workaround, why not just have pycharm create the folder? If you give it the foldername on that screen it will create it.
|
# ? Dec 12, 2022 21:47 |
|
QuarkJets posted:The ordering in this example is somewhat arbitrary, so you could just make the time loop the innermost loop. I fudged the generator slightly. In the original use-case context (names still fudged to protect the innocent), it's actually an argumentless method of a class where:
Zoracle Zed posted:yes it seems very strange to ever be expected to remember the order, and accepting an argument to control the iteration order seems even weirder! I'd probably do something like yielding a namedtuple of (time, agent, source, outputs) or whatever so the user is free to later sort them however they'd like. Controlling iteration order "makes sense" if you're an engineer who wants to plot/observe all of the simulated data before you run an experiment on it, and you're used to 3D arrays in Matlab where this would be possible. (I was told at one point "why isn't this a Numpy array".) Otherwise, yeah, it's a weird ask! Yielding more information, however -- that's much more appealing. The only trick would be refactoring how the downstream data processors accept the output of the generator since, as of now, all they need is timestamps and data. (Strictly speaking, the data objects that come out are tagged with the properties of the generating source, but not the agent, so the answer might be figuring out where that metadata goes and a friendlier pattern for accessing it.
|
# ? Dec 12, 2022 22:36 |
Zephirus posted:I can't reproduce this on pycharm pro on an m1 mac. Is your folder on a remote volume, or a volume formatted as non-hfs or apfs volume? Because in that case osx will create a .DS_Store file in the folder which will mean pycharm sees it as non-empty. It's on the internal Mac HD; after reading your response, I think the issue is with an app called Manilla that allows for folder colorization in Finder (I'm blind as a bat & using tags on files/folders isn't as easy for me to see). I use the app to visually sort larger directories if I'm not jumping around in iTerm. I'll try your suggestion; the behavior wasn't a deal-breaker but rather a small, annoying issue. Thanks!
|
|
# ? Dec 12, 2022 23:55 |
|
SporkOfTruth posted:Controlling iteration order "makes sense" if you're an engineer who wants to plot/observe all of the simulated data before you run an experiment on it, and you're used to 3D arrays in Matlab where this would be possible. (I was told at one point "why isn't this a Numpy array".) Otherwise, yeah, it's a weird ask! In that case even better then a numpy array (where you still need to remember the axis order) is an xarray array, with labeled axes.
|
# ? Dec 15, 2022 14:58 |
|
Lets say I have a pandas dataframe with a date column and I want to iterate over all the unique dates in that column. Is Python code:
Python code:
|
# ? Dec 20, 2022 04:40 |
|
Here's a weird question that's...probably incredibly low-level and I'm overthinking it. I'm thinking of playing around with A Side Project like you do (A cloud save service for retro emulator save files), and part of it would be a client side application to handle basic scanning and uploading/downloading. While I've been working with Python for years, I haven't done any Python GUI stuff at all, and I'm realizing I don't really know where to start with something if I want to end up with a program that:
Are there any recommended tutorials or resources for going about something like this? Alternately, should I stop before I go too much further and just make this part in C# instead? I'd still be out of my depth but I know C# is a more common language for making Windows apps.
|
# ? Dec 20, 2022 05:10 |
|
Jose Cuervo posted:Lets say I have a pandas dataframe with a date column and I want to iterate over all the unique dates in that column. That's correct
|
# ? Dec 20, 2022 05:19 |
|
Falcon2001 posted:Here's a weird question that's...probably incredibly low-level and I'm overthinking it. Qt is probably what you should use, it's a widely used standard for cross-platform GUI development that has a lot of power built in. You could use pyqt6 or pyside6, the main difference is in licensing so pay attention to that.
C# is a suitable alternative, and might be flat out better if you don't need to rely on anything from the Python ecosystem and have Windows as your main target. It really depends. But that doesn't necessarily solve any of the issues here, either way this is a bit of a project QuarkJets fucked around with this message at 05:55 on Dec 20, 2022 |
# ? Dec 20, 2022 05:49 |
|
QuarkJets posted:Qt is probably what you should use, it's a widely used standard for cross-platform GUI development that has a lot of power built in. You could use pyqt6 or pyside6, the main difference is in licensing so pay attention to that. Thanks! That's a lot of good info. I'll go start mucking around with QT and see what my options are; although probably I'll need to write the daemon/service first and then figure out how to interact with it, and then from there it's GUI time.
|
# ? Dec 20, 2022 06:41 |
|
Sorry in advance if this is not the right thread. I am making scatter plots in python using matplot lib, and I would like to have each data point occupy one single pixel that has one color. I am willing to deal with data points occupying multiple pixels, but then they still must be one color. I am currently getting poo poo like this: from code like this: code:
|
# ? Dec 21, 2022 02:44 |
|
What are the values in "color"?
|
# ? Dec 21, 2022 03:13 |
|
QuarkJets posted:What are the values in "color"? Here are the first 5 entries: -0.732114 2.02783 0.215997 1.63187 0.782867 Edit: I am basically trying to add color to this, where x and y are the data, and t is the color array. theHUNGERian fucked around with this message at 03:50 on Dec 21, 2022 |
# ? Dec 21, 2022 03:29 |
|
[quote="QuarkJets" post="528299545"] oh yeah and point out to your manager that postgres is also free [/quote. And less likely to make the inhouse database guy poo poo the bed in anger* *who me? I'm a nice greybeard. Who occasionally flips the lid in.... polite disagreement... when new hire tries to roll out a shitful mysql database filled with sql interpolations and other dangerous fauna.
|
# ? Dec 21, 2022 07:28 |
|
theHUNGERian posted:Sorry in advance if this is not the right thread. I think that there's something happening in whatever backend is being chosen for savefig; that's going to take your plot (which is displayed in inches with some default dpi) and then try to convert it to pixels (using the new dpi that you're providing), and something funny may happen along the way to give you the result you're seeing. You might try using a custom figure with a carefully-chosen DPI and figsize
|
# ? Dec 21, 2022 11:24 |
|
theHUNGERian posted:Edit: I am basically trying to add color to this, where x and y are the data, and t is the color array. check out PIL (Pillow), the python image library. you can go from numpy array to color image there without dealing with the hack of a scatterplot
|
# ? Dec 21, 2022 14:00 |
|
Thanks!
|
# ? Dec 21, 2022 19:27 |
|
Hello, I'm trying to scrape forum threads as a personal project, so far I got this to get information from each post in a thread a) User b) Date c) The text code:
But it's a bit slow. Any suggestions on how it could be improved?
|
# ? Dec 21, 2022 23:12 |
|
Mr. Nemo posted:Hello, I'm trying to scrape forum threads as a personal project, so far I got this to get information from each post in a thread Have you measured how much of the slowness is making the request, waiting for a response vs your parsing code? If it's your code that's slow, which would be a little surprising, one immediate thought is that len(text.split()) is a really slow way to count words.
|
# ? Dec 21, 2022 23:31 |
|
Twerk from Home posted:Have you measured how much of the slowness is making the request, waiting for a response vs your parsing code? If it's your code that's slow, which would be a little surprising, one immediate thought is that len(text.split()) is a really slow way to count words. I have no idea how to do that to be honest, how do you suggest I go at it? I took a 6 class python course and this is my final project (including some space filler plots and other stuff). I guess I could run it twice for 100 pages with and without that word counter line and see if there's a noticeable difference. Also the way I'm getting the name and dates is VERY manual, because I haven't had time to really dive into the HTML stuff beyond finding a "large" tag, that may be adding some time?
|
# ? Dec 21, 2022 23:35 |
|
Mr. Nemo posted:I have no idea how to do that to be honest, how do you suggest I go at it? I took a 6 class python course and this is my final project (including some space filler plots and other stuff). I guess I could run it twice for 100 pages with and without that word counter line and see if there's a noticeable difference. The simplest way to test if the request is slow would be to download the html for a page into a file and read from that, and time how long it takes to do all this while reading the html from a file vs actually making the request. If you want to make the whole thing faster regardless of why it's slow, just run lots of python processes, but don't be surprised when your IP gets blocked from hitting SA because you made a bunch of automated requests. If you want to actually measure why the whole thing is slow, break out a profiler. cProfile and snakeviz would be a decent place to start, but if you fire up scalene it's nicer. https://github.com/plasma-umass/scalene Edit: my suggestion to fire up a lot of python processes assumes you're on a recent computer with 8+ cores. Twerk from Home fucked around with this message at 23:45 on Dec 21, 2022 |
# ? Dec 21, 2022 23:41 |
|
You repeat the same element selection query for every post ( pageTemp.find_all("td",class_="postdate") and the like). It would probably speed things up a little to run those before the posts loop and then access the arrays from there. Besides that, how long is it taking? It could just be the requests taking most of the time, which could only be improved by parallelizing which I doubt the admins would appreciate for a scraper.
|
# ? Dec 22, 2022 01:00 |
|
necrotic posted:You repeat the same element selection query for every post ( pageTemp.find_all("td",class_="postdate") and the like). It would probably speed things up a little to run those before the posts loop and then access the arrays from there. I think this may be a big part of it OP, basically inside the loop you're running these huge queries and then grabbing a specific element; moving those queries out of the loop will probably help
|
# ? Dec 22, 2022 03:03 |
|
eXXon posted:Am I wrong for wanting to implement immutable objects as frozen dataclasses with cached properties? For example: Can I ask what is the code and syntax of the x and y variables I have used python for a bit but haven’t seen this with the semi colon?
|
# ? Dec 22, 2022 04:23 |
|
Those are type hints (and those are colons).
|
# ? Dec 22, 2022 04:38 |
|
timbro posted:Can I ask what is the code and syntax of the x and y variables I have used python for a bit but haven’t seen this with the semi colon? What you're seeing is the declaration of a dataclass with typehints
|
# ? Dec 22, 2022 04:48 |
|
OMFG GDAL has to be the worst thing ever invented. Its absolutely unnegotiable as a requirement to any serious Geospatial work. (Or in fact any Geospatial work if you wanna use python. But in this case, serious geospatial work, a front end api to a Geoserver install.) There is absolutely no binary compatibility between even dot versions. The Api is unstable. The Python versions MUST be matched to the exact build version of the library. The Docker containers featuring it seem to either stop at an ancient version of python or an ancient version of GDAL. Package managers dont understand the need for the OS and Python versions to be precisely aligned. Docker builds take about 10-15 minutes per cycle. And usually explode. I cant afford a lot of alcohol right now thanks to the Taxman loving me this year. Also Pipenv requires a modern version of setuptools which isnt compatible with the version of libgdal thats installed in almost any of the available docker pages for python. This is misery! duck monster fucked around with this message at 09:38 on Dec 22, 2022 |
# ? Dec 22, 2022 08:52 |
|
Any chance you're using conda environments? It looks like the gdal package in the conda-forge channel is being kept up to date, so someone else has already worked out the build issues for a bunch of platforms and python versions: https://anaconda.org/conda-forge/gdal I'm seeing python 3.8-3.11 for the latest release, at least
|
# ? Dec 22, 2022 10:22 |
|
Speaking of conda, I was checking it out and wanted to fool around with spacy transformers with hunspell and i just ran into so many issues, like conda didn't have all the packages so I had to install some through pip through conda, then hunspell install fails because it can't find the C++ build tools (although I'm pretty sure I installed them, it worked through normal pip iirc). I think a big part of the problem is trying to do it in windows, I'm going to give it another go on Linux tonight, but the missing conda packages didn't inspire a huge amount of confidence.. is there a significant gap between pip and conda for supported packages? Or am I just jumping in head first and missing some rudimentary conda concepts
|
# ? Dec 22, 2022 14:44 |
|
theHUNGERian posted:Sorry in advance if this is not the right thread. Yeah ok, I am an idiot. I made two plots, one in black and white, another one in color, and since the black and white plot looked fine, I didn't include its code in my OP. Turns out that placing code:
|
# ? Dec 22, 2022 20:28 |
|
boofhead posted:Speaking of conda, I was checking it out and wanted to fool around with spacy transformers with hunspell and i just ran into so many issues, like conda didn't have all the packages so I had to install some through pip through conda, then hunspell install fails because it can't find the C++ build tools (although I'm pretty sure I installed them, it worked through normal pip iirc). I think a big part of the problem is trying to do it in windows, I'm going to give it another go on Linux tonight, but the missing conda packages didn't inspire a huge amount of confidence.. is there a significant gap between pip and conda for supported packages? Or am I just jumping in head first and missing some rudimentary conda concepts I have never encountered an anaconda or conda-forge package that failed to install dependencies, but I don't usually use Windows. What channel are you using? It wouldn't surprise me if this was some bit of Microsoft fuckery that requires extra steps, since you mentioned needing a Windows compiler I only use pip as a source of last resort, for really obscure stuff that has no conda package. That step takes place after installing everything else I need with mamba. You shouldn't need pip for conda dependencies, if that's happening then something has gone deeply wrong. Oh, and if you use pip you need to pay attention to what it's doing - pip will happily try and replace packages with a different version, if it decides that's required, but that might break any of your conda packages If you use 3rd party channels then you're in the wild west, like pypi basically. I prefer conda-forge for most things, since that's exclusively open source packages and tends to update a lot faster than the anaconda channel.
|
# ? Dec 22, 2022 21:35 |
|
QuarkJets posted:I have never encountered an anaconda or conda-forge package that failed to install dependencies, but I don't usually use Windows. What channel are you using? It wouldn't surprise me if this was some bit of Microsoft fuckery that requires extra steps, since you mentioned needing a Windows compiler To follow up on this, I find almost all dependency problems go away when you use conda-forge (and maybe some high-quality channels like bioconda if applicable), remove the Anaconda "defaults" repo, and set channel_priority = strict. As a side bonus everything is nicely open-source and appropriately licensed. And yes, always do Pip steps last and if you must run conda/mamba again in an env after running Pip, don't. Just delete the environment and build it again in the right order, this is what envs are for.
|
# ? Dec 25, 2022 14:06 |
|
QuarkJets posted:I have never encountered an anaconda or conda-forge package that failed to install dependencies, but I don't usually use Windows. What channel are you using? It wouldn't surprise me if this was some bit of Microsoft fuckery that requires extra steps, since you mentioned needing a Windows compiler I havent really got any experience with conda. I've always been a bit suspicious of it as fragmenting the main package system. But thats just a dumb bias that I really ought get over. Re version, needs to be 3.10+ as I kinda went ham on the Union types with FastAPI (I've always been an advocate of Union types after using Crystallangs absolutely brilliant union based type safety system.) I ended up solving it by just building my own version of gdal hard matched to the version of python gdal and got up a docker container that ran. Turns out the secret was loving off PipEnv. Again, great idea, frustratingly broken implementation.
|
# ? Dec 25, 2022 18:33 |
|
duck monster posted:I havent really got any experience with conda. I've always been a bit suspicious of it as fragmenting the main package system. But thats just a dumb bias that I really ought get over. Pip, conda, and docker provide distinct levels of virtualization, from most lightweight to least. Conda provides Python packages, but it also provides non-Python dependencies; that second part is what makes conda useful and distinct from pip. It would not be accurate to say that conda fragments the package system. It is a separate tool for managing your non-Python dependencies. I've seen a lot of docker recipes that use mamba to create a conda environment just because a lot of base containers are pretty lightweight (by design) and OS repositories are pretty limiting (also by design)
|
# ? Dec 25, 2022 22:33 |
|
As a basic example, CUDA (nvidia gpu processing) requires the CUDA toolkit to be installed. The CUDA toolkit is not a python package, you cannot install it with pip. You can manage this dependency yourself, taking care to install the specific version of the cuda toolkit that is required by the python packages that you want to use and you can update the toolkit yourself whenever you want to update your python packages. This is fine, but it would be nice if we didn't have to manage that external dependency ourselves. If you use conda instead all of this happens automatically, you just install the package you want and the correct version of the cuda toolkit is installed too. This extends to all kinds of dependencies. On a system that doesn't have libtiff installed? No problem, there's a conda package for that. Need the Intel MKL? No problem. Need libhdf5? No problem. Etc
|
# ? Dec 25, 2022 23:37 |
|
^ That's precisely my use case, the stuff I want to play around with requires CUDA and a bunch of other non-python dependencies, and I think what was happening (even on linux, once I swapped away from windows) was the standard conda, plus conda-forge, was struggling with figuring it all out and that was resulting in some weird bugs and then just hanging / absurdly install times where it just looks like it's crashed. I found some stuff that suggested mamba, installed the solver library for that, and everything went smoothly with mamba doing the package dependency calculating stuff I'm definitely going to use this more in future, pip winds up doing some bizarre things so I'm happy to have finally stumbled into what looks like a better way of doing things. All in all I'm just happy everything is installed and I can finally start playing around with it
|
# ? Dec 26, 2022 00:11 |
|
|
# ? May 28, 2024 14:53 |
|
boofhead posted:^ That's precisely my use case, the stuff I want to play around with requires CUDA and a bunch of other non-python dependencies, and I think what was happening (even on linux, once I swapped away from windows) was the standard conda, plus conda-forge, was struggling with figuring it all out and that was resulting in some weird bugs and then just hanging / absurdly install times where it just looks like it's crashed. I found some stuff that suggested mamba, installed the solver library for that, and everything went smoothly with mamba doing the package dependency calculating stuff Yeah mamba is simply better, it is a lot more optimized
|
# ? Dec 26, 2022 00:54 |