Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Zoracle Zed
Jul 10, 2001

SporkOfTruth posted:

if you're not a bit careful, you can fall into the trap of forgetting the yield order.

yes it seems very strange to ever be expected to remember the order, and accepting an argument to control the iteration order seems even weirder! I'd probably do something like yielding a namedtuple of (time, agent, source, outputs) or whatever so the user is free to later sort them however they'd like.

Adbot
ADBOT LOVES YOU

worms butthole guy
Jan 29, 2021

by Fluffdaddy
Hey all, been a while since i've interacted with Python but have a syntax question.

I have a selenium linked in scraper as follows:

code:
 elements = driver.find_elements(By.XPATH, "//*[contains(@class, 'search-results__no-cluster-container')]//div[contains(@class, 'artdeco-card')]");

        for card in elements:
            print(card)
            profileDiv = card.find_element(By.XPATH, ".//div[contains(@class, 'update-components-actor__container')]")
            profilePic = profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') if profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') else ""
            text = card.find_element(By.XPATH, ".//div[contains(@class, 'feed-shared-update-v2__description-wrapper')]//following::span[1]").text
the problem i'm having is that if it can't find a proiflePic (not sure why it cna't find it since eveyrthing on linked in has it but whatever) it crashes the program.

Typically in JavaScript I could do something like:

code:
profilePic = profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src')) ? profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src')) : null
and it ignores it, however it seems like Python doesn't operate this way.

WOndering if someone can help me know what syntax I can use TO do that. I don't want to use a try -> catch because I still want it to scrape what it can without crashing.

Also if anyone knows Selenium, why does my ".//" not work :qq:

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Just eyeballing it, looks like the part I highlighted in this line is your issue:

profilePic = profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') if profileDiv.find_element(By.XPATH, ".//img[contains(@class, 'presence-entity__image')]").get_attribute('src') else “”

The problem is that if it doesn’t find profilePic, you are still trying to call the get_attribute method on it. Which will crash. Is your error message something like “NoneType has no method get_attribute”?

Anyway, cut that method call and see if it works.

worms butthole guy
Jan 29, 2021

by Fluffdaddy
That definitely helped, now I have it on this line annoyingly:

code:
profileDiv = card.find_element(By.XPATH, ".//div[contains(@class, 'update-components-actor__container')]") if card.find_element(By.XPATH, ".//div[contains(@class, 'update-components-actor__container')]") else ""
removing the line just makes:

code:
profilePic = card.find_element(By.XPATH, ".//img[contains(@class, 'update-components-actor__avatar-image')]").get_attribute('src') if card.find_element(By.XPATH, ".//img[contains(@class, 'update-components-actor__avatar-image')]") else ""
break again :stare:

edit: This is the error I get

code:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//img[contains(@class, 'update-components-actor__avatar-image')]"}
  (Session info: chrome=108.0.5359.99)
Stacktrace:
EDIT: Figured it out, it was a selenium problem. You have to use "find_elements" for Selenium to return a bool.

worms butthole guy fucked around with this message at 21:40 on Dec 12, 2022

Zephirus
May 18, 2004

BRRRR......CHK

C2C - 2.0 posted:

Anyone else using Pycharm on a Macbook? I'm doing the 100 Days of Python off of Udemy and am working thru the course in Pycharm. I've got a directory set up that has individual folders for each day. Typically each morning I go into Finder & just create a new folder (say, "Day 35"). Then I open Pycharm, select "New Project" and then navigate to that day's folder. Inevitably, when I click "Okay" on the pop-up dialog where you choose the interpretor & a few other things, it returns another pop-up telling me that the folder isn't empty & I just select "Create Project From Source". Then the project loads and the empty Icon file opens in the editor. Is there any way to ignore this file or otherwise keep it from opening immediately upon each new project creation?

I can't reproduce this on pycharm pro on an m1 mac. Is your folder on a remote volume, or a volume formatted as non-hfs or apfs volume? Because in that case osx will create a .DS_Store file in the folder which will mean pycharm sees it as non-empty.

As a workaround, why not just have pycharm create the folder? If you give it the foldername on that screen it will create it.

SporkOfTruth
Sep 1, 2006

this kid walked up to me and was like man schmitty your stache is ghetto and I was like whatever man your 3b look like a dishrag.

he was like damn.

QuarkJets posted:

The ordering in this example is somewhat arbitrary, so you could just make the time loop the innermost loop.

If you wanted to give anyone the freedom of either style then you could split up the generator. There's no reason that this generator needs to accept "inputs" as an input argument, it can accept "input" instead. But don't name a variable "input" as that's a built-in

I fudged the generator slightly. In the original use-case context (names still fudged to protect the innocent), it's actually an argumentless method of a class where:
  • "inputs" and "agents" are both properties of the class
  • "inputs" is really an instance of a separate data ingestion class with its own generator (so we can accept, say, fast line-by-line reads or streaming data)
  • "agents" is a typing.Sequence of Agent class instances
  • Probably the key bit for why re-ordering is tough and why my mind didn't immediately think about splitting it up: there's actually a preceding double loop that advances the state of the agents before the yield loops. It's akin to:
    code:
    for agent in self.agents:
        agent.orient(time)
        for source in agent.sources:
                source.prepare(time)
    
    where "orient" and "prepare" are those state-advancing methods. Again, these have to be done before we yield anything. Not sure how we'd split this up (though you're right, this is coupled a little too tightly and the use pattern repeats across the code base).

Zoracle Zed posted:

yes it seems very strange to ever be expected to remember the order, and accepting an argument to control the iteration order seems even weirder! I'd probably do something like yielding a namedtuple of (time, agent, source, outputs) or whatever so the user is free to later sort them however they'd like.

Controlling iteration order "makes sense" if you're an engineer who wants to plot/observe all of the simulated data before you run an experiment on it, and you're used to 3D arrays in Matlab where this would be possible. (I was told at one point "why isn't this a Numpy array".) Otherwise, yeah, it's a weird ask!

Yielding more information, however -- that's much more appealing. The only trick would be refactoring how the downstream data processors accept the output of the generator since, as of now, all they need is timestamps and data. (Strictly speaking, the data objects that come out are tagged with the properties of the generating source, but not the agent, so the answer might be figuring out where that metadata goes and a friendlier pattern for accessing it.

C2C - 2.0
May 14, 2006

Dubs In The Key Of Life


Lipstick Apathy

Zephirus posted:

I can't reproduce this on pycharm pro on an m1 mac. Is your folder on a remote volume, or a volume formatted as non-hfs or apfs volume? Because in that case osx will create a .DS_Store file in the folder which will mean pycharm sees it as non-empty.

As a workaround, why not just have pycharm create the folder? If you give it the foldername on that screen it will create it.

It's on the internal Mac HD; after reading your response, I think the issue is with an app called Manilla that allows for folder colorization in Finder (I'm blind as a bat & using tags on files/folders isn't as easy for me to see). I use the app to visually sort larger directories if I'm not jumping around in iTerm.

I'll try your suggestion; the behavior wasn't a deal-breaker but rather a small, annoying issue. Thanks!

Zoracle Zed
Jul 10, 2001

SporkOfTruth posted:

Controlling iteration order "makes sense" if you're an engineer who wants to plot/observe all of the simulated data before you run an experiment on it, and you're used to 3D arrays in Matlab where this would be possible. (I was told at one point "why isn't this a Numpy array".) Otherwise, yeah, it's a weird ask!

In that case even better then a numpy array (where you still need to remember the axis order) is an xarray array, with labeled axes.

Jose Cuervo
Aug 25, 2004
Lets say I have a pandas dataframe with a date column and I want to iterate over all the unique dates in that column.

Is
Python code:
for date in df['Dates'].unique():
    pass
equivalent to

Python code:
unique_dates = df['Dates'].unique()

for date in unique_dates:
    pass
As in, in the first bit of code the "df['Dates'].unique()" only gets evaluated once, right?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Here's a weird question that's...probably incredibly low-level and I'm overthinking it.

I'm thinking of playing around with A Side Project like you do (A cloud save service for retro emulator save files), and part of it would be a client side application to handle basic scanning and uploading/downloading. While I've been working with Python for years, I haven't done any Python GUI stuff at all, and I'm realizing I don't really know where to start with something if I want to end up with a program that:

  • Can be packaged (PyInstaller?)
  • Runs in the background with an icon in the taskbar (?)
  • Has a basic GUI that modifies the behavior of a background process.
  • Possibly able to work and be packaged for both Linux and Windows (and maybe OSX)?

Are there any recommended tutorials or resources for going about something like this? Alternately, should I stop before I go too much further and just make this part in C# instead? I'd still be out of my depth but I know C# is a more common language for making Windows apps.

QuarkJets
Sep 8, 2008

Jose Cuervo posted:

Lets say I have a pandas dataframe with a date column and I want to iterate over all the unique dates in that column.

Is
Python code:
for date in df['Dates'].unique():
    pass
equivalent to

Python code:
unique_dates = df['Dates'].unique()

for date in unique_dates:
    pass
As in, in the first bit of code the "df['Dates'].unique()" only gets evaluated once, right?

That's correct

QuarkJets
Sep 8, 2008

Falcon2001 posted:

Here's a weird question that's...probably incredibly low-level and I'm overthinking it.

I'm thinking of playing around with A Side Project like you do (A cloud save service for retro emulator save files), and part of it would be a client side application to handle basic scanning and uploading/downloading. While I've been working with Python for years, I haven't done any Python GUI stuff at all, and I'm realizing I don't really know where to start with something if I want to end up with a program that:

  • Can be packaged (PyInstaller?)
  • Runs in the background with an icon in the taskbar (?)
  • Has a basic GUI that modifies the behavior of a background process.
  • Possibly able to work and be packaged for both Linux and Windows (and maybe OSX)?

Are there any recommended tutorials or resources for going about something like this? Alternately, should I stop before I go too much further and just make this part in C# instead? I'd still be out of my depth but I know C# is a more common language for making Windows apps.

Qt is probably what you should use, it's a widely used standard for cross-platform GUI development that has a lot of power built in. You could use pyqt6 or pyside6, the main difference is in licensing so pay attention to that.

  • PySide6 applications work right out of the box with PyInstaller, and it'd also trivial to publish via a conda channel or pypi if you want people to have alternatives. In fact, the 4th bullet point implies to me that you should at least publish to pypi so that it's easy to create installable packages for other platforms (more on that later)
  • It really depends on what you want this background process to do, there are a million different patterns for doing something like this and a good solution is going to depend on what the background process is doing. You can get way into the weeds here setting up a bespoke shared memory interface or you could just be updating a flat file with a few configurable parameters that the background process occasionally polls. You should think of this bullet as being completely unrelated to the GUI - it shouldn't really matter whether you communicate with your background process via a GUI or a CLI, what matters is that you have designed and set up some clearly-defined interfaces and that your background process does what it's supposed to do even when you're not communicating with it. Developing and deploying an application that runs in the background of a system is a whole separate topic that can range from "somewhat trivial" to "insanely advanced", depending on what it does and what platforms it needs to run on. I don't think there's going to be a canned solution here that works everywhere
  • Qt applications tend to look great and are highly configurable, by default they adapt to whatever style the system is already using (e.g. Linux vs Windows vs Mac, the window will look pretty native to that platform unless you go out of your way to change it).
  • Since Qt exists on all of these platforms already, you really just need to set up some infrastructure to start doing cross-platform deployment. Unfortunately this does mean actually installing your application in an instance of each of these environments and then packaging your application on that target; a team might set up automated build pipelines to do this, that's a little bit of pain early to save a lot of pain later assuming you want this application to be something you can update and release again later. You should use docker images for this so that you can make the build and deployment process easily reproducible

C# is a suitable alternative, and might be flat out better if you don't need to rely on anything from the Python ecosystem and have Windows as your main target. It really depends. But that doesn't necessarily solve any of the issues here, either way this is a bit of a project

QuarkJets fucked around with this message at 05:55 on Dec 20, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

Qt is probably what you should use, it's a widely used standard for cross-platform GUI development that has a lot of power built in. You could use pyqt6 or pyside6, the main difference is in licensing so pay attention to that.

  • PySide6 applications work right out of the box with PyInstaller, and it'd also trivial to publish via a conda channel or pypi if you want people to have alternatives. In fact, the 4th bullet point implies to me that you should at least publish to pypi so that it's easy to create installable packages for other platforms (more on that later)
  • It really depends on what you want this background process to do, there are a million different patterns for doing something like this and a good solution is going to depend on what the background process is doing. You can get way into the weeds here setting up a bespoke shared memory interface or you could just be updating a flat file with a few configurable parameters that the background process occasionally polls. You should think of this bullet as being completely unrelated to the GUI - it shouldn't really matter whether you communicate with your background process via a GUI or a CLI, what matters is that you have designed and set up some clearly-defined interfaces and that your background process does what it's supposed to do even when you're not communicating with it. Developing and deploying an application that runs in the background of a system is a whole separate topic that can range from "somewhat trivial" to "insanely advanced", depending on what it does and what platforms it needs to run on. I don't think there's going to be a canned solution here that works everywhere
  • Qt applications tend to look great and are highly configurable, by default they adapt to whatever style the system is already using (e.g. Linux vs Windows vs Mac, the window will look pretty native to that platform unless you go out of your way to change it).
  • Since Qt exists on all of these platforms already, you really just need to set up some infrastructure to start doing cross-platform deployment. Unfortunately this does mean actually installing your application in an instance of each of these environments and then packaging your application on that target; a team might set up automated build pipelines to do this, that's a little bit of pain early to save a lot of pain later assuming you want this application to be something you can update and release again later. You should use docker images for this so that you can make the build and deployment process easily reproducible

C# is a suitable alternative, and might be flat out better if you don't need to rely on anything from the Python ecosystem and have Windows as your main target. It really depends. But that doesn't necessarily solve any of the issues here, either way this is a bit of a project

Thanks! That's a lot of good info. I'll go start mucking around with QT and see what my options are; although probably I'll need to write the daemon/service first and then figure out how to interact with it, and then from there it's GUI time.

theHUNGERian
Feb 23, 2006

Sorry in advance if this is not the right thread.

I am making scatter plots in python using matplot lib, and I would like to have each data point occupy one single pixel that has one color. I am willing to deal with data points occupying multiple pixels, but then they still must be one color. I am currently getting poo poo like this:


from code like this:
code:
import matplotlib.pyplot as plt

x = [x_Data]
y = [y_Data]
color = [some_Array]

plt.scatter(x, y, c=color, cmap='hsv', marker = '.', s = (72/600)**2, lw=0)
plt.axis('off')
plt.savefig('LOL_goons.png', dpi=2400)
Playing with marker type and 's' only changes the size of the marker, but one pixel of the resulting marker is always black. How do I make all pixels of the marker the same color?

QuarkJets
Sep 8, 2008

What are the values in "color"?

theHUNGERian
Feb 23, 2006

QuarkJets posted:

What are the values in "color"?

Here are the first 5 entries:
-0.732114
2.02783
0.215997
1.63187
0.782867

Edit: I am basically trying to add color to this, where x and y are the data, and t is the color array.

theHUNGERian fucked around with this message at 03:50 on Dec 21, 2022

duck monster
Dec 15, 2004

[quote="QuarkJets" post="528299545"]
oh yeah and point out to your manager that postgres is also free
[/quote.

And less likely to make the inhouse database guy poo poo the bed in anger*




*who me? I'm a nice greybeard. Who occasionally flips the lid in.... polite disagreement... when new hire tries to roll out a shitful mysql database filled with sql interpolations and other dangerous fauna.

QuarkJets
Sep 8, 2008

theHUNGERian posted:

Sorry in advance if this is not the right thread.

I am making scatter plots in python using matplot lib, and I would like to have each data point occupy one single pixel that has one color. I am willing to deal with data points occupying multiple pixels, but then they still must be one color. I am currently getting poo poo like this:


from code like this:
code:
import matplotlib.pyplot as plt

x = [x_Data]
y = [y_Data]
color = [some_Array]

plt.scatter(x, y, c=color, cmap='hsv', marker = '.', s = (72/600)**2, lw=0)
plt.axis('off')
plt.savefig('LOL_goons.png', dpi=2400)
Playing with marker type and 's' only changes the size of the marker, but one pixel of the resulting marker is always black. How do I make all pixels of the marker the same color?

I think that there's something happening in whatever backend is being chosen for savefig; that's going to take your plot (which is displayed in inches with some default dpi) and then try to convert it to pixels (using the new dpi that you're providing), and something funny may happen along the way to give you the result you're seeing. You might try using a custom figure with a carefully-chosen DPI and figsize

pmchem
Jan 22, 2010


theHUNGERian posted:

Edit: I am basically trying to add color to this, where x and y are the data, and t is the color array.

check out PIL (Pillow), the python image library. you can go from numpy array to color image there without dealing with the hack of a scatterplot

theHUNGERian
Feb 23, 2006

Thanks!

Mr. Nemo
Feb 4, 2016

I wish I had a sister like my big strong Daddy :(
Hello, I'm trying to scrape forum threads as a personal project, so far I got this to get information from each post in a thread

a) User
b) Date
c) The text

code:
PythonThread="https://forums.somethingawful.com/showthread.php?threadid=3812541&userid=0&perpage=40&pagenumber="
PostsDictionary={"User_Name":[],"Date":[],"Post_Content":[],"Post_Words":[]}
postsperpage=42 #We know there are 40 user posts in each page, and 2 ads which are counted as posts
for i in range(100): #the 100 is just a place holder
    j=i+1
    print("Scraping page "+str(j))
    Response=requests.get(PythonThread+str(j))
    pageTemp=BeautifulSoup(Response.text)
    for k in range(postsperpage):
        username=str(pageTemp.find_all("td",class_="userinfo")[k].find("dt"))
        usernameclean=username[username.find(">")+1:username.find("<",1)]
        if usernameclean!="Adbot": #name of the user that posts the ads previously mentioned
            PostsDictionary["User_Name"].append(usernameclean)
            date=str(pageTemp.find_all("td",class_="postdate")[k])
            dateofpost=pd.to_datetime(str(date[date.rfind("a>")+3:date.rfind("a>")+21]))
            PostsDictionary["Date"].append(dateofpost)
            text=str(pageTemp.find_all("td",class_="postbody")[k])
            text=cleanhtml(text)
            text=text.replace("\n","")
            PostsDictionary["Post_Content"].append(text)
            PostsDictionary["Post_Words"].append(len(text.split()))
It works, and I end up with a dictionary I can then transform into a Dataframe to analyze individual user data, create a .doc with the thread, etc.

But it's a bit slow. Any suggestions on how it could be improved?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Mr. Nemo posted:

Hello, I'm trying to scrape forum threads as a personal project, so far I got this to get information from each post in a thread

a) User
b) Date
c) The text

code:
PythonThread="https://forums.somethingawful.com/showthread.php?threadid=3812541&userid=0&perpage=40&pagenumber="
PostsDictionary={"User_Name":[],"Date":[],"Post_Content":[],"Post_Words":[]}
postsperpage=42 #We know there are 40 user posts in each page, and 2 ads which are counted as posts
for i in range(100): #the 100 is just a place holder
    j=i+1
    print("Scraping page "+str(j))
    Response=requests.get(PythonThread+str(j))
    pageTemp=BeautifulSoup(Response.text)
    for k in range(postsperpage):
        username=str(pageTemp.find_all("td",class_="userinfo")[k].find("dt"))
        usernameclean=username[username.find(">")+1:username.find("<",1)]
        if usernameclean!="Adbot": #name of the user that posts the ads previously mentioned
            PostsDictionary["User_Name"].append(usernameclean)
            date=str(pageTemp.find_all("td",class_="postdate")[k])
            dateofpost=pd.to_datetime(str(date[date.rfind("a>")+3:date.rfind("a>")+21]))
            PostsDictionary["Date"].append(dateofpost)
            text=str(pageTemp.find_all("td",class_="postbody")[k])
            text=cleanhtml(text)
            text=text.replace("\n","")
            PostsDictionary["Post_Content"].append(text)
            PostsDictionary["Post_Words"].append(len(text.split()))
It works, and I end up with a dictionary I can then transform into a Dataframe to analyze individual user data, create a .doc with the thread, etc.

But it's a bit slow. Any suggestions on how it could be improved?

Have you measured how much of the slowness is making the request, waiting for a response vs your parsing code? If it's your code that's slow, which would be a little surprising, one immediate thought is that len(text.split()) is a really slow way to count words.

Mr. Nemo
Feb 4, 2016

I wish I had a sister like my big strong Daddy :(

Twerk from Home posted:

Have you measured how much of the slowness is making the request, waiting for a response vs your parsing code? If it's your code that's slow, which would be a little surprising, one immediate thought is that len(text.split()) is a really slow way to count words.

I have no idea how to do that to be honest, how do you suggest I go at it? I took a 6 class python course and this is my final project (including some space filler plots and other stuff). I guess I could run it twice for 100 pages with and without that word counter line and see if there's a noticeable difference.

Also the way I'm getting the name and dates is VERY manual, because I haven't had time to really dive into the HTML stuff beyond finding a "large" tag, that may be adding some time?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Mr. Nemo posted:

I have no idea how to do that to be honest, how do you suggest I go at it? I took a 6 class python course and this is my final project (including some space filler plots and other stuff). I guess I could run it twice for 100 pages with and without that word counter line and see if there's a noticeable difference.

Also the way I'm getting the name and dates is VERY manual, because I haven't had time to really dive into the HTML stuff beyond finding a "large" tag, that may be adding some time?

The simplest way to test if the request is slow would be to download the html for a page into a file and read from that, and time how long it takes to do all this while reading the html from a file vs actually making the request. If you want to make the whole thing faster regardless of why it's slow, just run lots of python processes, but don't be surprised when your IP gets blocked from hitting SA because you made a bunch of automated requests.

If you want to actually measure why the whole thing is slow, break out a profiler. cProfile and snakeviz would be a decent place to start, but if you fire up scalene it's nicer.

https://github.com/plasma-umass/scalene

Edit: my suggestion to fire up a lot of python processes assumes you're on a recent computer with 8+ cores.

Twerk from Home fucked around with this message at 23:45 on Dec 21, 2022

necrotic
Aug 2, 2005
I owe my brother big time for this!
You repeat the same element selection query for every post ( pageTemp.find_all("td",class_="postdate") and the like). It would probably speed things up a little to run those before the posts loop and then access the arrays from there.

Besides that, how long is it taking? It could just be the requests taking most of the time, which could only be improved by parallelizing which I doubt the admins would appreciate for a scraper.

QuarkJets
Sep 8, 2008

necrotic posted:

You repeat the same element selection query for every post ( pageTemp.find_all("td",class_="postdate") and the like). It would probably speed things up a little to run those before the posts loop and then access the arrays from there.

I think this may be a big part of it OP, basically inside the loop you're running these huge queries and then grabbing a specific element; moving those queries out of the loop will probably help

timbro
Jul 18, 2004

eXXon posted:

Am I wrong for wanting to implement immutable objects as frozen dataclasses with cached properties? For example:

Python code:
from dataclasses import dataclass
from functools import cached_property

@dataclass(frozen=True)
class Product:
	x: int = 1
	y: int = 2

	@cached_property
	def xy(self):
		return self.x * self.y

Can I ask what is the code and syntax of the x and y variables I have used python for a bit but haven’t seen this with the semi colon?

marshalljim
Mar 6, 2013

yospos
Those are type hints (and those are colons).

QuarkJets
Sep 8, 2008

timbro posted:

Can I ask what is the code and syntax of the x and y variables I have used python for a bit but haven’t seen this with the semi colon?

What you're seeing is the declaration of a dataclass with typehints

duck monster
Dec 15, 2004

OMFG GDAL has to be the worst thing ever invented.

Its absolutely unnegotiable as a requirement to any serious Geospatial work.
(Or in fact any Geospatial work if you wanna use python. But in this case, serious geospatial work, a front end api to a Geoserver install.)
There is absolutely no binary compatibility between even dot versions.
The Api is unstable.
The Python versions MUST be matched to the exact build version of the library.
The Docker containers featuring it seem to either stop at an ancient version of python or an ancient version of GDAL.
Package managers dont understand the need for the OS and Python versions to be precisely aligned.
Docker builds take about 10-15 minutes per cycle.
And usually explode.
I cant afford a lot of alcohol right now thanks to the Taxman loving me this year.
Also Pipenv requires a modern version of setuptools which isnt compatible with the version of libgdal thats installed in almost any of the available docker pages for python.

This is misery!

duck monster fucked around with this message at 09:38 on Dec 22, 2022

QuarkJets
Sep 8, 2008

Any chance you're using conda environments? It looks like the gdal package in the conda-forge channel is being kept up to date, so someone else has already worked out the build issues for a bunch of platforms and python versions: https://anaconda.org/conda-forge/gdal

I'm seeing python 3.8-3.11 for the latest release, at least

boofhead
Feb 18, 2021

Speaking of conda, I was checking it out and wanted to fool around with spacy transformers with hunspell and i just ran into so many issues, like conda didn't have all the packages so I had to install some through pip through conda, then hunspell install fails because it can't find the C++ build tools (although I'm pretty sure I installed them, it worked through normal pip iirc). I think a big part of the problem is trying to do it in windows, I'm going to give it another go on Linux tonight, but the missing conda packages didn't inspire a huge amount of confidence.. is there a significant gap between pip and conda for supported packages? Or am I just jumping in head first and missing some rudimentary conda concepts

theHUNGERian
Feb 23, 2006

theHUNGERian posted:

Sorry in advance if this is not the right thread.

I am making scatter plots in python using matplot lib, and I would like to have each data point occupy one single pixel that has one color. I am willing to deal with data points occupying multiple pixels, but then they still must be one color. I am currently getting poo poo like this:


from code like this:
code:
import matplotlib.pyplot as plt

x = [x_Data]
y = [y_Data]
color = [some_Array]

plt.scatter(x, y, c=color, cmap='hsv', marker = '.', s = (72/600)**2, lw=0)
plt.axis('off')
plt.savefig('LOL_goons.png', dpi=2400)
Playing with marker type and 's' only changes the size of the marker, but one pixel of the resulting marker is always black. How do I make all pixels of the marker the same color?

Yeah ok, I am an idiot. I made two plots, one in black and white, another one in color, and since the black and white plot looked fine, I didn't include its code in my OP.

Turns out that placing
code:
plt.figure().clear()
between the plots removed the black pixels from the color plot.

QuarkJets
Sep 8, 2008

boofhead posted:

Speaking of conda, I was checking it out and wanted to fool around with spacy transformers with hunspell and i just ran into so many issues, like conda didn't have all the packages so I had to install some through pip through conda, then hunspell install fails because it can't find the C++ build tools (although I'm pretty sure I installed them, it worked through normal pip iirc). I think a big part of the problem is trying to do it in windows, I'm going to give it another go on Linux tonight, but the missing conda packages didn't inspire a huge amount of confidence.. is there a significant gap between pip and conda for supported packages? Or am I just jumping in head first and missing some rudimentary conda concepts

I have never encountered an anaconda or conda-forge package that failed to install dependencies, but I don't usually use Windows. What channel are you using? It wouldn't surprise me if this was some bit of Microsoft fuckery that requires extra steps, since you mentioned needing a Windows compiler

I only use pip as a source of last resort, for really obscure stuff that has no conda package. That step takes place after installing everything else I need with mamba. You shouldn't need pip for conda dependencies, if that's happening then something has gone deeply wrong. Oh, and if you use pip you need to pay attention to what it's doing - pip will happily try and replace packages with a different version, if it decides that's required, but that might break any of your conda packages

If you use 3rd party channels then you're in the wild west, like pypi basically. I prefer conda-forge for most things, since that's exclusively open source packages and tends to update a lot faster than the anaconda channel.

OnceIWasAnOstrich
Jul 22, 2006

QuarkJets posted:

I have never encountered an anaconda or conda-forge package that failed to install dependencies, but I don't usually use Windows. What channel are you using? It wouldn't surprise me if this was some bit of Microsoft fuckery that requires extra steps, since you mentioned needing a Windows compiler

I only use pip as a source of last resort, for really obscure stuff that has no conda package. That step takes place after installing everything else I need with mamba. You shouldn't need pip for conda dependencies, if that's happening then something has gone deeply wrong. Oh, and if you use pip you need to pay attention to what it's doing - pip will happily try and replace packages with a different version, if it decides that's required, but that might break any of your conda packages

If you use 3rd party channels then you're in the wild west, like pypi basically. I prefer conda-forge for most things, since that's exclusively open source packages and tends to update a lot faster than the anaconda channel.

To follow up on this, I find almost all dependency problems go away when you use conda-forge (and maybe some high-quality channels like bioconda if applicable), remove the Anaconda "defaults" repo, and set channel_priority = strict. As a side bonus everything is nicely open-source and appropriately licensed.

And yes, always do Pip steps last and if you must run conda/mamba again in an env after running Pip, don't. Just delete the environment and build it again in the right order, this is what envs are for.

duck monster
Dec 15, 2004

QuarkJets posted:

I have never encountered an anaconda or conda-forge package that failed to install dependencies, but I don't usually use Windows. What channel are you using? It wouldn't surprise me if this was some bit of Microsoft fuckery that requires extra steps, since you mentioned needing a Windows compiler

I only use pip as a source of last resort, for really obscure stuff that has no conda package. That step takes place after installing everything else I need with mamba. You shouldn't need pip for conda dependencies, if that's happening then something has gone deeply wrong. Oh, and if you use pip you need to pay attention to what it's doing - pip will happily try and replace packages with a different version, if it decides that's required, but that might break any of your conda packages

If you use 3rd party channels then you're in the wild west, like pypi basically. I prefer conda-forge for most things, since that's exclusively open source packages and tends to update a lot faster than the anaconda channel.

I havent really got any experience with conda. I've always been a bit suspicious of it as fragmenting the main package system. But thats just a dumb bias that I really ought get over.

Re version, needs to be 3.10+ as I kinda went ham on the Union types with FastAPI (I've always been an advocate of Union types after using Crystallangs absolutely brilliant union based type safety system.) I ended up solving it by just building my own version of gdal hard matched to the version of python gdal and got up a docker container that ran. Turns out the secret was loving off PipEnv. Again, great idea, frustratingly broken implementation.

QuarkJets
Sep 8, 2008

duck monster posted:

I havent really got any experience with conda. I've always been a bit suspicious of it as fragmenting the main package system. But thats just a dumb bias that I really ought get over.

Re version, needs to be 3.10+ as I kinda went ham on the Union types with FastAPI (I've always been an advocate of Union types after using Crystallangs absolutely brilliant union based type safety system.) I ended up solving it by just building my own version of gdal hard matched to the version of python gdal and got up a docker container that ran. Turns out the secret was loving off PipEnv. Again, great idea, frustratingly broken implementation.

Pip, conda, and docker provide distinct levels of virtualization, from most lightweight to least. Conda provides Python packages, but it also provides non-Python dependencies; that second part is what makes conda useful and distinct from pip.

It would not be accurate to say that conda fragments the package system. It is a separate tool for managing your non-Python dependencies. I've seen a lot of docker recipes that use mamba to create a conda environment just because a lot of base containers are pretty lightweight (by design) and OS repositories are pretty limiting (also by design)

QuarkJets
Sep 8, 2008

As a basic example, CUDA (nvidia gpu processing) requires the CUDA toolkit to be installed. The CUDA toolkit is not a python package, you cannot install it with pip.

You can manage this dependency yourself, taking care to install the specific version of the cuda toolkit that is required by the python packages that you want to use and you can update the toolkit yourself whenever you want to update your python packages. This is fine, but it would be nice if we didn't have to manage that external dependency ourselves.

If you use conda instead all of this happens automatically, you just install the package you want and the correct version of the cuda toolkit is installed too.

This extends to all kinds of dependencies. On a system that doesn't have libtiff installed? No problem, there's a conda package for that. Need the Intel MKL? No problem. Need libhdf5? No problem. Etc

boofhead
Feb 18, 2021

^ That's precisely my use case, the stuff I want to play around with requires CUDA and a bunch of other non-python dependencies, and I think what was happening (even on linux, once I swapped away from windows) was the standard conda, plus conda-forge, was struggling with figuring it all out and that was resulting in some weird bugs and then just hanging / absurdly install times where it just looks like it's crashed. I found some stuff that suggested mamba, installed the solver library for that, and everything went smoothly with mamba doing the package dependency calculating stuff

I'm definitely going to use this more in future, pip winds up doing some bizarre things so I'm happy to have finally stumbled into what looks like a better way of doing things. All in all I'm just happy everything is installed and I can finally start playing around with it

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

boofhead posted:

^ That's precisely my use case, the stuff I want to play around with requires CUDA and a bunch of other non-python dependencies, and I think what was happening (even on linux, once I swapped away from windows) was the standard conda, plus conda-forge, was struggling with figuring it all out and that was resulting in some weird bugs and then just hanging / absurdly install times where it just looks like it's crashed. I found some stuff that suggested mamba, installed the solver library for that, and everything went smoothly with mamba doing the package dependency calculating stuff

I'm definitely going to use this more in future, pip winds up doing some bizarre things so I'm happy to have finally stumbled into what looks like a better way of doing things. All in all I'm just happy everything is installed and I can finally start playing around with it

Yeah mamba is simply better, it is a lot more optimized

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply