Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Dominoes
Sep 20, 2007

Hey bros. I'm trying to partially implement PEP 582. I'd like to direct python to make virtualenvs etc, but it's murky how I'd do that. I'm going to assume there's an alias available, but how do I check what it is? Ie it could be `python`, `python3`, `python3.7` etc.

One check will be forcing the user to specify a range of versions in a `TOML` file, and checking that `python[3] --version` matches it.

edit: Further RFI: I've got the script mostly working, in that it automates operating from a venv in a way that's smoother than Pipenv and Poetry (Major caveat: No locking/dependency deconfliction), but it uses its packages in a in-proj-folder venv, ie the usual `venv/a few directories/site-packages` instead of to-level `__pypackages__`. Do y'all know of any ways around this, so the venv will point to a diff directory?

Also, note that I'm making this as a binary file (coded in Rust), to avoid the headache of ensuring the right python install (2? 3? 3.7? user? sudo?) is executing the script, which can be an issue on Linux, esp for new users, and bit me recently with pipenv.

Dominoes fucked around with this message at 13:59 on Jul 15, 2019

Adbot
ADBOT LOVES YOU

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Dominoes posted:

Hey bros. I'm trying to partially implement PEP 582. I'd like to direct python to make virtualenvs etc, but it's murky how I'd do that. I'm going to assume there's an alias available, but how do I check what it is? Ie it could be `python`, `python3`, `python3.7` etc.

One check will be forcing the user to specify a range of versions in a `TOML` file, and checking that `python[3] --version` matches it.

edit: Further RFI: I've got the script mostly working, in that it automates operating from a venv in a way that's smoother than Pipenv and Poetry (Major caveat: No locking/dependency deconfliction), but it uses its packages in a in-proj-folder venv, ie the usual `venv/a few directories/site-packages` instead of to-level `__pypackages__`. Do y'all know of any ways around this, so the venv will point to a diff directory?

Also, note that I'm making this as a binary file (coded in Rust), to avoid the headache of ensuring the right python install (2? 3? 3.7? user? sudo?) is executing the script, which can be an issue on Linux, esp for new users, and bit me recently with pipenv.


Since people can have multiple interpreters, you've got to have it so that you either ask the user or have a (required?) argument to see which python interpreter they want to use. Avoid using the alias because that could be anything. You have to use an absolute path to the interpreter.

IIRC, some of these types of tools have a ton of checks looking for various places people could have interpreters installed. That is a pretty hairy situation though since sometimes you might have the same python version installed in more than one place.

I think the best practice would be to just ask the user to provide the path to the interpreter.

Dominoes
Sep 20, 2007

Thermopyle posted:

IIRC, some of these types of tools have a ton of checks looking for various places people could have interpreters installed. That is a pretty hairy situation though since sometimes you might have the same python version installed in more than one place.

I think the best practice would be to just ask the user to provide the path to the interpreter.
I think this may be the best soln: Try to find interpreters, then ask. It won't be elegant, but hopefully this will be hidden from the user. The current situation is hosed.

Master_Odin
Apr 15, 2010

My spear never misses its mark...

ladies
You can pretty easily just iterate through possible python versions available on the path?

In bash for Linux with "future-proofing":
code:
if [ -x "$(command -v "python")" ]; then
  echo "Found python"
fi
for MAJOR in 2 3 4; do
  if [ -x "$(command -v "python${MAJOR}")" ]; then
    echo "Found python${MAJOR}"
  fi
  for MINOR in {0..20}; do
    if [ -x "$(command -v "python${MAJOR}.${MINOR}")" ]; then
      echo "Found python${MAJOR}.${MINOR}"
    fi
  done
done
Would probably expand this to get the actual version from the alias (and potentially follow symlinks, but probably not) for asking the user.

I believe pipenv does something similar when you specify --three or whatever when creating an environment. Could also test for tools like pyenv if you wanted to get really fancy for people who install versions of python but do not add them to their path, but trying to deal with users who have python installed, but not on the path is a pointlessly hard endeavor.

For the second point, you could potentially use the
code:
--target
option on pip to specify the __pypackages__ directory and then just append that directory to PYTHONPATH when you activate the environment?

Master_Odin fucked around with this message at 17:17 on Jul 15, 2019

Dominoes
Sep 20, 2007

Master_Odin posted:

For the second point, you could potentially use the
code:
--target
option on pip to specify the __pypackages__ directory and then just append that directory to PYTHONPATH when you activate the environment?
Nailed it; working.

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
Are those statements equivalent:

Python code:
# df1: pandas.DataFrame
# df2: pandas.DataFrame

df1["extra_column"] = df2["interesting_column"]
# and
df1["extra_column"] = df2["interesting_column"].to_list()
My quick toy example points to yes but I'd rather be sure.

e: vvv
Thanks !

unpacked robinhood fucked around with this message at 21:02 on Jul 18, 2019

cinci zoo sniper
Mar 15, 2013




unpacked robinhood posted:

Are those statements equivalent:

Python code:
# df1: pandas.DataFrame
# df2: pandas.DataFrame

df1["extra_column"] = df2["interesting_column"]
# and
df1["extra_column"] = df2["interesting_column"].to_list()
My quick toy example points to yes but I'd rather be sure.

I’m not quite certain, but I wouldn’t be surprised to learn that given matching but differently ordered indices in df1 and df2 you will get different insert orders into df1.extra_column.

SurgicalOntologist
Jun 17, 2004

Yeah those could be different depending on the indices. With tolist you'll lose that information.

The Fool
Oct 16, 2003


I'm merging a couple csv's with pandas and am having issues with numbers getting rounded up when using pandas.read_csv.

For example: 1904.9999 becomes 1905

Any suggestions? I spent a couple hours on google trying to figure this out and nothing I've tried has worked.

cinci zoo sniper
Mar 15, 2013




The Fool posted:

I'm merging a couple csv's with pandas and am having issues with numbers getting rounded up when using pandas.read_csv.

For example: 1904.9999 becomes 1905

Any suggestions? I spent a couple hours on google trying to figure this out and nothing I've tried has worked.

low_memory=False, float_precision=“high”; and fill out dtype argument.

The Fool
Oct 16, 2003


float_precision on it's own wasn't working, but adding the other two arguments did.

Thanks.

Master_Odin
Apr 15, 2010

My spear never misses its mark...

ladies
I guess Kennith Reitz is getting out of the game? https://github.com/not-kennethreitz/team/issues/21

I sincerely hope that requests falls into the PSF organization and that they take over governance. It'll also be interesting to see if we do end up with another event-stream event.

xtal
Jan 9, 2011

by Fluffdaddy

Master_Odin posted:

I guess Kennith Reitz is getting out of the game? https://github.com/not-kennethreitz/team/issues/21

I sincerely hope that requests falls into the PSF organization and that they take over governance. It'll also be interesting to see if we do end up with another event-stream event.

Oh poo poo, what's the drama this time?

mr_package
Jun 13, 2000
Just noticed you can add arbitrary variables to a dataclass, e.g. you can just do my_dataclass.asdf = "asdf" and then my_dataclass.asdf returns "asdf" should you ever need it. Printing my_dataclass doesn't, because the field isn't added to __repr__ but you can print(my_dataclass.asdf).

Is this a side effect of dataclass design or is it a feature we are intended to use?

edit: this might be normal think I've seen it before with other class types, but never used it..?

necrotic
Aug 2, 2005
I owe my brother big time for this!
You can do that on an instance of any class.

https://docs.python.org/3/tutorial/classes.html#odds-and-ends

KICK BAMA KICK
Mar 2, 2009

Think this is a design question with some Python specific aspects: I've got some code that periodically queries an API, processes any new data it finds, and sends that off to another API. This all works fine as a console app I just run in a screen on a $5 Nanode. What I want to do is build a web interface for controlling it (start/stop the polling, check the status if it's currently processing some input, which can take a few minutes) and viewing the incoming data and processed results. It'd be just for my benefit -- no one else would ever interact with this -- but it's an excuse to learn web stuff and refactor the spaghetti I currently have.

So I started learning Django; worked out the models and views, parsed my logs to pull in all of the work that's already been done into a database, so far so good. What I'm stuck on is, where does the code that does the work go and how does my Django code interact with it? I'm not even sure what to ask more specifically -- like when I navigate to the page that has the "start the thing" button, what should the corresponding View do when I click it? Am I spawning a thread, a new process? Where does the reference to whatever you'd need to stop the polling (or just check that it's still alive) live? How should the polling/processing code communicate that it's got input (and create an IncomingThing in the database) or that it's finished processing (and create a corresponding ResultThing)?

I get that this is broad and I'm dumb so if you just want to point me in the direction of some concepts (or some existing code that does this kind of thing) even that would help, I was at a loss for what to Google. Thanks!

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

KICK BAMA KICK posted:

Think this is a design question with some Python specific aspects: I've got some code that periodically queries an API, processes any new data it finds, and sends that off to another API. This all works fine as a console app I just run in a screen on a $5 Nanode. What I want to do is build a web interface for controlling it (start/stop the polling, check the status if it's currently processing some input, which can take a few minutes) and viewing the incoming data and processed results. It'd be just for my benefit -- no one else would ever interact with this -- but it's an excuse to learn web stuff and refactor the spaghetti I currently have.

So I started learning Django; worked out the models and views, parsed my logs to pull in all of the work that's already been done into a database, so far so good. What I'm stuck on is, where does the code that does the work go and how does my Django code interact with it? I'm not even sure what to ask more specifically -- like when I navigate to the page that has the "start the thing" button, what should the corresponding View do when I click it? Am I spawning a thread, a new process? Where does the reference to whatever you'd need to stop the polling (or just check that it's still alive) live? How should the polling/processing code communicate that it's got input (and create an IncomingThing in the database) or that it's finished processing (and create a corresponding ResultThing)?

I get that this is broad and I'm dumb so if you just want to point me in the direction of some concepts (or some existing code that does this kind of thing) even that would help, I was at a loss for what to Google. Thanks!

What you're looking for is called a task queue.

The canonical solution is called Celery. However, it's very configuration-heavy because it's Enterprise Grade.

95% of the time I prefer python-rq. It's simple with fine docs.

Behind the scenes the basic idea works like this:

You have a server like Redis running. Your webserver python code puts a message into Redis. Your task queue python code sees that message and runs the tasks you've configured to run when such a message appears.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Master_Odin posted:

I guess Kennith Reitz is getting out of the game? https://github.com/not-kennethreitz/team/issues/21

I sincerely hope that requests falls into the PSF organization and that they take over governance. It'll also be interesting to see if we do end up with another event-stream event.

I hope someone hires him.

Looks like PSF is taking over all of KR's stuff.

Also, in reading that issue thread I found out that PSF now administers Black!

Eventually, the black repo will be moving out of the python repo and into the psf repo.

Master_Odin
Apr 15, 2010

My spear never misses its mark...

ladies

Thermopyle posted:

I hope someone hires him.

Looks like PSF is taking over all of KR's stuff.

Also, in reading that issue thread I found out that PSF now administers Black!

Eventually, the black repo will be moving out of the python repo and into the psf repo.
He's already given away quite a few of them with the remaining real high profile ones left being just requests, records, and clint. But yeah, it's cool that the PSF is taking over some number of projects that are indispensable to the community as I trust them not to ghost a repo.

KICK BAMA KICK
Mar 2, 2009

Thermopyle posted:

What you're looking for is called a task queue.
Thanks! Lmao I actually knew about these (I remember looking up Huey when someone mentioned it here a few weeks ago) and had thought about using it to schedule processing some old data during downtime when new stuff wouldn't be coming in. Somehow never occurred to me to use it for the main loop itself.

Dominoes
Sep 20, 2007

mr_package posted:

Just noticed you can add arbitrary variables to a dataclass, e.g. you can just do my_dataclass.asdf = "asdf" and then my_dataclass.asdf returns "asdf" should you ever need it. Printing my_dataclass doesn't, because the field isn't added to __repr__ but you can print(my_dataclass.asdf).

Is this a side effect of dataclass design or is it a feature we are intended to use?

edit: this might be normal think I've seen it before with other class types, but never used it..?
Sounds like a trap.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Thermopyle posted:

What you're looking for is called a task queue.

The canonical solution is called Celery. However, it's very configuration-heavy because it's Enterprise Grade.

95% of the time I prefer python-rq. It's simple with fine docs.

Behind the scenes the basic idea works like this:

You have a server like Redis running. Your webserver python code puts a message into Redis. Your task queue python code sees that message and runs the tasks you've configured to run when such a message appears.

Me and another dev just made babbys first deployed web app and this is exactly what we did for a function that takes about 3 minutes to run. As a pro tip on RQ/Redis/Flask/Dash combo it doesnt play nice on windows 10. The worker.py file we had to grab poo poo out of the queue didnt work so stuff just stacked up in redis.

The front end "web" worker times out after 30s on heroku so we also had to figure out how to not use a while loop to ask the queue if our jobs were done yet. Still kinda working on that last bit but it was confusing for me for a while.

CarForumPoster fucked around with this message at 02:58 on Jul 19, 2019

SurgicalOntologist
Jun 17, 2004

mr_package posted:

Just noticed you can add arbitrary variables to a dataclass, e.g. you can just do my_dataclass.asdf = "asdf" and then my_dataclass.asdf returns "asdf" should you ever need it. Printing my_dataclass doesn't, because the field isn't added to __repr__ but you can print(my_dataclass.asdf).

Is this a side effect of dataclass design or is it a feature we are intended to use?

edit: this might be normal think I've seen it before with other class types, but never used it..?

You can do this with most objects, even functions. The whole "consenting adults" thing: Python won't stop you from doing something stupid.

Dominoes
Sep 20, 2007

Trivia: In old versions of Python, running with the `--version` arg outputs to stderr.

Q: The location inside a venv for executables and custom scripts (eg python, ipython, pip etc) is `bin` on Ubuntu, and `Scripts` on Windows. Are there any other names it could have?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

CarForumPoster posted:

Still kinda working on that last bit but it was confusing for me for a while.

This reminds me of when I was first getting into web stuff...it was very confusing and nebulous and magical for a long time to me. Not that this is you, but it reminds me to post something I try to post every once in awhile to help the next person in my position from years ago:

The whole idea and that your whole program runs from beginning to completion for every request to the webserver along with the consequences just took a long time to sink in.

Django or Flask or Whatever has all sorts of fancy trappings to make structuring your software easier, but it all boils down behind the scenes to a single function that a webserver calls. The function takes data from the request like query params, headers, and POST data as arguments and returns a thing containing a string describing the response.

However long this function takes mostly determines how fast your application is.

I always tell people having a hard time figuring out whats going on to try writing a toy HTTP server, it's not terribly hard, there's lots of tutorials, and it really helps you grok wtf is going on.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Thermopyle posted:

This reminds me of when I was first getting into web stuff...it was very confusing and nebulous and magical for a long time to me.

This is me

mr_package
Jun 13, 2000
If I write a dataclass where one of the fields is a dictionary of other dataclass objects you end up addressing them in a mixed format (dots and brackets e.g. my_dataclass.fruits['apple'].weight). Is there a good way to nest dictionaries without mixing addressing schemes like this? Am I just plain Doing It Wrong by making a dataclass turducken or it's ok and normal to work with them in this way?

I've been trying to think of a way using a frozen dataclass as the key to the dictionary but so far do not see a way forward. In some ways best solution is to use unordered collection but that means iteration and for a large enough value of fruits that will probably become too slow.

QuarkJets
Sep 8, 2008

mr_package posted:

If I write a dataclass where one of the fields is a dictionary of other dataclass objects you end up addressing them in a mixed format (dots and brackets e.g. my_dataclass.fruits['apple'].weight). Is there a good way to nest dictionaries without mixing addressing schemes like this? Am I just plain Doing It Wrong by making a dataclass turducken or it's ok and normal to work with them in this way?

I've been trying to think of a way using a frozen dataclass as the key to the dictionary but so far do not see a way forward. In some ways best solution is to use unordered collection but that means iteration and for a large enough value of fruits that will probably become too slow.

Since you already have a wrapper class, just give it various attr methods that pass through to the dictionary it possesses. E.g. The parent's __getattr__ could return whatever is returned by the dictionary's get() method. That should let you invoke things like my_dataclass.apple.weight

mr_package
Jun 13, 2000

QuarkJets posted:

Since you already have a wrapper class, just give it various attr methods that pass through to the dictionary it possesses. E.g. The parent's __getattr__ could return whatever is returned by the dictionary's get() method. That should let you invoke things like my_dataclass.apple.weight

Thank you this works perfectly just with a simple two lines:
code:
    def __getattr__(self, item):
        return self.fruits.get(item)
However breaks if I have two dictionary fields in the main class I want to use this with, say 'vegetables'. Is there a way to over ride the __getattr__ method on each field? Then I could have my_dataclass.fruits.apple.weight or my_dataclass.vegetables.carrots.price, that kind of thing. I don't know if there's a simple way for each to override their own, but I did make it work by creating a wrapper class in between.

So basically:
code:
@dataclass
class FruitsAndVeggies
    fruits: FruitWrapper
    vegetables: VegetableWrapper

@dataclass
class FruitWrapper
    fruits: Dict[str, Fruit]

    def __getattr__(self, item):
        return self.fruits.get(item)

@dataclass
class Fruit
    name: str
    weight: float
    price: float
I'm not sure this is the most elegant code I've ever seen but does seem to work, I'm thinking there's an approach that uses inheritance to do same thing--- or, any other simpler/better way? Perhaps calling it a wrapper is wrong, I should call it FruitsCollection and then it seems to make more sense and look less ugly.

QuarkJets
Sep 8, 2008

That looks like it should work as-is;

Python code:
@dataclass
class FruitsAndVeggies
    fruits: FruitWrapper
    vegetables: VegetableWrapper

@dataclass
class FruitWrapper
    fruits: Dict[str, Fruit]

    def __getattr__(self, item):
        return self.fruits.get(item)

@dataclass
class Fruit
    name: str
    weight: float
    price: float

fav = FruitsAndVeggies()
fav.fruits.apple = Fruit('apple', 1.0, 0.5)
print(fav.fruits.apple)

VegetableWrapper and FruitWrapper are each basically wrapping dict (you could also subclass dict instead), and they're just attributes inside of another wrapper class. So you should be able to access everything with just dot-syntax

QuarkJets fucked around with this message at 01:32 on Jul 21, 2019

KICK BAMA KICK
Mar 2, 2009

No setting in PyCharm (have sprung for Professional) to run all my tests before committing? Or is that a thing you configure in Git itself or something?

xtal
Jan 9, 2011

by Fluffdaddy

KICK BAMA KICK posted:

No setting in PyCharm (have sprung for Professional) to run all my tests before committing? Or is that a thing you configure in Git itself or something?

I don't know about PyCharm but you could do that with a git pre-commit hook. I wouldn't advise it though because you should be committing as often as possible and it'll introduce friction to that. What you should do is test every branch before merging it into master.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Just be a pre merge hook or ci

Dominoes
Sep 20, 2007

Hey bros. It looks like Pipenv and Poetry take a while to lock is because there's no way to pull the deps of packages without downloading them. #1: WTF. #2: Any reason I couldn't make a database, put it online, impl a JSON api, and have it automate caching this? Would need to download every new release of every package once, but then should be GTG. Am I missing anything?

For ref, the warehouse API is p good, but is missing this.

edit: related. Perfect's not feasible, but we can do better.

Expect an early release within a week.

Dominoes fucked around with this message at 16:03 on Jul 22, 2019

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Dominoes posted:

Hey bros. It looks like Pipenv and Poetry take a while to lock is because there's no way to pull the deps of packages without downloading them. #1: WTF. #2: Any reason I couldn't make a database, put it online, impl a JSON api, and have it automate caching this? Would need to download every new release of every package once, but then should be GTG. Am I missing anything?

For ref, the warehouse API is p good, but is missing this.

edit: related. Perfect's not feasible, but we can do better.

Expect an early release within a week.

I haven't looked into it in detail, but I thought they downloaded them because they hashed the actual content of the downloads to ensure a matching download.

Dominoes
Sep 20, 2007

Thermopyle posted:

I haven't looked into it in detail, but I thought they downloaded them because they hashed the actual content of the downloads to ensure a matching download.
They do, and that's part of the slowness, but is orthogonal to this. Sorry I can't cite it atm, but I recently read a Github issue where sdispater cited downloading to resolve deps.

You could otherwise resolve deps in a normal way, download only what you need, then hash as a final-step QC; I don't think that's what they're doing. Pipenv is especially bad about this.

Dominoes
Sep 20, 2007

Basic API

The first time it queries a package/specific version, it downloads the package to the server, and pulls the Metadata (Newer dist-info/wheel format; older egg-info not yet supported). This is sort of like what Poetry/Pipenv do locally. Then it caches it, and returns the cached results on future hits.

The first time you hit a package/version combo it'll take a while, but will be faster for future hits.

Dominoes fucked around with this message at 19:43 on Jul 22, 2019

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

KICK BAMA KICK posted:

Thanks! Lmao I actually knew about these (I remember looking up Huey when someone mentioned it here a few weeks ago) and had thought about using it to schedule processing some old data during downtime when new stuff wouldn't be coming in. Somehow never occurred to me to use it for the main loop itself.

I just stumbled across this today:

Understand How Celery Works by Building a Clone

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
I have a small percentages of files out of a batch that don't parse well when opened with the default open(..)
I've managed to get around this by checking the encoding on each file with filemagic, and setting the encoding parameter accordingly.

Does it feel bad-practicy ?

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡
In case anyone else had this problem...I started to type

quote:

Is there a better way than a file I git ignore to store secrets like API keys and what not? Like maybe a AWS service that can only be accessed by whitelisted IPs

I'd like to outsource some web scraping tasks that are behind logins but I dont want to give all of our API keys and passwords to every developer. I can certainly change them after but it might be a few months before the projects are done.


But then I googled like a good boy and there is and it works fine through boto3:
https://aws.amazon.com/blogs/aws/aws-secrets-manager-store-distribute-and-rotate-credentials-securely/

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply