Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Dominoes posted:

I apologize for the spam - would anyone mind skimming this readme, especially the parts comparing to existing projects? I'm attempting to demonstrate why I decided to build this, and how it improves on existing tools, without sounding snarky or insulting to those tools. I think it's OK, this is difficult to self-assess.

I think it's fine too, at least once you remove those xkcd links, come on.

quote:

A more direct summary here, from anecdoetes: I'm still not sure how to make Poetry work with Python 3 on Ubuntu. Pipenv's dep resolution is unusably slow and installation instructions confusing. Pyenv's docs, including installation-instructions aren't user-friendly, and to install Py with it requires the user's comp to build from source. Pip+Venv provides no proper dependency-resolution, and is tedius. Conda provides a nice experience, but there are many packages I use that aren't on Conda.

Can't quite parse this - what about Poetry are you trying to make work? The thing where you try and scrape its pyproject.toml entries?

Also I remember from that discord link that you were talking with PSF about this, and probably the single thing that makes me most uncomfortable about the prospect of using this project is that it requires you, specifically, to maintain its backend architecture. There's a link to your github embedded in the source for downloading python binaries, etc. Are you trying to get PSF to handle the managing binaries part? And possibly the serverside dependency listing that's currently on a personal heroku?

Adbot
ADBOT LOVES YOU

duck monster
Dec 15, 2004

Whats wrong with good old fashioned virtualenv(-wrapper) and pip?

Those things are old, and pretty close to flawless in my opinion. With the added advantage that it isolates system pythons from local pythons without doing an end run around the os distros package manager

Pyenv always seemed to me like an attempt at replicating RVM, and RVM was a mistake. I just dont get what these things bring to the table. Whats the use case?

duck monster fucked around with this message at 15:13 on Sep 28, 2019

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

duck monster posted:

Whats wrong with good old fashioned virtualenv(-wrapper) and pip?

Those things are old, and pretty close to flawless in my opinion. With the added advantage that it isolates system pythons from local pythons without doing an end run around the os distros package manager

Pyenv always seemed to me like an attempt at replicating RVM, and RVM was a mistake. I just dont get what these things bring to the table. Whats the use case?

I mean, the link on the last post of the last page (here it is again: https://github.com/David-OConnor/pyflow) to the readme has pretty explicit content about it

Dominoes posted:

I apologize for the spam - would anyone mind skimming this readme, especially the parts comparing to existing projects? I'm attempting to demonstrate why I decided to build this, and how it improves on existing tools, without sounding snarky or insulting to those tools. I think it's OK, this is difficult to self-assess.

A more direct summary here, from anecdoetes: I'm still not sure how to make Poetry work with Python 3 on Ubuntu. Pipenv's dep resolution is unusably slow and installation instructions confusing. Pyenv's docs, including installation-instructions aren't user-friendly, and to install Py with it requires the user's comp to build from source. Pip+Venv provides no proper dependency-resolution, and is tedius. Conda provides a nice experience, but there are many packages I use that aren't on Conda.

Oh one more thing about this - it would be very cool to have a way to disable the patching for multiple-dependency-version support. It has a valid use case for using pyflow to manage trees that will only ever be used with pyflow, but the instant you w ant to publish a package to pypi, or use your code without pyflow, it won't work (as you mention in the readme) and to me it's much better to see those errors early. Having multiple-version support would be nice but IMO it's not really something you can have supported in only one, or a subset, of tools.

crazysim
May 23, 2004
I AM SOOOOO GAY

mr_package posted:

I write code on MacOS and deploy to Win/Mac/Linux. If I use venv to create a virtual environment locally on my MBP, doesn't it still necessitate massaging the target system quite a bit? The Python binary venv creates on my system isn't going to just copy over to Linux or Windows, right? So... I need to run venv on the target system first and then copy the virtual environment over? Or I have to run something else that does the deployment (installs whatever is in requirements.txt?) as opposed to just copying files? Because if it's RH/CentOS it's not going to have python 3 by default, and even the latest RHEL is shipping Python 3.6. So if I'm on 3.7.4 and my virtual environment is same, I need to take some steps to prepare the server to support 3.7.4 virtual environment, yeah?

There must be some simple/good workflow that doesn't require I install a bunch of modules everywhere I want to deploy a small script? I know this comes up all the time I'm just looking for the "one-- and only one-- right way" to do this. I'd love for it to be venv since it's built-in to the STL; I don't want to have to do a bunch of manual steps on every system before I can run my stuff.

Edit: looks like pip / requirements.txt is probably the simplest option? But still have to get the version of Python I need on each system.
https://stackoverflow.com/questions/48876711/how-to-deploy-flask-virtualenv-into-production

Edit2: also, this: https://www.scylladb.com/2019/02/14/the-complex-path-for-a-simple-portable-python-interpreter-or-snakes-on-a-data-plane/

I'm surprised how much work it seems to be to solve this problem, still.

PyInstaller? You still need to build the application upon the target platform though. Maybe you can setup a CI/CD system to do this and make it less painful. GitHub Actions 2.0/Azure Pipelines has turn-key Windows/Mac/Linux runners for what it is worth.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I want to build a pretty Flask/Dash dashboard for CRUD purposes to replace manual operations in a Google Sheet/Jupyter Notebook. I currently load this Gsheet as a DataFrame in a Jupyter Notebook then run a butt-load of python functions on rows of the DataFrame. (E.g. a pywin32 function to print a page with specific printer settings.) I need to dumb this down for my sales team to take over this biz process where they can click button to run function without seeing code.

What's the easiest way to make my CRUD app load a db table or rows from a table as a dataframe?
Does Pandas make this trivial in some way? I know 0 SQL but will learn if needed.

EDIT: The Dash app will be run on localhost. The database will need to be shared across a team, similar to a GSheet. Multiple users may be interacting with it at a given time.

Tayter Swift
Nov 18, 2002

Pillbug

Deadite posted:

I don’t know if this is the right place to ask, but I am trying to read a large (750mb) csv file into a pandas dataframe, and it seems to be taking an unreasonably long time. I am limiting the columns to only 8 columns with the usecols option, but the read_csv method is still taking 6 minutes to read the file into python.

I haven’t been using python for very long and I’m coming from a SAS programming background. In SAS this file loads in a few seconds, so I feel like I am screwing something up for this to take so long. I originally tried the read_sas method to load the original 1.5 gb dataset, but I had a memory error and had to convert the file to csv to get around that. The file only has 170k rows.

Oh awesome, I'm in almost exactly the same situation you're in, except my datasets can reach 40GB of SAS. I've pared them down to maybe 11-15GB each, but I still have to convert to csv because I can't figure out how to make read_sas use chunks. I'll have to keep you in mind and bounce ideas with you.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Tayter Swift posted:

Oh awesome, I'm in almost exactly the same situation you're in, except my datasets can reach 40GB of SAS. I've pared them down to maybe 11-15GB each, but I still have to convert to csv because I can't figure out how to make read_sas use chunks. I'll have to keep you in mind and bounce ideas with you.

If you're loading them multiple times, pd.read_pickle() is really flexible and WAY WAY faster to load than a CSV. Almost as fast as a feather but I've had issues with feathers before where pickles "just work"

Dominoes
Sep 20, 2007

Phobeste posted:

Can't quite parse this - what about Poetry are you trying to make work? The thing where you try and scrape its pyproject.toml entries?
I mean poetry itself - if I follow the instructions on the website, it produces the error: [RuntimeError]The current Python version (2.7.16) is not supported by the project (^3.7) Please activate a compatible Python version.
Note that I have 3 versions of Python installed. (Sys 2, Sys 3, Updated 3). Dealing with this isn't on the Poetry docs or guide. You can find it a number of times in issues; From what I gather, it's using the `PATH` `python`, and expects you to use `pyenv` to modify this. I could probably sort through this, but this is not user-friendly, and I were new to Py, would be put off.

Phobeste posted:

Oh one more thing about this - it would be very cool to have a way to disable the patching for multiple-dependency-version support. It has a valid use case for using pyflow to manage trees that will only ever be used with pyflow, but the instant you w ant to publish a package to pypi, or use your code without pyflow, it won't work (as you mention in the readme) and to me it's much better to see those errors early. Having multiple-version support would be nice but IMO it's not really something you can have supported in only one, or a subset, of tools.
Good idea. It currently displays a warning. Will change to a prompt to let the user abort.

Dominoes fucked around with this message at 21:26 on Sep 28, 2019

Cyril Sneer
Aug 8, 2004

Life would be simple in the forest except for Cyril Sneer. And his life would be simple except for The Raccoons.
I'm trying to figure out to implement a sort of callback functionality:
code:

def somefunc():
 	print ('do something here')

class Test():
    def __init__(self, Handler = None):
         self.handler = Handler

    def perform(self):
        print ('run the function')
        self.handler()
 


And running this as...

code:
test = Test(Handler = somefunc)
test.perform()
do something here
It works as advertised, yay. The problem is I want to be able to pass an object into somefunc so it can actually do something useful. So something like,

code:
def somefunc(  someobj ):
 	someobj.do_something()
But I'm really struggling with how to set this up. I can't just call test = Test( Handler = somefunc(MyObj) ), because this ends up executing the function and supplying its return value. As well I don't really know what to do with the perform function in the Test class. If I include arguments in the call - self.handler(stuff_in_here), then it expects there to be stuff_in_here, which doesn't exist in the function scope.


Just so this doesn't sound like an XY problem let me add some detail. I'm writing a class that performs some streaming activity and provides connect, start, and stop methods. I want the user to be able to specify a function to be called once the connection is established.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Dominoes posted:

I mean poetry itself - if I follow the instructions on the website, it produces the error: [RuntimeError]The current Python version (2.7.16) is not supported by the project (^3.7) Please activate a compatible Python version.
Note that I have 3 versions of Python installed. (Sys 2, Sys 3, Updated 3). Dealing with this isn't on the Poetry docs or guide. You can find it a number of times in issues; From what I gather, it's using the `PATH` `python`, and expects you to use `pyenv` to modify this. I could probably sort through this, but this is not user-friendly, and I were new to Py, would be put off.

Good idea. It currently displays a warning. Will change to a prompt to let the user abort.

PATH is an environment variable that tells the computer which directories to check in when you run any command. So in a way, everything uses PATH. The system will go down the list of paths and check if the command is in there and run it if so. It’s a Linux/Unix thing.

You got that error because your path is set up so that 2.7 got found first. Whenever you activate a virtualenv/conda environment the activation script just prepends things to your PATH so that the new 3.7 version is found first.

Hollow Talk
Feb 2, 2014

CarForumPoster posted:

If you're loading them multiple times, pd.read_pickle() is really flexible and WAY WAY faster to load than a CSV. Almost as fast as a feather but I've had issues with feathers before where pickles "just work"

Multi-GB pickle files are not a good idea in my experience, as they take inordinate amounts of memory to load.

QuarkJets
Sep 8, 2008

Suggestions:

"tedius" should be "tedious"

The "A thoroughly biased feature table" is listing "Slow" as a feature

quote:

This avoids complications, especially for new users. It's common for Python-based CLI tools to not run properly when installed from pip due to the PATH or user directories not being configured in the expected way. Pipenv’s installation instructions are confusing, and may result in it not working correctly.

I challenge the accuracy of this statement. Pip is attached to a specific version of Python; other versions of Python require their own pip. If you install something via pip, you're installing it for that version of Python. If you install something with Pip, but that something needs a different version of Python (or any pip-installed package) than what's currently in your environment, then that's a packager issue, not a pip issue; pyflow will suffer from this same problem, as people sometimes mispackage things (e.g. specifying you need python 3.4 when your script uses python 3.6 features).

QuarkJets fucked around with this message at 23:47 on Sep 28, 2019

duck monster
Dec 15, 2004

Phobeste posted:

I mean, the link on the last post of the last page (here it is again: https://github.com/David-OConnor/pyflow) to the readme has pretty explicit content about it

quote:

Pip+Venv provides no proper dependency-resolution

What? Yes it does. I install a package into a virtualenv. If it also has a dependency, it gets installed too. Wheres the problem here?

duck monster fucked around with this message at 03:23 on Sep 29, 2019

Dominoes
Sep 20, 2007

QuarkJets posted:

"tedius" should be "tedious"
Fixed.

quote:

I challenge the accuracy of this statement. Pip is attached to a specific version of Python; other versions of Python require their own pip. If you install something via pip, you're installing it for that version of Python. If you install something with Pip, but that something needs a different version of Python (or any pip-installed package) than what's currently in your environment, then that's a packager issue, not a pip issue; pyflow will suffer from this same problem, as people sometimes mispackage things (e.g. specifying you need python 3.4 when your script uses python 3.6 features).
While handling this may be out of Pip's scope, the use of Python interpreters to run command-line tools (or more generally, standalone programs) can produce unexpected results when more than one interpreter is installed, or if it stores dependencies in different locations for different permissions-levels - both of which are easy-to-encounter in Linux. Pyflow won't encounter this issue, as it's a standalone executable. It will suffer it in so far as it does not solve this problem for CLI tools installed with it. Do you think it should include something like this? I haven't tried Pipx, but understand that it exists to address this. Poetry includes a way to install itself using `curl` and vendorized dependencies designed to avoid this, but I haven't looked into how they did it. I'm addressing the wording now, since I meant that line to refer to pyflow as a tool itself, not other tools installed with it.

Boris Galerkin posted:

You got that error because your path is set up so that 2.7 got found first. Whenever you activate a virtualenv/conda environment the activation script just prepends things to your PATH so that the new 3.7 version is found first.
Didn't know conda and venv used it under the hood too; appreciate the info.

duck monster posted:

What? Yes it does. I install a package into a virtualenv. If it also has a dependency, it gets installed too. Wheres the problem here?
My use of the word proper was ambiguous - Pip will install things based on the order `pip install` is run, with no regard to the requirements as a whole.

Dominoes fucked around with this message at 10:50 on Sep 29, 2019

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
I think part of the problem with all of these different packaging solutions and with yours finding a niche is the success Python has had among people whose job description isn't "writing software in python". It's a problem that usually only system scripting languages like sh or powershell have. That means that you have to consider two environments, or maybe two workflows basically entirely separately:

- The developer environment/workflow. This is people who write Python as their primary job. They produce a piece of code in Python and they are committed to either a) defining which interpreters it will work under for people in workflow 2 to use and letting people in workflow 2 worry about external compatibilities; or b) producing a packaged application that (usually) bundles an interpreter
- The user workflow. This is people who don't write Python as their primary job and don't produce packaged output but use Python a lot. This is everybody who uses jupyter, most people who use conda, most people who use py(xy) or spyder or whatever. This is also, ironically enough, things like Mac OSX. They manually handle interpreter installs and expect to not have to think about them. They also expect to consume packaged outputs from the first workflow.

There really hasn't been a tool that successfully handles both workflows. Probably pyenv/tox combined with first pipenv and now poetry is the best developer workflow, which requires managing multiple interpreter installations, automating tests, automating dependencies, and primarily focusing on the package you're developing. venv/virtualenvwrapper and plain pip or conda are the thing that works for the second workflow. Your tool seems targeted at the first workflow, and that's fine! Because pipenv is slow as dogshit and it's annoying to have to jump through hoops with path shims to make sure your dev environment doesn't stomp on your system python or whatever. So when people critique it they should keep in mind what it's aimed at.

Also I don't think this is you (let me know if it is) but somebody else on the python discourse, who mostly works with nix, is doing a similar import rewriter: https://github.com/nix-community/nixpkgs-pytools/#python-rewrite-imports https://discuss.python.org/t/allowing-multiple-versions-of-same-python-package-in-pythonpath/2219/6

duck monster
Dec 15, 2004

edit: gently caress it. hipsters poison everything. deleted

CarForumPoster
Jun 26, 2013

⚡POWER⚡

duck monster posted:

edit: gently caress it. hipsters poison everything. deleted

What?

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

duck monster posted:

edit: gently caress it. hipsters poison everything. deleted

Thanks Obama

QuarkJets
Sep 8, 2008

conda and jupyter existing primarily for people who do not write python is a... interesting take

Solumin
Jan 11, 2013

QuarkJets posted:

conda and jupyter existing primarily for people who do not write python is a... interesting take

It's absolutely true for jupyter, from what I understand. The notebooks are great for data science stuff, and data scientists are not necessarily experienced developers.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Solumin posted:

It's absolutely true for jupyter, from what I understand. The notebooks are great for data science stuff, and data scientists are not necessarily experienced developers.

Jupyter is great for conveying ideas as/in code clearly to other people. Data science or not. :colbert:

Conda just plain rocks. There are so many things that are a bitch and a half to pip install on Windows, usually due to binaries/PATH issues, that exist on conda in one form or another and "just work", particularly tensorflow/CUDA and selenium.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

QuarkJets posted:

conda and jupyter existing primarily for people who do not write python is a... interesting take

I don't mean it like "for simpletons who could not possibly be programmers" I mean it's much rarer that they're used for people whos goal is to ship python code (except for people wanting to ship python code for people to use in jupyter). It's a much stronger point about jupyter than conda though I'll admit

NinpoEspiritoSanto
Oct 22, 2013




CarForumPoster posted:

Jupyter is great for conveying ideas as/in code clearly to other people. Data science or not. :colbert:

I'd be interested in examples of this I've always thought Jupyter looked cool but never find anything to use it for.

Tayter Swift
Nov 18, 2002

Pillbug

CarForumPoster posted:

If you're loading them multiple times, pd.read_pickle() is really flexible and WAY WAY faster to load than a CSV. Almost as fast as a feather but I've had issues with feathers before where pickles "just work"

Hm, the [urlhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html]docs[/url] for read_pickle doesn't really show much flexibility. Ideally I'd like something where I could read something in either in chunks or based on a filter, but pickle looks like it reads in the whole thing.

Suppose I could do sqlalchemy, but while I know SQL I've always found Python implementations to be annoyingly verbose. HDF5 looks interesting but I couldn't wrap my head around it the last time I looked.

QuarkJets
Sep 8, 2008

Solumin posted:

It's absolutely true for jupyter, from what I understand. The notebooks are great for data science stuff, and data scientists are not necessarily experienced developers.

Notebooks are used in many fields for many things, including development of applications for end-users. Netflix has a huge codebase of notebooks that are used for a lot more than just data science. I personally wouldn't write user-facing software using notebooks, because I prefer pycharm for that, but it's a thing that happens.

And then there's the implication that people who write software cannot also perform data science or vice versa, which is also a weird thing to suggest. In some contexts those roles aren't even distinguishable

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

IME, conda and jupyter are used by scientists. Professional programmers don't seem to use either that much.

pmchem
Jan 22, 2010


You can be both a scientist and a professional programmer. Not everyone programs to make websites or mobile apps.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

pmchem posted:

You can be both a scientist and a professional programmer. Not everyone programs to make websites or mobile apps.

I guess I erred in thinking it obvious that I meant professional programmers who are not scientists.

Solumin
Jan 11, 2013

QuarkJets posted:

Notebooks are used in many fields for many things, including development of applications for end-users. Netflix has a huge codebase of notebooks that are used for a lot more than just data science. I personally wouldn't write user-facing software using notebooks, because I prefer pycharm for that, but it's a thing that happens.

And then there's the implication that people who write software cannot also perform data science or vice versa, which is also a weird thing to suggest. In some contexts those roles aren't even distinguishable

I hadn't heard about the Netflix notebooks before, that's really interesting!

I definitely did not mean to imply that.

vikingstrike
Sep 23, 2007

whats happening, captain

Tayter Swift posted:

Hm, the [urlhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html]docs[/url] for read_pickle doesn't really show much flexibility. Ideally I'd like something where I could read something in either in chunks or based on a filter, but pickle looks like it reads in the whole thing.

Suppose I could do sqlalchemy, but while I know SQL I've always found Python implementations to be annoyingly verbose. HDF5 looks interesting but I couldn't wrap my head around it the last time I looked.

You’re going to need something row based to have chunking supported out of the gate more than likely. HDF might be a format to consider. Pandas has a built in function, supports compression, chunk size, etc. If you decide to go the SQL route, pd.read_sql_query() is really straight forward to use.

Dominoes
Sep 20, 2007

Phobeste posted:

I think part of the problem with all of these different packaging solutions and with yours finding a niche is the success Python has had among people whose job description isn't "writing software in python". It's a problem that usually only system scripting languages like sh or powershell have. That means that you have to consider two environments, or maybe two workflows basically entirely separately:

- The developer environment/workflow. This is people who write Python as their primary job. They produce a piece of code in Python and they are committed to either a) defining which interpreters it will work under for people in workflow 2 to use and letting people in workflow 2 worry about external compatibilities; or b) producing a packaged application that (usually) bundles an interpreter
- The user workflow. This is people who don't write Python as their primary job and don't produce packaged output but use Python a lot. This is everybody who uses jupyter, most people who use conda, most people who use py(xy) or spyder or whatever. This is also, ironically enough, things like Mac OSX. They manually handle interpreter installs and expect to not have to think about them. They also expect to consume packaged outputs from the first workflow.

There really hasn't been a tool that successfully handles both workflows. Probably pyenv/tox combined with first pipenv and now poetry is the best developer workflow, which requires managing multiple interpreter installations, automating tests, automating dependencies, and primarily focusing on the package you're developing. venv/virtualenvwrapper and plain pip or conda are the thing that works for the second workflow. Your tool seems targeted at the first workflow, and that's fine! Because pipenv is slow as dogshit and it's annoying to have to jump through hoops with path shims to make sure your dev environment doesn't stomp on your system python or whatever. So when people critique it they should keep in mind what it's aimed at.

Also I don't think this is you (let me know if it is) but somebody else on the python discourse, who mostly works with nix, is doing a similar import rewriter: https://github.com/nix-community/nixpkgs-pytools/#python-rewrite-imports https://discuss.python.org/t/allowing-multiple-versions-of-same-python-package-in-pythonpath/2219/6

That's a useful categorization. I need to do some thinking on what the intended audience is, and how this fits. Currently, the intended audience based on inconveniences I've encountered myself. I don't fit neatly into either of those categories, but suspect that enough people do that they're useful. I agree that the normal project-based functionality falls more neatly into your first category. The one-off-script functionality uses much of the same implementation details as the project code, but I think its use is quite different; more along your second cat. I don't have any professional/team dev experience, so may be missing the big picture.

Anecdotes, from my workflow:
- I have a single script file for trying to implement an integration or matrix algo for following along with a class or textbook. Do I make a venv? Just install numpy and ipython on the sys Py? The latter usually wins, even though some people may not like it. Excellent use-case for having a global Conda installation where you dump all tools in, without breaking the system.

- I'm working on several reasonably-complex Py projects concurrently. PyCharm handles venv use nicely after some setup, but without it, I find the normal venv workflow a pain. I'm surprised to hear so many people like it, when (Without PyCharm's auto-activation) it feels like a mess of console commands, typing long paths I've memorized etc each time I want to reboot / re-open my project / switch to a new project. Pipenv and Poetry tackle this, but both introduce their own hassles on-par with the ones I described. (ie Pipenv being slow, failing to resolve deps, and sometimes not installing correctly, and Poetry not working with Py3 without diving into its Github issues, installing Pyenv etc)

- Installing a tool, and running into installation instructions like this. Ie not supporting a major operating system? Pointing to another tool for Mac? Starting the instructions for Linux with "Clone the repo, manually modify the path and add-to-the-shell (?)"? Use a separate project that exists just to install it? Maybe not-that-difficult, but also not-that-user-friendly.

Dominoes fucked around with this message at 06:52 on Sep 30, 2019

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Dominoes posted:

- I have a single script file for trying to implement an integration or matrix algo for following along with a class or textbook. Do I make a venv? Just install numpy and ipython on the sys Py? The latter usually wins, even though some people may not like it. Excellent use-case for having a global Conda installation where you dump all tools in, without breaking the system.

My workflow for this:

code:
conda create -n physics101 python=3.7 numpy scipy matplotlib jupyter
conda activate physics101
And as for this:

Dominoes posted:

- [...] and Poetry not working with Py3 without diving into its Github issues, installing Pyenv etc)

code:
nemo:~ $ poetry --version
Poetry 0.12.17
nemo:~ $ python --version
Python 3.6.8 :: Anaconda, Inc.
?

Dominoes posted:

- Installing a tool, and running into installation instructions like this. Ie not supporting a major operating system? Pointing to another tool for Mac? Starting the instructions for Linux with "Clone the repo, manually modify the path and add-to-the-shell (?)"? Use a separate project that exists just to install it? Maybe not-that-difficult, but also not-that-user-friendly.

Homebrew is a package manager for macOS. Think of it like an app store, like Steam. You don't need to use Stream, you can download and install the games on your own. But using Steam lets you update and uninstall all the games from from one place. Same concept with Homebrew.

As far as the Linux instructions go, yeah I could see how it looks daunting and it could probably be simplified a bit or better explained. But those instructions aren’t really complex or out of the ordinary from a Linux point of view. This is probably why they point Mac users to homebrew, because someone else has already figured out all the steps needed to get pyenv installed and running so you’d just need to type something like brew install pyenv and be done with it.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I have a list block_list of 4000 blocked domain names. I need to check a each string in a dataframe inspection_list["string"] of 10M strings to see if the string is a blocked domain name. If so, mark it True, else False

There's gotta be a faster or parallelizable way better than:
code:
for i, r in inspection_list.iterrows()
     if r["string"] in inspection_list:
          inspection_list["blocked"][i] = True
     else:
          inspection_list["blocked"][i] = False
Suggestions?

vikingstrike
Sep 23, 2007

whats happening, captain
Use .isin()

CarForumPoster
Jun 26, 2013

⚡POWER⚡

This is very fast. On a 500K row subset:

code:
%%time
check["blocked"]=check["domain"].isin(blocked["domain"])
Wall time: 48.4 ms

code:
%%time
check["blocked"]=check["domain"].apply(lambda x: True if x in blocked["domain"] else False)
Wall time: 6.49 s

Solumin
Jan 11, 2013
A fun solution would be a bloom filter of the blocked list, then checking if any hits are in the blocked list. Iirc this is how chrome implements checking for sites that have malware.

Cyril Sneer
Aug 8, 2004

Life would be simple in the forest except for Cyril Sneer. And his life would be simple except for The Raccoons.
Hmmm, no help on my callback question? (Its okay, I figured it out)

Oh well, I have another question. I'm struggling with laying out the proper design pattern to implement something. Basically, I want the user to be able to queue up a series of tasks (in any particular order) but have them execute sequentially. However, I want said sequential execution to take place in a thread that won't block my main program.

So kind of like if I hired an assistant - I give he or she my to-do list (the chores can only be done one at a time) and meanwhile, I'm free to do whatever.

Now to add a twist - the tasks are actually all async coroutines from an external module that I'm using.

Thanks goons!

mr_package
Jun 13, 2000
I used rq for this type of thing (long uploads) in a Flask web app on the recommendation of someone in this thread. Otherwise users would get an HTTP layer timeout in the browser waiting for the upload to finish.

https://python-rq.org/

I didn't consider the docs that great but after a while working with it it was ok. You need to kind of do things 'their way' a little bit. But it's been very set-and-forget for a long time now, I haven't had to log in and do anything on the server where the workers are running. HOWEVER there's a weirdness with getting failed jobs or removing them from the queue, it doesn't clean itself up and I have had to go in and do redis commands directly in order to remove them. There's something weird where you need a certain id but can't get it from rq directly because after x minutes the reference is removed but the thing still lives somewhere else, sorry it's been over a year since I've had to deal with that so I don't remember the details. But the short of it is getting the info on failed jobs was not as easy as the rest of it was. (And it may be improved in some updates).

Anyway the easiest fix was actually to install the web-based management UI that ships alongside it and remove stuff from the failed queue manually that way.

The other well known task queue library is Celery.

mr_package fucked around with this message at 17:40 on Oct 3, 2019

QuarkJets
Sep 8, 2008

Cyril Sneer posted:

Hmmm, no help on my callback question? (Its okay, I figured it out)

Oh well, I have another question. I'm struggling with laying out the proper design pattern to implement something. Basically, I want the user to be able to queue up a series of tasks (in any particular order) but have them execute sequentially. However, I want said sequential execution to take place in a thread that won't block my main program.

So kind of like if I hired an assistant - I give he or she my to-do list (the chores can only be done one at a time) and meanwhile, I'm free to do whatever.

Now to add a twist - the tasks are actually all async coroutines from an external module that I'm using.

Thanks goons!

Create a queue
https://docs.python.org/3/library/queue.html#module-queue

Create a thread
https://docs.python.org/3/library/threading.html#module-threading

Have the thread run a while loop that checks for tasks in the queue

Adbot
ADBOT LOVES YOU

KICK BAMA KICK
Mar 2, 2009

Reread some code I wrote before finishing Obey the Testing Goat (again, strong recommend, not sure anything has ever helped me as much as that) and realizing I had actually rolled my own extremely stupid version of unittest.mock to fake an external API, return some fake data, capture the arguments used, the whole nine yards.

Kinda addicting going back through my code and mocking out every last thing to create proper unit tests though I definitely see the concerns raised about mocks tying you to a particular implementation, sometimes to the point of tests almost looking like tautologous restatements of the code being tested.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply