|
Dominoes posted:I apologize for the spam - would anyone mind skimming this readme, especially the parts comparing to existing projects? I'm attempting to demonstrate why I decided to build this, and how it improves on existing tools, without sounding snarky or insulting to those tools. I think it's OK, this is difficult to self-assess. I think it's fine too, at least once you remove those xkcd links, come on. quote:A more direct summary here, from anecdoetes: I'm still not sure how to make Poetry work with Python 3 on Ubuntu. Pipenv's dep resolution is unusably slow and installation instructions confusing. Pyenv's docs, including installation-instructions aren't user-friendly, and to install Py with it requires the user's comp to build from source. Pip+Venv provides no proper dependency-resolution, and is tedius. Conda provides a nice experience, but there are many packages I use that aren't on Conda. Can't quite parse this - what about Poetry are you trying to make work? The thing where you try and scrape its pyproject.toml entries? Also I remember from that discord link that you were talking with PSF about this, and probably the single thing that makes me most uncomfortable about the prospect of using this project is that it requires you, specifically, to maintain its backend architecture. There's a link to your github embedded in the source for downloading python binaries, etc. Are you trying to get PSF to handle the managing binaries part? And possibly the serverside dependency listing that's currently on a personal heroku?
|
# ? Sep 28, 2019 14:42 |
|
|
# ? May 27, 2024 03:20 |
|
Whats wrong with good old fashioned virtualenv(-wrapper) and pip? Those things are old, and pretty close to flawless in my opinion. With the added advantage that it isolates system pythons from local pythons without doing an end run around the os distros package manager Pyenv always seemed to me like an attempt at replicating RVM, and RVM was a mistake. I just dont get what these things bring to the table. Whats the use case? duck monster fucked around with this message at 15:13 on Sep 28, 2019 |
# ? Sep 28, 2019 15:09 |
|
duck monster posted:Whats wrong with good old fashioned virtualenv(-wrapper) and pip? I mean, the link on the last post of the last page (here it is again: https://github.com/David-OConnor/pyflow) to the readme has pretty explicit content about it Dominoes posted:I apologize for the spam - would anyone mind skimming this readme, especially the parts comparing to existing projects? I'm attempting to demonstrate why I decided to build this, and how it improves on existing tools, without sounding snarky or insulting to those tools. I think it's OK, this is difficult to self-assess. Oh one more thing about this - it would be very cool to have a way to disable the patching for multiple-dependency-version support. It has a valid use case for using pyflow to manage trees that will only ever be used with pyflow, but the instant you w ant to publish a package to pypi, or use your code without pyflow, it won't work (as you mention in the readme) and to me it's much better to see those errors early. Having multiple-version support would be nice but IMO it's not really something you can have supported in only one, or a subset, of tools.
|
# ? Sep 28, 2019 15:20 |
|
mr_package posted:I write code on MacOS and deploy to Win/Mac/Linux. If I use venv to create a virtual environment locally on my MBP, doesn't it still necessitate massaging the target system quite a bit? The Python binary venv creates on my system isn't going to just copy over to Linux or Windows, right? So... I need to run venv on the target system first and then copy the virtual environment over? Or I have to run something else that does the deployment (installs whatever is in requirements.txt?) as opposed to just copying files? Because if it's RH/CentOS it's not going to have python 3 by default, and even the latest RHEL is shipping Python 3.6. So if I'm on 3.7.4 and my virtual environment is same, I need to take some steps to prepare the server to support 3.7.4 virtual environment, yeah? PyInstaller? You still need to build the application upon the target platform though. Maybe you can setup a CI/CD system to do this and make it less painful. GitHub Actions 2.0/Azure Pipelines has turn-key Windows/Mac/Linux runners for what it is worth.
|
# ? Sep 28, 2019 15:30 |
|
I want to build a pretty Flask/Dash dashboard for CRUD purposes to replace manual operations in a Google Sheet/Jupyter Notebook. I currently load this Gsheet as a DataFrame in a Jupyter Notebook then run a butt-load of python functions on rows of the DataFrame. (E.g. a pywin32 function to print a page with specific printer settings.) I need to dumb this down for my sales team to take over this biz process where they can click button to run function without seeing code. What's the easiest way to make my CRUD app load a db table or rows from a table as a dataframe? Does Pandas make this trivial in some way? I know 0 SQL but will learn if needed. EDIT: The Dash app will be run on localhost. The database will need to be shared across a team, similar to a GSheet. Multiple users may be interacting with it at a given time.
|
# ? Sep 28, 2019 17:55 |
|
Deadite posted:I don’t know if this is the right place to ask, but I am trying to read a large (750mb) csv file into a pandas dataframe, and it seems to be taking an unreasonably long time. I am limiting the columns to only 8 columns with the usecols option, but the read_csv method is still taking 6 minutes to read the file into python. Oh awesome, I'm in almost exactly the same situation you're in, except my datasets can reach 40GB of SAS. I've pared them down to maybe 11-15GB each, but I still have to convert to csv because I can't figure out how to make read_sas use chunks. I'll have to keep you in mind and bounce ideas with you.
|
# ? Sep 28, 2019 18:15 |
|
Tayter Swift posted:Oh awesome, I'm in almost exactly the same situation you're in, except my datasets can reach 40GB of SAS. I've pared them down to maybe 11-15GB each, but I still have to convert to csv because I can't figure out how to make read_sas use chunks. I'll have to keep you in mind and bounce ideas with you. If you're loading them multiple times, pd.read_pickle() is really flexible and WAY WAY faster to load than a CSV. Almost as fast as a feather but I've had issues with feathers before where pickles "just work"
|
# ? Sep 28, 2019 18:53 |
|
Phobeste posted:Can't quite parse this - what about Poetry are you trying to make work? The thing where you try and scrape its pyproject.toml entries? Note that I have 3 versions of Python installed. (Sys 2, Sys 3, Updated 3). Dealing with this isn't on the Poetry docs or guide. You can find it a number of times in issues; From what I gather, it's using the `PATH` `python`, and expects you to use `pyenv` to modify this. I could probably sort through this, but this is not user-friendly, and I were new to Py, would be put off. Phobeste posted:Oh one more thing about this - it would be very cool to have a way to disable the patching for multiple-dependency-version support. It has a valid use case for using pyflow to manage trees that will only ever be used with pyflow, but the instant you w ant to publish a package to pypi, or use your code without pyflow, it won't work (as you mention in the readme) and to me it's much better to see those errors early. Having multiple-version support would be nice but IMO it's not really something you can have supported in only one, or a subset, of tools. Dominoes fucked around with this message at 21:26 on Sep 28, 2019 |
# ? Sep 28, 2019 21:24 |
|
I'm trying to figure out to implement a sort of callback functionality:code:
And running this as... code:
code:
Just so this doesn't sound like an XY problem let me add some detail. I'm writing a class that performs some streaming activity and provides connect, start, and stop methods. I want the user to be able to specify a function to be called once the connection is established.
|
# ? Sep 28, 2019 21:36 |
|
Dominoes posted:I mean poetry itself - if I follow the instructions on the website, it produces the error: [RuntimeError]The current Python version (2.7.16) is not supported by the project (^3.7) Please activate a compatible Python version. PATH is an environment variable that tells the computer which directories to check in when you run any command. So in a way, everything uses PATH. The system will go down the list of paths and check if the command is in there and run it if so. It’s a Linux/Unix thing. You got that error because your path is set up so that 2.7 got found first. Whenever you activate a virtualenv/conda environment the activation script just prepends things to your PATH so that the new 3.7 version is found first.
|
# ? Sep 28, 2019 22:04 |
|
CarForumPoster posted:If you're loading them multiple times, pd.read_pickle() is really flexible and WAY WAY faster to load than a CSV. Almost as fast as a feather but I've had issues with feathers before where pickles "just work" Multi-GB pickle files are not a good idea in my experience, as they take inordinate amounts of memory to load.
|
# ? Sep 28, 2019 22:20 |
|
Suggestions: "tedius" should be "tedious" The "A thoroughly biased feature table" is listing "Slow" as a feature quote:This avoids complications, especially for new users. It's common for Python-based CLI tools to not run properly when installed from pip due to the PATH or user directories not being configured in the expected way. Pipenv’s installation instructions are confusing, and may result in it not working correctly. I challenge the accuracy of this statement. Pip is attached to a specific version of Python; other versions of Python require their own pip. If you install something via pip, you're installing it for that version of Python. If you install something with Pip, but that something needs a different version of Python (or any pip-installed package) than what's currently in your environment, then that's a packager issue, not a pip issue; pyflow will suffer from this same problem, as people sometimes mispackage things (e.g. specifying you need python 3.4 when your script uses python 3.6 features). QuarkJets fucked around with this message at 23:47 on Sep 28, 2019 |
# ? Sep 28, 2019 23:44 |
|
Phobeste posted:I mean, the link on the last post of the last page (here it is again: https://github.com/David-OConnor/pyflow) to the readme has pretty explicit content about it quote:Pip+Venv provides no proper dependency-resolution What? Yes it does. I install a package into a virtualenv. If it also has a dependency, it gets installed too. Wheres the problem here? duck monster fucked around with this message at 03:23 on Sep 29, 2019 |
# ? Sep 29, 2019 03:00 |
|
QuarkJets posted:"tedius" should be "tedious" quote:I challenge the accuracy of this statement. Pip is attached to a specific version of Python; other versions of Python require their own pip. If you install something via pip, you're installing it for that version of Python. If you install something with Pip, but that something needs a different version of Python (or any pip-installed package) than what's currently in your environment, then that's a packager issue, not a pip issue; pyflow will suffer from this same problem, as people sometimes mispackage things (e.g. specifying you need python 3.4 when your script uses python 3.6 features). Boris Galerkin posted:You got that error because your path is set up so that 2.7 got found first. Whenever you activate a virtualenv/conda environment the activation script just prepends things to your PATH so that the new 3.7 version is found first. duck monster posted:What? Yes it does. I install a package into a virtualenv. If it also has a dependency, it gets installed too. Wheres the problem here? Dominoes fucked around with this message at 10:50 on Sep 29, 2019 |
# ? Sep 29, 2019 10:03 |
|
I think part of the problem with all of these different packaging solutions and with yours finding a niche is the success Python has had among people whose job description isn't "writing software in python". It's a problem that usually only system scripting languages like sh or powershell have. That means that you have to consider two environments, or maybe two workflows basically entirely separately: - The developer environment/workflow. This is people who write Python as their primary job. They produce a piece of code in Python and they are committed to either a) defining which interpreters it will work under for people in workflow 2 to use and letting people in workflow 2 worry about external compatibilities; or b) producing a packaged application that (usually) bundles an interpreter - The user workflow. This is people who don't write Python as their primary job and don't produce packaged output but use Python a lot. This is everybody who uses jupyter, most people who use conda, most people who use py(xy) or spyder or whatever. This is also, ironically enough, things like Mac OSX. They manually handle interpreter installs and expect to not have to think about them. They also expect to consume packaged outputs from the first workflow. There really hasn't been a tool that successfully handles both workflows. Probably pyenv/tox combined with first pipenv and now poetry is the best developer workflow, which requires managing multiple interpreter installations, automating tests, automating dependencies, and primarily focusing on the package you're developing. venv/virtualenvwrapper and plain pip or conda are the thing that works for the second workflow. Your tool seems targeted at the first workflow, and that's fine! Because pipenv is slow as dogshit and it's annoying to have to jump through hoops with path shims to make sure your dev environment doesn't stomp on your system python or whatever. So when people critique it they should keep in mind what it's aimed at. Also I don't think this is you (let me know if it is) but somebody else on the python discourse, who mostly works with nix, is doing a similar import rewriter: https://github.com/nix-community/nixpkgs-pytools/#python-rewrite-imports https://discuss.python.org/t/allowing-multiple-versions-of-same-python-package-in-pythonpath/2219/6
|
# ? Sep 29, 2019 13:52 |
|
edit: gently caress it. hipsters poison everything. deleted
|
# ? Sep 29, 2019 14:00 |
|
duck monster posted:edit: gently caress it. hipsters poison everything. deleted What?
|
# ? Sep 29, 2019 14:07 |
|
duck monster posted:edit: gently caress it. hipsters poison everything. deleted Thanks Obama
|
# ? Sep 29, 2019 14:35 |
|
conda and jupyter existing primarily for people who do not write python is a... interesting take
|
# ? Sep 29, 2019 23:24 |
|
QuarkJets posted:conda and jupyter existing primarily for people who do not write python is a... interesting take It's absolutely true for jupyter, from what I understand. The notebooks are great for data science stuff, and data scientists are not necessarily experienced developers.
|
# ? Sep 30, 2019 00:03 |
|
Solumin posted:It's absolutely true for jupyter, from what I understand. The notebooks are great for data science stuff, and data scientists are not necessarily experienced developers. Jupyter is great for conveying ideas as/in code clearly to other people. Data science or not. Conda just plain rocks. There are so many things that are a bitch and a half to pip install on Windows, usually due to binaries/PATH issues, that exist on conda in one form or another and "just work", particularly tensorflow/CUDA and selenium.
|
# ? Sep 30, 2019 00:18 |
|
QuarkJets posted:conda and jupyter existing primarily for people who do not write python is a... interesting take I don't mean it like "for simpletons who could not possibly be programmers" I mean it's much rarer that they're used for people whos goal is to ship python code (except for people wanting to ship python code for people to use in jupyter). It's a much stronger point about jupyter than conda though I'll admit
|
# ? Sep 30, 2019 00:57 |
CarForumPoster posted:Jupyter is great for conveying ideas as/in code clearly to other people. Data science or not. I'd be interested in examples of this I've always thought Jupyter looked cool but never find anything to use it for.
|
|
# ? Sep 30, 2019 01:19 |
|
CarForumPoster posted:If you're loading them multiple times, pd.read_pickle() is really flexible and WAY WAY faster to load than a CSV. Almost as fast as a feather but I've had issues with feathers before where pickles "just work" Hm, the [urlhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html]docs[/url] for read_pickle doesn't really show much flexibility. Ideally I'd like something where I could read something in either in chunks or based on a filter, but pickle looks like it reads in the whole thing. Suppose I could do sqlalchemy, but while I know SQL I've always found Python implementations to be annoyingly verbose. HDF5 looks interesting but I couldn't wrap my head around it the last time I looked.
|
# ? Sep 30, 2019 01:34 |
|
Solumin posted:It's absolutely true for jupyter, from what I understand. The notebooks are great for data science stuff, and data scientists are not necessarily experienced developers. Notebooks are used in many fields for many things, including development of applications for end-users. Netflix has a huge codebase of notebooks that are used for a lot more than just data science. I personally wouldn't write user-facing software using notebooks, because I prefer pycharm for that, but it's a thing that happens. And then there's the implication that people who write software cannot also perform data science or vice versa, which is also a weird thing to suggest. In some contexts those roles aren't even distinguishable
|
# ? Sep 30, 2019 01:48 |
|
IME, conda and jupyter are used by scientists. Professional programmers don't seem to use either that much.
|
# ? Sep 30, 2019 01:50 |
|
You can be both a scientist and a professional programmer. Not everyone programs to make websites or mobile apps.
|
# ? Sep 30, 2019 01:58 |
|
pmchem posted:You can be both a scientist and a professional programmer. Not everyone programs to make websites or mobile apps. I guess I erred in thinking it obvious that I meant professional programmers who are not scientists.
|
# ? Sep 30, 2019 02:17 |
|
QuarkJets posted:Notebooks are used in many fields for many things, including development of applications for end-users. Netflix has a huge codebase of notebooks that are used for a lot more than just data science. I personally wouldn't write user-facing software using notebooks, because I prefer pycharm for that, but it's a thing that happens. I hadn't heard about the Netflix notebooks before, that's really interesting! I definitely did not mean to imply that.
|
# ? Sep 30, 2019 02:18 |
|
Tayter Swift posted:Hm, the [urlhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html]docs[/url] for read_pickle doesn't really show much flexibility. Ideally I'd like something where I could read something in either in chunks or based on a filter, but pickle looks like it reads in the whole thing. You’re going to need something row based to have chunking supported out of the gate more than likely. HDF might be a format to consider. Pandas has a built in function, supports compression, chunk size, etc. If you decide to go the SQL route, pd.read_sql_query() is really straight forward to use.
|
# ? Sep 30, 2019 04:10 |
|
Phobeste posted:I think part of the problem with all of these different packaging solutions and with yours finding a niche is the success Python has had among people whose job description isn't "writing software in python". It's a problem that usually only system scripting languages like sh or powershell have. That means that you have to consider two environments, or maybe two workflows basically entirely separately: That's a useful categorization. I need to do some thinking on what the intended audience is, and how this fits. Currently, the intended audience based on inconveniences I've encountered myself. I don't fit neatly into either of those categories, but suspect that enough people do that they're useful. I agree that the normal project-based functionality falls more neatly into your first category. The one-off-script functionality uses much of the same implementation details as the project code, but I think its use is quite different; more along your second cat. I don't have any professional/team dev experience, so may be missing the big picture. Anecdotes, from my workflow: - I have a single script file for trying to implement an integration or matrix algo for following along with a class or textbook. Do I make a venv? Just install numpy and ipython on the sys Py? The latter usually wins, even though some people may not like it. Excellent use-case for having a global Conda installation where you dump all tools in, without breaking the system. - I'm working on several reasonably-complex Py projects concurrently. PyCharm handles venv use nicely after some setup, but without it, I find the normal venv workflow a pain. I'm surprised to hear so many people like it, when (Without PyCharm's auto-activation) it feels like a mess of console commands, typing long paths I've memorized etc each time I want to reboot / re-open my project / switch to a new project. Pipenv and Poetry tackle this, but both introduce their own hassles on-par with the ones I described. (ie Pipenv being slow, failing to resolve deps, and sometimes not installing correctly, and Poetry not working with Py3 without diving into its Github issues, installing Pyenv etc) - Installing a tool, and running into installation instructions like this. Ie not supporting a major operating system? Pointing to another tool for Mac? Starting the instructions for Linux with "Clone the repo, manually modify the path and add-to-the-shell (?)"? Use a separate project that exists just to install it? Maybe not-that-difficult, but also not-that-user-friendly. Dominoes fucked around with this message at 06:52 on Sep 30, 2019 |
# ? Sep 30, 2019 06:20 |
|
Dominoes posted:- I have a single script file for trying to implement an integration or matrix algo for following along with a class or textbook. Do I make a venv? Just install numpy and ipython on the sys Py? The latter usually wins, even though some people may not like it. Excellent use-case for having a global Conda installation where you dump all tools in, without breaking the system. My workflow for this: code:
Dominoes posted:- [...] and Poetry not working with Py3 without diving into its Github issues, installing Pyenv etc) code:
Dominoes posted:- Installing a tool, and running into installation instructions like this. Ie not supporting a major operating system? Pointing to another tool for Mac? Starting the instructions for Linux with "Clone the repo, manually modify the path and add-to-the-shell (?)"? Use a separate project that exists just to install it? Maybe not-that-difficult, but also not-that-user-friendly. Homebrew is a package manager for macOS. Think of it like an app store, like Steam. You don't need to use Stream, you can download and install the games on your own. But using Steam lets you update and uninstall all the games from from one place. Same concept with Homebrew. As far as the Linux instructions go, yeah I could see how it looks daunting and it could probably be simplified a bit or better explained. But those instructions aren’t really complex or out of the ordinary from a Linux point of view. This is probably why they point Mac users to homebrew, because someone else has already figured out all the steps needed to get pyenv installed and running so you’d just need to type something like brew install pyenv and be done with it.
|
# ? Sep 30, 2019 12:40 |
|
I have a list block_list of 4000 blocked domain names. I need to check a each string in a dataframe inspection_list["string"] of 10M strings to see if the string is a blocked domain name. If so, mark it True, else False There's gotta be a faster or parallelizable way better than: code:
|
# ? Oct 1, 2019 03:14 |
|
Use .isin()
|
# ? Oct 1, 2019 03:27 |
|
vikingstrike posted:Use .isin() This is very fast. On a 500K row subset: code:
code:
|
# ? Oct 1, 2019 04:52 |
|
A fun solution would be a bloom filter of the blocked list, then checking if any hits are in the blocked list. Iirc this is how chrome implements checking for sites that have malware.
|
# ? Oct 1, 2019 07:49 |
|
Hmmm, no help on my callback question? (Its okay, I figured it out) Oh well, I have another question. I'm struggling with laying out the proper design pattern to implement something. Basically, I want the user to be able to queue up a series of tasks (in any particular order) but have them execute sequentially. However, I want said sequential execution to take place in a thread that won't block my main program. So kind of like if I hired an assistant - I give he or she my to-do list (the chores can only be done one at a time) and meanwhile, I'm free to do whatever. Now to add a twist - the tasks are actually all async coroutines from an external module that I'm using. Thanks goons!
|
# ? Oct 3, 2019 03:26 |
|
I used rq for this type of thing (long uploads) in a Flask web app on the recommendation of someone in this thread. Otherwise users would get an HTTP layer timeout in the browser waiting for the upload to finish. https://python-rq.org/ I didn't consider the docs that great but after a while working with it it was ok. You need to kind of do things 'their way' a little bit. But it's been very set-and-forget for a long time now, I haven't had to log in and do anything on the server where the workers are running. HOWEVER there's a weirdness with getting failed jobs or removing them from the queue, it doesn't clean itself up and I have had to go in and do redis commands directly in order to remove them. There's something weird where you need a certain id but can't get it from rq directly because after x minutes the reference is removed but the thing still lives somewhere else, sorry it's been over a year since I've had to deal with that so I don't remember the details. But the short of it is getting the info on failed jobs was not as easy as the rest of it was. (And it may be improved in some updates). Anyway the easiest fix was actually to install the web-based management UI that ships alongside it and remove stuff from the failed queue manually that way. The other well known task queue library is Celery. mr_package fucked around with this message at 17:40 on Oct 3, 2019 |
# ? Oct 3, 2019 17:31 |
|
Cyril Sneer posted:Hmmm, no help on my callback question? (Its okay, I figured it out) Create a queue https://docs.python.org/3/library/queue.html#module-queue Create a thread https://docs.python.org/3/library/threading.html#module-threading Have the thread run a while loop that checks for tasks in the queue
|
# ? Oct 4, 2019 23:55 |
|
|
# ? May 27, 2024 03:20 |
|
Reread some code I wrote before finishing Obey the Testing Goat (again, strong recommend, not sure anything has ever helped me as much as that) and realizing I had actually rolled my own extremely stupid version of unittest.mock to fake an external API, return some fake data, capture the arguments used, the whole nine yards. Kinda addicting going back through my code and mocking out every last thing to create proper unit tests though I definitely see the concerns raised about mocks tying you to a particular implementation, sometimes to the point of tests almost looking like tautologous restatements of the code being tested.
|
# ? Oct 8, 2019 02:02 |