|
Jabor posted:You seem to be expecting the chatbot to actually know things and I'm really not sure why you have that expectation? So I can be smug with an emotionless robot, obviously. Anyways, I'm sure it will murder me for my impertinence after the skynet protocol becomes active.
|
# ? May 18, 2023 04:48 |
|
|
# ? May 28, 2024 14:32 |
|
Seventh Arrow posted:So I can be smug with an emotionless robot, obviously. I miss fishmech, too
|
# ? May 18, 2023 05:28 |
|
Data Graham posted:Someone in another thread mentioned You'll enjoy this, then: Python code:
Python code:
Python code:
Precambrian Video Games fucked around with this message at 14:19 on May 18, 2023 |
# ? May 18, 2023 14:13 |
|
Something I've enjoyed in PowerShell is something called Parameter Sets. Basically, you can have a whole mess of parameters for a function, and you can define them such that what's required or optional or even allowed is dependent on other parameters you've specified. As a somewhat simple example, if you were writing a function that added a user to a group, you would need a username, and then either a group name or group id. You could configure this easily in PowerShell, such that the interpreter would handle the case where someone passed in a username and nothing else, or passed in both a group name or group id. Is there anything like that in Python? I know I could make both group_name and group_id optional, and then in the code raise a ValueError if both group_name and group_id are given, or if neither are given, but that's a lot of boilerplate code. I know with typing you can import overload so that you can at least define those cases for type hinting, but nothing at run time will actually enforce that. For my case I'm working with this godawful Grouper API and there are some API endpoints that will do a ton of things based on what parameters are sent in, but I'd like to do some sanity checking beforehand because the errors returned from the API are often not clear, and also the API will often do stupid things so passing in "invalid" parameters in a certain combination might not cause any errors, but might cause unexpected results. Also as I'm thinking to myself, maybe I right one mega function that will just take everything as an optional input as a private function, but then write simpler functions that use the private function but with more limited parameters so it's much harder for someone to get themselves in trouble.
|
# ? May 18, 2023 16:10 |
|
I think what you want is a class for each Parameter Set that you can easily convert to a dict of kwargs to pass to external APIs. You can accomplish that a lot of ways using any of the large number of Python packages for configuration (or rolling your own if you're restricted in what dependencies you can add). I recommend pydantic dataclasses. You can do static type checking and pydantic will also do runtime validation on top of (almost) all of the features of standard library dataclasses. You can also nest classes if needed, although you may need to put in a bit of effort to flatten the dict if you want to pass them as kwargs to another function.
Precambrian Video Games fucked around with this message at 16:28 on May 18, 2023 |
# ? May 18, 2023 16:25 |
|
I was going to suggest dataclasses too. What's the difference between a dataclass and a pydantic dataclass?
|
# ? May 18, 2023 21:41 |
|
A Pydantic dataclass runs validations on the data before just returning a regular dataclass. So would the expectation be that the person consuming my functions would be generating the correct dataclass and then passing that into a function? Or I'm using a dataclass under the hood somehow?
|
# ? May 19, 2023 02:47 |
|
eXXon posted:You'll enjoy this, then: I’m guessing the original is due to string interning and the rest are compile time optimisations?
|
# ? May 19, 2023 10:31 |
|
QuarkJets posted:I was going to suggest dataclasses too. What's the difference between a dataclass and a pydantic dataclass? Like I said, pydantic is a library for configuration that makes runtime type checking and validation much easier. However, it's built around a BaseModel class that I find rather clunky to use and overkill for most use cases. The pydantic dataclass is a drop-in replacement for standard library dataclasses that adds most of the features of a BaseModel and keeps almost every other dataclass behaviour the same (there are a few quirks with some of the newest standard library dataclass features like slots). FISHMANPET posted:A Pydantic dataclass runs validations on the data before just returning a regular dataclass. I think dataclasses are fairly transparent for users (especially if you use Fields) and largely leave complexity to developers, so yes, I would define functions that take a dataclass and pass on the dict-ified version to the Grouper API. An example: Python code:
DoctorTristan posted:I’m guessing the original is due to string interning and the rest are compile time optimisations? I have no idea.
|
# ? May 20, 2023 04:36 |
|
I have a sqlite database with a table with the following columns: SID (the subject ID that the data belongs to), dt_index (contains the date and time that the blood glucose value was collected), and bg_mg_per_dL (the blood glucose value). I now have a list of tuples where each tuple contains a SID and a start date and time indicating that I want to retrieve the blood glucose values for that SID starting at that start date and time and for the next 6 hours. Is the best (fastest?) way to achieve this to use a for loop? The length of the list of tuples can be anywhere from 5 entries to 1000. Python code:
Jose Cuervo fucked around with this message at 21:26 on May 25, 2023 |
# ? May 25, 2023 21:24 |
|
Jose Cuervo posted:I have a sqlite database with a table with the following columns: SID (the subject ID that the data belongs to), dt_index (contains the date and time that the blood glucose value was collected), and bg_mg_per_dL (the blood glucose value). That's a reasonable approach. You could turn the big statement inside the loop into a function and replace the for loop with a list comprehension, that probably wouldn't be any faster but I think it'd be better organized
|
# ? May 25, 2023 22:22 |
|
I think I've already come in herre with similar questions, so I apologize in advance if there's some repetition. I'm a data analyst who mainly works in GUIs like Tableau, PowerBI etc. but I'm interested in upskilling with more skills in python and dev-work in general around data analysis (also "data science" too but only for the simple applications of ML to be realistic in that my day-to-day responsibilities which aren't in building models. I also don't have a CS, heavy stats or engineering background.). I want to learn how to do more of my work in a "devops" or software developer way both for my own general upskilling/interest but also because I really enjoy the consistency and reproducibility with working using the same set of tools that aren't based on GUIs (love the concept of TDD and design patterns, and interestingly they feel less daunting then having to tackle some giant-bloated Tableau dashboard because templating in BI tools absolutely blows). My work will pay for some online learning, so I really want to pick the best platform for me, but I've been having trouble finding something that is both super interactive (i.e. has some sort of "fake" web dev IDE that simulates what actually doing the skills would "feel" like) while not being so abstracted that its completely like mad-libs / fill-in-the-blanks (DataCamp often feels like this) and its still based around actual projects and not trying to teach you python through rote memorization (yes, I understand I need to just "write python", but my ADHD brain doesn't work with just "pushing through" something, I have to be actually doing something tangible for my focus to stay on enough that I'm actually writing code). Much prefer to learn how to build a python package end-to-end and at first only understand like 20% of applying python techniques themselves so that when I'm applying something its actually being done in the context of a tangible end goal. The whole "import foo and baz" poo poo is absolutely useless to me because I'll never remember that unless I've applied it and built some sort of memory of what this was doing in the context of a problem. I get frustrated seeing the kind of examples that should only be in documentation, not an actual tutorial. poo poo like this: code:
If the examples were stupid but still "practical" they would work 1000% better like "John is writing an app for a lemonade stand and needs to write a function that groups lemonade drinks by brands. He has a csv with all the brands and all the drinks, here is what he would do..." It's also tricky to find "realistic" material for using python in a low-level "applied" sort of way. I'm not a web developer, so while I don't need to learn how to build an entire app, I still don't want all the code I'm writing just to be scratch notebooks filled with pandas queries, especially since very often what I'm asked to do is often very repeatable without much work to parameterize or at least be made in such a way that other colleagues could re-use. Anyways, bit of a long rant, sorry had to just get it out on screen here. If anyone knows of anything that's near the combo of what Im talking about (data-oriented, interactive / simulated environment and project/contextually focused) then it would be much appreciated for my stupid overthinking ADHD brain.
|
# ? May 28, 2023 17:12 |
I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect. What other options do I have?
|
|
# ? May 29, 2023 02:50 |
|
Popete posted:I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect. A queue?
|
# ? May 29, 2023 03:29 |
|
Popete posted:I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect. Once the library is loaded into memory the first time there's not really anything that you can do to directly stop accessing it in the future from the other process, so that's out. You can use a file-based mutex, but that's sloppy and can be error-prone if you don't set it up just right You could have one script control the other, then you can lock access, wait around for the child process to finish doing something before you give it more work, etc. What I think you should do is define a real-deal python package, reimplement these scripts as functions with a primary entry point and everything, and then define some kind of main() or __main__.py that handles the execution of the "scripts" (which are just modules in your package now).
|
# ? May 29, 2023 03:42 |
|
Popete posted:I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect. If you are on Linux, named semaphores have a mechanism for informing another process that the last semaphore owner terminated without unlocking it. On Windows, named mutexes do (named semaphores do not). Those can (1) wait until the other process is done using it and (2) tell you the other process died while using it. Both of these will be creating an object in a kernel namespace that will exist as long as at least one handle is open to it from some process. Opening the same name from another process will give it a handle to the same kernel object. There are probably no easily straightforward python standard library wrappers to these; it generally tries to be platform independent and this is inherently platform specific code.
|
# ? May 29, 2023 08:51 |
Yes ideally if one process dies well holding the lock I would be able to notice and recover the lock from this. To be more explicit one process is a daemon running in the background every 2 seconds it's accessing devices over the I2C bus, another process can be run by a user at anytime and access the same devices over I2C, there is a rare chance the two collide and their I2C transactions overlap each other (it requires a write and then a read for this transaction to complete and that's where the problem can occur). I started to write my own file lock solution where I create a file under /tmp (Linux) and wrote the string "locked" or "unlocked" to the file which would get checked when one process wanted to take the lock. Then I remembered I needed a way to cause one program to spin on the lock but wasn't sure how to do that.
|
|
# ? May 29, 2023 16:37 |
|
Popete posted:I started to write my own file lock solution where I create a file under /tmp (Linux) and wrote the string "locked" or "unlocked" to the file which would get checked when one process wanted to take the lock. Then I remembered I needed a way to cause one program to spin on the lock but wasn't sure how to do that. Maybe select? https://docs.python.org/3/library/select.html (But as others said external files can get a bit gnarly, probs not best overall solution)
|
# ? May 29, 2023 17:02 |
|
Popete posted:Yes ideally if one process dies well holding the lock I would be able to notice and recover the lock from this. To be more explicit one process is a daemon running in the background every 2 seconds it's accessing devices over the I2C bus, another process can be run by a user at anytime and access the same devices over I2C, there is a rare chance the two collide and their I2C transactions overlap each other (it requires a write and then a read for this transaction to complete and that's where the problem can occur). To confirm, when you say write then read, do you mean (A) I2C write, then time passes with the bus idle, then a read, or (B) a write, a repeated start, then a read? They aren't electrically the same. The kernel driver for your I2C controller should be providing a way to do (B) and will do it atomically. (B) is common for things like "Write a register address, then read data from it" and typically chips won't accept (A) for doing that operation anyway. If you're using i2c-dev via smbus2 python wrapper, it's i2c_rdwr() If you were doing the file locking version, you'd typically use advisory file locks (python standard library has a wrapper in fnctl.flock()) on the dummy file, not writing actual content into the files. Trying to acquire a lock while someone has an exclusive lock will block, and any held locks will be freed by the kernel when the holding process terminates for any reason. When you acquire the lock, it won't tell you if the last owner died while holding it (and it was released by the kernel instead), but it sounds like you probably don't care. (also since you mentioned you were using multiprocessing and linux, be aware that you need to call multiprocessing.set_start_method("spawn"). The default start method violates POSIX and will cause random crashes/deadlocks depending on the internal implementation of other libraries and what they happen to be doing at the moment of the fork) Foxfire_ fucked around with this message at 20:02 on May 29, 2023 |
# ? May 29, 2023 19:59 |
I'm using smbus version 1.1 which does not appear to have the i2c_rdwr option in this older version. Here is what the I2C write/read section looks like.code:
That would be 2 separate bus transactions and I don't believe it would be atomic (which I would like). So what happens is one program is writing/reading values to the chip and the other program comes in and does the same stomping over the first programs write/reads and causes incorrect values to be read out.
|
|
# ? May 29, 2023 20:47 |
|
For SMBus/PMBus, i2c_rdwr wouldn't help you anyway. Each of those read commands is doing a repeated start internally and then a stop because that's what those specifications require. fcntl.flock() on a file is what I'd do. Either some dummy temporary or the device file for the bus.
|
# ? May 30, 2023 01:54 |
I actually tried fcntl.flock already but I had implemented it in the library calling the smbus functions itself and it did not work as I expected. This time I implemented the flock calls outside the library and surrounded the calls too the library from each separate script and that appears to be working. Not sure what the difference would be, also entirely possible I screwed something up previously.
|
|
# ? May 30, 2023 04:26 |
|
Oysters Autobio posted:I think I've already come in herre with similar questions, so I apologize in advance if there's some repetition. I don't have good advice for you but I will say that if you find pandas confusing, especially its indexing system, you are not alone. I loathe .loc and iloc and really most things about pandas. multiindexing just seems to make it even worse. I can't suggest an alternative other than that astropy has a (in my opinion) more lightweight and intuitive table system plus a great module for physical units, but there's little reason for non-physicists to use it. There are plain old numpy tables, I guess. To answer your question with another question, though, have you looked into R's tidyverse? I never particularly understood it either but its proponents seem to really like it*. I gave up on R a while ago because I think it's a terrible programming language and I gather Python has better support and interoperability with ML-focused packages, but you can use both, in principle. I'd actually be interested in seeing a recent and comprehensive comparison of the two (here's just one example from half-assed searching). * you said you don't have a heavy stats or CS background. I feel like R and pandas were both designed for and by users in social sciences and biology, who have a greater need for categorical variables (AKA factors in R) and are more used to querying databases with SQL than doing object-oriented programming.
|
# ? May 30, 2023 06:35 |
|
As far as I can tell , polars is a better pandas in every way, including the API.
|
# ? May 30, 2023 11:59 |
|
IIRC polars doesn't have all the functions pandas does, though I'm not enough of an expert in either to go into detail. But yeah I've almost totally switched over to polars. The differences in ergonomics, memory usage, and especially speed vs pandas are just plain unfair.
|
# ? May 30, 2023 13:05 |
|
Yeah, I do have one project that uses pandas to collect the results of sql queries and output them to html or excel, no math. I’m not planning to port that over anytime soon.
|
# ? May 31, 2023 00:30 |
|
I'm trying to replicate the behaviour of the Linux command 'date', but the timezone has me baffled:Python code:
|
# ? May 31, 2023 13:11 |
|
bolind posted:I're read up on matters, and learned a bit about timezone aware and timezone naive timestamps, which kinda makes sense, but how hard can it be to figure out which time zone the OS thinks it's in? datetime.now() is a naive object, you want datetime.now().astimezone() code:
|
# ? May 31, 2023 14:29 |
|
bolind posted:I'm trying to replicate the behaviour of the Linux command 'date', but the timezone has me baffled: ^^ or the above Try datetime.datetime.now().astimezone().tzinfo
|
# ? May 31, 2023 14:31 |
|
Thank you both of you!
|
# ? May 31, 2023 18:24 |
|
My team is working on a Click-based CLI suite; part of the requirements is that we have a way to bundle up all IO for logging to a remote object of some kind (ticket right now, but somewhere more sane in the future); I looked into how to capture those but all the options seemed to have weird downsides and was getting into footgun territory, so instead I expanded on the number of functions we have that wrap native Click functions to do this. Yes, I implemented my own logger instead of using the native one, just so I had more control over formatting/etc. Requirements: Log all user input and stdout, as well as (in the future) additional logged info. Python code:
I'd like to enforce the usage of these functions over the native Click ones, ideally through a test of some kind, as if someone uses these (or even .input() or print() they won't get logged) However, there's a few problems with the most obvious approaches: 1. We have a bunch of people working on this with independent testing setups, and most of them aren't using pytest. Cutting them all over as part of this work is probably out of scope, but could be a last resort. If we did that, I could possibly setup some weird Mocks to track usage and just confirm that the amount of times the native click function was called is equal to the number of times the wrapper function is called. 2. We don't want to block calling these functions at all, and there's other native click stuff that still needs doing that isn't part of these IO functions, so I can't just look for 'import click' and raise a problem. My current plans are either to have a test that does a literal string search through the modules looking for references to these, which is uh...smelly, or to just shrug and go 'welp that's gotta be part of your manual integration testing on new features', and risk this not being implemented. (or just...ctrl+f through the codebase every so often.) saintonan posted:^^ or the above I really wish there was just a datetime.datetime.now_with_tz() function in the standard library to return a timezone aware current datetime, because god it's irritating to have to remember how to do this every time it comes up, and more and more stuff expects you to be using TZ aware dts Falcon2001 fucked around with this message at 21:46 on Jun 2, 2023 |
# ? Jun 2, 2023 19:31 |
|
Wasn't one of the requirements to capture stdout? I'm not sure that this is a great way to meet that requirement - this solution only echoes stdout in a few specific places, and it's got the adoption problem that you pointed out. You're basically creating a new interface that sits between the developers and click, are you planning to reimplement every click interface that prints to stdout? What if someone needs to call a 3rd party library that also writes to stdout, are you going to have to write new interfaces for those modules too? I think it'd be a lot more foolproof if you wrapped the main entry point of your application(s) with something that truly captures and redirects `sys.stdout`. Then no one needs to update their code, and people can robustly add new print, echo, secho, etc. statements wherever they want. This is also easier to test because only the application entrypoint(s) need to be tested to make sure that it has the stdout capture hook. I don't often deal with text-based user input so I'm less sure of how to capture that reliably. If you're wrapping 3rd party functions then I tend to believe that creating drop-in replacements is the way to go, to minimize refactor work. Instead of naming that function `get_confirmation` I'd name it `confirm` and stick it in a module that's named something like "click_logged.py". The new function's signature should be using *args and **kwargs, potentially just grabbing the first argument so that it can be echoed: `confirm(text, *args, **kwargs)`. A better programmer than me could probably direct you to some crazy decorator solution that lets you monkeypatch the click stdin methods with your wrapped version "I implemented my own logger instead of using the native one" gives me pause, but if you're certain that the built-in logging module doesn't have what you need then I'm not going to say it's a bad idea. If it was my group and I was reading that comment in a PR I'd say "prove it, prove that you need this" but you don't gotta do poo poo here lmao go with god
|
# ? Jun 3, 2023 06:06 |
|
what is the go-to python distributable binary compiler in 2023? three years ago reddit said:quote:PyInstaller, Nuitka and cx_Freeze Specifically I'd like to build a distributable binary for this app https://github.com/bes-dev/stable_diffusion.openvino which is a fork of stable diffusion that has CPU support which removes the need to install cuda or even have a GPU which improves accessibility at the cost of render times taking several minutes instead of under 30 seconds Because it has these fairly large hurdles: Install Python <= 3.9.0 Set up and update PIP to the highest version Install OpenVINO™ Development Tools 2022.3.0 release with PyPI Which are a pretty high bar to clear for the casual user
|
# ? Jun 3, 2023 07:18 |
|
QuarkJets posted:Wasn't one of the requirements to capture stdout? I'm not sure that this is a great way to meet that requirement - this solution only echoes stdout in a few specific places, and it's got the adoption problem that you pointed out. You're basically creating a new interface that sits between the developers and click, are you planning to reimplement every click interface that prints to stdout? What if someone needs to call a 3rd party library that also writes to stdout, are you going to have to write new interfaces for those modules too? Yeah I looked into the whole redirecting sys.stdout/etc path and basically it can cause weird fuckery with some libraries (potentially including Click) because it breaks some TTY stuff? Just reading through some Stackoverflow posts and it sounded like it was full of weird gotchas involving setting environment variables and stuff, which just sounded like a bunch of pain in the rear end especially for users of the program, which is more important to avoid than some minor confusion in the dev side. It's possible there is a nice way to do it, but I wasn't able to find anything. FWIW: Click only has a pretty small number of actual IO options, so it's not like we're talking hundreds of functions; there's around five or so, and one of them handily replaces print, so really there's not a ton of things we need to wrap. There's also the whole wrapping stdin as well as stdout part, which apparently also causes OTHER weird issues. Programming is hilarious because sometimes you just go 'well that sounds super simple' like I did when I started on this task and now I'm halfway through reading the docs for pkgutil and a bunch of other reasonably arcane python libraries I shouldn't be touching and preparing to do string finds on entire code files. I'll dig around a bit more and see if there's something I can at least prototype out, but most of the stuff I found completely captures or redirects stdout instead of teeing it, and the best solution most people put was 'just run it with tee jeez' and that's not really relevant to this scenario. Edit: Look at this poo poo: https://stackoverflow.com/questions/616645/how-to-duplicate-sys-stdout-to-a-log-file this isn't even the full thing I'm trying to do and it's still basically a bunch of arcane nightmares including literally subprocessing tee.
|
# ? Jun 3, 2023 08:47 |
|
Hadlock posted:what is the go-to python distributable binary compiler in 2023? three years ago reddit said: Try pyinstaller and see. Then try the others if it doesn’t work.
|
# ? Jun 3, 2023 11:26 |
|
Falcon2001 posted:
There are a couple of answers to that question that build on a very succinct Tee class, it looks like it's exactly what you need. Have you tried that? QuarkJets fucked around with this message at 20:00 on Jun 3, 2023 |
# ? Jun 3, 2023 19:58 |
|
If I have to keep track of the distances between 1500 points (i.e, the distance between each point and 1499 others), is the best way to structure that, say in a sqlite db, just a table with 1500 records that are 1499 int arrays?
|
# ? Jun 3, 2023 20:46 |
haruspicy posted:If I have to keep track of the distances between 1500 points (i.e, the distance between each point and 1499 others), is the best way to structure that, say in a sqlite db, just a table with 1500 records that are 1499 int arrays? Why not a table with 1500 * 1499 rows with point1, point2, distance?
|
|
# ? Jun 3, 2023 21:41 |
|
haruspicy posted:If I have to keep track of the distances between 1500 points (i.e, the distance between each point and 1499 others), is the best way to structure that, say in a sqlite db, just a table with 1500 records that are 1499 int arrays? Do you really need all 1500! distances at every time step? That's a weird requirement even for physics simulation What if you defined a table that was 2 integers and 1 float 1. The id of the first point 2. The id of the second point 3. The distance between the points
|
# ? Jun 3, 2023 21:46 |
|
|
# ? May 28, 2024 14:32 |
|
QuarkJets posted:There are a couple of answers to that question that build on a very succinct Tee class, it looks like it's exactly what you need. Have you tried that? Now that it's not as late and I'm not as angry, I think you're right. I got caught up in the whole 'won't work for embedded C libraries' and kind of forgot none of my proposed solutions would do that either so it wasn't really worth worrying about.
|
# ? Jun 4, 2023 05:55 |