Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Seventh Arrow
Jan 26, 2005

Jabor posted:

You seem to be expecting the chatbot to actually know things and I'm really not sure why you have that expectation?

So I can be smug with an emotionless robot, obviously.

Anyways, I'm sure it will murder me for my impertinence after the skynet protocol becomes active.

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

Seventh Arrow posted:

So I can be smug with an emotionless robot, obviously.

Anyways, I'm sure it will murder me for my impertinence after the skynet protocol becomes active.

I miss fishmech, too

Precambrian Video Games
Aug 19, 2002



Data Graham posted:

Someone in another thread mentioned

Python code:
a = "wtf"
b = "wtf"
a is b
> True

a = "wtf!"
b = "wtf!"
a is b
> False
wtf?

You'll enjoy this, then:

Python code:
a = "abc!"; b = "abc!"; a is b
> True
... and this:
Python code:
if True:
    a = "wtf!"
    b = "wtf!"
    a is b
> True
... but what about this?
Python code:
if True:
    a = "wtf!"
    b = "wtf!"
a is b
> True in script/ipython, SyntaxError in base python repl?!

Precambrian Video Games fucked around with this message at 14:19 on May 18, 2023

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Something I've enjoyed in PowerShell is something called Parameter Sets. Basically, you can have a whole mess of parameters for a function, and you can define them such that what's required or optional or even allowed is dependent on other parameters you've specified. As a somewhat simple example, if you were writing a function that added a user to a group, you would need a username, and then either a group name or group id. You could configure this easily in PowerShell, such that the interpreter would handle the case where someone passed in a username and nothing else, or passed in both a group name or group id.

Is there anything like that in Python? I know I could make both group_name and group_id optional, and then in the code raise a ValueError if both group_name and group_id are given, or if neither are given, but that's a lot of boilerplate code. I know with typing you can import overload so that you can at least define those cases for type hinting, but nothing at run time will actually enforce that.

For my case I'm working with this godawful Grouper API and there are some API endpoints that will do a ton of things based on what parameters are sent in, but I'd like to do some sanity checking beforehand because the errors returned from the API are often not clear, and also the API will often do stupid things so passing in "invalid" parameters in a certain combination might not cause any errors, but might cause unexpected results. Also as I'm thinking to myself, maybe I right one mega function that will just take everything as an optional input as a private function, but then write simpler functions that use the private function but with more limited parameters so it's much harder for someone to get themselves in trouble.

Precambrian Video Games
Aug 19, 2002



I think what you want is a class for each Parameter Set that you can easily convert to a dict of kwargs to pass to external APIs. You can accomplish that a lot of ways using any of the large number of Python packages for configuration (or rolling your own if you're restricted in what dependencies you can add). I recommend pydantic dataclasses. You can do static type checking and pydantic will also do runtime validation on top of (almost) all of the features of standard library dataclasses. You can also nest classes if needed, although you may need to put in a bit of effort to flatten the dict if you want to pass them as kwargs to another function.

Precambrian Video Games fucked around with this message at 16:28 on May 18, 2023

QuarkJets
Sep 8, 2008

I was going to suggest dataclasses too. What's the difference between a dataclass and a pydantic dataclass?

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
A Pydantic dataclass runs validations on the data before just returning a regular dataclass.

So would the expectation be that the person consuming my functions would be generating the correct dataclass and then passing that into a function? Or I'm using a dataclass under the hood somehow?

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

eXXon posted:

You'll enjoy this, then:

Python code:
a = "abc!"; b = "abc!"; a is b
> True
... and this:
Python code:
if True:
    a = "wtf!"
    b = "wtf!"
    a is b
> True
... but what about this?
Python code:
if True:
    a = "wtf!"
    b = "wtf!"
a is b
> True in script/ipython, SyntaxError in base python repl?!

I’m guessing the original is due to string interning and the rest are compile time optimisations?

Precambrian Video Games
Aug 19, 2002



QuarkJets posted:

I was going to suggest dataclasses too. What's the difference between a dataclass and a pydantic dataclass?

Like I said, pydantic is a library for configuration that makes runtime type checking and validation much easier. However, it's built around a BaseModel class that I find rather clunky to use and overkill for most use cases. The pydantic dataclass is a drop-in replacement for standard library dataclasses that adds most of the features of a BaseModel and keeps almost every other dataclass behaviour the same (there are a few quirks with some of the newest standard library dataclass features like slots).

FISHMANPET posted:

A Pydantic dataclass runs validations on the data before just returning a regular dataclass.

So would the expectation be that the person consuming my functions would be generating the correct dataclass and then passing that into a function? Or I'm using a dataclass under the hood somehow?

I think dataclasses are fairly transparent for users (especially if you use Fields) and largely leave complexity to developers, so yes, I would define functions that take a dataclass and pass on the dict-ified version to the Grouper API. An example:

Python code:
from dataclasses import asdict

from pydantic import Field, validator
from pydantic.dataclasses import dataclass

class UserConfig:
    validate_assignment = True

@dataclass(kw_only=True, config=UserConfig)
class User:
    username: str = Field("", title="username")
    group: int = Field(0, title="group number", ge=0)

    @validator('username')
    def name_must_not_be_empty(cls, v):
        if not v:
            raise ValueError('must not be empty')
        return v

try:
    user = User(username=None)
except Exception as error:
    print("Caught:\n", error)

user = User(username="jdoe", group=5)

try:
    user.username = ""
except Exception as error:
    print("Caught:\n", error)

try:
    user.group = -5
except Exception as error:
    print("Caught:\n", error)

print(user)
print(asdict(user))
If you want the class to be immutable, you can drop the validate_assignment config and set frozen=True in the dataclass) For ensuring that only one of group id/name is specified, you can use a root validator, and would probably have to strip out None values from the return of asdict to pass onwards.

DoctorTristan posted:

I’m guessing the original is due to string interning and the rest are compile time optimisations?

I have no idea.

Jose Cuervo
Aug 25, 2004
I have a sqlite database with a table with the following columns: SID (the subject ID that the data belongs to), dt_index (contains the date and time that the blood glucose value was collected), and bg_mg_per_dL (the blood glucose value).

I now have a list of tuples where each tuple contains a SID and a start date and time indicating that I want to retrieve the blood glucose values for that SID starting at that start date and time and for the next 6 hours.

Is the best (fastest?) way to achieve this to use a for loop? The length of the list of tuples can be anywhere from 5 entries to 1000.

Python code:
forecasts = []
for SID, start_dt in the_matches:
	forecasts.append(c.execute("""
	                                 SELECT dt_index, bg_mg_per_dL from blood_glucose 
					 WHERE SID == :SID
					 AND datetime(dt_index) >= datetime(:date_time)
					 AND datetime(dt_index) < datetime(:date_time, '+' || :hrs || ' hours')
					""",
					{'SID': SID,
					 'date_time': start_dt,
					'hrs': 6}).fetchall())

Jose Cuervo fucked around with this message at 21:26 on May 25, 2023

QuarkJets
Sep 8, 2008

Jose Cuervo posted:

I have a sqlite database with a table with the following columns: SID (the subject ID that the data belongs to), dt_index (contains the date and time that the blood glucose value was collected), and bg_mg_per_dL (the blood glucose value).

I now have a list of tuples where each tuple contains a SID and a start date and time indicating that I want to retrieve the blood glucose values for that SID starting at that start date and time and for the next 6 hours.

Is the best (fastest?) way to achieve this to use a for loop? The length of the list of tuples can be anywhere from 5 entries to 1000.

Python code:
forecasts = []
for SID, start_dt in the_matches:
	forecasts.append(c.execute("""
	                                 SELECT dt_index, bg_mg_per_dL from blood_glucose 
					 WHERE SID == :SID
					 AND datetime(dt_index) >= datetime(:date_time)
					 AND datetime(dt_index) < datetime(:date_time, '+' || :hrs || ' hours')
					""",
					{'SID': SID,
					 'date_time': start_dt,
					'hrs': 6}).fetchall())

That's a reasonable approach. You could turn the big statement inside the loop into a function and replace the for loop with a list comprehension, that probably wouldn't be any faster but I think it'd be better organized

Oysters Autobio
Mar 13, 2017
I think I've already come in herre with similar questions, so I apologize in advance if there's some repetition.

I'm a data analyst who mainly works in GUIs like Tableau, PowerBI etc. but I'm interested in upskilling with more skills in python and dev-work in general around data analysis (also "data science" too but only for the simple applications of ML to be realistic in that my day-to-day responsibilities which aren't in building models. I also don't have a CS, heavy stats or engineering background.).

I want to learn how to do more of my work in a "devops" or software developer way both for my own general upskilling/interest but also because I really enjoy the consistency and reproducibility with working using the same set of tools that aren't based on GUIs (love the concept of TDD and design patterns, and interestingly they feel less daunting then having to tackle some giant-bloated Tableau dashboard because templating in BI tools absolutely blows).

My work will pay for some online learning, so I really want to pick the best platform for me, but I've been having trouble finding something that is both super interactive (i.e. has some sort of "fake" web dev IDE that simulates what actually doing the skills would "feel" like) while not being so abstracted that its completely like mad-libs / fill-in-the-blanks (DataCamp often feels like this) and its still based around actual projects and not trying to teach you python through rote memorization (yes, I understand I need to just "write python", but my ADHD brain doesn't work with just "pushing through" something, I have to be actually doing something tangible for my focus to stay on enough that I'm actually writing code).

Much prefer to learn how to build a python package end-to-end and at first only understand like 20% of applying python techniques themselves so that when I'm applying something its actually being done in the context of a tangible end goal. The whole "import foo and baz" poo poo is absolutely useless to me because I'll never remember that unless I've applied it and built some sort of memory of what this was doing in the context of a problem.


I get frustrated seeing the kind of examples that should only be in documentation, not an actual tutorial. poo poo like this:

code:
In [132]: columns = pd.Categorical(
   .....:     ["One", "One", "Two"], categories=["One", "Two", "Three"], ordered=True
   .....: )
   .....: 

In [133]: df = pd.DataFrame(
   .....:     data=[[1, 2, 3], [4, 5, 6]],
   .....:     columns=pd.MultiIndex.from_arrays([["A", "B", "B"], columns]),
   .....: )
   .....: 

In [134]: df.groupby(axis=1, level=1).sum()
Out[134]: 
   One  Two  Three
0    3    3      0
1    9    6      0
is frustrating because yeah I understand what its doing but because the example isn't contextualized I'll never remember it (this btw is a fake example since its pulled from pandas docs, but I have seen tutorials written like this) or learn to apply it in real life.

If the examples were stupid but still "practical" they would work 1000% better like "John is writing an app for a lemonade stand and needs to write a function that groups lemonade drinks by brands. He has a csv with all the brands and all the drinks, here is what he would do..."

It's also tricky to find "realistic" material for using python in a low-level "applied" sort of way. I'm not a web developer, so while I don't need to learn how to build an entire app, I still don't want all the code I'm writing just to be scratch notebooks filled with pandas queries, especially since very often what I'm asked to do is often very repeatable without much work to parameterize or at least be made in such a way that other colleagues could re-use.

Anyways, bit of a long rant, sorry had to just get it out on screen here. If anyone knows of anything that's near the combo of what Im talking about (data-oriented, interactive / simulated environment and project/contextually focused) then it would be much appreciated for my stupid overthinking ADHD brain.

Popete
Oct 6, 2009

This will make sure you don't suggest to the KDz
That he should grow greens instead of crushing on MCs

Grimey Drawer
I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect.

What other options do I have?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Popete posted:

I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect.

What other options do I have?

A queue?

QuarkJets
Sep 8, 2008

Popete posted:

I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect.

What other options do I have?

Once the library is loaded into memory the first time there's not really anything that you can do to directly stop accessing it in the future from the other process, so that's out.

You can use a file-based mutex, but that's sloppy and can be error-prone if you don't set it up just right

You could have one script control the other, then you can lock access, wait around for the child process to finish doing something before you give it more work, etc.

What I think you should do is define a real-deal python package, reimplement these scripts as functions with a primary entry point and everything, and then define some kind of main() or __main__.py that handles the execution of the "scripts" (which are just modules in your package now).

Foxfire_
Nov 8, 2010

Popete posted:

I have two Python scripts running independently that both access a shared resource in a library. I want to lock the library when one script is using it but I cannot seem to get this to work using a multiprocess.Lock as that appears to only work on processes spawned from the same thread. The lock appears to have no effect.

What other options do I have?
What do you want to happen when one process has the lock and then crashes or is killed? Is whatever underlying thing the library does consistent and recoverable?

If you are on Linux, named semaphores have a mechanism for informing another process that the last semaphore owner terminated without unlocking it. On Windows, named mutexes do (named semaphores do not). Those can (1) wait until the other process is done using it and (2) tell you the other process died while using it.

Both of these will be creating an object in a kernel namespace that will exist as long as at least one handle is open to it from some process. Opening the same name from another process will give it a handle to the same kernel object.

There are probably no easily straightforward python standard library wrappers to these; it generally tries to be platform independent and this is inherently platform specific code.

Popete
Oct 6, 2009

This will make sure you don't suggest to the KDz
That he should grow greens instead of crushing on MCs

Grimey Drawer
Yes ideally if one process dies well holding the lock I would be able to notice and recover the lock from this. To be more explicit one process is a daemon running in the background every 2 seconds it's accessing devices over the I2C bus, another process can be run by a user at anytime and access the same devices over I2C, there is a rare chance the two collide and their I2C transactions overlap each other (it requires a write and then a read for this transaction to complete and that's where the problem can occur).

I started to write my own file lock solution where I create a file under /tmp (Linux) and wrote the string "locked" or "unlocked" to the file which would get checked when one process wanted to take the lock. Then I remembered I needed a way to cause one program to spin on the lock but wasn't sure how to do that.

jaete
Jun 21, 2009


Nap Ghost

Popete posted:

I started to write my own file lock solution where I create a file under /tmp (Linux) and wrote the string "locked" or "unlocked" to the file which would get checked when one process wanted to take the lock. Then I remembered I needed a way to cause one program to spin on the lock but wasn't sure how to do that.

Maybe select? https://docs.python.org/3/library/select.html

(But as others said external files can get a bit gnarly, probs not best overall solution)

Foxfire_
Nov 8, 2010

Popete posted:

Yes ideally if one process dies well holding the lock I would be able to notice and recover the lock from this. To be more explicit one process is a daemon running in the background every 2 seconds it's accessing devices over the I2C bus, another process can be run by a user at anytime and access the same devices over I2C, there is a rare chance the two collide and their I2C transactions overlap each other (it requires a write and then a read for this transaction to complete and that's where the problem can occur).

I started to write my own file lock solution where I create a file under /tmp (Linux) and wrote the string "locked" or "unlocked" to the file which would get checked when one process wanted to take the lock. Then I remembered I needed a way to cause one program to spin on the lock but wasn't sure how to do that.

To confirm, when you say write then read, do you mean (A) I2C write, then time passes with the bus idle, then a read, or (B) a write, a repeated start, then a read? They aren't electrically the same. The kernel driver for your I2C controller should be providing a way to do (B) and will do it atomically. (B) is common for things like "Write a register address, then read data from it" and typically chips won't accept (A) for doing that operation anyway. If you're using i2c-dev via smbus2 python wrapper, it's i2c_rdwr()

If you were doing the file locking version, you'd typically use advisory file locks (python standard library has a wrapper in fnctl.flock()) on the dummy file, not writing actual content into the files. Trying to acquire a lock while someone has an exclusive lock will block, and any held locks will be freed by the kernel when the holding process terminates for any reason. When you acquire the lock, it won't tell you if the last owner died while holding it (and it was released by the kernel instead), but it sounds like you probably don't care.

(also since you mentioned you were using multiprocessing and linux, be aware that you need to call multiprocessing.set_start_method("spawn"). The default start method violates POSIX and will cause random crashes/deadlocks depending on the internal implementation of other libraries and what they happen to be doing at the moment of the fork)

Foxfire_ fucked around with this message at 20:02 on May 29, 2023

Popete
Oct 6, 2009

This will make sure you don't suggest to the KDz
That he should grow greens instead of crushing on MCs

Grimey Drawer
I'm using smbus version 1.1 which does not appear to have the i2c_rdwr option in this older version. Here is what the I2C write/read section looks like.

code:
bus = SMBus(0)
...
bus.write_byte_data(chip_addr, self.conf.UCD90124_PAGE_REG, self.conf.voltRails[rail][self.conf.index])
vout = bus.read_word_data(chip_addr, self.conf.UCD90124_VOUT_REG)
vout_mode = bus.read_byte_data(chip_addr, self.conf.UCD90124_VOUT_MODE_REG)
...
bus.close
The write byte sets the page register on the chip and then you read from the VOUT_REG to get the specified rail voltage as set by the PAGE reg.

That would be 2 separate bus transactions and I don't believe it would be atomic (which I would like). So what happens is one program is writing/reading values to the chip and the other program comes in and does the same stomping over the first programs write/reads and causes incorrect values to be read out.

Foxfire_
Nov 8, 2010

For SMBus/PMBus, i2c_rdwr wouldn't help you anyway. Each of those read commands is doing a repeated start internally and then a stop because that's what those specifications require.

fcntl.flock() on a file is what I'd do. Either some dummy temporary or the device file for the bus.

Popete
Oct 6, 2009

This will make sure you don't suggest to the KDz
That he should grow greens instead of crushing on MCs

Grimey Drawer
I actually tried fcntl.flock already but I had implemented it in the library calling the smbus functions itself and it did not work as I expected. This time I implemented the flock calls outside the library and surrounded the calls too the library from each separate script and that appears to be working. Not sure what the difference would be, also entirely possible I screwed something up previously.

Precambrian Video Games
Aug 19, 2002



Oysters Autobio posted:

I think I've already come in herre with similar questions, so I apologize in advance if there's some repetition.

I'm a data analyst who mainly works in GUIs like Tableau, PowerBI etc. but I'm interested in upskilling with more skills in python and dev-work in general around data analysis (also "data science" too but only for the simple applications of ML to be realistic in that my day-to-day responsibilities which aren't in building models. I also don't have a CS, heavy stats or engineering background.).

I don't have good advice for you but I will say that if you find pandas confusing, especially its indexing system, you are not alone. I loathe .loc and iloc and really most things about pandas. multiindexing just seems to make it even worse. I can't suggest an alternative other than that astropy has a (in my opinion) more lightweight and intuitive table system plus a great module for physical units, but there's little reason for non-physicists to use it. There are plain old numpy tables, I guess.

To answer your question with another question, though, have you looked into R's tidyverse? I never particularly understood it either but its proponents seem to really like it*. I gave up on R a while ago because I think it's a terrible programming language and I gather Python has better support and interoperability with ML-focused packages, but you can use both, in principle. I'd actually be interested in seeing a recent and comprehensive comparison of the two (here's just one example from half-assed searching).

* you said you don't have a heavy stats or CS background. I feel like R and pandas were both designed for and by users in social sciences and biology, who have a greater need for categorical variables (AKA factors in R) and are more used to querying databases with SQL than doing object-oriented programming.

LightRailTycoon
Mar 24, 2017
As far as I can tell , polars is a better pandas in every way, including the API.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
IIRC polars doesn't have all the functions pandas does, though I'm not enough of an expert in either to go into detail.

But yeah I've almost totally switched over to polars. The differences in ergonomics, memory usage, and especially speed vs pandas are just plain unfair.

LightRailTycoon
Mar 24, 2017
Yeah, I do have one project that uses pandas to collect the results of sql queries and output them to html or excel, no math. I’m not planning to port that over anytime soon.

bolind
Jun 19, 2005



Pillbug
I'm trying to replicate the behaviour of the Linux command 'date', but the timezone has me baffled:

Python code:
Python 3.6.8 (default, Jan 25 2023, 08:28:52) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> print(datetime.datetime.now().strftime("%Z"))

>>> print(datetime.datetime.now().strftime("%z"))

>>> quit()
I're read up on matters, and learned a bit about timezone aware and timezone naive timestamps, which kinda makes sense, but how hard can it be to figure out which time zone the OS thinks it's in?

wolrah
May 8, 2006
what?

bolind posted:

I're read up on matters, and learned a bit about timezone aware and timezone naive timestamps, which kinda makes sense, but how hard can it be to figure out which time zone the OS thinks it's in?

datetime.now() is a naive object, you want datetime.now().astimezone()

code:
>>> print(datetime.datetime.now().astimezone().strftime("%z"))
-0400
>>> print(datetime.datetime.now().astimezone().strftime("%Z"))
Eastern Daylight Time
>>>

saintonan
Dec 7, 2009

Fields of glory shine eternal

bolind posted:

I'm trying to replicate the behaviour of the Linux command 'date', but the timezone has me baffled:

Python code:
Python 3.6.8 (default, Jan 25 2023, 08:28:52) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> print(datetime.datetime.now().strftime("%Z"))

>>> print(datetime.datetime.now().strftime("%z"))

>>> quit()
I're read up on matters, and learned a bit about timezone aware and timezone naive timestamps, which kinda makes sense, but how hard can it be to figure out which time zone the OS thinks it's in?

^^ or the above

Try datetime.datetime.now().astimezone().tzinfo

bolind
Jun 19, 2005



Pillbug
Thank you both of you!

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
My team is working on a Click-based CLI suite; part of the requirements is that we have a way to bundle up all IO for logging to a remote object of some kind (ticket right now, but somewhere more sane in the future); I looked into how to capture those but all the options seemed to have weird downsides and was getting into footgun territory, so instead I expanded on the number of functions we have that wrap native Click functions to do this. Yes, I implemented my own logger instead of using the native one, just so I had more control over formatting/etc.

Requirements: Log all user input and stdout, as well as (in the future) additional logged info.

Python code:
def show_user_processing_msg(msg: str) -> None:
	click.secho(msg, color='Green')
	getlogger().add_log(msg, OUTPUT)

def get_confirmation(confirm_prompt: str, **kwargs) -> bool:
	getlogger().add_log(confirm_prompt, OUTPUT)
	response = click.confirm(confirm_prompt, **kwargs)
	getlogger().add_log(response, INPUT)
	return response
(there's more there in terms of implementation, and it works for now, this is just for illustrative purposes)

I'd like to enforce the usage of these functions over the native Click ones, ideally through a test of some kind, as if someone uses these (or even .input() or print() they won't get logged) However, there's a few problems with the most obvious approaches:

1. We have a bunch of people working on this with independent testing setups, and most of them aren't using pytest. Cutting them all over as part of this work is probably out of scope, but could be a last resort. If we did that, I could possibly setup some weird Mocks to track usage and just confirm that the amount of times the native click function was called is equal to the number of times the wrapper function is called.

2. We don't want to block calling these functions at all, and there's other native click stuff that still needs doing that isn't part of these IO functions, so I can't just look for 'import click' and raise a problem.

My current plans are either to have a test that does a literal string search through the modules looking for references to these, which is uh...smelly, or to just shrug and go 'welp that's gotta be part of your manual integration testing on new features', and risk this not being implemented. (or just...ctrl+f through the codebase every so often.)

saintonan posted:

^^ or the above

Try datetime.datetime.now().astimezone().tzinfo

I really wish there was just a datetime.datetime.now_with_tz() function in the standard library to return a timezone aware current datetime, because god it's irritating to have to remember how to do this every time it comes up, and more and more stuff expects you to be using TZ aware dts

Falcon2001 fucked around with this message at 21:46 on Jun 2, 2023

QuarkJets
Sep 8, 2008

Wasn't one of the requirements to capture stdout? I'm not sure that this is a great way to meet that requirement - this solution only echoes stdout in a few specific places, and it's got the adoption problem that you pointed out. You're basically creating a new interface that sits between the developers and click, are you planning to reimplement every click interface that prints to stdout? What if someone needs to call a 3rd party library that also writes to stdout, are you going to have to write new interfaces for those modules too?

I think it'd be a lot more foolproof if you wrapped the main entry point of your application(s) with something that truly captures and redirects `sys.stdout`. Then no one needs to update their code, and people can robustly add new print, echo, secho, etc. statements wherever they want. This is also easier to test because only the application entrypoint(s) need to be tested to make sure that it has the stdout capture hook.

I don't often deal with text-based user input so I'm less sure of how to capture that reliably. If you're wrapping 3rd party functions then I tend to believe that creating drop-in replacements is the way to go, to minimize refactor work. Instead of naming that function `get_confirmation` I'd name it `confirm` and stick it in a module that's named something like "click_logged.py". The new function's signature should be using *args and **kwargs, potentially just grabbing the first argument so that it can be echoed: `confirm(text, *args, **kwargs)`.

A better programmer than me could probably direct you to some crazy decorator solution that lets you monkeypatch the click stdin methods with your wrapped version

"I implemented my own logger instead of using the native one" gives me pause, but if you're certain that the built-in logging module doesn't have what you need then I'm not going to say it's a bad idea. If it was my group and I was reading that comment in a PR I'd say "prove it, prove that you need this" but you don't gotta do poo poo here lmao go with god

Hadlock
Nov 9, 2004

what is the go-to python distributable binary compiler in 2023? three years ago reddit said:

quote:

PyInstaller, Nuitka and cx_Freeze

Specifically I'd like to build a distributable binary for this app

https://github.com/bes-dev/stable_diffusion.openvino which is a fork of stable diffusion that has CPU support which removes the need to install cuda or even have a GPU which improves accessibility at the cost of render times taking several minutes instead of under 30 seconds

Because it has these fairly large hurdles:

Install
Python <= 3.9.0
Set up and update PIP to the highest version
Install OpenVINO™ Development Tools 2022.3.0 release with PyPI

Which are a pretty high bar to clear for the casual user

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

Wasn't one of the requirements to capture stdout? I'm not sure that this is a great way to meet that requirement - this solution only echoes stdout in a few specific places, and it's got the adoption problem that you pointed out. You're basically creating a new interface that sits between the developers and click, are you planning to reimplement every click interface that prints to stdout? What if someone needs to call a 3rd party library that also writes to stdout, are you going to have to write new interfaces for those modules too?

I think it'd be a lot more foolproof if you wrapped the main entry point of your application(s) with something that truly captures and redirects `sys.stdout`. Then no one needs to update their code, and people can robustly add new print, echo, secho, etc. statements wherever they want. This is also easier to test because only the application entrypoint(s) need to be tested to make sure that it has the stdout capture hook.

I don't often deal with text-based user input so I'm less sure of how to capture that reliably. If you're wrapping 3rd party functions then I tend to believe that creating drop-in replacements is the way to go, to minimize refactor work. Instead of naming that function `get_confirmation` I'd name it `confirm` and stick it in a module that's named something like "click_logged.py". The new function's signature should be using *args and **kwargs, potentially just grabbing the first argument so that it can be echoed: `confirm(text, *args, **kwargs)`.

A better programmer than me could probably direct you to some crazy decorator solution that lets you monkeypatch the click stdin methods with your wrapped version

"I implemented my own logger instead of using the native one" gives me pause, but if you're certain that the built-in logging module doesn't have what you need then I'm not going to say it's a bad idea. If it was my group and I was reading that comment in a PR I'd say "prove it, prove that you need this" but you don't gotta do poo poo here lmao go with god

Yeah I looked into the whole redirecting sys.stdout/etc path and basically it can cause weird fuckery with some libraries (potentially including Click) because it breaks some TTY stuff? Just reading through some Stackoverflow posts and it sounded like it was full of weird gotchas involving setting environment variables and stuff, which just sounded like a bunch of pain in the rear end especially for users of the program, which is more important to avoid than some minor confusion in the dev side. It's possible there is a nice way to do it, but I wasn't able to find anything.

FWIW: Click only has a pretty small number of actual IO options, so it's not like we're talking hundreds of functions; there's around five or so, and one of them handily replaces print, so really there's not a ton of things we need to wrap.

There's also the whole wrapping stdin as well as stdout part, which apparently also causes OTHER weird issues. Programming is hilarious because sometimes you just go 'well that sounds super simple' like I did when I started on this task and now I'm halfway through reading the docs for pkgutil and a bunch of other reasonably arcane python libraries I shouldn't be touching and preparing to do string finds on entire code files.

I'll dig around a bit more and see if there's something I can at least prototype out, but most of the stuff I found completely captures or redirects stdout instead of teeing it, and the best solution most people put was 'just run it with tee jeez' and that's not really relevant to this scenario.

Edit: Look at this poo poo: https://stackoverflow.com/questions/616645/how-to-duplicate-sys-stdout-to-a-log-file this isn't even the full thing I'm trying to do and it's still basically a bunch of arcane nightmares including literally subprocessing tee.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Hadlock posted:

what is the go-to python distributable binary compiler in 2023? three years ago reddit said:

Specifically I'd like to build a distributable binary for this app

https://github.com/bes-dev/stable_diffusion.openvino which is a fork of stable diffusion that has CPU support which removes the need to install cuda or even have a GPU which improves accessibility at the cost of render times taking several minutes instead of under 30 seconds

Because it has these fairly large hurdles:

Install
Python <= 3.9
Set up and update PIP to the highest version
Install OpenVINO™ Development Tools 2022.3.0 release with PyPI

Which are a pretty high bar to clear for the casual user

Try pyinstaller and see. Then try the others if it doesn’t work.

QuarkJets
Sep 8, 2008

Falcon2001 posted:


Edit: Look at this poo poo: https://stackoverflow.com/questions/616645/how-to-duplicate-sys-stdout-to-a-log-file this isn't even the full thing I'm trying to do and it's still basically a bunch of arcane nightmares including literally subprocessing tee.

There are a couple of answers to that question that build on a very succinct Tee class, it looks like it's exactly what you need. Have you tried that?

QuarkJets fucked around with this message at 20:00 on Jun 3, 2023

haruspicy
Feb 10, 2023
If I have to keep track of the distances between 1500 points (i.e, the distance between each point and 1499 others), is the best way to structure that, say in a sqlite db, just a table with 1500 records that are 1499 int arrays?

a foolish pianist
May 6, 2007

(bi)cyclic mutation

haruspicy posted:

If I have to keep track of the distances between 1500 points (i.e, the distance between each point and 1499 others), is the best way to structure that, say in a sqlite db, just a table with 1500 records that are 1499 int arrays?

Why not a table with 1500 * 1499 rows with point1, point2, distance?

QuarkJets
Sep 8, 2008

haruspicy posted:

If I have to keep track of the distances between 1500 points (i.e, the distance between each point and 1499 others), is the best way to structure that, say in a sqlite db, just a table with 1500 records that are 1499 int arrays?

Do you really need all 1500! distances at every time step? That's a weird requirement even for physics simulation

What if you defined a table that was 2 integers and 1 float
1. The id of the first point
2. The id of the second point
3. The distance between the points

Adbot
ADBOT LOVES YOU

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

There are a couple of answers to that question that build on a very succinct Tee class, it looks like it's exactly what you need. Have you tried that?

Now that it's not as late and I'm not as angry, I think you're right. I got caught up in the whole 'won't work for embedded C libraries' and kind of forgot none of my proposed solutions would do that either so it wasn't really worth worrying about.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply