Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
LightRailTycoon
Mar 24, 2017

Chin Strap posted:

Used all the time in Pandas too

And standard library string methods

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

Falcon2001 posted:

I'll be honest, Python imports are kind of a murky soup to me, despite working with it a bunch. I would also be interested if anyone had a video/blog post or something that explained it without getting into the deep minutiae of import logic, simply the 'if you run from X you pull in Y' level stuff.

Python imports have some deep complexity but the hand-wavey basics are easy and will apply to like 99.9% of cases.

Files ending in ".py" are modules. Directories containing a file named "__init__.py" are modules too. In a Python session the sys.path is simply a list of directories, and you can import any module found in one of these directories. Modules can be nested, so you can "import x.y.z" mimics directory structure x/y/z.py or x/y/z/, but only so long as each directory has an __init__.py in it. You can use "from" to extract part of an import into a shortened alias, or to extract only specific objects from a module. From can be used to extract a module but can also extract other objects (classes, functions, etc.) defined in that module. These are all equivalent:

Python code:
import os.path
print(os.path.sep)
Python code:
from os import path
print(path.sep)
Python code:
from os.path import sep
print(sep)
In these examples, "os" is a module, "path" is a module that is inside of "os", and "sep" is a string inside of the "path" module. For the sake of simplicity, let's say this is a directory "os" with a file "path.py", and inside of that file is the string "sep". Since sep is not a module, "import os.path.sep" fails. And "from os.path import sep" succeeds because "from" lets us specify other objects to import from a module.

You can use relative imports for importing things from one module into another, the syntax replicates the basics of a linux filesystem; "from ..buttz import fartz" checks for a module that is 2 parent directories up named "buttz" and tries to import an object named "fartz".

Usually when people have import problems it's because they forgot to include a "__init__.py" somewhere, they accidentally created a circular import (e.g. two modules both try to import from each other), or they are trying to import from a module that isn't in sys.path. You should not be modifying sys.path yourself unless you have a good reason. "I just want this bit of code to work for the next few minutes" is a good reason. If this is code you'll be working on for awhile and sharing with other people what you probably really want to do is make a package that can be installed with pip, which takes like 2 minutes once you know how to do it.

huhu
Feb 24, 2006
I feel like every definition I’ve had of imports paints them nicely. Then I ask one follow up question and it falls apart.

rowkey bilbao
Jul 24, 2023
A pydantic JSON serialization question. How do you deal with custom non-pydantic objects ?
I want instances of the House class to be serialized to a simple fixed string, here's an example

Python code:
from pydantic import BaseModel, ConfigDict, GetCoreSchemaHandler
from pydantic_core import core_schema
from typing import Type, Any, Dict  

class House:
    humidity: int
    owner: str

    def __init__(self, value: str):
        self.house_name = value

    @classmethod
    def validate(cls, v=None, *args, **kwargs):
        return House(v)

    @classmethod
    def __get_pydantic_core_schema__(
        cls, source: Type[Any], handler: GetCoreSchemaHandler

    ) -> core_schema.CoreSchema:
        return core_schema.general_plain_validator_function(cls.validate)

def test_serialize():
    class CreditorOverview(BaseModel):
        outputs: Dict[str, House]
        model_config = ConfigDict(
            json_encoders={
                House: lambda v: "*****",
            },
        )

    class FunkoCollection(BaseModel):
        outputs: Dict[str, House]

    creds1 = CreditorOverview(outputs={"secret": House("beach house")})
    # this works because of CreditorOverview.model_config.json_encoder
    print(creds1.model_dump_json())
    # this doesn't work ('Unable to serialize unknown type House')
    creds2 = FunkoCollection(outputs={"secret": House("money shack")})
    print(creds2.model_dump_json())

test_serialize()
Ideally I'd only be touching the serialization logic on the House class.
It's used as a member on a bunch of objects which get serialized too, and I'd rather not define a model_config.json_encoder on each one (also I'm not sure it would work)

I believe it can be accomplished by correctly writing House.__get_pydantic_core_schema__. I tried variants on this snippet too but couldn't get it right
Python code:
        return core_schema.no_info_after_validator_function(
            cls.validate,
            core_schema.str_schema(),
            serialization=core_schema.format_ser_schema("?????"),
        )
How do I make sure instances of House always get JSON serialized to "*****", without rounding up every object that references House, and with minimal changes to the base class logic ?

rowkey bilbao fucked around with this message at 11:39 on Sep 26, 2023

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

Python imports have some deep complexity but the hand-wavey basics are easy and will apply to like 99.9% of cases.

Files ending in ".py" are modules. Directories containing a file named "__init__.py" are modules too. In a Python session the sys.path is simply a list of directories, and you can import any module found in one of these directories. Modules can be nested, so you can "import x.y.z" mimics directory structure x/y/z.py or x/y/z/, but only so long as each directory has an __init__.py in it. You can use "from" to extract part of an import into a shortened alias, or to extract only specific objects from a module. From can be used to extract a module but can also extract other objects (classes, functions, etc.) defined in that module. These are all equivalent:

Python code:
import os.path
print(os.path.sep)
Python code:
from os import path
print(path.sep)
Python code:
from os.path import sep
print(sep)
In these examples, "os" is a module, "path" is a module that is inside of "os", and "sep" is a string inside of the "path" module. For the sake of simplicity, let's say this is a directory "os" with a file "path.py", and inside of that file is the string "sep". Since sep is not a module, "import os.path.sep" fails. And "from os.path import sep" succeeds because "from" lets us specify other objects to import from a module.

You can use relative imports for importing things from one module into another, the syntax replicates the basics of a linux filesystem; "from ..buttz import fartz" checks for a module that is 2 parent directories up named "buttz" and tries to import an object named "fartz".

Usually when people have import problems it's because they forgot to include a "__init__.py" somewhere, they accidentally created a circular import (e.g. two modules both try to import from each other), or they are trying to import from a module that isn't in sys.path. You should not be modifying sys.path yourself unless you have a good reason. "I just want this bit of code to work for the next few minutes" is a good reason. If this is code you'll be working on for awhile and sharing with other people what you probably really want to do is make a package that can be installed with pip, which takes like 2 minutes once you know how to do it.

This seems pretty reasonable. I think that so many IDEs/etc handwave and handle the 'what directories are part of your path for the purposes of testing' that people just don't have to dig into this much - at work we use a custom sort of python build system that pulls in dependencies and it can be pretty complex, but also something you don't touch directly, just define and run.

Big Dick Cheney
Mar 30, 2007
Why can't I just import datetime? Why do I always have to do 'from datetime import datetime, timedelta', etc?

Jigsaw
Aug 14, 2008

Big Dick Cheney posted:

Why can't I just import datetime? Why do I always have to do 'from datetime import datetime, timedelta', etc?

You can. You just have to then use "datetime.datetime" and "datetime.timedelta" instead of "datetime" and "timedelta".

Data Graham
Dec 28, 2009

📈📊🍪😋



To be clear, datetime is just confusing because inside the module "datetime" there is another submodule called "datetime".

Which sucks when you're digging through a project that sometimes imports the top-level one, sometimes the second-level one. Just because whoever wrote it had no regard for future them (aka them)

Armitag3
Mar 15, 2020

Forget it Jake, it's cybertown.


from them import frustration

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I don't have time to import time from time

nullfunction
Jan 24, 2005

Nap Ghost

Armitag3 posted:

from them import frustration as usual

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I'm very upset that that is perfectly valid Python

QuarkJets
Sep 8, 2008

I especially hate package names that install a completely different module name. Pillow is the big offender for me, I hate that the module you import is actually named PIL. May as well call it PISS

WHERE MY HAT IS AT
Jan 7, 2011

QuarkJets posted:

I especially hate package names that install a completely different module name. Pillow is the big offender for me, I hate that the module you import is actually named PIL. May as well call it PISS

At least Pillow has the excuse of it being a fork of PIL so the import is for backwards compatibility/historical reasons. Beautiful Soup 4 with its dumb "install beautifulsoup4 (but not beautifulsoup, because that's beautifulsoup 3!) and then import bs4" can go to hell, though.

Data Graham
Dec 28, 2009

📈📊🍪😋



And things like that are mostly because of abandonware issues.

rowkey bilbao
Jul 24, 2023
Coming back to my previous question from another angle

I have a simple class
Python code:
class Gem:
    def __init__(self, value):
        self.value = value
It can be used and referenced from a pydantic BaseModel class

Python code:
from pydantic import BaseModel, ConfigDict
from pydantic.functional_serializers import PlainSerializer
from typing_extensions import Annotated

from typing import Type, Any, Dict, Union
class SimpleAsteroid(BaseModel):
    contains: Annotated[
        Gem,
        PlainSerializer(
            lambda x: f"{x.value[::-1]}", return_type=str, when_used="json"
        ),
    ]
    model_config = ConfigDict(arbitrary_types_allowed=True)
I can write
Python code:
m1 = SimpleAsteroid(contains=Gem("sploppa"))
print(m1.model_dump_json())
And it will work.

My problem is, the Gem model can be used in several places, for example all those instances are valid
Python code:
m2 = ComplicatedAsteroid(soil={"underground": Gem("sploppa")})
m3 = ComplicatedAsteroid(soil={"herb": "excellent"})
m4 = ComplicatedAsteroid(soil="rich")
This is doable by having
Python code:
class ComplicatedAsteroid(BaseModel):
    soil: Union[
        Dict[
            str, # Dict key type
            Union[ # Dict value types
                str,
                Annotated[
                    Gem,
                    PlainSerializer(
                        lambda x: f"{x.value[::-1]}", return_type=str, when_used="json"
                    ),
                ],
            ],
        ],
        str,
        None,
    ]
    model_config = ConfigDict(arbitrary_types_allowed=True)
but it gets nightmarish real quick. How can I keep the serialization logic as close to the Gem class as possible ?

Can I somehow "make" Gem into an Annotation ? This gets real murky for me.

e: full example

rowkey bilbao fucked around with this message at 14:10 on Sep 27, 2023

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Why don't you just base Gem on one of the Pydantic objects?

rowkey bilbao
Jul 24, 2023
It's got a bunch of extra logic that I'd rather not touch, also it's not really my call.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
You can do like
Python code:
Gem2 = Annotated[Gem, ...]
and use that? Pick a better name of course

Macichne Leainig
Jul 26, 2012

by VG

Data Graham posted:

And things like that are mostly because of abandonware issues.

I would say "just replace the abandonware" but I think we all know it's load-bearing abandonware at this point.

rowkey bilbao
Jul 24, 2023

Chin Strap posted:

You can do like
Python code:
Gem2 = Annotated[Gem, ...]
and use that? Pick a better name of course

Is this what you had in mind ? I have a similar issue.

Python code:
from pydantic import BaseModel, ConfigDict
from pydantic.functional_serializers import PlainSerializer
from typing_extensions import Annotated

class GemBase:
    def __init__(self, value):
        self.value = value

Gem = Annotated[
    GemBase,
    PlainSerializer(lambda x: f"it is {x.value}", return_type=str, when_used="json"),
]

class SimpleAsteroid(BaseModel):
    contains: Gem
    model_config = ConfigDict(arbitrary_types_allowed=True)

class ComplicatedAsteroid(BaseModel):
    soil: Union[
        Dict[
            Any,
            Union[Dict, Gem],
        ], str
    ]
    model_config = ConfigDict(arbitrary_types_allowed=True)
 

m1 = SimpleAsteroid(contains=Gem("sploppa"))
m2 = ComplicatedAsteroid(soil={"underground": Gem("a diamond")})
m3 = ComplicatedAsteroid(soil={"underground": {"rocks": Gem("a spirulin")}})
 

print(m1.model_dump_json()) # OK
print(m2.model_dump_json()) # OK
print(m3.model_dump_json()) # NOK: 'Unable to serialize unknown type Gem'

rowkey bilbao fucked around with this message at 17:08 on Sep 27, 2023

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
A) You've got to give error output if you want help

B) This seems like a terrible class to work with. Can't you simplify things at all

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams

Macichne Leainig posted:

I would say "just replace the abandonware" but I think we all know it's load-bearing abandonware at this point.

It's one of the big downsides of open source ecosystems with a centralized package repository. When someone abandons "pillow" then the community can pick it up, but the package "pillow" still belongs to that abandoned project. So everybody has to work around that limitation.

rowkey bilbao
Jul 24, 2023

Chin Strap posted:

A) You've got to give error output if you want help

B) This seems like a terrible class to work with. Can't you simplify things at all

I edited the error message in my previous post.

I'm poking around trying to find something that breaks the least amount of things.

I liked your idea but the way I implemented it changes the instance type from Gem to typing._AnnotatedAlias, which breaks down at some point in the test suite.

Is there a way to create a class that retains the initial characteristics while also containing the necessary information for pydantic to do its serialization like I want to ?

What I haven't tried yet is to add a Gem__metadata__ member that replicates what I see when calling inspect.get_annotations on an annotated type.

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I've never tried it myself, but can you use Gem and BaseModel as mixins to make a new class?

SurgicalOntologist
Jun 17, 2004

I suggest take a step back and review your whole type system, not just Gem. In particular soil. It doesn't make sense to me that it can be so many different things (string, dict of strings, dict of gems). I mean, I can imagine the reasons. But I think if you try to design a proper Soil type, rather than trying to capture the different possibilities of Soil with different arrangements of builtin types, you will end up simplifying things a lot.

Edit: to be more specific, you don't really have a dict in the sense of mapping arbitrary keys to values. At least judging by your choice of "underground" in the example, that looks like a domain concept, not an arbitrary string.

From another angle, imagine you have your data model set up, and your next task is to implement a method that checks whether a certain gem is anywhere in the asteroid. That's going to be a nightmare with all the combinations you'll have to check.

SurgicalOntologist fucked around with this message at 21:20 on Sep 27, 2023

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Is there a simple way in Python to persist a data structure of some kind between runs, like a database? Like literally just saving a list of dictionaries to JSON and then saving it to a file would probably be fine and might be what I end up going with.

I have a one-off project to kick off a ticket campaign, so I generally want to run it for a few, then check on them, and then continue, maybe tweaking my script between runs. Mostly I want to make sure that I capture any state like ticket IDs/etc as the script goes through and then can say 'if ticket_id: continue' to skip double-making tickets.

QuarkJets
Sep 8, 2008

Falcon2001 posted:

Is there a simple way in Python to persist a data structure of some kind between runs, like a database? Like literally just saving a list of dictionaries to JSON and then saving it to a file would probably be fine and might be what I end up going with.

I have a one-off project to kick off a ticket campaign, so I generally want to run it for a few, then check on them, and then continue, maybe tweaking my script between runs. Mostly I want to make sure that I capture any state like ticket IDs/etc as the script goes through and then can say 'if ticket_id: continue' to skip double-making tickets.

do you want a database? sqlite is built in

json is good too and honestly much better if you don't need an actual database for some reason, but i hear from backend developers all the time "QuarkJets my dick can't get hard unless I'm working with a database in all of my projects" so maybe that's something you require. If you're looking for key membership among rows of data then it sounds like this may be you

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

do you want a database? sqlite is built in

json is good too and honestly much better if you don't need an actual database for some reason, but i hear from backend developers all the time "QuarkJets my dick can't get hard unless I'm working with a database in all of my projects" so maybe that's something you require. If you're looking for key membership among rows of data then it sounds like this may be you

I thank God every day I wake up and no longer am on call for databases loving up, so I think I'll just use a JSON file.

wolrah
May 8, 2006
what?

Falcon2001 posted:

I thank God every day I wake up and no longer am on call for databases loving up, so I think I'll just use a JSON file.
SQLite may be a database but it doesn't have a lot of the things you'd usually run in to problems with so using it as a local storage format is something the developers actually actively promote: https://www.sqlite.org/appfileformat.html

Whatever device you're using to read this is almost certainly running multiple applications that use SQLite internally.

That said if there'd be a benefit to this file being manually editable with a plain text editor other options may be better.

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I wouldn't think about this problem as how you save data to disk, but instead what form do you want this data to take as you're using it (probably just a list or set of ticket IDs?). You could pickle that and write that to disk, or output as JSON and write that. Don't over think it.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
To me it sounds like something you'd like to query. I've not used sqlite but if it is really that low of a bar to set up makes sense to me.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
This is going to have 80 or less "rows" so honestly a pickle or a json is probably perfectly fine. I'll take a look at sqlite though.

LightRailTycoon
Mar 24, 2017
On the subject of pickles, does anyone use shelf? It looks pretty nice.

Son of Thunderbeast
Sep 21, 2002
I wrote a script that ran beautifully for the first couple months, but in the past week or two it's gotten suddenly and significantly slower. I'm not sure if it's because of one of the updates I pushed but that's my current guess.

I'm running it now with cProfile to see what it can show me, but does anyone have some good beginner tips for troubleshooting performance issues?

It's kinda funny because I've been in support for years, and I have always hated and avoided troubleshooting performance issues, but now I can't avoid it

spiritual bypass
Feb 19, 2008

Grimey Drawer
Does it read through some data set? Has it grown over time?

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
I personally recommend PyInstrument as a profiler because it only shows you the things where your code spends substantial time. Its HTML output ability is nice. IIRC cProfile shows you every operation/call, many of which will be irrelevant to the program's speed of execution.

Anyway, your question is pretty broad. Sometimes the issue comes from a bit of code that works fine when your dataset is small and not so fine when it isn't. I once had a small section of code that was consuming 25% of the program's runtime, and it's because it was repeatedly calling min() on a very large, growing set. This is not only O(n), it was n operations every time. Tracking the minimum value through another means sped up that section over 500x, and that bit of code didn't even register as noteworthy in the profiler anymore. :shrug:

Son of Thunderbeast
Sep 21, 2002

spiritual bypass posted:

Does it read through some data set? Has it grown over time?

It does read through sets of data, but the sets are always within a certain size range and the data the script operates on doesn't change much.

And yeah sorry for how vague and broad it is, but I'm a relative beginner, especially when it comes to troubleshooting a performance issue, so I'm just looking for broad and basic starting points.

I'll give pyinstrument a look, thanks!

Son of Thunderbeast fucked around with this message at 02:37 on Oct 1, 2023

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
If it's gotten significantly slower recently, the first thing I'd do is grab an older version of the script that predates the slowdown.

This does two things:
- Lets you confirm that the slowdown was caused by a code change (or tells you to look elsewhere if it turns out the old version has also mysteriously gotten slower!)
- Gives you a baseline for comparing profiles - if something takes a long but roughly equal time in both profiles, then it might be a good candidate for general optimization, but isn't actually the cause of the regression you're investigating. If something takes a long time with the new code but not with the old one then that's where you want to focus.

Jabor fucked around with this message at 02:24 on Oct 1, 2023

Adbot
ADBOT LOVES YOU

Son of Thunderbeast
Sep 21, 2002

Jabor posted:

If it's gotten significantly slower recently, the first thing I'd do is grab an older version of the script that predates the slowdown.
oh my god of course lol, thank you.

Between this and initial profiling tests, I think I found the issue--the slowdowns started around the time I added an extremely hacky dupe check to my already extremely hacky index ID generator, which gets called a lot.

I knew when I was writing this that it was hosed up, because I was tired and googling libraries and shoving parts in and twisting them until they fit and did the thing I wanted. It worked fine enough except about once every 250,000 calls or so, it would throw an exception because of a hash collision. So I threw the while loop in there as a quick duct tape job.

I'm going to work on fixing this, but if anyone has some tips I'm all ears.

code:
def generate_index(id:int, table_name:str):
        s = sql_module.Tools()
        existing_pids = set(s.get_columns(table=table_name,cols=["index"])["index"].tolist())
        p_int = random.randint(1,65000)
        p_salt = p_int.to_bytes(16, "little")
        index = pyblake2.blake2b(data=bytes(id), salt=p_salt)
        while index in existing_pids:
            p_int = random.randint(1,65000)
            p_salt = p_int.to_bytes(16, "little")
            index = pyblake2.blake2b(data=bytes(id), salt=p_salt)
        return index.hexdigest()
Also if this is notably and hilariously bad please let me know how so I can laugh too

Son of Thunderbeast fucked around with this message at 03:32 on Oct 1, 2023

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply