Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
QuarkJets
Sep 8, 2008

FredMSloniker posted:

What I mean is, in (say) Lua, if I had a function like

Lua code:
local look_at_thing = function(s)
    return my_big_table[s]
end

local my_big_table = {["foo"] = "bar", ["baz"] = "bow"}

print(look_at_thing("foo"))

I'd get the error attempt to index a nil value (global 'my_big_table'). It works if I take off the local, but in Lua, I want to keep the scope of things as small as possible, yes? But Python has only the vaguest notions of scope, from what I understand (is there any way to keep an import from grabbing everything in a module?), so as long as my_big_table is defined before I call look_at_thing, I'm good?

Yeah Python scope is very broad, broader than most languages.

On imports: imports can be limited, this line will only import a specific function and a specific variable from "some_module.py":

Python code:
from some_module import some_function, some_variable
Some tutorials will recommend using "from some_module import *", which will import everything from "some_module.py". I don't like doing that, and most style guides advise against it

quote:

so as long as my_big_table is defined before I call look_at_thing, I'm good?

That's right. So you've got to put that table definition somewhere near the bottom of your code (so that it can capture all of the functions that you want it to contain) but before you start actually executing anything that may need to rely on that table (e.g. before a __main__ block, if you have one of those).

Adbot
ADBOT LOVES YOU

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.

QuarkJets posted:

Yeah Python scope is very broad, broader than most languages.

On imports: imports can be limited, this line will only import a specific function and a specific variable from "some_module.py":

Python code:
from some_module import some_function, some_variable

What I mean is, there's no way for me to define peasant.py so only the get_state_complex function can be imported. But I guess that wasn't a concern, developing Python.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
If you name your functions starting with an _, that's a convention to signify stuff that shouldn't be called from outside the module. It also means that it won't be imported when someone does from your_module import *.

It's still technically possible to call it, but it'll look obviously fucky to any experienced Python programmer.

Foxfire_
Nov 8, 2010

FredMSloniker posted:

Where did I mismatch?
I misread what you were trying to do; nesting f-strings didn't occur to me, especially mixed with the truthy string assignment. Implicit conversion of everything to a bool is something that I think is bad and I agree with languages where it is not a thing (rust), or you can get a warning for it (most C/C++ compilers) and then fix it. The only benefit of it is marginally less typing, and it leads to hard-to-read code + subtle bugs (i.e. things like "" or None converting to false).

FredMSloniker posted:

why does it exist if I shouldn't use it?
Python historically grew out of a hobby project and it has lots of leftovers where stuff is the way it is because it was either easier to implement or from aesthetic purity. The CPython interpreter originally did name -> object lookups using the same data structure that was used for a python-level dict, so it also exposed those dicts to the python level code. It doesn't work that way anymore, but locals() and globals() still exist to make python dict copies of the symbol tables (and are still occasionally useful for things like debuggers or tracing code).

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Foxfire_ posted:

Implicit conversion of everything to a bool is something that I think is bad and I agree with languages where it is not a thing (rust), or you can get a warning for it (most C/C++ compilers) and then fix it. The only benefit of it is marginally less typing, and it leads to hard-to-read code + subtle bugs (i.e. things like "" or None converting to false).

I fully disagree with the 'hard to read code' part. Truthiness checks - as long as you understand that they're there - allow for much more concise, cleaner code, it just means you have to know certain assumptions. For example:

Python code:
my_list = thing_that_generates_a_list()
if not my_list:
    continue

# VS
my_list = thing_that_generates_a_list()
if len(my_list) == 0 or isinstance(my_list, None):
    continue
If you're coming from Rust, you're going to be confused by the above, because you'll go 'well of course my_list exists, you just generated it' but it's much simpler than the alternatives, and as long as you understand truthiness it's not that subtle of a bug and IMO understanding truthiness is not a particularly complex Python concept.

The biggest issue with it that I've run into is expecting the string "False" to evaluate to False, but that's a pretty edge case and I caught it immediately.

It's a possible footgun, but it gives you a lot of python-focused benefits in return; the other stuff like locals() is another category of Python traps, but in the opposite direction: Here's really powerful stuff you can do, but you probably don't want to.

Falcon2001 fucked around with this message at 18:48 on Oct 9, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Honestly if I have one complaint about Python coming from C# it's not truthiness checking, it's the lack of type hint / typing requirements. Some older codebases at work are huge and utterly devoid of type hints and it makes them really difficult to update. This is of course, a pretty standard complaint and you should build this into your local coding reqs/etc, but yeah, I think it's a way bigger problem than truthiness stuff.

Foxfire_
Nov 8, 2010

Falcon2001 posted:

it just means you have to know certain assumptions.
The problem is that people don't make consistent assumptions, either between people or for the same person in different contexts/points in time

For example:
- the containers in collections all convert to False when empty
- the containers in queue all convert to True when empty
- a user-defined container class that didn't go out of its way to override __bool__() defaults to converting to True
- a standard library empty array converts to False (not that you should ever use that anyway)
- an unsized NumPy array converts to True
- a empty NumPy array converts to False with a DeprecationWarning because they plan to change it to be an error
- a length-1 NumPy array converts like its one member
- a length >1 NumPy array throws an error on conversion
- a file open for input at EOF converts to True. If you list()-ed it, that list would convert to False
- nan converts to True(JavaScript disagrees and I bet there are some people who would passionately argue that it should obviously be False)
- "0" is True (PHP and Perl disagree)
- document.all would be True if it existed in python (JavaScript is a strange, strange beast :))

Mixing None with converts-to-false is also prone to subtle data-dependent errors because the conversion for strings is actually "True if and only if the string is both not None and not "", but people frequently act like it's either a None test or an empty string test, depending on how they're thinking about their data at the time

It also makes the and operator be asymmetric, (x and y) is not equivalent to (y and x):
False and None is False, but None and False is None

There's a whole lot of subtleties, and not very much benefit to it.

Foxfire_ fucked around with this message at 20:08 on Oct 9, 2022

QuarkJets
Sep 8, 2008

Falcon2001 posted:

I fully disagree with the 'hard to read code' part. Truthiness checks - as long as you understand that they're there - allow for much more concise, cleaner code, it just means you have to know certain assumptions. For example:

Python code:
my_list = thing_that_generates_a_list()
if not my_list:
    continue

# VS
my_list = thing_that_generates_a_list()
if len(my_list) == 0 or isinstance(my_list, None):
    continue
If you're coming from Rust, you're going to be confused by the above, because you'll go 'well of course my_list exists, you just generated it' but it's much simpler than the alternatives, and as long as you understand truthiness it's not that subtle of a bug and IMO understanding truthiness is not a particularly complex Python concept.

The biggest issue with it that I've run into is expecting the string "False" to evaluate to False, but that's a pretty edge case and I caught it immediately.

It's a possible footgun, but it gives you a lot of python-focused benefits in return; the other stuff like locals() is another category of Python traps, but in the opposite direction: Here's really powerful stuff you can do, but you probably don't want to.

A few things:
1. It's more standard to use "mylist is not None" rather than using isinstance, it's a little more concise too. PEP8 also advises this
2. thing_that_generates_a_list() shouldn't be able to return None, so the None check is superfluous
3. "Explicit is better than implicit." If you want to check if the returned container is empty, you should write code that actually does that instead of using an implicit truthiness conversion

Truthiness checks exchange explicitness for a tiny bit convenience. I think it's better to be explicit and PEP8 agrees: "beware of writing `if x` when you really mean `if x is not None`". So while truthiness is correct code that can sometimes make code easier to read, it can also sometimes make code harder to read and can even be dangerous, as Foxfire_ is pointing out.

12 rats tied together
Sep 7, 2006

Truthy and Falsey values in python are well understood at this stage of the programming languages life, I think.

Checking if something is not None is a code smell though. Python is an OOP language so avoid if statements, avoid type checks, especially avoid writing if statements that are actually type checks.

Isolate the thing that is conditionally None and provide a safe API for it, don't let it propagate throughout your codebase.

The place where truthy and falsey provide the most value is while loops, because the important behavior lives in one place and the truthy or falseyness of it is trivial to verify in the language's excellent suite of repl and ide tooling.

QuarkJets
Sep 8, 2008

12 rats tied together posted:

Checking if something is not None is a code smell though. Python is an OOP language so avoid if statements, avoid type checks, especially avoid writing if statements that are actually type checks.

OTOH, "is not None" is the best pattern to use for default keyword arguments that use a mutable type. But yeah, it can be a code smell

I agree that we should avoid writing code that can return None in addition to another type, and we should avoid propagating that kind of ambiguous state into our code from elsewhere

TheJanitor
Apr 17, 2007
Ask me about being the strongest janitor since Roger Wilco
In Django working with optional querysets can lead to exciting code with "if not queryset". There is an A+ bug lurking in the below code:
code:
def get_user(user_id, optional_queryset=None):
    if not optional_queryset:
         optional_queryset = User.objects
    return optional_existing_queryset.filter(id=user_id).all()

my_locked_user_row = get_user(10, optional_queryset=User.objects.select_related('some_global_shared_thing').for_update())

In Django __bool__ on a queryset actually executes sql and returns if any rows were found. So in the above example we are accidentally locking every single User row instead of just the one we wanted. Ontop of that by using "select_related" we were also locking rows in the joined tables. This is a nice and easy way to ensure your production application is not actually running any requests in parallel and force every single request to run one by one locking the same shared rows.

QuarkJets
Sep 8, 2008

Holy smokes, that's one doozy of a bug

Let's say it again

:moonrio::moonrio::moonrio: Explicit is better than implicit :moonrio::moonrio::moonrio:

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Yeah, I'm happy to be wrong here, although I will say that my isinstance(None) was me just trying to remember it offhand without looking it up and not necessarily a statement of 'and THIS is what I call None checking'

That being said, is there some more reading on why None/null is a bad value and what the right patterns are? I've had some stuff recently I probably won't be working on in the future but one of the big things was a series of calls out to other services to try and lookup data.

Some of these calls returned empty results, so I made the return type Optional[str]. If I got a response, I returned it; if I got an empty response, I just returned None, and so on back up the chain. This allowed me to opt-out some portions of the data processing, because much of the input data was human-generated and couldn't be trusted to return results 100% of the time.

12 rats tied together
Sep 7, 2006

Basically you want to get rid of the NoneTypes asap because they're, kind of by definition, the least useful type in the language. The simplest example I can think of is that "for x in response" where response is an empty list is valid code, but if response is None that's a runtime error.

What are you going to do with the optional string? Is there an appropriate default value for when the string isn't present? You can encode the answers to these questions permanently in your object model, which usually tends to simplify the code, improve readability and maintainability, and other good things.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

12 rats tied together posted:

Basically you want to get rid of the NoneTypes asap because they're, kind of by definition, the least useful type in the language. The simplest example I can think of is that "for x in response" where response is an empty list is valid code, but if response is None that's a runtime error.

What are you going to do with the optional string? Is there an appropriate default value for when the string isn't present? You can encode the answers to these questions permanently in your object model, which usually tends to simplify the code, improve readability and maintainability, and other good things.

I think that's the simple explanation, but is there a good place to read about the more complex one? I can see the shape of what y'all are getting at and I'd like to know more, but there's so many weird edge cases when I try and imagine it, that all end up just re-implementing patterns like QuarkJets/etc are talking about with guards and ifs.

Default attributes, for example, seem like a much bigger problem to me - like John Null https://www.wired.com/2015/11/null/ - and runs the risk of this stuff showing up in the output data later on. Sure, I can just make some 'if attr_name is DEF_STRING' checks, but that feels like shuffling deck chairs on the Titanic rather than fixing the underlying problem.

Precambrian Video Games
Aug 19, 2002



12 rats tied together posted:

Checking if something is not None is a code smell though.

Huh? What is smelly about either of:
Python code:
if optional_arg is not None:
Python code:
if x := some_dict.get(key) is not None:

12 rats tied together
Sep 7, 2006

Python handles the John Null situation in two ways:

1) None isn't actually "null", it is "the singleton instance of NoneType" which is an in-language placeholder for "the concept of nothing", but it actually is something, and that something has a (restrictive) API. This is different from other languages where Null might actually be nothing, it might be a macro that is a placeholder for "the integer 0", etc.

2) Python's NoneType casts to string as "None", so a John None would not run into any of these problems presuming that the handler for their name was running Python. They would correctly be stored as first_name John, last_name None, display_name John None.

As for default arguments, because the design depends so much on the business domain, there aren't a lot of good generic resources that I'm aware of. A non trivial but still toy example I like to pull out is, pretend we're writing doordash, and we have PickupOrder vs DeliveryOrder, and we're confused about what ValidateAddress() is doing. Which address is validated, or is it both? and what happens if we don't have a delivery address?

You solve this problem with OOP by extracting the things that vary, giving them names, and injecting them later at runtime. For this example, we might eventually conclude that DeliveryOrder and PickupOrder are fake, that we fooled ourselves into this failing pattern with names and inheritance, and that what we actually have here is an instance of the class FoodOrder that always has an instance of an object of the type FulfillmentBehavior, which always has ValidateAddress() (or some other wrapper method - a safe and stable API for the Order objects to consume).

In our FulfillmentBehavior object model we can design-away the NoneTypes and the optionals: PickupFulfillment never has a DeliveryDriver or a DeliveryAddress, but DeliveryFulfillment always does. Both objects can be talked-to by the parent FoodOrder in the same way, both objects always "do the right thing", nowhere in the code are we checking what the anything's type is, or double checking that things aren't None. If something were None unexpectedly, we'd raise an error in the constructor, and we'd never have completed a FoodOrder object to the point where it could possibly fail to actuate its ValidateAddress method. Helpfully, at the failure call site we also have all of the information we need to reconstruct it, so we can reproduce this easily later while debugging.

eXXon posted:

Huh? What is smelly about either of:
Python code:
if optional_arg is not None:
Python code:
if x := some_dict.get(key) is not None:
It's a code smell to be adding these things to an established codebase without it also being part of a PR that also introduces a new object model and all of its construction logic. To be clear, None exists, and you have to handle it sometime, but the places where you handle it should be some of the least changing, most understood parts of your codebase.

dict.get also makes me nervous, always. I'd rather see "if key in some_dict".

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

12 rats tied together posted:

Python handles the John Null situation in two ways:

1) None isn't actually "null", it is "the singleton instance of NoneType" which is an in-language placeholder for "the concept of nothing", but it actually is something, and that something has a (restrictive) API. This is different from other languages where Null might actually be nothing, it might be a macro that is a placeholder for "the integer 0", etc.

2) Python's NoneType casts to string as "None", so a John None would not run into any of these problems presuming that the handler for their name was running Python. They would correctly be stored as first_name John, last_name None, display_name John None.

As for default arguments, because the design depends so much on the business domain, there aren't a lot of good generic resources that I'm aware of. A non trivial but still toy example I like to pull out is, pretend we're writing doordash, and we have PickupOrder vs DeliveryOrder, and we're confused about what ValidateAddress() is doing. Which address is validated, or is it both? and what happens if we don't have a delivery address?

You solve this problem with OOP by extracting the things that vary, giving them names, and injecting them later at runtime. For this example, we might eventually conclude that DeliveryOrder and PickupOrder are fake, that we fooled ourselves into this failing pattern with names and inheritance, and that what we actually have here is an instance of the class FoodOrder that always has an instance of an object of the type FulfillmentBehavior, which always has ValidateAddress() (or some other wrapper method - a safe and stable API for the Order objects to consume).

In our FulfillmentBehavior object model we can design-away the NoneTypes and the optionals: PickupFulfillment never has a DeliveryDriver or a DeliveryAddress, but DeliveryFulfillment always does. Both objects can be talked-to by the parent FoodOrder in the same way, both objects always "do the right thing", nowhere in the code are we checking what the anything's type is, or double checking that things aren't None. If something were None unexpectedly, we'd raise an error in the constructor, and we'd never have completed a FoodOrder object to the point where it could possibly fail to actuate its ValidateAddress method. Helpfully, at the failure call site we also have all of the information we need to reconstruct it, so we can reproduce this easily later while debugging.

It's a code smell to be adding these things to an established codebase without it also being part of a PR that also introduces a new object model and all of its construction logic. To be clear, None exists, and you have to handle it sometime, but the places where you handle it should be some of the least changing, most understood parts of your codebase.

dict.get also makes me nervous, always. I'd rather see "if key in some_dict".

Well, I'll take this all as things to dig into, as I still don't fully grok it. Maybe I'll throw a more detailed example in the chat later if I have time if folks would like to help pick it apart. Again, I think I see the general point (drive for predictability/etc), I'm just not sure how it interacts with specific examples, all of which when I walk through them come back around to behavior that feels basically identical, just with extra steps.

In any event, clearly I was wrong in the first place, so thanks for the explanation.

12 rats tied together
Sep 7, 2006

I mean, you didn't say anything wrong (that I remember anyway, I'm phone posting).

Handling nulls is hard in every language, and the tools that the language gives you to deal with them are important. As a strong/dynamic language Python doesn't have a lot to offer you, especially compared to Rust's compiler. You pretty much just have OOP design principles and, for some classes of null check, mypy (or similar).

If you post some examples later I will give you my $0.02 on them, you're ultimately the best person here to make the call though. Sometimes optional int is better than VerboseEncapsulatingTypeHierarchy.

QuarkJets
Sep 8, 2008

I don't know of any particular book or article that describes this because this is mostly deriving from experience, but my thoughts are aligned with what 12 rats tied together is posting. Clean Code by Robert Martin is probably the most influential source for how I write software, I've recommended it itt before. It's actually about Java but most of its lessons apply to any modern language and to software development more generally. That and PEP8 is pretty comprehensive for solid Python development

Basically, you should have a well-defined internal data model. If you need to call external APIs that aren't consistent with your data model, then you can either redefine your model or transform incoming data to fit your model. Sometimes redefinition is unavoidable, but if you have confidence in your data model then transformation is all you really need. The functions that you write should have unambiguous input types and output types. Duck Typing is fine, so long as a common parent type is used; a function should not have to deal with an argument sometimes being an int, or sometimes a string, or sometimes None except for very specific and commonly used patterns (as mentioned earlier: None-replacement of default mutable types for keyword arguments is OK, but ideally you don't even need to do that)

In terms of writing clean code, the example function from earlier was named thing_that_generates_a_list(). This had drat well better return a list. If this is an external API function that can return None for some reason, then I want to try to wrap it with my own function that always returns a list; now I have 1 None check instead of possibly several, and it's localized to the edge of my codebase where it's clear that I'm transforming input data.

And as always, this isn't one size fits all. There are going to be edge cases where the data model just cannot be simple.

QuarkJets fucked around with this message at 23:24 on Oct 10, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Well I do own Clean Code and I'll give it another shot. Thanks again for diving into this; coming from a non-compsci background some coding concepts are very easy, and others are kind of difficult to grasp. Dependency Injection took me until I had to test some of my code to realize exactly why it's useful, for example, and then it clicked into place immediately.

Precambrian Video Games
Aug 19, 2002



12 rats tied together posted:

It's a code smell to be adding these things to an established codebase without it also being part of a PR that also introduces a new object model and all of its construction logic. To be clear, None exists, and you have to handle it sometime, but the places where you handle it should be some of the least changing, most understood parts of your codebase.

I still don't quite follow. It's necessary in most cases to set mutable-typed optional arguments to None whether they have a default or not, and if the sensible default is None (null) then it doesn't seem smelly at all:

Python code:
def func(arg_positional, arg_optional=None):
    # if arg_optional is None:
    #     arg_optional = some_default
    print(f"{arg_positional=}{arg_optional=}") # fine if you don't mind it printing None, but need to check otherwise
    if arg_optional is not None:
        arg_optional.do_something()

12 rats tied together posted:

dict.get also makes me nervous, always. I'd rather see "if key in some_dict".

Why does it make you nervous? It's less efficient to do key in dict followed by dict[key] if you actually need the value later on. It's also painful to write some comprehensions using dicts without walrus and get.

Precambrian Video Games fucked around with this message at 05:51 on Oct 11, 2022

QuarkJets
Sep 8, 2008

eXXon posted:

I still don't quite follow. It's necessary in most cases to set mutable-typed optional arguments to None whether they have a default or not, and if the sensible default is None (null) then it doesn't seem smelly at all:

Python code:
def func(arg_positional, arg_optional=None):
    # if arg_optional is None:
    #     arg_optional = some_default
    print(f"{arg_positional=}{arg_optional=}") # fine if you don't mind it printing None, but need to check otherwise
    if arg_optional is not None:
        arg_optional.do_something()
Why does it make you nervous? It's less efficient to do key in dict followed by dict[key] if you actually need the value later on. It's also painful to write some comprehensions using dicts without walrus and get.

Python code:
def func(arg_optional=None):
    if arg_optional is None:
         arg_optional = some_default
    arg_optional.do_something()
This is a fine standard pattern for mutable keyword arguments while preserving a consistent internal data model

Python code:
def func(arg_optional=None):
    if arg_optional is not None:
        arg_optional.do_something()
This function has fundamentally different behavior; it's the same as saying that there actually is no default value. This pattern makes no sense - the "optional" argument results in a no-op if not provided, therefore it should be a mandatory argument. If you're in a situation where you don't want to provide an argument, then the function will do nothing; so why call it?


eXXon posted:

Why does it make you nervous? It's less efficient to do key in dict followed by dict[key] if you actually need the value later on. It's also painful to write some comprehensions using dicts without walrus and get.

Personally, I find .get useful. Sometimes I want an exception to be raised if a key is missing (use square brackets with no conditional), sometimes I only want to do something if a key is present (square brackets with a conditional), sometimes I want to fetch a value but can fall back on a default value (.get). If I'm checking for an environment variable then I'm probably using .get; I don't see any advantage to using an if-not-present-then-replace statement for this, .get takes care of that cleanly and succinctly

Precambrian Video Games
Aug 19, 2002



QuarkJets posted:

This function has fundamentally different behavior; it's the same as saying that there actually is no default value. This pattern makes no sense - the "optional" argument results in a no-op if not provided, therefore it should be a mandatory argument. If you're in a situation where you don't want to provide an argument, then the function will do nothing; so why call it?

It's perfectly reasonable to have an optional argument with no default, for example:

Python code:
def func(arg_positional, arg_optional=None):
    arg_positional.do_something()
    if arg_optional is not None:
        arg_positional.do_something_else(arg_optional)
        arg_positional += arg_optional
Now you can argue that do_something_else should accept None, but you may not be able to change it to do so. In case of operators like + or * you can provide default values of arg_optional=0 or 1 (respectively) that typically result in no change to arg_positional, but you also might prefer to avoid the operation altogether if arg_positional is a large (say numpy) array and evaluating the conditional is less costly.

e: another scenario is if you want to pass arg_positional to arg_optional.

Precambrian Video Games fucked around with this message at 07:15 on Oct 11, 2022

QuarkJets
Sep 8, 2008

eXXon posted:

It's perfectly reasonable to have an optional argument with no default

It can be, but it was not reasonable in the example provided

quote:

for example:

Python code:
def func(arg_positional, arg_optional=None):
    arg_positional.do_something()
    if arg_optional is not None:
        arg_positional.do_something_else(arg_optional)
        arg_positional += arg_optional
Now you can argue that do_something_else should accept None, but you may not be able to change it to do so. In case of operators like + or * you can provide default values of arg_optional=0 or 1 (respectively) that typically result in no change to arg_positional, but you also might prefer to avoid the operation altogether if arg_positional is a large (say numpy) array and evaluating the conditional is less costly.

e: another scenario is if you want to pass arg_positional to arg_optional.

This function really does two different things: sometimes it does "something" and sometimes it does "something, then something else". To me, it seems like it'd make more sense as 2 functions (or maybe we don't need a function at all if we're just calling a method, and then we have 1 function where arg_optional is no longer optional). At a higher level, this is easier for a human to interpret (you don't have to worry about which of the logic branches a particular function call is using, you can be certain it's always following a specific one because there's only one). Testing this is a little more straightforward as well (you can test each function separately instead of having to parametrize one of the arguments).

I won't go as far as saying that a None default that dictates control logic like this is useless, but I think there are often better patterns that can be used

QuarkJets fucked around with this message at 16:14 on Oct 11, 2022

12 rats tied together
Sep 7, 2006

eXXon posted:

Why does it make you nervous? It's less efficient to do key in dict followed by dict[key] if you actually need the value later on. It's also painful to write some comprehensions using dicts without walrus and get.

Think QuarkJets did a good job tackling the other cases, but this one is just personal preference from me: any time I'm using a raw dict, I want the "raise if accessing missing key" behavior of the subscript notation. Submitting a PR with .get(key, default) on a raw dict is a yellow flag for me because: Why don't we know if the key will be in there? And, how are we certain that the default value is acceptable in all cases, if we don't even know what keys are in there? Why does this function own the default value of a key instead of the dict constructor?

Sometimes it's fine, or you're working in an ecosystem where it's idiomatic to do this kind of stuff. That's fine, I'm not going to try and change e.g. django documentation from this thread. Speaking generally, if this is code your team owns, IME dict.get([...], default) is a canary signal for "something is wrong here and will get worse later".

WRT efficiency, I have not yet worked in a python codebase where we can't afford the extra O(1) complexity of performing an explicit "k in d", but I could be convinced to approve such a PR in a codebase that needed it.

QuarkJets
Sep 8, 2008

What's the actual performance difference between value = a_dict.get(key, default) and value = default if key not in a_dict else a_dict[key]? I'm guessing it's going to be negligible enough that I wouldn't let it effect my code at all. I'm an optimization blow-hard but even for me that's beyond what I'd consider doing. There is surely a more effective way to optimize the performance of the software than doing this kind of replacement.

I think this one comes down to being a style choice, you should be doing whatever is conventional for your group (or if you get to set that standard, then whatever you like)

boofhead
Feb 18, 2021

e: whoops, ignore

Precambrian Video Games
Aug 19, 2002



QuarkJets posted:

What's the actual performance difference between value = a_dict.get(key, default) and value = default if key not in a_dict else a_dict[key]? I'm guessing it's going to be negligible enough that I wouldn't let it effect my code at all. I'm an optimization blow-hard but even for me that's beyond what I'd consider doing. There is surely a more effective way to optimize the performance of the software than doing this kind of replacement.

Probably not much unless you're doing billions of those calls, but I would still say the .get form is more readable and I'd prefer it in new code without bothering to go back and change existing calls.

nullfunction
Jan 24, 2005

Nap Ghost

QuarkJets posted:

What's the actual performance difference between value = a_dict.get(key, default) and value = default if key not in a_dict else a_dict[key]? I'm guessing it's going to be negligible enough that I wouldn't let it effect my code at all. I'm an optimization blow-hard but even for me that's beyond what I'd consider doing. There is surely a more effective way to optimize the performance of the software than doing this kind of replacement.

I think this one comes down to being a style choice, you should be doing whatever is conventional for your group (or if you get to set that standard, then whatever you like)

I was curious about this too.

code:
$ python3.10 -m timeit -r 10 -s "d = {'foo': 'bar'}" "value = d.get('foo', 'bar')"
10000000 loops, best of 10: 31.5 nsec per loop

$ python3.10 -m timeit -r 10 -s "d = {}" "value = d.get('foo', 'bar')"
10000000 loops, best of 10: 30.9 nsec per loop

$ python3.10 -m timeit -r 10 -s "d = {'foo': 'bar'}" "value = 'bar' if not 'foo' in d else d['foo']"
10000000 loops, best of 10: 27.1 nsec per loop

$ python3.10 -m timeit -r 10 -s "d = {}" "value = 'bar' if not 'foo' in d else d['foo']"
20000000 loops, best of 10: 18.8 nsec per loop

Precambrian Video Games
Aug 19, 2002



Interesting, apparently the function call makes it virtually impossible to optimize get over dict indexing. Someone apparently tried to file a bug report about this in 2019.

QuarkJets
Sep 8, 2008

nullfunction posted:

I was curious about this too.

code:
$ python3.10 -m timeit -r 10 -s "d = {'foo': 'bar'}" "value = d.get('foo', 'bar')"
10000000 loops, best of 10: 31.5 nsec per loop

$ python3.10 -m timeit -r 10 -s "d = {}" "value = d.get('foo', 'bar')"
10000000 loops, best of 10: 30.9 nsec per loop

$ python3.10 -m timeit -r 10 -s "d = {'foo': 'bar'}" "value = 'bar' if not 'foo' in d else d['foo']"
10000000 loops, best of 10: 27.1 nsec per loop

$ python3.10 -m timeit -r 10 -s "d = {}" "value = 'bar' if not 'foo' in d else d['foo']"
20000000 loops, best of 10: 18.8 nsec per loop

Just goes to show how important it is to profile before trying to optimize

e:
Here's a neat trick, on my system this is actually a little faster than all of the above methods:

code:
$ python -m timeit -r 10 -s "d = {'foo': 'bar'}; get=d.get" "value = get('foo', 'bar')"
And this gives performance that's a little worse, but still better than calling d.get directly???

code:
python -m timeit -r 10 -s "d = {'foo': 'bar'}; get = dict.get" "value = get(d, 'foo', 'bar')"

QuarkJets fucked around with this message at 00:06 on Oct 12, 2022

nullfunction
Jan 24, 2005

Nap Ghost

QuarkJets posted:

And this gives performance that's a little worse, but still better than calling d.get directly???

Ultimately both are pulling work out of the loop, so yeah, I'd expect them to be faster than a call to d.get(). I do see that it's about 3ns faster (regardless of whether the dict is populated) than the operator approach's worst-case on my machine, which surprised me a bit, still the operator's best-case wins handily, but

QuarkJets posted:

Just goes to show how important it is to profile before trying to optimize

is the real takeaway here.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I've got two stupid Python questions today:

Does numpy fall back on a pure python implementation if something goes wrong with loading the native extensions? I got thrown a python 2 script today where someone said "This is insanely slow, we've been using it but it takes 4-8 days to run" and asked if I could speed it up. I ran 2to3 then reindent over it, then updated its dependencies from numpy 1.16 to 1.23.3. Then when I went to profile it and figure out why it was slow, it finished in less than a minute on the same dataset that had previously taken a week. I tossed it back to the user and they're all happy with the new, python 3 version, but I want to know why the gently caress did porting Python 2 to 3 and updating numpy make it 10,000x faster?!?!?!?! Yes, results are identical between the two. The only thing I can possibly think of is that it was using a broken version of numpy, but 1.16 isn't even that old.

Also, is there a better way to indicate direct vs transitive dependencies in a requirements.txt file? The project only has a handful of direct dependencies, but because i'm checking in a full pip freeze rather than just the scipy>=0.18,<0.19, numpy>=1.16,<1.17 that was there before, now I have all of the transitive dependencies in the requirements.txt as well as the direct ones.

Edit: and while I'm here, what's the best of the modern build system backends. Hatch/hatchery? Poetry-core? Which one do you like?

Twerk from Home fucked around with this message at 05:48 on Oct 12, 2022

QuarkJets
Sep 8, 2008

Twerk from Home posted:

I've got two stupid Python questions today:

Does numpy fall back on a pure python implementation if something goes wrong with loading the native extensions? I got thrown a python 2 script today where someone said "This is insanely slow, we've been using it but it takes 4-8 days to run" and asked if I could speed it up. I ran 2to3 then reindent over it, then updated its dependencies from numpy 1.16 to 1.23.3. Then when I went to profile it and figure out why it was slow, it finished in less than a minute on the same dataset that had previously taken a week. I tossed it back to the user and they're all happy with the new, python 3 version, but I want to know why the gently caress did porting Python 2 to 3 and updating numpy make it 10,000x faster?!?!?!?! Yes, results are identical between the two. The only thing I can possibly think of is that it was using a broken version of numpy, but 1.16 isn't even that old.

No, to my knowledge numpy has never had a pure python mode. It's more likely that the code itself benefitted from the conversion to Python3 - range, zip, map, and filter used to all return lists instead of iterables, dictionary keys were lists instead of views, etc. For instance that kind of difference in performance could be due to no longer having to dip into swap space for some poorly-written (for python2) blocks of code. There could also be some other cause (maybe the data is accessed over a network and some jank-rear end Python2-based reader was struggling with that for hard to understand reasons?). Hard to say for sure without the code itself, but no point in dwelling on it

quote:

Also, is there a better way to indicate direct vs transitive dependencies in a requirements.txt file? The project only has a handful of direct dependencies, but because i'm checking in a full pip freeze rather than just the scipy>=0.18,<0.19, numpy>=1.16,<1.17 that was there before, now I have all of the transitive dependencies in the requirements.txt as well as the direct ones.

Edit: and while I'm here, what's the best of the modern build system backends. Hatch/hatchery? Poetry-core? Which one do you like?

The install_requires block of setup.cfg is where you should be placing your direct dependencies, requirements.txt is for reproducing a full environment.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
It looks like projects using pyproject.toml will specify that you should try to install wheels in the [build-system] block: https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/#fallback-behaviour

If I have a package that's using the legacy setup.py, is there a way to make it use wheels when installing? When I pip install my package, I'm seeing a whole lot of:
"Using legacy 'setup.py install' for $DEPENDENCY, since package 'wheel' is not installed."

I know that I could hack around this by installing wheel manually first in an environment before running pip install, but there must be a way to tell everyone's pip to try to use wheels by default because I've seen other packages do it.

QuarkJets
Sep 8, 2008

Twerk from Home posted:

It looks like projects using pyproject.toml will specify that you should try to install wheels in the [build-system] block: https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/#fallback-behaviour

If I have a package that's using the legacy setup.py, is there a way to make it use wheels when installing? When I pip install my package, I'm seeing a whole lot of:
"Using legacy 'setup.py install' for $DEPENDENCY, since package 'wheel' is not installed."

I know that I could hack around this by installing wheel manually first in an environment before running pip install, but there must be a way to tell everyone's pip to try to use wheels by default because I've seen other packages do it.

Why can't you upgrade the package to use pyproject.toml?

The deprecated keyword you're looking for is setup_requires, as in ```setup_requires=["wheel"]```. Instead of doing that, you should define a requirements.txt that sets up a build environment (call it requirements_build.txt or something) and create your environment based on that. Users shouldn't have to build your package, they should be installing the wheel that you published somewhere else so this is all cool and good

If you really can't transition to a pyproject.toml (why not???) then you should at least try to set yourself up with a setup.cfg instead of setup.py

QuarkJets fucked around with this message at 04:47 on Oct 13, 2022

Seventh Arrow
Jan 26, 2005

So at my new job, they have data analysts who are manually cleaning CSV files in Excel whenever they arrive. Obviously this is gross and it feels like they want me to automate the process.

Now I'm pretty familiar with cleaning data in pandas, but I also want to have some sort of interface - like maybe a webpage or UI with a "browse" button that they can click on and upload their filthy CSV. They could click a "start" button and then out the other end pops a minty-fresh CSV for their consumption.

But I'm wondering about this part of it - would it be possible to have a webpage in flask or django with the aforementioned "browse" button, or will this require a tkinter-type interface?

QuarkJets
Sep 8, 2008

Seventh Arrow posted:

So at my new job, they have data analysts who are manually cleaning CSV files in Excel whenever they arrive. Obviously this is gross and it feels like they want me to automate the process.

Now I'm pretty familiar with cleaning data in pandas, but I also want to have some sort of interface - like maybe a webpage or UI with a "browse" button that they can click on and upload their filthy CSV. They could click a "start" button and then out the other end pops a minty-fresh CSV for their consumption.

But I'm wondering about this part of it - would it be possible to have a webpage in flask or django with the aforementioned "browse" button, or will this require a tkinter-type interface?

Yeah a webpage should be doable, is this something you'd be able to host on an internal webserver? That'd be slick. Maybe you could use dropzone.js for a drag-and-drop interface, then have flask take those requests and run the cleaning process.

Since this is for users I think that a flask implementation is going to be a lot more intuitive for them than having a tkinter interface

Adbot
ADBOT LOVES YOU

Seventh Arrow
Jan 26, 2005

Ok great, thanks! This is going to either be an internal webserver or maybe some cloud solution...it's still at the "living rent-free in my head" stage.

edit: one thing that has been super helpful in my journey is hiring a python tutor on fiverr.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply