Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

bigperm posted:

I like htmx a lot. Just ask for some html, put it somewhere (I guess it does more than that though). Can even use flask/django to send some jinja template partial. I made a website that generates 'poems' from a hilarious list of "1000 Most Common English Phrases" and used htmx for the infinite scroll and it was dead simple.

Out of curiosity do you have the source for that lying around? Would be interesting to see.

Adbot
ADBOT LOVES YOU

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Can't stress enough the sanity-saving that type hints offers for python. Sure, it's not perfect, but it is at minimum a nice way to make your IDE way more effective.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Cyril Sneer posted:

~200,000 entries each containing a 256x256 image.

There are 3 steps I perform. First loop is a zero-check (np.all applied to each image); second loop is more complicated as I'm checking certain pixels ranges for certain features; then finally I resize everything to a new size (~80 x 80).

What I've been finding is that my loops start fast but slow to a nearly-unusable crawl toward the end and I'm not really sure why.

Assuming that you're not discussing a major overhaul that would take a long time to code, I'd do...both and https://docs.python.org/3/library/timeit.html.

Basically I suspect that you're trading memory utilization for speed - creating two new lists is going to probably use more memory (although not a ton if my understanding of Python references is correct). On the other hand, reverse iterating and popping is probably not a lot of overhead since you're basically doing O(n).

If your loops are crawling at the end I'd measure memory usage or figure out if something isn't getting cleared, or if you're doing some sort of extra iteration or something, like checking your good/bad lists each time you add something to them.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

CarForumPoster posted:

Good advice above, adding: doing experimentation like this in jupyter notebooks is my fav. You can just throw a %time at the top of the cell and see how long it took versus the other way you were considering. Just write prototype code for each, then delete the slow one. Once the cells work in sequence, make it a .py file.

IMO this method of developing poo poo is way better at catching gotchas/slow code/errors than writing things in a .py file and testing them over and over as a monolith.


...though doesnt work well for things like django/flask/fastapi webapps.

Yeah, and remember that VSCode basically works with Jupyter notebooks out of the box if you don't want to go through the whole install process/etc. It's very convenient for any data munching work.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

If you're not 100% certain about how something works, then you should use the internet to look up that information, or in a python terminal you can run "help(sorted)" or "help(list.sort)". Or you can write out a little toy script that tests some specific functionality that you're not sure about, if it's not as simple as "hey uhh I forgot exactly what this function does". I've been writing Python code for 20 years and I still use help on basic functions that I haven't used in awhile.

I keep an ipython interactive interpreter window open most of the time I'm coding for just this purpose. "Wait, can I add two dictionaries together with +? Let me find out...yep!" etc.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Boy the fact that it's centered is throwing me off, but I don't think it significantly changes what others are saying.

If the bounds / column widths are consistent from document to document, you can just manually set them once and fuggedaboudit, then just cast to Int for any of the numbers. For example:

code:
ex_str =   "1/4/22      CLARINDA      IA  197   724  163.17 145   777  159.34 194   820  157.51 239   879  158.71  26   743  153.85  0     0  0.00   0     0    0.00   26   894  144.97  827     667016     806.55     105965055.2    158.86"
head = int(ex_str[30:33])
If the column widths are inconsistent, but the column orders are, then you just need to write something to autodetermine the bounds at the beginning, probably by comparing the column headers widths vs the data widths. This gets trickier, but it's still just patternmatching.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Oh hey, asciitable is probably exactly what they were asking for. https://pypi.org/project/asciitable/ for context.

edit: meant to quote, person above me suggested this.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

I have never, ever seen someone say that. Not even once. More "oh nice list comprehension!" Abandon any amateur hour goobers who are telling you to convert nested list comprehensions into for loops

I mean...I have seen that, a few times, but that's mostly because nested list comprehensions are a little harder to parse than a non-nested one.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
That style is what I always considered nested list comprehension and also found very difficult to parse. (the one without a set of internal brackets)

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Hed posted:

Ran into this one when I was running through test cases too quickly, and immediately assumed the function where I was building out the Decimal was wrong.

It turns out, calling Decimal() on the string was the right way, and in my tests I wrote it the way at the bottom! Always read your Expected vs. Actual to determine where the problem is :)

Python code:
>>> Decimal('920')
Decimal('920')
>>> Decimal('920') / Decimal(100)
Decimal('9.2')   # wtf it's right in the interpreter?
>>> Decimal(9.20)
Decimal('9.199999999999999289457264239899814128875732421875')     # oh god drat it

I look forward to fourteen pages about floating point number formatting weirdness.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

I don't know of any decent class tutorials because every single one that I've ever seen deep throats the OOP cancer :shrug:

You gotta be more specific about wtf this is, because classes are deeply tied to OOP.

Edit: Missed that you were replying to someone, thought you were asking for a tutorial. My bad.

Falcon2001 fucked around with this message at 01:16 on Jul 17, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Seventh Arrow posted:

Lo and behold, the same printout with half the calories! Or something. Would there ever be a reason to use .format() or a string concat instead of f-strings?

Also, I feel like Codecademy kind of rushed this complex subject (classes, not f-strings) and I feel like I've been smacked with a frying pan. Can someone point me to a decent tutorial on classes?

Two things here - for starters, the biggest reason I've seen to use .format() is if you want to save a template and use it somewhere other than where it's being interpreted. For example, in a piece of software I maintain, we have some basic templates for the small amount of human readable outputs we do (emails/tickets/etc). These are stored in text files and have placeholders for {email_address} etc throughout.

But for most smaller stuff, f-strings are great and you should stick to them.

Second off, the whole classes thing. I really liked Automate The Boring Stuff as my intro to programming, so here's his chapter from another book on classes - I haven't read it, but in general these books are well recommended, so try it out: http://inventwithpython.com/beyond/chapter15.html

IMO: Classes are basically the blueprints for objects that form the backbone of Object Oriented Programming. A class defines details about the object, such as attributes (what is this object?) and methods (what can this object do?) - so for example, you have the classic animal OOP example:

code:

class Cat:

height: int
weight: int
color: str
name: str

def __init__(self, height: int, weight: int, color: str, name: str):
    self.height = height
    self.weight = weight
    self.color = color
    self.name = name

def purr(self):
   print(f"{self.name} purrs loudly.")
And then in practice:
code:
new_animal = cat(100,100,'red','doofus')
new_animal.purr()
>>> "doofus purrs loudly"
This is obviously a ridiculously small example, but basically everything in python is an object defined by a class. Lists, for example, are simply objects defined by the standard library to behave in certain ways.

QuarkJets above referred to the "OOP cancer", which, without speaking for them too closely, generally refers to how much of a tangled spaghetti mess poorly written OOP can be when you start getting into inheritance/etc. In general though, if you're writing Python, you need to understand OOP principles because Python is inherently OOP by design, and the majority of software written in it adheres to OOP practices. There's lots of footguns in OOP, mostly around things like maintainability of code, but there's also extremely robust discussion around avoiding those footguns and writing better code outside of it, all of which is probably past your current skill / knowledge level, so I wouldn't worry about it too much.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Seventh Arrow posted:

Thanks! I will take a look at Al Sweigart's chapter - although later on in the Codecademy course they had a project that helped me get a better idea of how classes are used and what they're good for:

https://gist.github.com/jalcoding8/fcac42c43d378c6943274ece219bde0f

I could see how the Business could contain the Franchise and the Menu, and the Franchise could contain the Menu, and so on; and how they could also have functions that interact. I'm curious though - the order of stuff doesn't seem to matter for the most part. Like if the "Menu" class was before "Franchise", would the "calculate_bill" function (line 43) still work?

The order doesn't really matter (There's a BUT there around things like type hinting but it's pretty minimal) because you define your classes before you start actually executing code. Everything inside a class is a blueprint, so it's executed when you create a new object from that class, but the stuff starting on line 54 is actually code that's being executed. So if you tried executing that code before you defined the class (like say, move it to line 2) it will fail.

Edit: a side interesting note: every time you import a module, it basically goes through and interprets all the code in there, which is why you see the weird pattern sometimes of if __name__ == "__main__" pattern in modules/scripts sometimes, because that basically lets you write code that won't be executed on import, only if you're running the file directly.

Falcon2001 fucked around with this message at 20:01 on Jul 17, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Apex Rogers posted:

You could probably do this with os.walk(). You would need to do something to canonicalize the file name, e.g. handle the “- Copy” and “(1)” suffixes, but once you have that you can keep a dictionary of file names, checking if the key exists each time (if so, it’s a duplicate). Duplicates can go into a separate list or whatever that you dump at the end. Obviously, you’ll want to keep the actual file name around to go into this list, rather than the canonicalized version.

Keep a dict[str, list[Path]] where the key is the canonicalized name, and the value is a list of all file paths associated with it.

Iterate over the whole directory with os.walk() and then every file just figure out what it's canonical name is, then toss it in the dictionary. One final iteration over the dict to filter out any entry with a value length of 1 or less, and boom, there you go.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Speaking of general design pattern problems: I have some questions about Dependency Injection.

I'm reworking some of my code, and one area in particular has a lot of dependencies because it deals with a large number of various internal services. This is all functions, no classes in this module, for context.

My initial setup is setting up a dataclass that just holds those dependencies, initializing it at the entrypoint to this portion of the codebase with a factory function, and then passing that down as the calls flow through. This allow for dependency injection as I have a factory function that builds it the 'default' way, but I can also easily pass in appropriately configured mocks / test replacements for testing.

Is there any significant disadvantage to doing this?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

The trouble that I usually see is people use dependency injection to just move an obscure environment variable from one place to another, passing it in with literally the same variable name without thinking about what their code actually does. On top of the more well-known advantages, another big advantage of using dependency injection is that you can write code that is easier to read and self-documenting. If you're refactoring anyway then you may as well write the best possible signatures

I think that the biggest pitfall here is that you are probably making changes that aren't backwards compatible, so you have to be carefully looking for broken code and controlling your version numbers and package dependencies.

The specific approach you're using is good and is the very common "put the arguments in a struct" mode of simplifying overly long function signatures. Be careful about adding too many parameters that aren't actually common to multiple services, e.g. your functions should only receive parameters that are relevant to them; otherwise this can create confusion later. Ask yourself, which of the parameters in this dataclass is actually used by this function?" The answer should be "all of them"

Thanks! In this case it's a reasonably shallow section of code that just happens to have a lot of weird api calls to make. Unfortunately, this also means that the dependency chain is...more complicated than most. For example, Function X calls API 1 for all services, then I filter based on a call to API 2, and then pass to a subfunction that does API 1 and 3, but not two. But it in turn has a helper function that does call to API 2.

This all probably means I didn't do a good enough job writing this up in the first place, but it does mean that while a function might not need a specific dependency, a helper function it relies on does. Luckily this madness is confined to this particular module and won't pollute the rest of the codebase.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
I will also say that if you just keep writing more code you can often achieve understanding through repetition. Honestly it's the way I got to the point where actual computer science theory started to stick: I did a ton of code katas on sites like CodeWars and Leetcode - I'd start with the ones with the highest completion rate, then try and brute force my way through it, googling anything I didn't understand.

This isn't perfect, but at least for me I really struggle with theory when I don't have a way to tie it to concrete examples.

Example: Dependency Injection seemed like the dumbest pattern in the world to me like 9 months ago. My only experiences with it had been the sort of 'magic pile of bullshit' added to big web framework startup sections, and it just seemed mystical to me. Why are you adding all these things here? Why not just use them where they're needed? Your code looks awful! Sure, I could read about why they were useful, but honestly it didn't click until:

I'm working on an escalation module for a service I own and part of that is mail; I have a mailer class that handles all my mail workflows, which are pretty minor. When I wrote this originally I just had singletons setup for the Mailer class, and no unit tests because my mentor at the time didn't think it was important. Now I'm coming back to my code after a month or so away from it and my immediate thought was 'well gently caress, how do I know this WORKS?'

'I guess I can do a dry-run of our stuff? Well poo poo, dry-runs don't process mail, on purpose, to avoid any accidental mailing.'

'Huh. I need like a 'send-mail-anyway-dry-run' option...But that seems like a weird edge case. What if I added mailing to the dry run, and then just had a neutered option? Well that'd be hard to do because I'd need to ohhhhh make a new mailer class with the send functionality nerfed and inject the dependency.'

The most obvious problem here is I should have been pushed to test my code when I originally wrote it, but now I know and will improve over time.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
I have a weird question. I inherited a codebase that's got an interesting decision in it and I'm trying to decide whether to start reverting it.

Python code:
Area = Enum("Area", get_all_valid_Areas_dict(), type=str)

# example return from get_all_valid_Areas_dict():
{
	"ABC": "abc", 
	"XYZ": "xyz"
}
Edit: get_all_valid_areas_dict comes from a library we ingest that defines area data, so building the list dynamically is the right thing to do.

I'm not a CS major so I might be missing something, but my understanding of enums is that it's used to define a finite list of things in code, not in your data. It seems like if we're building the list dynamically we should just store it as a list/set of valid strings?

My biggest problem with it is that because of this, you can create this enum in two ways:


Python code:
r = Area['XYZ']
r2 = Area('xyz')
r == r2
Which has lead to a few bugs already due to misunderstanding whether we're working with upper or lowercase

Is this as dumb as I think it is?

Falcon2001 fucked around with this message at 18:49 on Aug 20, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Thanks to both of you for context! I think the problem is that we don't ever refer to a single area in code; it's only used for validation of things like 'Ah, this input file has an area, let me see if it's a valid one'. The closest we get in being able to refer to a passed in variable as an Area instead of just str.

Deffon posted:

The pitch for enums is that you provide more clarity to your domain and code by being more specific. An area e.g. is not something that you can do arbitrary string operations on, and the result of those operations are not necessarily areas themselves. People reading your code will understand that it doesn't represent e.g. an unverified freetext field that a user can write anything they want in.
It also a "proof" of validation - if your function only works with valid areas, then the fact that the caller was able to supply an Area member means that it's already validated. If you only use strings, then it's easy to send unvalidated areas to functions that expect validated ones, so every function that expect validated areas has to do an extra round of validation to help prevent mistakes.

Dynamically typed languages like Python don't reap the benefits from wrapper types (in this case wrapping string values) as much as statically typed languages do.
Unless you use type annotations you need a way to distinguish between parameters that are meant to follow the Area protocol (comparable with other Areas, has a "value" and "name" properties) and the string protocol (has contains(), startswith() etc. methods).
It's easy to confuse what something is supposed to be without a naming convention like "blah_area" and "blah_area_str". Even so, the caller might send the wrong thing, especially if they don't use keyword arguments.

You should only convert enums from and to strings when communicating with the user and with apis outside your control that only understand strings.
In the rest of the code you should refer to hardcoded enum members like so "Area.XYZ".
That way it's harder to confuse if you working with enums or their string counterparts.

In short, if you are able to apply certain conventions then enums can greatly benefit your code.

This part makes me think it might be worth keeping this around, and just removing the case discrepancy between their names and values, so we always interact with them using the same casing, which would remove some of the weirdness from interacting with them.


QuarkJets posted:

That's a reasonable way to define an enum, it's why the functional interface exists

I think what you and some members of your team may not understand is that enum members are singletons. Area('xyz') is not creating a new Area instance, it's returning the member of Area that corresponds to value 'xyz'. Not only are those two variables equal, they're exactly the same object:
<more words>

Thanks! I didn't actually realize they were singletons, so that's handy to know.

QuarkJets posted:

Since an Enum is a class like any other, you can attach helpful methods to it that every member will then receive.
Python code:
def contains_x(self):
    """Test if the char 'x' is present."""
    return 'x' in self

Area.contains_x = contains_x

Area.XYZ.contains_x()  # True; the value contains "x"
Area['XYZ'].contains_x()  # True - remember, this is accessing XYZ by member name, but the evaluation is against the value
Area.ABC.contains_x()  # False

This part, however, I'm reasonably confident this gets a lot messier if we're defining the enum dynamically. I looked into using this before using the class-style declaration and it looks like you have to do a bunch of subclassing. https://stackoverflow.com/questions/43096541/a-more-pythonic-way-to-define-an-enum-with-dynamic-members I think is what I found before.

Overall, It sounds like using enums to basically be a clarity/validation point isn't a terrible idea (seeing a method that calls for an Area is clearer than a str), but the fundamental problem we're running into is that of the casing confusion - since you can interact with the enum using either a lower or uppercase, and the difference between () and [] is subtle enough to skip notice sometimes.

I'll look into what we gain by standardizing on one case, and then doing .upper() or .lower() whenever we have to write out/etc and see if that helps.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

Your internal code should use Area members directly, e.g. Area.XYZ. Your external interfaces should convert input strings to Areas.

Ninja edit: I got most of the way through writing a big explanation of my service but I think I just am agreeing with you. I think there's just a bunch of old code lying around that does some odd things, so I'm trying to prioritize what I can bother rewriting...especially in a codebase with like 30% test coverage, which makes refactors a little scary. (I'm working on that last part, but thanks to a bunch of module variables and singleton patterns it's not an easy thing to fix.)

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Is there a simpler way to handle this interaction? I'm finding I'm doing this a reasonable amount.

Python code:
expected_var = object.possibly_null_attr if object.possibly_null_attr else default_value

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Data Graham posted:

Python code:
expected_var = object.possibly_null_attr or default_value
?

I didn't realize I could use Or like that, thanks!


Armitag3 posted:

You can use the built in getattr, similar syntax to dict.get("key", default).

Python code:
expected_var = getattr(object, "possibly_null_attr", default_value)
That is, if you mean null to mean the attribute doesn't exist. If it exists and is None, then it'll return that None, in which case you can:
  • Test for all falsy, at which point Data Graham's suggestion will work;
  • Test only for None which will require you to do what you are doing (recommend you add is None to your if)

Yeah, for purposes of this, I mean nullable to be None, not nonexistent. Falsiness is the check I'm trying for, to be clearer.

Falcon2001 fucked around with this message at 19:47 on Aug 31, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

12 rats tied together posted:

If you find that you end up sprinkling
Python code:
object.possibly_null_attr or default_value
all throughout your codebase like a magic salve, that's also probably an indication that you're missing some kind of GuaranteedObject class with a safe api so that your dependents can interact with it without needing to guard against the null propagating throughout the rest of your business logic

Yeah, I know this is moving into antipattern territory. This portion is entirely internal, and none of these objects exit to my dependencies. In fact, a bit part of what I'm doing is trying to generate safe, fully generated objects to output by catching any places where things are null'd out during generation of those objects.

I think the proper way to do this would be with more exceptions and error handling instead of 'Optional[outputObject]', but this isn't something I'm shoving outside of this module at least.

Edit:

To expand on this, I guess I can just lay out what I'm doing and if I'm being an idiot, tell me.
Python code:
component_one = get_c1()
component_two = get_c2()
component_three = get_c3()

if not (c1 and c2 and c3):
	log("Cannot make this object!")
	return None

merged_output_object = build_merged_output_object(c1,c2,c3)

return merged_output_object
This is obviously pretty abstracted, but is a high level summary of it. Double edit: this doesn't even contain what I asked about before. I might be an idiot. Anyway, feel free to critique this too!

Falcon2001 fucked around with this message at 20:35 on Aug 31, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

That kind of situation is what I was trying to address in my followup post; we can't control our external interfaces, but we can choose how we interact with them. If I have 10 calls to "get_x", then it's totally valid to write 10 "set to default if None" lines (one per call), but that's a pain in the rear end. And what happens if I also need to add exception handling each time that this external interface is used? That's a mess. It'd be better to write 1 function that contains None checking and replacement. This is easier to write, easier to read later, and less error prone.

If an external data structure is insanely defined then define your own, then use your sane data structure in your code.

I'm interested in what you're getting at, but I don't fully understand. If it helps provide some context, C1, C2, and C3 rely on different external service calls.

Basically I'm generating a schedule here so I'm patching data together from several data sources, but the output from one isn't necessarily a guaranteed input to another.

For example, here's a workflow:
- Grab the current oncall for a team - this tool just gives me an email alias, no indication of what it is
- Try looking it up in LDAP. If it's not a human this step will fail, and I return None for that result.
- Recursively build a chain from the first person up their management chain.

This is just like, the C2 portion, because there's other parts before that that also have weird opportunities to fail, because my use case for these is imperfect.

For now if any of those steps fail I want the construction to fail rather than create a half-formed or empty schedule, but I also know these steps WILL fail for some services/etc because of the imperfections, so none of this should crash the program, just gracefully generate no schedule for this service.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

punished milkman posted:

let’s say you’re working with a legacy codebase (python 3, untyped) that relies heavily on shuffling around various huge nested dictionaries across your functions. these dictionaries usually follow an internally consistent structure, but knowing what their contents are is a huge pain in the rear end. e.g i regularly encounter situations where i’m like okay i’ve got the “config” dictionary… how the hell do i dig out the s3 path for item x? these things are very opaque and you need to either a) refer to old code that gets the item you want or b) log out the contents of the whole dict and figure out how to dig through it. not ergonomic at all.

so the question is, how would you refactor this codebase to make the structure of the dictionaries more well known to the developers? typed dicts? dataclasses? just wondering what i can do to make developing less of a headache in the future (ideally without having to rewrite thousands of lines of legacy code)

This is my life right now and I absolutely hate it and am trying to fix it up. (I also don't have a handy test dictionary to refer to either, since this part of the codebase isn't tested).

I agree with QuarkJets that dataclasses aren't a silver bullet for this situation, but I do think they help a ton, just like FoxFire_ mentioned.

A thing to keep in mind through: refactoring away from dictionaries is pretty difficult because it's very easy to miss adding or removing elements somewhere along the chain if they're not well handled, and if you don't have tests that's gonna be a problem.

IMO here's what my plan is for doing this stuff, and what I've done for the little portions I've worked out so far: (Do one dictionary at a time, btw)

- First, the easiest answer to solve this in the short term is to (as QuarkJets mentioned), just go dig back until you find where the dictionary is instantiated and then open your favorite notetaking application and start jotting down what's in there. You can even probably throw a breakpoint in and print them in the debug console wherever you're working at.

- Now that you have a convenient schema for it, work through and make sure the dictionary isn't getting arbitrarily updated anywhere. IDE searches help a lot for this, but you can also just walk along the code manually if you need to. If the schema ever gets mutated, note where and how and what.

- Create a dataclass to replace your dictionary. As a general rule, make sure you're declaring all your attributes in the dataclass (so for example, if you don't instantiate with 'USER_ID' but add it later, still declare it in your dataclass class definition, just set it to None or an appropriate default value on creation.

Python code:
@dataclass
class Butts:
	""" This is a Butt. It does ButtStuff """
    butt_length: int
    butt_depth: int
    butt_color: ButtColor  # This is an Enum!
    butt_id: int = 0  # This is not known during creation, so we default to 0

new_butt = Butts(10,10,ButtColor.BLUE)
Now, start swapping things out. If you don't have test cases, I sure hope you're good at manually rerunning your code to cover your bases, because this is tricky since you're moving stuff around a lot.

Once you're done with one dataclass, make sure to commit your changes before moving onto the next. I'm by no means an expert, but this process works out alright for me so far.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

Also if at some point you find yourself needing a dictionary form of your dataclasses, they have a built-in method that does that. IIRC it's "asdict()"

Yep - and if you need special handling to serialize your data (for example, if you have classes that don't handily serialize), there's a dict_factory argument for it, that lets you pass in a function to handle the processing.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

darkforce898 posted:

There is also this package that I have used before that could be useful

"marshmallow: simplified object serialization — marshmallow 3.18.0 documentation" https://marshmallow.readthedocs.io/en/stable/

I considered using Pydantic, which seems similar-ish, but I was waiting on them to fix some sort of vulnerability I saw on our internal repo mirror. I'll go back and look again, as I'm about to go do another round of this stuff.

Edit:

On the topic of documenting data structures, is that something that should live outside of your code base or just properly documented/annotated within it? This is assuming that we're talking about something that isn't exposed to customers or somewhere else you'd need to make sure it's available to outside folks.

Falcon2001 fucked around with this message at 07:57 on Sep 20, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

FredMSloniker posted:

I'm generating an SQLite database as a way of not having to store a very large dictionary in memory simultaneously. If I use isolation_level = None, it runs very slowly because of all the disk access. If I use commit() every five seconds or so, it runs much faster. But I'm wondering if there's a way to do it better. Is there some way to say something like:
code:
if size_of_transaction(con) > maximum_size_of_transaction:
    con.commit()

What package/etc are you using for this?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

FredMSloniker posted:

I'm not sure what you mean. I'm using 3.10.1 for Windows, downloaded straight from their site, and the sqlite3 module. Is that what you wanted, or...?

I forgot there was a core library sqllite option, my bad.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
I wonder if looking into how people build chess engines might be useful insight, since it sounds at least vaguely analogous.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
I'm trying to improve testing in my code base, and we basically have the following workflow:

  • Collect data from multiple data sources
  • Merge, transform, filter
  • Produce a composite data structure, essentially a list of entities
  • Use the composite structure to perform the rest of the business logic.

There's a pretty clear delineation between the steps 1-3 and 4, which basically just requires the output of steps 1-3, so I'm working to generate a way to build a test composite data structure. It is itself a dataclass, but it has a lot of heavily nested objects, some inherited from a java interface, and none with clear factories/etc.

If I can construct a test version of this, I can plug it directly into the rest of the business logic in Step 4, so that's my current plan, but I'm a little bit of a loss as to how to approach this. Is it worth looking into Pydantic/etc? If you had to do something like this, how would you go about approaching it.

Falcon2001 fucked around with this message at 23:36 on Sep 29, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

necrotic posted:

I bump any linter line length checks to 120. The 79 thing is insane imo.

My team does 100, but yeah 79 is...antiquated. Just keep it reasonable so you're not cosplaying as a java dev and you'll be fine.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

FredMSloniker posted:

Python code:
# My 120-character ruler. 7890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890

This is very minor, but what program are you using to write your code? If it's VSCode, you can add a vertical ruler by modifying your preferences: https://stackoverflow.com/questions/29968499/vertical-rulers-in-visual-studio-code which is very handy since it's just a nice visual cue.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

FredMSloniker posted:

Notepad++. Really, I don't need the ruler now, since I adjusted the window width, but I haven't had a reason to take it out.

You'll have to convince me to learn Visual Studio Code. I'm not saying I can't be convinced, just that the screenshots aren't selling me.

Just to be clear: There's no right answer for how you want to write your code. If you want to use notepad, go for it; I know professional software devs who exclusively use vim which is basically a hyperpowered commandline program, and I know others who swear by highly specialized heavy IDEs like Visual Studio (not-Code)

(Minor side note: VS Code and Visual Studio are entirely different programs and only really share 'you can write code in here!' as a feature. The list of differences is probably bigger than the similarities.)

The biggest argument for VS Code over Notepad++ is that VS Code is highly extensible to do a lot of things, and is purpose built for being a code-aware text editor. It's not a full blown IDE like Visual Studio, it's basically a halfway point between Notepad++ and VS. (Also the extensions platform is much nicer to use than Notepad++ addons.)

Because of that, it bakes in some ideas that are quite handy for programming - here's a few basic ones:
  • Integrated terminal that you can bring up / hide quickly. (ctrl+`)
  • Integration with things like syntax checking / linting to see if your code stands up to best practices.
  • The concept of 'workspaces' which basically means 'here's a collection of folders your code cares about, and we don't have to care about anything higher.
  • Code auto-suggest and documentation on highlight - for example, if I'm writing a python program and I import loads from json, when I type loads() it'll bring up the docstring from the library telling me what arguments it expects and any documentation built in.

Basically it's a lot of little improvements because it was built to write code first and foremost, instead of just being a great notepad replacement. (FWIW: I love Notepad++ and use it a ton.)

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Foxfire_ posted:

Implicit conversion of everything to a bool is something that I think is bad and I agree with languages where it is not a thing (rust), or you can get a warning for it (most C/C++ compilers) and then fix it. The only benefit of it is marginally less typing, and it leads to hard-to-read code + subtle bugs (i.e. things like "" or None converting to false).

I fully disagree with the 'hard to read code' part. Truthiness checks - as long as you understand that they're there - allow for much more concise, cleaner code, it just means you have to know certain assumptions. For example:

Python code:
my_list = thing_that_generates_a_list()
if not my_list:
    continue

# VS
my_list = thing_that_generates_a_list()
if len(my_list) == 0 or isinstance(my_list, None):
    continue
If you're coming from Rust, you're going to be confused by the above, because you'll go 'well of course my_list exists, you just generated it' but it's much simpler than the alternatives, and as long as you understand truthiness it's not that subtle of a bug and IMO understanding truthiness is not a particularly complex Python concept.

The biggest issue with it that I've run into is expecting the string "False" to evaluate to False, but that's a pretty edge case and I caught it immediately.

It's a possible footgun, but it gives you a lot of python-focused benefits in return; the other stuff like locals() is another category of Python traps, but in the opposite direction: Here's really powerful stuff you can do, but you probably don't want to.

Falcon2001 fucked around with this message at 18:48 on Oct 9, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Honestly if I have one complaint about Python coming from C# it's not truthiness checking, it's the lack of type hint / typing requirements. Some older codebases at work are huge and utterly devoid of type hints and it makes them really difficult to update. This is of course, a pretty standard complaint and you should build this into your local coding reqs/etc, but yeah, I think it's a way bigger problem than truthiness stuff.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Yeah, I'm happy to be wrong here, although I will say that my isinstance(None) was me just trying to remember it offhand without looking it up and not necessarily a statement of 'and THIS is what I call None checking'

That being said, is there some more reading on why None/null is a bad value and what the right patterns are? I've had some stuff recently I probably won't be working on in the future but one of the big things was a series of calls out to other services to try and lookup data.

Some of these calls returned empty results, so I made the return type Optional[str]. If I got a response, I returned it; if I got an empty response, I just returned None, and so on back up the chain. This allowed me to opt-out some portions of the data processing, because much of the input data was human-generated and couldn't be trusted to return results 100% of the time.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

12 rats tied together posted:

Basically you want to get rid of the NoneTypes asap because they're, kind of by definition, the least useful type in the language. The simplest example I can think of is that "for x in response" where response is an empty list is valid code, but if response is None that's a runtime error.

What are you going to do with the optional string? Is there an appropriate default value for when the string isn't present? You can encode the answers to these questions permanently in your object model, which usually tends to simplify the code, improve readability and maintainability, and other good things.

I think that's the simple explanation, but is there a good place to read about the more complex one? I can see the shape of what y'all are getting at and I'd like to know more, but there's so many weird edge cases when I try and imagine it, that all end up just re-implementing patterns like QuarkJets/etc are talking about with guards and ifs.

Default attributes, for example, seem like a much bigger problem to me - like John Null https://www.wired.com/2015/11/null/ - and runs the risk of this stuff showing up in the output data later on. Sure, I can just make some 'if attr_name is DEF_STRING' checks, but that feels like shuffling deck chairs on the Titanic rather than fixing the underlying problem.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

12 rats tied together posted:

Python handles the John Null situation in two ways:

1) None isn't actually "null", it is "the singleton instance of NoneType" which is an in-language placeholder for "the concept of nothing", but it actually is something, and that something has a (restrictive) API. This is different from other languages where Null might actually be nothing, it might be a macro that is a placeholder for "the integer 0", etc.

2) Python's NoneType casts to string as "None", so a John None would not run into any of these problems presuming that the handler for their name was running Python. They would correctly be stored as first_name John, last_name None, display_name John None.

As for default arguments, because the design depends so much on the business domain, there aren't a lot of good generic resources that I'm aware of. A non trivial but still toy example I like to pull out is, pretend we're writing doordash, and we have PickupOrder vs DeliveryOrder, and we're confused about what ValidateAddress() is doing. Which address is validated, or is it both? and what happens if we don't have a delivery address?

You solve this problem with OOP by extracting the things that vary, giving them names, and injecting them later at runtime. For this example, we might eventually conclude that DeliveryOrder and PickupOrder are fake, that we fooled ourselves into this failing pattern with names and inheritance, and that what we actually have here is an instance of the class FoodOrder that always has an instance of an object of the type FulfillmentBehavior, which always has ValidateAddress() (or some other wrapper method - a safe and stable API for the Order objects to consume).

In our FulfillmentBehavior object model we can design-away the NoneTypes and the optionals: PickupFulfillment never has a DeliveryDriver or a DeliveryAddress, but DeliveryFulfillment always does. Both objects can be talked-to by the parent FoodOrder in the same way, both objects always "do the right thing", nowhere in the code are we checking what the anything's type is, or double checking that things aren't None. If something were None unexpectedly, we'd raise an error in the constructor, and we'd never have completed a FoodOrder object to the point where it could possibly fail to actuate its ValidateAddress method. Helpfully, at the failure call site we also have all of the information we need to reconstruct it, so we can reproduce this easily later while debugging.

It's a code smell to be adding these things to an established codebase without it also being part of a PR that also introduces a new object model and all of its construction logic. To be clear, None exists, and you have to handle it sometime, but the places where you handle it should be some of the least changing, most understood parts of your codebase.

dict.get also makes me nervous, always. I'd rather see "if key in some_dict".

Well, I'll take this all as things to dig into, as I still don't fully grok it. Maybe I'll throw a more detailed example in the chat later if I have time if folks would like to help pick it apart. Again, I think I see the general point (drive for predictability/etc), I'm just not sure how it interacts with specific examples, all of which when I walk through them come back around to behavior that feels basically identical, just with extra steps.

In any event, clearly I was wrong in the first place, so thanks for the explanation.

Adbot
ADBOT LOVES YOU

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Well I do own Clean Code and I'll give it another shot. Thanks again for diving into this; coming from a non-compsci background some coding concepts are very easy, and others are kind of difficult to grasp. Dependency Injection took me until I had to test some of my code to realize exactly why it's useful, for example, and then it clicked into place immediately.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply