Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »

Cyril Sneer: Aug 8, 2004; Life would be simple in the forest except for Cyril Sneer. And his life would be simple except for The Raccoons.

nullfunction posted:

Since you're offering it up for a roast, here are some things to consider:

Your error handling choices only look good on the surface. Yes, you've made an error message slightly more fancy by adding some text to it, yes, the functions will always raise errors of a consistent type. They also don't react in any meaningful way to handle the errors that might be raised or enrich any of the error messages with context that would be useful to an end user (or logging system). You could argue that they make the software worse because they swallow the stack trace that might contain something meaningful (because they don't raise from the base exception).

Do you have any good resources that discuss the proper way to deal with error handling?

# ? Apr 30, 2024 21:53

Adbot: ADBOT LOVES YOU

# ? May 4, 2024 12:24

QuarkJets: Sep 8, 2008

Fender posted:

There is some good feedback in here. Some of it I feel would be appropriate for a conversation. Like, why I check for None a lot. It's just a habit I learned in my last role where we parsed a lot of unreliable data sourced from scrapers. I would happily explain my position there.

One problem I see is that the None-checking is everywhere. Since you need to filter out empty strings anyway, it'd be better if you translated "None" to an empty string - then all of the code that's downstream from the file parsers doesn't have to do any None checking. Basically you have these 3 interface functions that are letting None values filter into the rest of your functions, so you need to handle the possibility of None everywhere, but if you handled the None values in the interface functions then the rest of your code would be a bit nicer. Another problem is that the None checking has become boilerplate; it'd be better to use some small utility functions to simplify the code. This also gets back to the issue I talked about where the street values have the possibility of raising an exception if any of them happen to be None; you did the none-checking for each of the street components in the XML function, but then you forgot that any of them could be None when you concatenated them together. If you had an interface function that returned only strings instead, the code looks nicer and this bug goes away.

For example:

Python code:

            name = ent.find("NAME").text.strip() if ent.find("NAME") is not None else None
            company = ent.find("COMPANY").text.strip() if ent.find("COMPANY") is not None else None
            street_1 = ent.find("STREET").text.strip() if ent.find("STREET") is not None else None
            street_2 = ent.find("STREET_2").text.strip() if ent.find("STREET_2") is not None else None
            street_3 = ent.find("STREET_3").text.strip() if ent.find("STREET_3") is not None else None
            street = street_1 + street_2 + street_3
            city = ent.find("CITY").text.strip() if ent.find("CITY") is not None else None
            county = ent.fine("COUNTY").text.strip() if ent.find("COUNTY") is not None else None
            state = ent.find("STATE").text.strip() if ent.find("STATE") is not None else None
            country = ent.find("COUNTRY").text.strip() if ent.find("COUNTRY") is not None else None
            postal_code = ent.find("POSTAL_CODE").text.strip() if ent.find("POSTAL_CODE") is not None else None0

            data = {
                "name": name,
                "organization" : company,
                "street": street,
                "city": city,
                "county" : county,
                "state": state,
                "zip": self._parse_zip(postal_code),
                "country": country,
            }

Wouldn't it be great to have a function that does most of this stuff?

Python code:

def parse_ent(ent, name):
    return x.text.strip() if (x := ent.find("NAME")) is not None else ""

def xml_ent_to_address(ent):
    data = {'name': parse_ent(ent, 'NAME'),
            'company': parse_ent(ent, 'COMPANY'),
            # ...
            'zip': parse_zip(parse_ent(ent, "POSTAL_CODE")),
            'country': parse_ent(ent, 'COUNTRY')
    }
    # street can have several components
    street_parts = (parse_ent(ent, x) for x in ('STREET', 'STREET_2', 'STREET_3'))
    data['street'] = ' '.join(x for x in street_parts if x)
    # And check this out: splat operators let you avoid having a funny "if data is not None" in this class
    return Address(**data)

quote:

And stuff like the Address class having that weird view dict method, would love to chat about that. That is like that bc while they never said it, the examples both showed the printed json ignoring any keys without values, so I went out of my way to provide that since all the example data had different fields present/missing depending on the format. I was really getting in my own head by hour number 4.

I got that it was necessary to filter keys without values; I was saying that the view_dict method could have been a lot more concise. If you'd used a dataclass:

Python code:

    def view_dict(self):
        # If we put the .strip() then we don't need to have it on every line of the __init__
        # This also means we don't need None-checking or len-checking in the __init__
        # This one-liner does it all: 
        return {name: value.strip() for name, value in asdict(self).items() if value}

As it stands now, the Address class is kind of pointless; only its view_dict method is useful, the rest of it provides no additional value. Its __init__ strips input strings, but the file parsers are already doing that. A class like this could be beneficial for verifying that you're using valid names in your dictionaries, but the class allows everything to be None, so the class doesn't actually provide that benefit. Any dictionaries that were implemented with a mistyped key would see those values get silently dropped on the floor as each Address instance auto-assigns those fields to None instead, quietly permitting buggy outcomes. Permitting a dictionary as an optional input argument that you then parse with .get() is to blame there; if you make the keyword arguments explicit and use splat operators while creating the instance instead of a loose dictionary interface then you'll be alerted right away if a bad key gets provided.

Python code:

# Too permissive:
class Address():
    def __init__(self, data=None, **kwargs) -> None:
        if data is None:
            data = kwargs
        self.street = data.get("street").strip() if data.get("street") else None
        self.city = data.get("city").strip() if data.get("city") else None
...


# Good:
class Address:
    def __init__(self, street='', city='', etc you get the idea):
        self.street = street
        self.city = city
...


# Better:
@dataclass
class Address:
    street: str = ""
    city: str = ""
...

# ? May 1, 2024 03:37

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Chillmatic posted:

I wrote a script to get all the xml files from the show in question, and then iterate through them using the intro marker data in those xml files to decide where to make the edits using ffmpeg, except now I'm running into losing the subtitles from the original files, and converting them is a whole thing because they're PGS format.

Comedy option: Send the edited files to OpenAIs whisper API and let them create new transcripts for you!

# ? May 1, 2024 04:14

The March Hare: Oct 15, 2006; _{Je r�ve d'un}
Wayne's World 3; Buglord

Chillmatic posted:

I wrote a script to get all the xml files from the show in question, and then iterate through them using the intro marker data in those xml files to decide where to make the edits using ffmpeg, except now I'm running into losing the subtitles from the original files, and converting them is a whole thing because they're PGS format.

You "simply" need to discover the correct invocation Obi-Wan. (I do not know it, sorry...)

# ? May 1, 2024 04:45

Chillmatic: Jul 25, 2003; always seeking to survive and flourish

I ended up doing the most hacky bullshit ever, but finally I will never have to listen to the Office theme song ever again. gently caress that piano. gently caress that doo doo rear end elementary school recorder instrument or whatever the gently caress it is. Boom bitch, bye.

# ? May 1, 2024 15:03

Fender: Oct 9, 2000; Mechanical Bunny Rabbits!; Dinosaur Gum

QuarkJets posted:

More stuff.

My entire thought process for that Address class was basically, "this data is crap, I'm just gonna pass None all the way to the constructor and not worry about it." Which sounds so obviously pointless after reading your advice. It's rough getting your code roasted, but drat if this isn't good poo poo.

# ? May 1, 2024 21:37

QuarkJets: Sep 8, 2008

Thank you! Yeah that data sucks for sure, what a pain. And irl someone is going to come to you with some stupid bullshit like "Oops between dates X and Y the last and first names are swapped and in the wrong columns, can you fix that" and depending on the request you have to know how to ride the line between "no that's an unreasonable request" and "this is important, I will help you with that".

# ? May 1, 2024 22:27

nullfunction: Jan 24, 2005; Nap Ghost

Cyril Sneer posted:

Do you have any good resources that discuss the proper way to deal with error handling?

Afraid not. My knowledge is built up over many years and many versions and there's no one place I can point to that has everything you would conceivably want to know. I can generalize a bit but you're going to need to get your hands dirty if you really want to understand.

Always catch the most specific type you can in any given situation. Overly-broad exception handling is definitely a smell and you can configure your linter to warn you about this if you're not already being warned. There are a couple of patterns you can use to help manage complex exception possibilities:

Python code:

try:
    might_raise()
except TypeError as data_format_error:
   ... # Nothing fancy here, but you can except multiple times for a single try
except (KeyError, ValueError) as data_validity_error:
    ... # Both KeyError and ValueError are handled with the same logic
except Exception as e:
    ... # This broad statement comes after the specific exceptions we really care about and provides a catch-all for anything unhandled above.
except: # DO NOT DO THIS DO NOT DO THIS
    ... # You have now swallowed KeyboardInterrupt so good luck stopping your code while this is running, idiot

If I catch an exception, generally I'm doing one of three things with it:

Python code:

# I want to add some logic but still get the original error, which 
# can be accomplished with a bare raise like QuarkJets pointed out.
try:
    might_raise()
except SomeException:
    logger.exception("Oh no!")
    raise

Python code:

# I want to wrap an existing exception in an exception type that I
# control so that I have more granular control elsewhere in my code
# or I expose predictable errors to my consumers
try:
    might_raise()
except SomeException as e:
    raise MyOtherExceptionType("context info") from e

Python code:

# I want to swallow an exception because I have a way to handle
# it in code, or it is not impactful to the outcome of my program.
try:
    might_raise()
except SomeNonBlockingException:
    # Handle the failure here, or don't.
    pass

# The above is fine but contextlib (stdlib) gives us an even nicer 
# way to disregard an exception with suppress as a context manager:
from contextlib import suppress

def nan_divider(a: int, b: int) -> float:
    """Returns NaN instead of ZeroDivisionError"""
    with suppress(ZeroDivisionError):
        return a / b
    return float("nan")

Yes, you can do other things with exceptions like raising an exception disconnected from the exceptions that caused it, I just don't find myself reaching for this very often.

It's a good practice to put the absolute minimum you can get away with inside each try block, often this is a single line of code. It's much nicer to reason about lots of try/except blocks that catch the same error from different calls than to have all the calls in a single try/except and not have a clear indication of which call raised the exception without digging through the exception's guts. If you intend to handle an exception from either of those calls in the same way and it makes no difference to you where that exception originated, feel free to use the more concise form.

Python code:

# Concise but ambiguous
try:
    might_raise_one()
    might_raise_two()
except TheSameException:
    ... # raised, but which one failed?

# Longer, but clear and easier to modify
try:
    might_raise_one()
except TheSameException:
    ...

try:
    might_raise_two()
except TheSameException:
    ...

Finally, handling exceptions in the right place is critical. I only handle exceptions when I know I have all the necessary context to put into that exception so that it's useful to whoever is receiving it. Exceptions sometimes happen at deeper levels of your software and need to be propagated up to the right place where you have enough meaningful info to be able to handle it, whether that's retrying a certain number of times, putting a placeholder in for some missing data, or logging an error message to be addressed later. This is the hardest bit to illustrate with examples, because the examples have to be nontrivial to really illustrate why it's important. This is left as an exercise to the reader -- you're almost certainly going to get this wrong a few times before you start getting it right.

Consider a script that munges CSV files from a list of provided directories and outputs the result of some calculation as a different CSV file. How many failure modes can you think of that might need to be handled differently?

What if we were provided an empty list of directories? What if we can't access one of those directories? What if we can't access any of the directories? What if we can't access a single file in one of the directories but can access the rest? What happens if a provided directory doesn't contain any CSV data? What happens if we load something that is named CSV but doesn't contain CSV data, just plain text or a blank file? What happens if it contains binary data? What happens if the CSV headers don't include the field(s) we're after? What happens if the data itself is not valid? What if we can't write the output file?

That's a lot of failures and I'm sure you could come up with many more in this vein! What's not great is that these are largely technical concerns which naturally lend themselves to error handling conversations. The other side of the coin is how the script fits into the larger picture. If your script is run nightly on a server using automation, your error handling story will be completely different than if Bob in Accounting has to run this script ad-hoc at the end of the month to come up with the numbers needed to close the books.

# ? May 2, 2024 01:38

Hed: Mar 31, 2004; Fun Shoe

I have a docker container that just runs a python script that does a TCP server I built. It's worked great for the last 3 years.

I was changing some logging in the code and in the meantime upgraded python and the underlying linux. Python 3.9 -> 3.12 is no problem but bullseye -> bookworm seems to break it.

Python code:

server = await asyncio.start_server(client_connected, host, port)
await server.serve_forever()

It launches fine but when I connect to it I get

pre:

[2024-05-02 01:23:14,025] - ERROR - Unhandled exception in client_connected_cb
handle_traceback: Handle created at (most recent call last):
  File "/usr/local/app/server.py", line 442, in <module>
    asyncio.run(main(bind_address, bind_port), debug=True)
  File "/usr/local/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
  File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 674, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 641, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1979, in _run_once
    handle._run()
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
transport: <_SelectorSocketTransport fd=7 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/usr/local/app/server.py", line 285, in client_connected
    fwd_reader, fwd_writer = await asyncio.open_connection(host_ip, port, family=AF_INET,
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/streams.py", line 48, in open_connection
    transport, _ = await loop.create_connection(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1080, in create_connection
    infos = await self._ensure_resolved(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1456, in _ensure_resolved
    return await loop.getaddrinfo(host, port, family=family, type=type,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 901, in getaddrinfo
    return await self.run_in_executor(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 863, in run_in_executor
    executor.submit(func, *args), loop=self)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 179, in submit
    self._adjust_thread_count()
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 202, in _adjust_thread_count
    t.start()
  File "/usr/local/lib/python3.12/threading.py", line 992, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

Can't start new thread is weird.
It works fine if I run Python 3.12 in bullseye, just not bookworm, so not sure what changed in bookworm. Running ulimit and ulimit -u inside the container result in "unlimited" and the host has thousands of spare threads. Any idea what it could be?

Hed fucked around with this message at 13:55 on May 3, 2024

# ? May 2, 2024 02:35

necrotic: Aug 2, 2005; I owe my brother big time for this!

I found this thread on docker forums that mentions the �privileged flag with docker resolved it. https://forums.docker.com/t/runtimeerror-cant-start-new-thread/138142/4

Upgrading both the OS and Python at the same time can make it difficult to nail down exactly what side changed, but my hunch is something related to Linux and containers and security contexts.

There may be a more limited seccomp change you can make to allow threading without the privileged flag.

https://docs.docker.com/engine/security/seccomp/

# ? May 2, 2024 02:53

Adbot: ADBOT LOVES YOU

# ? May 4, 2024 12:24

Hed: Mar 31, 2004; Fun Shoe

Thanks, I�ll give that a shot. Great find.
And I posted in error, I was upgrading all the way from buster. So everything but the latest Linux release seems to work. I�ll report back.

# ? May 2, 2024 02:57

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »