|
nullfunction posted:Since you're offering it up for a roast, here are some things to consider: Do you have any good resources that discuss the proper way to deal with error handling?
|
# ? Apr 30, 2024 21:53 |
|
|
# ? Jun 5, 2024 07:04 |
|
Fender posted:There is some good feedback in here. Some of it I feel would be appropriate for a conversation. Like, why I check for None a lot. It's just a habit I learned in my last role where we parsed a lot of unreliable data sourced from scrapers. I would happily explain my position there. One problem I see is that the None-checking is everywhere. Since you need to filter out empty strings anyway, it'd be better if you translated "None" to an empty string - then all of the code that's downstream from the file parsers doesn't have to do any None checking. Basically you have these 3 interface functions that are letting None values filter into the rest of your functions, so you need to handle the possibility of None everywhere, but if you handled the None values in the interface functions then the rest of your code would be a bit nicer. Another problem is that the None checking has become boilerplate; it'd be better to use some small utility functions to simplify the code. This also gets back to the issue I talked about where the street values have the possibility of raising an exception if any of them happen to be None; you did the none-checking for each of the street components in the XML function, but then you forgot that any of them could be None when you concatenated them together. If you had an interface function that returned only strings instead, the code looks nicer and this bug goes away. For example: Python code:
Python code:
quote:And stuff like the Address class having that weird view dict method, would love to chat about that. That is like that bc while they never said it, the examples both showed the printed json ignoring any keys without values, so I went out of my way to provide that since all the example data had different fields present/missing depending on the format. I was really getting in my own head by hour number 4. I got that it was necessary to filter keys without values; I was saying that the view_dict method could have been a lot more concise. If you'd used a dataclass: Python code:
Python code:
|
# ? May 1, 2024 03:37 |
|
Chillmatic posted:I wrote a script to get all the xml files from the show in question, and then iterate through them using the intro marker data in those xml files to decide where to make the edits using ffmpeg, except now I'm running into losing the subtitles from the original files, and converting them is a whole thing because they're PGS format. Comedy option: Send the edited files to OpenAIs whisper API and let them create new transcripts for you!
|
# ? May 1, 2024 04:14 |
|
Chillmatic posted:I wrote a script to get all the xml files from the show in question, and then iterate through them using the intro marker data in those xml files to decide where to make the edits using ffmpeg, except now I'm running into losing the subtitles from the original files, and converting them is a whole thing because they're PGS format. You "simply" need to discover the correct invocation Obi-Wan. (I do not know it, sorry...)
|
# ? May 1, 2024 04:45 |
|
I ended up doing the most hacky bullshit ever, but finally I will never have to listen to the Office theme song ever again. gently caress that piano. gently caress that doo doo rear end elementary school recorder instrument or whatever the gently caress it is. Boom bitch, bye.
|
# ? May 1, 2024 15:03 |
|
QuarkJets posted:More stuff. My entire thought process for that Address class was basically, "this data is crap, I'm just gonna pass None all the way to the constructor and not worry about it." Which sounds so obviously pointless after reading your advice. It's rough getting your code roasted, but drat if this isn't good poo poo.
|
# ? May 1, 2024 21:37 |
|
Thank you! Yeah that data sucks for sure, what a pain. And irl someone is going to come to you with some stupid bullshit like "Oops between dates X and Y the last and first names are swapped and in the wrong columns, can you fix that" and depending on the request you have to know how to ride the line between "no that's an unreasonable request" and "this is important, I will help you with that".
|
# ? May 1, 2024 22:27 |
|
Cyril Sneer posted:Do you have any good resources that discuss the proper way to deal with error handling? Afraid not. My knowledge is built up over many years and many versions and there's no one place I can point to that has everything you would conceivably want to know. I can generalize a bit but you're going to need to get your hands dirty if you really want to understand. Always catch the most specific type you can in any given situation. Overly-broad exception handling is definitely a smell and you can configure your linter to warn you about this if you're not already being warned. There are a couple of patterns you can use to help manage complex exception possibilities: Python code:
Python code:
Python code:
Python code:
It's a good practice to put the absolute minimum you can get away with inside each try block, often this is a single line of code. It's much nicer to reason about lots of try/except blocks that catch the same error from different calls than to have all the calls in a single try/except and not have a clear indication of which call raised the exception without digging through the exception's guts. If you intend to handle an exception from either of those calls in the same way and it makes no difference to you where that exception originated, feel free to use the more concise form. Python code:
Consider a script that munges CSV files from a list of provided directories and outputs the result of some calculation as a different CSV file. How many failure modes can you think of that might need to be handled differently? What if we were provided an empty list of directories? What if we can't access one of those directories? What if we can't access any of the directories? What if we can't access a single file in one of the directories but can access the rest? What happens if a provided directory doesn't contain any CSV data? What happens if we load something that is named CSV but doesn't contain CSV data, just plain text or a blank file? What happens if it contains binary data? What happens if the CSV headers don't include the field(s) we're after? What happens if the data itself is not valid? What if we can't write the output file? That's a lot of failures and I'm sure you could come up with many more in this vein! What's not great is that these are largely technical concerns which naturally lend themselves to error handling conversations. The other side of the coin is how the script fits into the larger picture. If your script is run nightly on a server using automation, your error handling story will be completely different than if Bob in Accounting has to run this script ad-hoc at the end of the month to come up with the numbers needed to close the books.
|
# ? May 2, 2024 01:38 |
|
I have a docker container that just runs a python script that does a TCP server I built. It's worked great for the last 3 years. I was changing some logging in the code and in the meantime upgraded python and the underlying linux. Python 3.9 -> 3.12 is no problem but bullseye -> bookworm seems to break it. Python code:
pre:[2024-05-02 01:23:14,025] - ERROR - Unhandled exception in client_connected_cb handle_traceback: Handle created at (most recent call last): File "/usr/local/app/server.py", line 442, in <module> asyncio.run(main(bind_address, bind_port), debug=True) File "/usr/local/lib/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) File "/usr/local/lib/python3.12/asyncio/base_events.py", line 674, in run_until_complete self.run_forever() File "/usr/local/lib/python3.12/asyncio/base_events.py", line 641, in run_forever self._run_once() File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1979, in _run_once handle._run() File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) transport: <_SelectorSocketTransport fd=7 read=polling write=<idle, bufsize=0>> Traceback (most recent call last): File "/usr/local/app/server.py", line 285, in client_connected fwd_reader, fwd_writer = await asyncio.open_connection(host_ip, port, family=AF_INET, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/streams.py", line 48, in open_connection transport, _ = await loop.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1080, in create_connection infos = await self._ensure_resolved( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1456, in _ensure_resolved return await loop.getaddrinfo(host, port, family=family, type=type, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/base_events.py", line 901, in getaddrinfo return await self.run_in_executor( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/base_events.py", line 863, in run_in_executor executor.submit(func, *args), loop=self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 179, in submit self._adjust_thread_count() File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 202, in _adjust_thread_count t.start() File "/usr/local/lib/python3.12/threading.py", line 992, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread It works fine if I run Python 3.12 in bullseye, just not bookworm, so not sure what changed in bookworm. Running ulimit and ulimit -u inside the container result in "unlimited" and the host has thousands of spare threads. Any idea what it could be? Hed fucked around with this message at 13:55 on May 3, 2024 |
# ? May 2, 2024 02:35 |
|
I found this thread on docker forums that mentions the —privileged flag with docker resolved it. https://forums.docker.com/t/runtimeerror-cant-start-new-thread/138142/4 Upgrading both the OS and Python at the same time can make it difficult to nail down exactly what side changed, but my hunch is something related to Linux and containers and security contexts. There may be a more limited seccomp change you can make to allow threading without the privileged flag. https://docs.docker.com/engine/security/seccomp/
|
# ? May 2, 2024 02:53 |
|
Thanks, I’ll give that a shot. Great find. And I posted in error, I was upgrading all the way from buster. So everything but the latest Linux release seems to work. I’ll report back.
|
# ? May 2, 2024 02:57 |
|
I have a SQLAlchemy model called Subject, and each Subject model has an attribute named 'heart_rate': Mapped[list['HeartRateValue']], where the HeartRateValue model stores the time_stamp and value of each heart rate value. I know if I have the ID of the subject I can use session.get(Subject, subject_id) to get the Subject object where the subject ID is subject_id. Is there a way with SQLAlchemy to then query the Subject object for the heart rate values which fall into a certain time interval (say start_date_time, end_date_time)?
|
# ? May 10, 2024 19:30 |
|
Jose Cuervo posted:I have a SQLAlchemy model called Subject, and each Subject model has an attribute named 'heart_rate': Mapped[list['HeartRateValue']], where the HeartRateValue model stores the time_stamp and value of each heart rate value. I know if I have the ID of the subject I can use Figured out how to do this: Python code:
|
# ? May 12, 2024 19:57 |
|
Anyone attending PyCon this weekend?
|
# ? May 17, 2024 02:24 |
|
I have lots of Python experience, but no pandas and while I'm trying to learn polars I clearly don't get it. I'm trying to make column selections in one dataframe based on rows in another I have a dataframe widgets: pre:shape: (3, 4) ┌───────┬───────┬──────┬──────────┐ │ group ┆ month ┆ year ┆ quantity │ │ --- ┆ --- ┆ --- ┆ --- │ │ u8 ┆ str ┆ u16 ┆ i16 │ ╞═══════╪═══════╪══════╪══════════╡ │ 1 ┆ 07 ┆ 2024 ┆ 520 │ │ 1 ┆ 09 ┆ 2024 ┆ 640 │ │ 1 ┆ 12 ┆ 2024 ┆ 108 │ └───────┴───────┴──────┴──────────┘ pre:shape: (13_992, 134) ┌─────────────┬─────────────┬────────────┐ │ 072024 ┆ 092024 ┆ 122024 │ │ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ f64 │ ╞═════════════╪═════════════╪════════════╡ │ null ┆ null ┆ null │ │ 1947.050278 ┆ 1252.862918 ┆ 857.521461 │ │ 1917.11333 ┆ 1234.664637 ┆ 846.660942 │ │ 1917.11333 ┆ 1234.664637 ┆ 846.660942 │ │ 1917.11333 ┆ 1234.664637 ┆ 846.660942 │ │ … ┆ … ┆ … │ │ null ┆ null ┆ null │ │ null ┆ null ┆ null │ │ null ┆ null ┆ null │ │ null ┆ null ┆ null │ │ null ┆ null ┆ null │ └─────────────┴─────────────┴────────────┘ I have read the docs and the O'Reilly preview for "Python Polars: The Definitive Guide" but while I get the functions I'm having trouble getting in the mindset.
|
# ? May 21, 2024 04:01 |
|
Hed posted:I have lots of Python experience, but no pandas and while I'm trying to learn polars I clearly don't get it. I unfortunately don't know Polars at all, but I can confirm that you're not crazy for feeling like there's a mindset difference. Pandas took me a while to figure out how the hell I was supposed to do basic operations stuff; a lot of it appeared to be basically 'nah you just need to apply things across a whole series'/etc rather than more traditional for-looping.
|
# ? May 21, 2024 17:00 |
|
Are you familiar with SQL? Python libraries with DataFrames (like pandas, polars, or pyspark) use a lot of SQL idioms. You don't need a for loop because a select statement applies to all of your rows.Python code:
|
# ? May 21, 2024 17:28 |
|
BAD AT STUFF posted:Are you familiar with SQL? Python libraries with DataFrames (like pandas, polars, or pyspark) use a lot of SQL idioms. You don't need a for loop because a select statement applies to all of your rows. Do you have a recommended resource to learn some basics of SQL?
|
# ? May 21, 2024 23:46 |
|
I think the problem is your price table isn't in a tidy, "one observation per row" format. I'd unpivot it with a melt() function, convert the dates in both dataframes to a common datetime format, then join on that column. edit: this should get you started: code:
Tayter Swift fucked around with this message at 00:34 on May 22, 2024 |
# ? May 22, 2024 00:19 |
|
know what, I needed a break from the other polars code I've been hitting my head with all day:Python code:
code:
|
# ? May 22, 2024 01:03 |
|
Oh, good call. I was too focused in on the "(pl.col("072024") * 520)" portion of the question (and still misunderstood it) and missed the point. I should have wondered why the second table was mentioned...Jose Cuervo posted:Do you have a recommended resource to learn some basics of SQL? I don't have a specific recommendation, other than I would look at relational databases in general and not just SQL on its own. Maybe from a university that posts its stuff online like MIT or Stanford.
|
# ? May 22, 2024 15:12 |
|
Tayter Swift posted:know what, I needed a break from the other polars code I've been hitting my head with all day: Holy crap, thank you. I didn't realize "tidy" is the concept, so I found this and yes I agree I will do this: https://r4ds.had.co.nz/tidy-data.html. It makes sense. I guess the thought is that since the data is columnar don't worry about the seemingly inefficient operations, this is actually optimized? I will probably play around to make sure I really understand it. Thanks for getting me on the right path.
|
# ? May 22, 2024 19:53 |
|
I can't read rust code very well but it looks like care is taken to not make it an O(n2) operation. But yeah, manipulating columns and rows like it's an Excel PivotTable is core to DataFrame functionality, whether it's pandas or polars.
|
# ? May 22, 2024 20:52 |
|
It's a great solution and I think I'm closer to understanding melt now. 2 minor style comments: - you can replace .with_columns(<stuff>.alias(name)) with .with_columns(name=<stuff>) - Instead of using .drop to remove the columns you don't want, use .select to keep the ones you do. I've heard that the polars devs encourage doing it that way, it makes what's in the dataframe clearer.
|
# ? May 23, 2024 00:21 |
|
I think the "x = foo" syntax showed up just after I picked up polars -- I prefer it but have a bad habit of still using .alias() sometimes. I'll die on the .drop() hill though, when I only want to drop a column or two. It's explicitly dropping the columns, while .select()ing everything but what I want is implicitly dropping (and can be difficult to state if you don't know what columns are in the DataFrame -- df.select(pl.all().exclude('foo')) is a bit much for my taste). Tayter Swift fucked around with this message at 03:44 on May 23, 2024 |
# ? May 23, 2024 03:41 |
|
Polars is definitely still changing pretty quickly in kinda major ways. I saw a talk recently where the guy was pretty clear that x= and .alias(x) are the same, but alias has become such an ingrained habit its getting passed on long past its life. I can see the reasoning behind using drop, especially if you're just creating a temporary column for some math. I might be misremembering but I thought at the talk the guy said the devs behind polars preferred select and drop was less efficient, but I should really test that before saying it too loudly.
|
# ? May 23, 2024 03:55 |
|
It definitely shouldn't make a difference if you use LazyFrames.
|
# ? May 23, 2024 04:49 |
|
Huh, looking at the issue tracker it looks like someone did experience a performance degradation when using drop() on LazyFrames "dozens to hundreds of times per second." Maybe to do with being more difficult to parallelize. Ritchie did some tweaking but it did not fix this person's problem, so he reverted to using select(). I dunno, that seems like a corner case to me and something to keep in mind, but not worth losing the readability of the code for most work.
|
# ? May 23, 2024 18:10 |
|
Sorry in advance but I am a loving moron when it comes to programming and as much as I think I understand Python when I follow tutorials, trying to do something in the real world leads to me holding my head in my hands. I'm trying to make 600 PDFs. They all have different first pages (covers), which are individually named PDFs in a folder (1.pdf, 2.pdf etc). I want to add the same PDF after each of these covers ('666-interior.pdf'). I'm trying to use PyPDF2 for this and I thought I knew what I was doing, but I do not. This code is what I got, but it's not incrementing the cover file name, so each cover is just 1.pdf and it is doubling up the interior each time, so on each iteration it's repeating the 666-interior.pdf. What am I doing wrong? I would really appreciate some help here. code:
|
# ? May 27, 2024 17:58 |
|
I'm not following your description of the bug and what output you actually get - do you still get 600 files? Or just 1 with 600 iterations of the content file? Do you get any specific error messages from python when you run it? I haven't used that tool before but in the meantime, you could try moving the definition of merger = PdfWriter() to the start of each loop, because you're closing it again at the end of every loop, and I'd assume it's not supposed to work that way. Also I imagine you need parentheses on close, like merger.close()
|
# ? May 27, 2024 18:20 |
|
boofhead posted:I'm not following your description of the bug and what output you actually get - do you still get 600 files? Or just 1 with 600 iterations of the content file? Do you get any specific error messages from python when you run it? Sorry - I should have been clearer. It's producing 600 files. Each one has the same cover (first file) which doesn't seem to be incrementing. Furthermore, it's doubling up the file size each time, adding another duplicate of the interior file. I don't think it's a bug so much as my own incompetence. I was just hoping someone would be able to point out the logic flow that's doing something wrong. And yes, I need parentheses on the end of that line.
|
# ? May 27, 2024 18:29 |
First thing is that you're .append()'ing in every iteration of the loop and it never gets reset, meaning it will just keep accumulating new pages every time it loops. That reset is what the instantiation of merger = PdfWriter() does, so that needs to be done fresh at the beginning of each iteration, i.e. within the loop, like boofhead said. See if that helps with the initial cover read issue too.
|
|
# ? May 27, 2024 18:43 |
|
If you want 600 files, shouldn't `merger = PdfWriter()` go inside of the for loop? If it's on the outside then you're appending to that same object for each iteration The line "merger.close" doesn't do anything. "merge.close()" will call the close method e: Looks like i have the same suggestions as boofhead, I should have read their post before replying! QuarkJets fucked around with this message at 18:48 on May 27, 2024 |
# ? May 27, 2024 18:43 |
|
As an aside, I'll suggest f-strings and integer 0-padding to make your file names neater:code:
|
# ? May 27, 2024 18:45 |
|
E: other people replied with the same stuff in the meantime Have you tried the changes I suggested? Looking at your code I imagine it's just ignoring merger.close and then adding the next cover page then contents to the same pdf file, so your 600th file probably looks like: Cover page 1 Contents Cover page 2 Contents ... Cover page 600 Contents Move the definition of merger inside the loop so it creates a new object each time, and close it properly, and see if that fixes it
|
# ? May 27, 2024 18:46 |
|
Ah that's great - thank you! Moving the merger = pdfWriter() into the loop seems to have done it. That and the merger.close() parentheses has worked a treat. I really, really appreciate the guidance. PS - boofhead, the interiors were repeating, but not the covers.
|
# ? May 27, 2024 18:54 |
|
I can’t remember what I replaced them with at the moment but both pypdf2 and 4 are old and unsupported at this point. I recall running into annoying stuff with them a lot and ditching them for a more modern package 1-2 years ago Edit: it was pymupdf CarForumPoster fucked around with this message at 20:55 on May 27, 2024 |
# ? May 27, 2024 19:29 |
|
Sorry for the incredibly basic question, but I'm trying to pick up Python again(no previous coding experience), and I'm completely lost on something I thought I understood. I'm doing the Cisco skillsforall basic python course and this section makes no sense to me: What comma inside the string? What does it mean by "no spaces inside the strings"? I don't understand how/why spaces are being added/removed in this example. I think this might just be a mistake in the lesson, but I don't want to move on if I'm fundamentally misunderstanding something.
|
# ? Jun 2, 2024 20:19 |
|
They just mean that the spaces separating the strings (i.e. the variables you pass to the print function) aren't strictly necessary in python and that they won't use them in future. It's just a style choice E.g. Python code:
But the spaces inside each string are important because otherwiseitwouldlooklikethis
|
# ? Jun 2, 2024 20:25 |
|
|
# ? Jun 5, 2024 07:04 |
|
SHARTING BEAR posted:Sorry for the incredibly basic question, but I'm trying to pick up Python again(no previous coding experience), and I'm completely lost on something I thought I understood. I'm doing the Cisco skillsforall basic python course and this section makes no sense to me: You may want to do code:
mystes fucked around with this message at 20:46 on Jun 2, 2024 |
# ? Jun 2, 2024 20:42 |