|
Data Graham posted:Wait a minute, I was just using dataclasses for something similar and was going to mention something annoying about them, which is You can assign defaults in the dataclass definition, see my example
|
# ¿ Apr 26, 2024 22:42 |
|
|
# ¿ May 16, 2024 02:00 |
|
I don't like how the function code is all wrapped in large try/except blocks that raise new errors. That's not really error checking, it's error obfuscation; yes, you print the caught exception object, but it's hiding the stack trace for no good reason. If you absolutely felt like you had to add more context to exceptions, like if you wanted to use logging or send a notification to someone, then you could use exception chaining to raise the original exceptionPython code:
Don't run pip install commands in your notebook code. A comment that describes what's needed is fine.
|
# ¿ Apr 27, 2024 03:15 |
|
The commit message is "challenge completed"; one of the requirements is to write a concise commit message, but I think that's probably too concise. After looking it over a little I realized that the code isn't pep8-compliant, which is another requirement. A common tool to verify this is the standalone utility named "pycodestyle", which used to literally be called "pep8". An even better tool is "flake8", which combines pycodestyle and some other linters into one utility. Try installing flake8 (with pip) and running it on your script (`flake8 challenge.py`), it'll print out a list of line numbers with specific issues to fix. You can also run pycodestyle in the same way if you want to see only the pep8 problems. Linting is integrated directly into most modern IDEs but every one has its own default flavor of what's "right" and not all of them are pep8-compliant; lesson learned, run flake8 on your code before you commit it (you should look up pre-commit-hooks, during a git commit they can not only automatically fix a lot of common issues like unnecessary whitespace at the end of lines, you can also incorporate a flake8 pre-commit stage that will cause the commit to be rejected if flake8 has complaints; this enforces pep8 compliance on anything that you try to commit, it's very handy!) Those two issues (PEP8 compliance, vague commit message) + the result not being sorted by zip code is 3 failed requirements. Those are probably the big issues here. I have some other suggestions for your code that you can think about, because I like providing code review, but I don't think these matter as much: The `Address` class would be a lot more concise if it was a dataclass. Then the `view_dict` method reduces to 2 lines (one of which is the asdict function from the dataclasses module). As it stands, `Address` also has no documentation A class-based approach like this has fallen out of favor with a lot of Python developers. The methods in `Parser` could have all been functions. `file_path[-3:].lower()` is less effective at extracting file extensions than `os.path.splitext(file_path)[-1]`. Technically it satisfies the assignment, but this line is a brittle implementation that could require rework if it had to be extended to other file formats in the future. In a code review I'd reject this line. The rest of the implementation in `parse_file` looks good. A dictionary dispatch approach would be equally valid I'm not great at XML parsing. But when you extract the street components I think you're missing some logic to correctly handle street names. Here's your approach to concatenating these: Python code:
Python code:
XML code:
Python code:
- We use a walrus operator to only keep elements that have length > 0 - We join together the remaining elements with spaces Moving beyond the issues with start parsing in xml, I'm not enamored with the "x = y if y is not None else None" approach in general. 90% of this code is duplicate effort. I think this could be a lot cleaner with a looped approach. I'm imagining a dict comprehension but even a for loop could be cleaner, I think The postal code parser is a little too complicated and I think it doesn't work right with XML values. The Zip+4 values in the XML are formatted as "NNNNN - MMMM" but your regex is `r"^\d{5}-\d{4}$`. That lack of a space means that your parser assumes it's not a Zip+4 value, so it always returns the first 5 digits. This would work better: Python code:
In the tcv parser I think you were meant to infer that sometimes an Organization label is actually in the "last" column; it looks like you're parsing "Central Trading Company Ltd." as a last name for a person with no first name. If a middle name is "N/M/N" you should probably just not include that value. E.g. "no middle name" == "N/A" == don't bother keeping this vlaue Probably don't want to leave a "Hello, world!" docstring in your code
|
# ¿ Apr 29, 2024 08:08 |
|
Fender posted:There is some good feedback in here. Some of it I feel would be appropriate for a conversation. Like, why I check for None a lot. It's just a habit I learned in my last role where we parsed a lot of unreliable data sourced from scrapers. I would happily explain my position there. One problem I see is that the None-checking is everywhere. Since you need to filter out empty strings anyway, it'd be better if you translated "None" to an empty string - then all of the code that's downstream from the file parsers doesn't have to do any None checking. Basically you have these 3 interface functions that are letting None values filter into the rest of your functions, so you need to handle the possibility of None everywhere, but if you handled the None values in the interface functions then the rest of your code would be a bit nicer. Another problem is that the None checking has become boilerplate; it'd be better to use some small utility functions to simplify the code. This also gets back to the issue I talked about where the street values have the possibility of raising an exception if any of them happen to be None; you did the none-checking for each of the street components in the XML function, but then you forgot that any of them could be None when you concatenated them together. If you had an interface function that returned only strings instead, the code looks nicer and this bug goes away. For example: Python code:
Python code:
quote:And stuff like the Address class having that weird view dict method, would love to chat about that. That is like that bc while they never said it, the examples both showed the printed json ignoring any keys without values, so I went out of my way to provide that since all the example data had different fields present/missing depending on the format. I was really getting in my own head by hour number 4. I got that it was necessary to filter keys without values; I was saying that the view_dict method could have been a lot more concise. If you'd used a dataclass: Python code:
Python code:
|
# ¿ May 1, 2024 03:37 |
|
Thank you! Yeah that data sucks for sure, what a pain. And irl someone is going to come to you with some stupid bullshit like "Oops between dates X and Y the last and first names are swapped and in the wrong columns, can you fix that" and depending on the request you have to know how to ride the line between "no that's an unreasonable request" and "this is important, I will help you with that".
|
# ¿ May 1, 2024 22:27 |