|
Python is bad because its type system is based on pinkie promises and even mypy cannot save you. At least at my company where once a week we have some dumb sentry issue because someone has donecode:
|
# ¿ Sep 3, 2023 01:57 |
|
|
# ¿ May 11, 2024 16:04 |
|
We’re pretty good about hinting in first party code but it all falls down as soon as you use a third party library for which there are no types. Fine for Flask/Django/whatever other popular library but just lol if I try to get anyone to write our own type packages for third party deps we use.
|
# ¿ Sep 3, 2023 02:26 |
|
Falcon2001 posted:Familiar with Frozen dataclasses, but this solves a problem I'm not actually trying to fix. The issue isn't 'you can reassign/update attributes after creation', the problem is 'certain attributes make calls out to clients to populate data, and we need to ensure that doesn't happen.' Assuming there's a naming convention for the underlying properties, you could make a class that just dynamically returns those when fetching a property. Something like: Python code:
|
# ¿ Sep 14, 2023 04:22 |
|
Edit: ^^^ Dataclasses aren't immutable unless you do @dataclass(frozen=True). You're still relying on this global variable in create_dict which you don't want, and your Parser (I'd probably call it COnfig or something to be more clear that it doesn't do any parsing itself) is tightly coupled to CLI args. Maybe today you only take CLI args to configure this, but someday it may come from env vars, or a config file, or a network call, etc. What you're aiming for is to separate "the config" and "where the config comes from" in most of the code, so it doesn't have to care. Python code:
Personally, I would avoid passing the entire config object to data_display since all it needs out of it is the sort config. Passing around a big bundle of state like that can make it hard to reason about what exactly a function relies on and has led me to some hair-pulling debugging sessions in the past. Or put another way, if a year from now you need to make some changes or fixes to data_display (or call it from some other part of the codebase), which of these function signatures tells you the most about how it works? Python code:
WHERE MY HAT IS AT fucked around with this message at 16:45 on Sep 15, 2023 |
# ¿ Sep 15, 2023 16:42 |
|
I can only assume they're just too close to retirement to want to learn something new, even if it's demonstrably better. Not that people earlier in their careers can't be stubborn or cling to garbage, either. Just IME that's been where I've seen it.
|
# ¿ Sep 19, 2023 17:51 |
|
QuarkJets posted:I especially hate package names that install a completely different module name. Pillow is the big offender for me, I hate that the module you import is actually named PIL. May as well call it PISS At least Pillow has the excuse of it being a fork of PIL so the import is for backwards compatibility/historical reasons. Beautiful Soup 4 with its dumb "install beautifulsoup4 (but not beautifulsoup, because that's beautifulsoup 3!) and then import bs4" can go to hell, though.
|
# ¿ Sep 27, 2023 05:02 |
|
teen phone cutie posted:i have a general question that i'm just now starting to look into, but would be great if someone could point me in the right direction: If you want to execute this logic on each request so you can gather stats, what you're after is the before_request decorator. It doesn't pass any params, but you can import flask.request to have access to a request object to inspect. As for the actual stat storage, you haven't said anything about what your backing data store is, but I'd probably do something like: for each request, extract the user ID out of the JWT and upsert a row in a DB table with the user ID and current timestamp + 15 minutes. Then, when you want to view active users, query the table for rows with a timestamp greater than current. Depending on space constraints, you'll likely want to clean this up periodically by dropping all rows with an expired timestamp. Tracking users not logged in depends on how you want to use that data. Do you need to bucket it day by day? As a percentage of total requests? Just an absolute count? Harder to suggest an approach to that one without more details but maybe just a table of unique IPs + day of access would suffice.
|
# ¿ Nov 29, 2023 05:21 |
|
Curious to hear about other folks' solutions! I used a trie for part two so I could iterate over the string and check substring prefixes as I went along until I hit a valid word. However, I hadn't implemented a trie since I was in school, which was close to a decade ago now, so re-learning it all burned me out on puzzles and I haven't bothered doing part two of any of the other days.
|
# ¿ Dec 12, 2023 18:18 |
|
If I can keep making this the ETL thread for a little longer, what would you all suggest for a totally greenfield project? I do some contracting for my wife’s company, and inherited a mishmash of make.com no-code projects, single-purpose API endpoints in Heroku, and Airtable automations which sync data to Google Sheets. They’re starting to hit a scale where this is a problem because things are unreliable, go out of sync, certain data exists only in some data stores, etc. The goal is to get them set up with a proper data store that can act as a single source of truth, and an actual ETL platform where I can write real code or use premade connectors to shift data around. I did a bit of digging and something like Airbyte + Databricks looks nice, but maybe it’s overkill for what they need? Think thousands of rows a day rather than millions, and they only want to be able to do dashboarding and ad-hoc querying in Tableau. Would I regret just doing a managed Airflow and a Postgres instance at this point? I don’t want to have to redo everything in a year or two.
|
# ¿ Dec 26, 2023 12:27 |
|
Thanks! Those both look interesting, leaning towards dagster just because mage doesn’t have a managed hosted setup and I want to minimize the time I spend on this. Say I have a pydantic model that represents my incoming data from a third party API (in this case a warehouse management system), what are folks using to actually write that to a raw table for transformation with dbt later? At work we use sqlalchemy for all our DB interactions but that seems heavy handed, especially if I’ve already got a list of models parsed from JSON or whatever. I could just hand roll a parameterized sql statement but surely there’s a library out there that will do what I need, right? Edit: looks like Dagster can do this natively with a data frame, never mind! WHERE MY HAT IS AT fucked around with this message at 10:23 on Dec 27, 2023 |
# ¿ Dec 27, 2023 00:47 |
|
Yeah, they’re probably several years out from needing a full time engineer, if they ever get that far. So I’m it, and the less time spent on maintenance the better for both sides. I’ll start with Postgres since I have experience scaling that pretty far at work, and plan to move to BigQuery someday if they need it. A data engineering thread sounds like a good idea, I can work on an OP this week since I’m off on PTO and my nephews gave me the plague or something anyways.
|
# ¿ Dec 28, 2023 12:36 |
|
Poetry has its own resolver and doesn’t rely on pip, pipenv does just punt to pip under the hood and is even slower. The real problem is just that setup.py is a non deterministic way of specifying dependencies so you can’t have a truly “offline” package resolver, you have to actually install stuff to do lock file generation.
|
# ¿ Jan 26, 2024 14:14 |
|
There's also Pex: https://pex.readthedocs.io/en/v2.1.163/ It requires that an interpreter matching your constraints be present on the system already, but otherwise, it includes all runtime dependencies and acts as a hermetic environment like a venv does.
|
# ¿ Feb 14, 2024 22:08 |
|
I don't think I'd call myself an "expert" so maybe someone will weigh in with a better solution, but using ffill seems like it would work:code:
|
# ¿ Feb 21, 2024 03:24 |
|
|
# ¿ May 11, 2024 16:04 |
|
There's a burgeoning data engineering thread here that might get you a better answer, but I think the crossover of posters is high anyways: https://forums.somethingawful.com/showthread.php?threadid=4050611
|
# ¿ Feb 21, 2024 21:22 |