Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
spiritual bypass
Feb 19, 2008

Grimey Drawer
A correlated subquery in BigQuery would drain your bank account instantly

Adbot
ADBOT LOVES YOU

Data Graham
Dec 28, 2009

📈📊🍪😋



PyCharm on my Mac has been randomly flickering something awful for the last few months, and recently got a whole lot worse.

I dug around and learned that there was an update that fixes it. Hooray!

So I'm just posting to let you all know that in case anyone's having that problem, update PyCharm.

Macichne Leainig
Jul 26, 2012

by VG
I use Jetbrains Toolbox and I have to admit, it doesn’t feel like it does a great job notifying me of pending updates

duck monster
Dec 15, 2004

Bemused Observer posted:

Oh yes, I've only recently started doing stuff with PostgresSQL, previously all my SQLing was done in BigQuery (where you have basically unlimited processing power), but I immediately realized that the query planner is cool and real an my friend (and also I realized I need to catch up on how the query optimization actually works, at least on some level)

EXPLAIN <sql query> is your friend. You can kind of just grok how its doing its thing from there.

Macichne Leainig
Jul 26, 2012

by VG
https://discuss.python.org/t/a-steering-council-notice-about-pep-703-making-the-global-interpreter-lock-optional-in-cpython/30474

Sounds like the GIL is gonezo long term!

Oysters Autobio
Mar 13, 2017
I'm helping out a data analysis team adopt python and jupyter notebooks for their work rather than Excel or Tableau or whatever.

The DAs aren't writing python for it to be deployed anywhere (i.e. not for ETL or ongoing pipelines), they're mainly using various libraries like pandas and the like for data cleaning and analysis. Their end products are reports or identifying ETL needs that they can prototype and then be handed off to a data engineer to productionize into a datamart or whatever.

I want to introduce some standards for readability and maybe even convince them to use our in-house gitlab for VC

But all the resources I've seen on these topics are understandably related to SWE best practices for maintainable codebases.

I don't want to go in heavy handed by imposing all of these barriers since the goal isn't to impose python or jupyter on them, rather facilitate it's introduction and benefits that come from adopting the tools. Number one priority here since everything is setup for them with a jupyterhub instance (though this could use some love in terms of default configs).

So, does anyone know of any decent data sciency or data analysis oriented python code style books that is more oriented for those just using python as scripts but still nudges them to write descriptive classes / variables / functions and other readability standards but strips away the SWE stuff that's less important for people just writing jupyter notebooks?

Similarly any suggestions for tools or packages I should consider would be great.

Secondly, is it a bad idea to abstract different python packages and capabilities behind creating a monolithic custom library as a set of utilities for them?

Like I posted earlier about using requests, would it be a bad practice to create a python package like "utilitytoolkit" and then when they ask for new usecasea they can't quite figure out or think might be nice to simplify because they do it a lot, just publish new modules and add on to it?

Like say there's a really common API they use, I could abstract using python requests to run a GET call and transform the JSON payload into a pandas dataframe into a module so all they need to do is "utility.toolkit()"? I don't want to monkey around with APIs or microservices when everyone is just using python anyways.

Oysters Autobio fucked around with this message at 12:35 on Aug 1, 2023

The March Hare
Oct 15, 2006

Je rêve d'un
Wayne's World 3
Buglord
I would simply use black to autoformat for them, though I do not know how jupyter works so it may not be possible.

CompeAnansi
Feb 1, 2011

I respectfully decline
the invitation to join
your hallucination

Oysters Autobio posted:

I'm helping out a data analysis team adopt python and jupyter notebooks for their work rather than Excel or Tableau or whatever.

What's wrong with Tableau for data analysts? I understand maybe wanting to be able to explore the data in python if you want, but visualizations generally look better in Tableau and they're more interactive. Seems like a confusing move unless leadership is trying to move them more into doing data science type work. If they're just using python for cleaning data, shouldn't that be the job of the data engineers?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

The March Hare posted:

I would simply use black to autoformat for them, though I do not know how jupyter works so it may not be possible.

https://black.readthedocs.io/en/stable/getting_started.html#installation looks like it works just fine. I think the file format is just a big json document anyway.


Oysters Autobio posted:

I'm helping out a data analysis team adopt python and jupyter notebooks for their work rather than Excel or Tableau or whatever.

The DAs aren't writing python for it to be deployed anywhere (i.e. not for ETL or ongoing pipelines), they're mainly using various libraries like pandas and the like for data cleaning and analysis. Their end products are reports or identifying ETL needs that they can prototype and then be handed off to a data engineer to productionize into a datamart or whatever.

I want to introduce some standards for readability and maybe even convince them to use our in-house gitlab for VC

But all the resources I've seen on these topics are understandably related to SWE best practices for maintainable codebases.

I don't want to go in heavy handed by imposing all of these barriers since the goal isn't to impose python or jupyter on them, rather facilitate it's introduction and benefits that come from adopting the tools. Number one priority here since everything is setup for them with a jupyterhub instance (though this could use some love in terms of default configs).

So, does anyone know of any decent data sciency or data analysis oriented python code style books that is more oriented for those just using python as scripts but still nudges them to write descriptive classes / variables / functions and other readability standards but strips away the SWE stuff that's less important for people just writing jupyter notebooks?

Similarly any suggestions for tools or packages I should consider would be great.

Some notes from a guy who kind of walked the path you're trying to push people towards, and have pushed some newbs up the same hill:

Most good rules of computer science have mostly reasonable explanations for why things are the way they are, and so the best way I've found is to find a recent example of where something sucks:

Why use source control? Remember that time you modified your report because boss says they wanted it different, and it broke and it took you forever to go back to the working version? Source control!
Why use dataclasses? Remember how much of a pain it is to constantly have to repull the column titles to remember what the keys are in this massive dictionary you imported? Dataclasses!
Why bother with readability? Remember when so and so was out sick and you had to pick up their work and you couldn't figure out wtf any of it meant and you were stuck working late to meet that deadline? LINTERS!
Why bother with unit testing? (Alright I dunno if this is a winnable fight) Remember when you had to update that report and the only way you could make sure it worked was to go through 2 hours of manual testing each time you made a small change?

The bigger thing I think is to avoid them solving known programming problems with terrible solutions, as secondary programmers are wont to do since they don't have exposure to good coding practices. However, most people are willing to learn, non devs just generally need a more real world, direct example of why something is a good idea, where devs have encountered enough of those things to kind of go along with best practices on principle a lot of the time.

quote:

Secondly, is it a bad idea to abstract different python packages and capabilities behind creating a monolithic custom library as a set of utilities for them?

Like I posted earlier about using requests, would it be a bad practice to create a python package like "utilitytoolkit" and then when they ask for new usecasea they can't quite figure out or think might be nice to simplify because they do it a lot, just publish new modules and add on to it?

Like say there's a really common API they use, I could abstract using python requests to run a GET call and transform the JSON payload into a pandas dataframe into a module so all they need to do is "utility.toolkit()"? I don't want to monkey around with APIs or microservices when everyone is just using python anyways.

I know this probably makes real software engineers cringe, but IMO: if you're not distributing it and this doesn't ever go out in a package or anything (and you don't have complex build dependencies/etc), I don't really see a major problem with starting that way. If it's useful, it's probably fine, but I'd still practice general separation of modules/etc within it, so you might have utilitytoolkit, but you should try and organize things somewhat - utilitytoolkit.clients, utilitytoolkit.text, etc.

I would also recommend naming it something a bit better than that; probably $TeamNameLib is more useful.

FWIW I work on a production service and we have a series of libraries that wrap and extend the basic CRUD clients we get for the services we interact with and it works great. We separate them by client, so you have like roboto.s3 and roboto.ddb and they share a namespace but are separate packages.

QuarkJets
Sep 8, 2008

I've written a lot of data science code and have never regretted using basic software engineering practices as part of that workflow. It doesn't really take any extra work to initialize a git repository. If you need to write code to sanitize and analyze some bespoke data source odds are you'll have to do it again and again, so you may as well write code that you can reuse and maybe even give to others, it's not like using functions is really any harder than writing everything as one big script

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

I've written a lot of data science code and have never regretted using basic software engineering practices as part of that workflow. It doesn't really take any extra work to initialize a git repository. If you need to write code to sanitize and analyze some bespoke data source odds are you'll have to do it again and again, so you may as well write code that you can reuse and maybe even give to others, it's not like using functions is really any harder than writing everything as one big script

I don't think anyone's going to argue with you here; the point I was making is that if you're new to the field a lot of dev stuff seems kind of weird and mysterious because you don't have any background on why, and telling people to do stuff 'because it's not much more work' when they don't understand why they're doing it is basically not going to work.

Edit: also if you have no idea how it works Git is far from simple and you can absolutely gently caress your poo poo up easily.

Falcon2001 fucked around with this message at 18:09 on Aug 2, 2023

Generic Monk
Oct 31, 2011

Falcon2001 posted:


Edit: also if you have no idea how it works Git is far from simple and you can absolutely gently caress your poo poo up easily.

i’ve taught a few novices how to use git/written documentation for them. i think it’s best to give a very high level overview of first principles so they understand the very basics of what it’s doing (the repository is a sum of all your commits basically) and then be very clear what the immediate benefits are (not ever worrying about having to manually back code up or overwriting someone’s changes, everything you delete is easily accessible if you need it again, you can see exactly who made changes to code and when and those will have helpful messages attached etc etc)

don’t dumb it down to the point where it obfuscates the fundamentals but also don’t throw too much jargon in their face at one time; maybe wait until they’re comfortable with cloning and committing before introducing branching and merging for example

in my previous job when i arrived the person evangelising git had got everyone to have their own fork of the repo holding the data transformation scripts, and to do a ‘pull’ you would do a merge commit from the main repo into your fork. he’d read too far ahead and convinced himself that the process needed to be more complex than it needed to be. keep it simple and have clear documentation for the commonly followed steps.

most people loving hate using the CLI and prefer a GUI. people really seem to respond well to the git interface in VSCode - i can’t blame them, it doesn’t throw a lot of stuff at you but also doesn’t abstract the process to the point where you kind of lose track of what you’re doing, which i think GitHub desktop for instance does.

most of this falls out of the window when you’re dealing with merge conflicts though which are objectively unintuitive as gently caress. try to deal with these yourself at the start lmao

all that said, trust that most people who are even moderately engaged and computer literate will pick it up pretty quickly, given a GUI and not being overwhelmed with jargon. you will probably already know the people that will be a bit more difficult. one guy at my previous job insisted on submitting his code in the form of an MSSQL .bak file until the day i left. i left him alone since he had staked out his own ‘patch’ anyway

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I know a moderate amount of info about git but I still use GitHub Desktop as my primary interface. It does obfuscate some of the workings of git but also they end up being some of the stuff about git that really sucks. But I've found it meets the vast majority of needs, and certainly all the day to day needs. Even merge conflicts are not so bad, paired with vscode. When GitHub Desktop detects a merge conflict I open the file in vscode and vscode helps me fix it and I save the file and GitHub Desktop is happy.

Oysters Autobio
Mar 13, 2017

CompeAnansi posted:

What's wrong with Tableau for data analysts? I understand maybe wanting to be able to explore the data in python if you want, but visualizations generally look better in Tableau and they're more interactive.

Problem with Tableau that I find is that its the worst software ever designed in terms of reusability. You could work on one component or dashboard sheet for days, but you can't just swap out the data for anything similar so nothing is modular or re-useable. That lack of modularization makes it only good if you decide to deploy and maintain a standing dashboard that serves a re-occuring need. Once you sink your time into it though don't ever expect to be able to gain anything back, it's very frustrating to see old dashboards with what look like (from PDF printouts) really awesome visualizations but the moment you open it, it throws up connection errors. I find doesn't make it a good platform for quick visualization for a one-off report or product.

Aside from those things, I am also just not a fan of the "developer experience" if you will. In addition to not being able to re-use components and hot-swap data, I hate how none of the UI is declarative. Instead of telling the GUI what you want to see, you have to randomly drop in esoteric combinations of dimension and measures until you get the result you were looking for. Absolutely hate that design.

Finally, its expensive so we have limited licenses for even viewers, so less people can use your products.

quote:

If they're just using python for cleaning data, shouldn't that be the job of the data engineers?

Hahahaha, yeah I ask myself that everyday. No, our data engineers don't clean data, they just ETL (i guess without the 'T') into the data lake and "there you go, have fun". Most of our engineers are working on greenfield replacement projects and don't want to support our legacy self-rolled on-prem hadoop stuff nor were any of them around when it was deployed.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Cross-posting from the SQL thread since Python is a bit more appropriate:

Anyone here work with Spark on a routine basis? If so, what's the nature of the work and how do you like it?

I'm studying up for the databricks Spark Developer cert but unfortunately don't have much excuse to use it in my current role.

QuarkJets
Sep 8, 2008

Hughmoris posted:

Cross-posting from the SQL thread since Python is a bit more appropriate:

Anyone here work with Spark on a routine basis? If so, what's the nature of the work and how do you like it?

I'm studying up for the databricks Spark Developer cert but unfortunately don't have much excuse to use it in my current role.

I work with Spark basically every day

Spark is kind of lovely and unintuitive with tons of badly named mystery switches, and having it makes my CVE analyzer light up like a christmas tree, but it's better than what came before. I'd rather use Dask but this is what my group was using before I joined it and things are too far along to switch now

Hughmoris
Apr 21, 2007
Let's go to the abyss!

QuarkJets posted:

I work with Spark basically every day

Spark is kind of lovely and unintuitive with tons of badly named mystery switches, and having it makes my CVE analyzer light up like a christmas tree, but it's better than what came before. I'd rather use Dask but this is what my group was using before I joined it and things are too far along to switch now

Awesome! Are you using homegrown clusters on premise, or something like Databricks/EMR/Synapse etc? Is your work more in the data analytics space, or engineering/ML?

QuarkJets
Sep 8, 2008

Hughmoris posted:

Awesome! Are you using homegrown clusters on premise, or something like Databricks/EMR/Synapse etc? Is your work more in the data analytics space, or engineering/ML?

Homegrown, engineering

Spark does a good job of doing what it says on the tin with resilient datasets, it's just got a lot of warts. Apache's focus seems to be on people doing bespoke data analysis every time rather than production systems

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Had an opportunity to play around in Flask today to try something, after spending the last year in FastAPI and holy moly are these entirely different ecosystems.

oatmealraisin
Feb 26, 2023

FISHMANPET posted:

Had an opportunity to play around in Flask today to try something, after spending the last year in FastAPI and holy moly are these entirely different ecosystems.

What have you noticed? I moved to fastapi a while ago and honestly was just using Flask for small APIs, and honestly beyond the endpoint niceties I dont really remember what I'm missing

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Here's a potentially weird meta-python question: Is there any easy utility for generating dataclasses / pydantic models /etc (any sort of Python data structure) from an existing document, like a JSON or CSV? Like the actual class definition.

Even if it could just scaffold it out and I had to take over from there, that would be very handy. I do some various scripting work that involves a lot of 'here's data in a structured format, do something with it' and I like working with objects instead of just a big ol' dict. I realize this isn't a massive time savings or anything, but it'd be nice to be like 'alright, take this CSV and give me something I can throw in a .py file so when I import and dump it I get IDE completion/etc'.

To be clear: I'm not talking about dynamically generating the model during runtime, I mean like something that spits out the actual model definition for me.

Edit: https://koxudaxi.github.io/datamodel-code-generator/ This looks like it might be what I want? Going to play around with it.

Falcon2001 fucked around with this message at 17:14 on Aug 4, 2023

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams

oatmealraisin posted:

What have you noticed? I moved to fastapi a while ago and honestly was just using Flask for small APIs, and honestly beyond the endpoint niceties I dont really remember what I'm missing

I'm working on a GUI app with Flask, specifically trying to build some forms, so obviously it's a totally different case than writing an API in FastAPI. But there's so many batteries included in Flask, and batteries you can easily add in. I was fine in FastAPI using Pydantic BaseSettings to import my settings from a .env file, but Flask just does it for me (as long as I install python-dotenv). And I'm using forms so I can just bolt-in Flask-WTF. And I thought it was annoying to have to hand code my forms into the jinja template, but it turns out I can just install bootcamp-flask to do that for me. And etc etc.

I don't really think it's better or worse, it's just... different.

StumblyWumbly
Sep 12, 2007

Batmanticore!
After a lifetime of stashing stuff in dicts that are painful when I look at them 2 weeks later, I'm thinking of just going whole hog with dataclasses. I get into a lot of situations where I have 20 pieces of hardware, each with a sample rate, range, and enable/disable, or a block of data with a start time, end time, format, 10k samples, and a few other features.

Is there any drawback to using dataclasses more? It seems too good to be true.

Anything to consider in going with Pydantic vs dataclass?

nullfunction
Jan 24, 2005

Nap Ghost
Dataclasses are cool and good and if you're accustomed to just throwing everything in a dict it's easy to get started and see immediate benefits.

Pydantic is really helpful if you are getting your data from a less-than-trusted source and need to ensure that everything comes in exactly as you expect by using its validators. Naturally there's a cost to this extra processing, and whether or not this is acceptable will depend on your requirements and goals, but I can say that for all of my use cases the ergonomics of using Pydantic far outweighed the performance cost. I haven't had the opportunity to use the new 2.x series but I understand the core has been rewritten in Rust and is significantly faster than the 1.x series.

If your data is of a predictable format or you need to stick to the stdlib, dataclasses will suffice. If your data is uncertain and you don't mind consuming an extra dependency, it's hard not to recommend Pydantic.

QuarkJets
Sep 8, 2008

StumblyWumbly posted:

After a lifetime of stashing stuff in dicts that are painful when I look at them 2 weeks later, I'm thinking of just going whole hog with dataclasses. I get into a lot of situations where I have 20 pieces of hardware, each with a sample rate, range, and enable/disable, or a block of data with a start time, end time, format, 10k samples, and a few other features.

Is there any drawback to using dataclasses more? It seems too good to be true.

Anything to consider in going with Pydantic vs dataclass?

There's basically no downside to using dataclasses instead of dicts. If you need to go back and represent a dataclass as a dict there's a function in the dataclasses module that performs that conversion for you

iirc dataclasses were inspired by Pydantic so they share a lot of features. You'd use Pydantic if you want the additional features that come with that package, such as data validation

StumblyWumbly
Sep 12, 2007

Batmanticore!
Exactly what I was hoping for, thanks!

lazerwolf
Dec 22, 2009

Orange and Black

Falcon2001 posted:

Here's a potentially weird meta-python question: Is there any easy utility for generating dataclasses / pydantic models /etc (any sort of Python data structure) from an existing document, like a JSON or CSV? Like the actual class definition.

Even if it could just scaffold it out and I had to take over from there, that would be very handy. I do some various scripting work that involves a lot of 'here's data in a structured format, do something with it' and I like working with objects instead of just a big ol' dict. I realize this isn't a massive time savings or anything, but it'd be nice to be like 'alright, take this CSV and give me something I can throw in a .py file so when I import and dump it I get IDE completion/etc'.

To be clear: I'm not talking about dynamically generating the model during runtime, I mean like something that spits out the actual model definition for me.

Edit: https://koxudaxi.github.io/datamodel-code-generator/ This looks like it might be what I want? Going to play around with it.

Have you tried just asking ChatGPT to generate the boilerplate code for you? Seems like something it should be able to do for you pretty easily.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

There's basically no downside to using dataclasses instead of dicts. If you need to go back and represent a dataclass as a dict there's a function in the dataclasses module that performs that conversion for you

iirc dataclasses were inspired by Pydantic so they share a lot of features. You'd use Pydantic if you want the additional features that come with that package, such as data validation

I think the stdlib dataclasses originated from https://www.attrs.org/en/stable/ but I wouldn't be surprised if both were influences.

Other than that minor pedantry I agree with this statement 100%.

IMO: Dictionaries are best when they're dynamically generated maps of data or where you're using the key for accurate quick retrieval of something, not stand ins for data structures. Something like:

Python code:
account_vals = {
	"account_id_1": "value",
	"account_id_2": "value",
}
Whereas this example should probably be a dataclass:
Python code:
account_data = {
	"account_name": "butts",
	"account_butt_size": "XXL",
}
Honestly sometimes I'll throw a Dict[str, Dataclass] together if I have a list of dataclasses that I want to reference directly by an ID, something like:
Python code:
acct_data_map = {
	"id_001": AccountDataMap(id="001", name="butts", account_butt_size="XXL"),
	"id_002": AccountDataMap(id="002", name="butts", account_butt_size="S"),
}
Mostly because when I can trust the data, it's nicer and technically faster (CPU speed tradeoff for memory) to do acct_data_map[id] instead of next(x for x in list_of_dataclasses if x.id == id).

lazerwolf posted:

Have you tried just asking ChatGPT to generate the boilerplate code for you? Seems like something it should be able to do for you pretty easily.

ehhhhhh it's work stuff and we're not supposed to put any data anywhere near ChatGPT. If it was for home use I bet it'd be great.

Edit: I forgot about the one thing that drives me a little crazy with dicts vs dataclasses, mostly in the json space:

I *really* wish the stdlib dataclass could serialize datetimes, because it's a super common thing and there's a very handy and consistent option for it with datetime.isoformat() and datetime.fromisoformat()

It's basically the only common thing I find myself having to sigh and write a factory for when I need to serialize/deserialize data, because the other stuff I run into are complex objects.

Falcon2001 fucked around with this message at 01:49 on Aug 5, 2023

necrotic
Aug 2, 2005
I owe my brother big time for this!
Parsing dates deviates from the standard (and what basically every other parser does by default). It also adds complexity as they have to look at every string field to see if it fits the pattern, and then parse it.

It's not that much work to define a function that loads the json and converts whatever fields you need to their specific formats (be it datetime, or even something more complicated specific to your needs).

Precambrian Video Games
Aug 19, 2002



Pydantic dataclasses are almost entirely compatible* drop-in replacements for stdlib dataclasses so there's not much reason to stick to stdlib unless you can't use pydantic at all for whatever reason.

*I think there are some subtle difference with how the decorator parameters like kwargs_only work, but maybe 3.11 fixed that?

Somewhat related, I keep getting frustrated that nobody seems to want to add an ordered set to stdlib. That should have been done when dicts became ordered by default. I can understand nobody wanting to take the time to actually implement it efficiently but the workaround to use a dict with None values is gross. There is an OrderedSet package, not sure how well it works but again it should at least be in collections or something.

By a similar token, working with dicts in pybind11 is a mess because the C++ STL doesn't have an insertion-ordered dict (std::map is ordered by key hashes). So passing Python dicts to C++ is hazardous and I keep forgetting this.

QuarkJets
Sep 8, 2008

Dataclasses don't parse anything at all, they do a good job of just holding data. I'd be pretty mad if I defined a dataclass that silently parsed a string into a datetime

Just perform that conversion as part of a class method that returns a new instance of the dataclass - that's what, 2 lines of code including the `def` statement?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Those are good points but I reserve my right to be opinionated and wrong. :smugbert:

At least it would be nice to have some built ins in the stdlib

Falcon2001 fucked around with this message at 04:35 on Aug 5, 2023

Hed
Mar 31, 2004

Fun Shoe

Falcon2001 posted:

Here's a potentially weird meta-python question: Is there any easy utility for generating dataclasses / pydantic models /etc (any sort of Python data structure) from an existing document, like a JSON or CSV? Like the actual class definition.

Even if it could just scaffold it out and I had to take over from there, that would be very handy. I do some various scripting work that involves a lot of 'here's data in a structured format, do something with it' and I like working with objects instead of just a big ol' dict. I realize this isn't a massive time savings or anything, but it'd be nice to be like 'alright, take this CSV and give me something I can throw in a .py file so when I import and dump it I get IDE completion/etc'.

To be clear: I'm not talking about dynamically generating the model during runtime, I mean like something that spits out the actual model definition for me.

Edit: https://koxudaxi.github.io/datamodel-code-generator/ This looks like it might be what I want? Going to play around with it.

I would love something like this that introspects and gives you a probable boilerplate typed model. If you find out something let us know!

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Hed posted:

I would love something like this that introspects and gives you a probable boilerplate typed model. If you find out something let us know!

So I've only tested it with CSVs, which like...it works with strings! But in general this seems to do some of that? https://koxudaxi.github.io/datamodel-code-generator/ It worked great for CSVs, although I did open a discussion item to see if there's a way to get a mapping of the CSV column name to the dataclass field names so I can easily slam jam that CSV into the dataclass it makes.

I want to see how well it handles nested stuff or complex objects because that would be a lifesaver.

joebuddah
Jan 30, 2005
I've been tasked with getting input from
A. Devices that output data in US format #,###.###
B. The possibility of hand key in either US data format or EU format of #.####,###

Then formatting all of it to EU format.
I've got the number format part figured out.

Is there a better way to verify the hand keyed EU format values, than using regex for each character index location?

LightRailTycoon
Mar 24, 2017

joebuddah posted:

I've been tasked with getting input from
A. Devices that output data in US format #,###.###
B. The possibility of hand key in either US data format or EU format of #.####,###

Then formatting all of it to EU format.
I've got the number format part figured out.

Is there a better way to verify the hand keyed EU format values, than using regex for each character index location?

Is there always a decimal? Are there always three digits after it? If so, ignore the formatting.

nullfunction
Jan 24, 2005

Nap Ghost
Hopefully you've found the locale module in the stdlib to do your output formatting and just missed the fact that locale.atof() exists. The name isn't intuitive, but it parses strings according to your locale settings, which you can change using setlocale().

https://docs.python.org/3/library/locale.html

Son of Thunderbeast
Sep 21, 2002

joebuddah posted:

I've been tasked with getting input from
A. Devices that output data in US format #,###.###
B. The possibility of hand key in either US data format or EU format of #.####,###

Then formatting all of it to EU format.
I've got the number format part figured out.

Is there a better way to verify the hand keyed EU format values, than using regex for each character index location?

Here you go

[0-9][,/.][0-9]{3}[,/.][0-9]{3}

:ocelot:

Cyril Sneer
Aug 8, 2004

Life would be simple in the forest except for Cyril Sneer. And his life would be simple except for The Raccoons.
Looking for advice on the following example code I've been playing around with:

code:
class aName:
    vals = []
    @classmethod
    def do_something (cls, x,y):
        output = x + y
        cls.vals.append( output )
        print(f'appended {output} in {cls}')
        
    @classmethod
    def get_vals(cls):
        return cls.vals
    
class Bob (aName):
    vals = []
    
    
class Sam (aName):
    vals = []
    @classmethod
    def do_something (cls, x,y):
        output = x * y
        cls.vals.append( output )
        print(f'appended {output} in {cls}')
    
    

Bob.do_something (1,3)
Bob.do_something (2,3)
Bob.do_something (3,3)

Sam.do_something(1,3)
Sam.do_something(2,3)
Sam.do_something(3,3)

Bob.get_vals() # returns [4, 5, 6]
Sam.get_vals() # returns [3, 6, 9]
It's easiest to understand if you start from the bottom, where you'll see my desired calling pattern. I want to create different "static" classes that can accumulate different results depending on which is called. You'll see the Bob class does not subclass anything and so preserves the addition operation. In my Sam class, I've modified the do_something function to perform multiplication instead of addition.

The above code does actually work the way I want it to, I just don't know if its the best way to do it (or might in fact be considered a bad way!).

My main gripe is the need to define that Bob class, which doesn't override any behaviour, and only serves to re-scope the class variable.

Adbot
ADBOT LOVES YOU

Precambrian Video Games
Aug 19, 2002



Cyril Sneer posted:

My main gripe is the need to define that Bob class, which doesn't override any behaviour, and only serves to re-scope the class variable.

You can make do_something an abstractmethod and move the implementation to Bob.

Unless you really desperately need these things to be quasi-singletons, I would not make do_something a classmethod. For one, it makes it difficult to have more than one around, whereas I don't see why users should be forbidden from doing so. If you want default set of Bob/Sam/whatever instances you can define that in a module.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply