Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Oysters Autobio
Mar 13, 2017
Sometimes I want to go back to R and it's tidyverse but everything at work is python so I know I don't have the same luxury.

Two questions here, first, anyone have good resources for applying TDD for data analysis or ETL workflows? Specifically here with something that is dataframe friendly. Love the idea of TDD, but every tutorial I've found on testing is focused on objects and other applications so I'd rather not try and reinvent my own data testing process if there's a really great and easy way to do it while staying in the dataframe abstraction or pandas-world.

Second questiom, is there a pandas dataframe-friendly wrapper someone has made for interacting with APIs like the requests library but abstracted with some helper functions for iterating through columns? Or are people just using pd.apply() and pd.map() on a dataframe column they've written into a payload?

Still rather new with pandas and the little footguns I can see with accidentally using a method that results in a column being processed without maintaining order or key:value relationship with a unique identifier.

If there was a declarative dataframe oriented package that just let me do something like.

Python code:

url = https://api.foo.baz.com

processed_df = dataframerequests(df['column2], append = 'True') 

processed_df.head()

Where 'column2' is a column from pandas dataframe that I want to POST to an API for it to process in some way (translation or whatever) and the append boolean tells it to append a new column rather than replacing it with the results.

With requests I always feel uneasy or unsure of how im POSTing a dataframe column as a dict, then adding that results dict back as a new column.

Totally get why from a dev perspective all this 'everything as a REST API' microservices makes sense, but been finding it difficult for us dumbass data analysts who mainly work with SQL calls to a db to adapt.

cue Hank Hill meme:

"Do I look like I know what the hell a JSON is? I just want a table of a goddamn hotdog"

Adbot
ADBOT LOVES YOU

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Falcon2001 posted:

Now that it's not as late and I'm not as angry, I think you're right. I got caught up in the whole 'won't work for embedded C libraries' and kind of forgot none of my proposed solutions would do that either so it wasn't really worth worrying about.

Well, this got interesting once I started actually implementing this.

I'm using this context manager class example (https://stackoverflow.com/a/24583265) , and immediately realized I'm going to get into some wonky stuff. The basic problem is that sys.stdout and sys.stdin both implement a lot more methods than just read and write, especially for stuff like progressbars/etc, which involve modifying the screen buffer directly, and the way that class is implemented could cause some weird errors if those methods are called.

I could possibly do something like that where I basically make my entire own copy of sys.stdout's code and tack on my logger forking into it, which is one of the solutions on SO, but doing it for all three turns into a lot of code, and on top of all that, that code has it's own problems. For example, how do you interpret something like a buffer read (such as when writing a growing progress bar to stdout?) or implement other TTY features? Do you just straight modify old code? What if someone calls something like a clear command? Dump the entire data array?

Those are all fully answerable questions, but it starts turning into a lot of work for edge cases, so I think I'll probably just double check with my team and stick with the original drop-in wrapper methods approach. We don't use any other libraries that do things like print stuff directly to the screen, and we're unlikely to with this project, so reworking the requirements to 'create functions that can capture IO' and recommend them for use seems like a reasonable path forward. We won't capture everything, but we'll capture enough to record what was shown to the customer and what they entered, as well as the parameters to the program, all of which should provide audit logging for review and investigation if someone runs into a problem.

Edit: I did implement this to test out of curiosity, and it does technically work with Click, but it seems to break most of the nicer functionality in the library, as it detects it's not dealing with a tty and so progressbars just disappear, and colors/etc also disappear (which again, makes sense since I'm just stripping data out.) Obviously none of that is needed in the logs, but yeah, would degrade our program in a number of small ways, just to avoid making sure people use a wrapper utility function.

necrotic posted:

For terminals those methods would write out control characters to achieve thst behavior. Pythons stdin itself isn’t managing those specifics, the output terminal does via specific characters that control the terminal.

I would expect those to all go through the write method, though I’m phone posting so can’t confirm what stdin is an instance of. My expectation is that a write method is the core or what’s needed.

And if that’s the case, your tee class may be able to inherit from whatever base stdin is an instance and just overwriting the write method would capture all written text.

Then you have control characters to deal with, but any other interception method would have to as well.

Ah, I had completely forgotten about control characters (I don't do a lot of deep dives into CLI stuff very often). Looking into Click's code, it seems to do a lot of internal logic to work with various terminals, and so it seems to be catching one of those things to degrade gracefully (which is nicer than crashing!)

This seems like probably I was overestimating how simple it was to 'just grab stdout' in my initial assessment; control characters and everything else are things that I'd have to parse in some way, but for output, all I care about is the original message we told the library to show to the user; I don't necessarily need to capture the exact bytes that were sent to stdout.

Another example is that click.confirm asks a user to confirm something. We care about the message the user is given (which we pass in as a string, which is easily forked) but Click coerces any number of responses into a boolean; again, I don't really care if they answered 'Y' or 'y' or just hit enter, or answered 'sure'; so we just record that the user accepted the prompt by interpreting the boolean response.

Falcon2001 fucked around with this message at 02:06 on Jun 5, 2023

necrotic
Aug 2, 2005
I owe my brother big time for this!
For terminals those methods would write out control characters to achieve thst behavior. Pythons stdin itself isn’t managing those specifics, the output terminal does via specific characters that control the terminal.

I would expect those to all go through the write method, though I’m phone posting so can’t confirm what stdin is an instance of. My expectation is that a write method is the core or what’s needed.

And if that’s the case, your tee class may be able to inherit from whatever base stdin is an instance and just overwriting the write method would capture all written text.

Then you have control characters to deal with, but any other interception method would have to as well.

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

Oysters Autobio posted:

Sometimes I want to go back to R and it's tidyverse but everything at work is python so I know I don't have the same luxury.

Two questions here, first, anyone have good resources for applying TDD for data analysis or ETL workflows? Specifically here with something that is dataframe friendly. Love the idea of TDD, but every tutorial I've found on testing is focused on objects and other applications so I'd rather not try and reinvent my own data testing process if there's a really great and easy way to do it while staying in the dataframe abstraction or pandas-world.

Second questiom, is there a pandas dataframe-friendly wrapper someone has made for interacting with APIs like the requests library but abstracted with some helper functions for iterating through columns? Or are people just using pd.apply() and pd.map() on a dataframe column they've written into a payload?

Still rather new with pandas and the little footguns I can see with accidentally using a method that results in a column being processed without maintaining order or key:value relationship with a unique identifier.

If there was a declarative dataframe oriented package that just let me do something like.

Python code:
url = https://api.foo.baz.com

processed_df = dataframerequests(df['column2], append = 'True') 

processed_df.head()
Where 'column2' is a column from pandas dataframe that I want to POST to an API for it to process in some way (translation or whatever) and the append boolean tells it to append a new column rather than replacing it with the results.

With requests I always feel uneasy or unsure of how im POSTing a dataframe column as a dict, then adding that results dict back as a new column.

Totally get why from a dev perspective all this 'everything as a REST API' microservices makes sense, but been finding it difficult for us dumbass data analysts who mainly work with SQL calls to a db to adapt.

cue Hank Hill meme:

"Do I look like I know what the hell a JSON is? I just want a table of a goddamn hotdog"

I'm having a hard time parsing what you're actually trying to do here because this post is jumping around all over the place.

Test-driven development is about defining the desired behaviour of your code by creating tests before you begin writing the code. Determining exactly what should constitute a 'testable unit' is a skill in itself, but as a simple example if part of the 'T' in your ETL involves ingesting a pandas dataframe with a string column and converting that column to floats, you might create a test that passes in mock input data to (part of) your transformation code and succeeds if the column in the output has the correct type. Similarly you might want verify how your code handles invalid inputs so you might create further tests that pass in mock data that contains errors and succeed iff the error is handled correctly.

Applying this paradigm to ETL is in principle no different - you define the desired behaviour through creating tests for that behaviour. One complication though is that it can be (very) difficult to define useful testable units where you are dependant on external systems - the E and the L in ETL, and I for one usually don't bother.

AFAIK there is no generic package that saves you the work of doing

Python code:
import requests
import pandas as pd

url="https://fart.butt.com"
response = requests.get(url)
df=pd.read_json(response.text)
It would be very difficult to create such an interface - the JSON structure of an HTML response is very flexible and will not generally be convertible to a dataframe. You need to know the specific data structure returned by the API in order to parse it correctly.

Oysters Autobio posted:

cue Hank Hill meme:

"Do I look like I know what the hell a JSON is? I just want a table of a goddamn hotdog"

If you want to work with web APIs you need to know what JSON is.

dorkanoid
Dec 21, 2004

Most of Pandas' read_X() functions take URLs without complaining, by the way - and even gzipped files. However, for some structures of JSON/XML it can be hard to wrangle the reader to get the object you want.

Big Dick Cheney
Mar 30, 2007
json_normalize() works...sometimes

Oysters Autobio
Mar 13, 2017

DoctorTristan posted:

I'm having a hard time parsing what you're actually trying to do here because this post is jumping around all over the place.

I'm sorry you're completely right. Thanks for being so patient and kind with your response. Bit flustered as you could tell.

quote:


Test-driven development is about defining the desired behaviour of your code by creating tests before you begin writing the code. Determining exactly what should constitute a 'testable unit' is a skill in itself, but as a simple example if part of the 'T' in your ETL involves ingesting a pandas dataframe with a string column and converting that column to floats, you might create a test that passes in mock input data to (part of) your transformation code and succeeds if the column in the output has the correct type. Similarly you might want verify how your code handles invalid inputs so you might create further tests that pass in mock data that contains errors and succeed iff the error is handled correctly.

Applying this paradigm to ETL is in principle no different - you define thdesired behaviour through creating tests for that behaviour. One complication though is that it can be (very) difficult to define useful testable units where you are dependant on external systems - the E and the L in ETL, and I for one usually don't bother.


Thanks for the detail on this, and your example is right on what I'm looking at. My biggest confusion has been how to define testable units when handling dataframes but I didn't realize it could be as simple as what you said about testing a column. I just need to find some examples or tutorials of what this looks like in pandas or pyspark.

quote:

AFAIK there is no generic package that saves you the work of doing

Python code:
import requests
import pandas as pd

url="https://fart.butt.com"
response = requests.get(url)
df=pd.read_json(response.text)
It would be very difficult to create such an interface - the JSON structure of an HTML response is very flexible and will not generally be convertible to a dataframe. You need to know the specific data structure returned by the API in order to parse it correctly.

Sorry let me specify what I mean here a bit better. Consider in my case that dataframes (and it's rows and columns) are the only data structures that would ever be used with this web API.

I'd like to make or use an interface for a user who:

- has a dataframe
- selects a column
- column is iterated over with each row being POSTed to a web API
- results are appended as a new column in the same order so they match

Consider a scenario like a translation API, and I want to translate each row of data in a single column and have the translation appended as a new column with the rows matching. Because translation using ML based models are influenced by all of the tokens being passed through it, I can't send the entire column as a single JSON array and need to send and return the data row by row.

quote:

If you want to work with web APIs you need to know what JSON is.

I don't *want* to work with web APIs, I'm *forced* to work with web APIs because most of the devs supporting our data analyst teams have never supported a data warehouse and only understand making web apps and microservices for everything.

I wouldn't be doing this myself in the first place if we had someone responsible for ETL and having the data available in an analytically useful format alongside other data I need to analyze it with.

I understand JSON a little myself but I have other analyst colleagues who like me only know SQL and a little bit of python, so I figured hey if I have to figure out how to do this why not spend the effort taking that knowledge and making it a little more reproducible so in the future it's super easy to do each time with a new dataframe.

Oysters Autobio fucked around with this message at 16:05 on Jun 10, 2023

QuarkJets
Sep 8, 2008

Oysters Autobio posted:

Sorry let me specify what I mean here a bit better. Consider in my case that dataframes (and it's rows and columns) are the only data structures that would ever be used with this web API.

I'd like to make or use an interface for a user who:

- has a dataframe
- selects a column
- column is iterated over with each row being POSTed to a web API
- results are appended as a new column in the same order so they match

Consider a scenario like a translation API, and I want to translate each row of data in a single column and have the translation appended as a new column with the rows matching. Because translation using ML based models are influenced by all of the tokens being passed through it, I can't send the entire column as a single JSON array and need to send and return the data row by row.

I don't *want* to work with web APIs, I'm *forced* to work with web APIs because most of the devs supporting our data analyst teams have never supported a data warehouse and only understand making web apps and microservices for everything.

I wouldn't be doing this myself in the first place if we had someone responsible for ETL and having the data available in an analytically useful format alongside other data I need to analyze it with.

I understand JSON a little myself but I have other analyst colleagues who like me only know SQL and a little bit of python, so I figured hey if I have to figure out how to do this why not spend the effort taking that knowledge and making it a little more reproducible so in the future it's super easy to do each time with a new dataframe.

So you need a function that takes a row of data and posts it to a web API? Sounds like you've basically figured out what you need to do for your TDD implementation, I think what you're saying is:

1. Define a pandas dataframe
2. Iterate over rows
3. Translate each row into JSON. Basically, this is a dict that's {column1: value1, column2: value2, ...}?
4. Post that JSON to a web API

If you use dependency injection, then you can replace the real web API with a test fixture that verifies that the JSON messages match what's in the dataframe.

I'm not a pandas whiz so I don't know if there's already some library out there that makes this easier. Here's how I'd do it:

Python code:
for row in df.itertuples():
    message = json.dumps(row._asdict())
    # post the message someplace

a dingus
Mar 22, 2008

Rhetorical questions only
Fun Shoe
What do you do if a post fails? Can you retry without altering the state of the model? Will you have to start all over again? Sucks that you have to send it row-by-row. Our QA team did something similar to test a chat bot we were developing

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

I can’t really add much to what to what the two posters above me have said - using a loop to iterate over the rows is how I’d do it as well. Only thing I’d add is that if you want to add the results back into the data frame as a new column don’t do that during the loop itself - store them in a list then add that as a new column once you’ve processed every row.

Macichne Leainig
Jul 26, 2012

by VG
You can also define the new column on the dataframe ahead of time and then just do like row.field = value as well, but now we're getting down into "how do you prefer to write your pandas"

However I've always been doing df.iterrows for iteration so nice to know about some other options

Macichne Leainig fucked around with this message at 16:31 on Jun 13, 2023

Oysters Autobio
Mar 13, 2017

Macichne Leainig posted:

You can also define the new column on the dataframe ahead of time and then just do like row.field = value as well, but now we're getting down into "how do you prefer to write your pandas"

Absolutely nothing wrong with sharing (at least to me that is, lol) advice on "how do you prefer to write your pandas". It's much appreciated.

The one part I'm not sure of is how to ensure or test to make sure that the new column I've created is "properly lined up" with the original input rows. Can I somehow assign some kind of primary key to these rows so then when my new column is added I can quickly see "ok awesome, input row 43 from original column lines up with output row 43."

Does that make sense at all? Or am I being a bit to overly concerned here? I'm still fairly new to Python coming from mainly a SQL background so the thought of generating a list based off inputs from a column and then attaching the new outputs column and knowing 100% that they match is for some reason a big concern of mine.

Oysters Autobio fucked around with this message at 03:06 on Jun 15, 2023

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Oysters Autobio posted:

Absolutely nothing wrong with sharing (at least to me that is, lol) advice on "how do you prefer to write your pandas". It's much appreciated.

The one part I'm not sure of is how to ensure or test to make sure that the new column I've created is "properly lined up" with the original input rows. Can I somehow assign some kind of primary key to these rows so then when my new column is added I can quickly see "ok awesome, input row 43 from original column lines up with output row 43."

Does that make sense at all? Or am I being a bit to overly concerned here? I'm still fairly new to Python coming from mainly a SQL background so the thought of generating a list based off inputs from a column and then attaching the new outputs column and knowing 100% that they match is for some reason a big concern of mine.

Does the order matter for your target API? Most stuff I'm familiar with is dealing in JSON where you've just got a dictionary, so it doesn't matter if your new column is in the right 'place'.

If it does matter, then figure out where it needs to matter and test off that; such as after you serialize it or whatever, check that the nth index is whatever you're expecting.

QuarkJets
Sep 8, 2008

Oysters Autobio posted:

Absolutely nothing wrong with sharing (at least to me that is, lol) advice on "how do you prefer to write your pandas". It's much appreciated.

The one part I'm not sure of is how to ensure or test to make sure that the new column I've created is "properly lined up" with the original input rows. Can I somehow assign some kind of primary key to these rows so then when my new column is added I can quickly see "ok awesome, input row 43 from original column lines up with output row 43."

Does that make sense at all? Or am I being a bit to overly concerned here? I'm still fairly new to Python coming from mainly a SQL background so the thought of generating a list based off inputs from a column and then attaching the new outputs column and knowing 100% that they match is for some reason a big concern of mine.

You can access rows by index if you want to test specific values are in specific places

Is there a chance that you could be seeing the rows in random order instead of sequentially? If that's a concern then just assign to the rows directly after allocating the new column

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

Oysters Autobio posted:

Absolutely nothing wrong with sharing (at least to me that is, lol) advice on "how do you prefer to write your pandas". It's much appreciated.

The one part I'm not sure of is how to ensure or test to make sure that the new column I've created is "properly lined up" with the original input rows. Can I somehow assign some kind of primary key to these rows so then when my new column is added I can quickly see "ok awesome, input row 43 from original column lines up with output row 43."

Does that make sense at all? Or am I being a bit to overly concerned here? I'm still fairly new to Python coming from mainly a SQL background so the thought of generating a list based off inputs from a column and then attaching the new outputs column and knowing 100% that they match is for some reason a big concern of mine.

Sounds like something to write a unit test for.

Less flippantly, if you create the new column data by iterating over the rows in the data frame df and don’t perform any operation on df that change the row order, then the ordering of both will be consistent and it should be safe to assign the new column to the dataframe.

There’s no need to create a primary key column in a dataframe as pandas already does that for you (it’s called the index).

Lastly I would warn that there are some downsides to using pandas in ETL pipelines; some of its convenience features around inferring data types can behave inconsistently even on very similar datasets, and the developers absolutely love introducing breaking changes. If you do want to use it then at least make sure you’ve pinned the version in whatever environment you’re running it in.

Windows 98
Nov 13, 2005

HTTP 400: Bad post
Hello Python thread. I haven't read all 207 pages but I look forward to hanging in here with y'all in the future. I am a python developer / devops professionally these days. 7 years experience coding, only the most recent 2 in python. Feels like every day I learn something new about it. I don't know pretty much anyone other than coworkers who code, so it will be fun to have an intelligent conversation with some people who know what's going on.

I've been working on a cool little Flask framework lately. I got tired of having side projects always needing a ton of leg work to get going. So I figured I could create a baseline package that has core framework functionality stuff. Config, encryption, seeding, migrations, permissions, etc. I've been doing some real nifty stuff with inheritance building a base model class. As long as you inherit it you're class will get out of the box CRUD, controllers, routes, logging and some other fun stuff. When I have it a little further along I would love to show you guys the repo.

QuarkJets
Sep 8, 2008

Sounds good, this thread is cool people are happy to give feedback or pats on the back or whatever you happen to want

In other news I saw a real life python at the zoo and thought of you, thread

Average Lettuce
Oct 22, 2012


Windows 98 posted:

As long as you inherit it you're class will get out of the box CRUD, controllers, routes, logging and some other fun stuff. When I have it a little further along I would love to show you guys the repo.

Out of curiosity: Any reason to not use Django + DRF?

Windows 98
Nov 13, 2005

HTTP 400: Bad post

Average Lettuce posted:

Out of curiosity: Any reason to not use Django + DRF?

I like how lightweight Flask is. It's nice to leverage the werzeug routes, and SQLAlchemy. I am not really opposed to Django. I just haven't ever used it. I used to work a lot in Laravel when I was coding with PHP and was a big fan. I only switched to Python because of the job I got hired for a few years ago. We have our own ERP framework we use at work, so the actual pressing need to learn Flask or Django was never there.. So all my Python experience outside of actual work has been side projects and stuff for fun. I picked Flask because it was recommended as the most lightweight and non invasive framework, and it seems to be correct. I also like how theres a lot of little Flask specific packages floating out there.

I probably should check out Django, you're right. There is a certain amount of joy I get working on reinventing the wheel a bit, and doing it my special way that I wrote myself.

Windows 98 fucked around with this message at 21:06 on Jun 21, 2023

Hed
Mar 31, 2004

Fun Shoe

Windows 98 posted:

I like how lightweight Flask is. It's nice to leverage the werzeug routes, and SQLAlchemy. I am not really opposed to Django. I just haven't ever used it. I used to work a lot in Laravel when I was coding with PHP and was a big fan. I only switched to Python because of the job I got hired for a few years ago. We have our own ERP framework we use at work, so the actual pressing need to learn Flask or Django was never there.. So all my Python experience outside of actual work has been side projects and stuff for fun. I picked Flask because it was recommended as the most lightweight and non invasive framework, and it seems to be correct. I also like how theres a lot of little Flask specific packages floating out there.

I probably should check out Django, you're right. There is a certain amount of joy I get working on reinventing the wheel a bit, and doing it my special way that I wrote myself.

Wait, there’s a Python ERP framework?

Windows 98
Nov 13, 2005

HTTP 400: Bad post

Hed posted:

Wait, there’s a Python ERP framework?

https://github.com/odoo/odoo

Hed
Mar 31, 2004

Fun Shoe
Holy crap. On one hand an ERP where you don’t have a vendor to yell at sounds psychotic but that WMS doesn’t seem too bad from the screenshots.


I might have to implement some light double-entry accounting patterns in my Django app soon so I’m looking around for some modules. Thanks for sharing.

oatmealraisin
Feb 26, 2023

Windows 98 posted:

I've been working on a cool little Flask framework lately.

Have you tried out FastAPI?

Windows 98
Nov 13, 2005

HTTP 400: Bad post

oatmealraisin posted:

Have you tried out FastAPI?

I have heard it mentioned but I have never actually looked into it. I am assuming it is also a lightweight framework like Flask? What advantages would it have if I swapped?

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Flask is lightweight and lets you bolt lots of things onto it, which means it's more flexible, but some things are more repetitive. FastAPI is specifically for writing Rest APIs, so there's less boilerplate involved, but it lacks some of Flask's flexibility.

A bit over a year ago I did a rewrite of an internal RestAPI tool we use from Groovy into Python. I'd done a bit with Flask at that point and so started to look at that, and Flask was able to do it, but I also spent a few days running through a FastAPI tutorial and decided to do my rewrite in that instead, as it was tailored to exactly what I was doing.

FastAPI (via Starlette) can do templating using something like Jinja2, but I don't have any direct experience with that to compare it to Flask, if that's what you're doing.

Windows 98
Nov 13, 2005

HTTP 400: Bad post

FISHMANPET posted:

Flask is lightweight and lets you bolt lots of things onto it, which means it's more flexible, but some things are more repetitive. FastAPI is specifically for writing Rest APIs, so there's less boilerplate involved, but it lacks some of Flask's flexibility.

But doctor... I already wrote CRUD RESTful APIs that auto generate as long as the model exists. I wrote a bunch of code to auto register blueprints and the API endpoints and then it is used as a mixin class on the base model class. It's actually really nifty. It completely eliminates the need for repetitive code. If you want a model you just inherit the Model class, and controllers, routes, CRUD, factories, seeders, all get auto generated. I also have SQLAlchemy decoupled from Flask and running raw on its own, so you can use the same framework for static non-Flask scripts as well as Discord bots. And any additional endpoints or modifications get solved with a little polymorphism to inject what you need into the routes/controllers/models + a super call.

That's what I meant by reinventing the wheel. It feels really nice to accomplish core framework functionality knowing I wrote it myself. If I had a very serious project I would maybe consider a more established framework. I just have a lot of fun doing it :)

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Ease of development aside, the big difference between FastAPI and Flask is that FastAPI is asynchronous, and Flask is not. Which may or may not matter to you.

Strictly speaking, FastAPI and Flask probably aren't really "equivalent" tools, Flask is probably more equivalent to Starlette, which is the framework that FastAPI is built on. You've probably built something more equivalent to FastAPI on top of Flask.

This really sounds like a case where momentum is a key deciding factor. The calculation would be different if you were starting from nothing, and maybe if you'd built all that framework on top of FastAPI instead of Flask the final product would be "better" but I'm not sure it would be "better" enough to warrant throwing away what you've done and re-implement it in FastAPI.

Then again, I'll say I've had a blast developing in FastAPI, and maybe you'd have fun trying it out. It sounds like you've built out functionality (like auto-generation around models) that isn't in FastAPI, so maybe you'd have a blast porting that over to work with FastAPI.

I've structured my app so that all the data interaction happens outside the "routes" so I could, if I wanted to, pretty easily port that stuff to some other tool as well, so that's certainly still possible with FastAPI.

Macichne Leainig
Jul 26, 2012

by VG
Just gonna plug Litestar (formerly Starlite) here, it's a lighter ASGI framework built on top of Starlette that might be more palatable for some:

https://github.com/litestar-org/litestar

Windows 98
Nov 13, 2005

HTTP 400: Bad post

FISHMANPET posted:

I've structured my app so that all the data interaction happens outside the "routes" so I could, if I wanted to, pretty easily port that stuff to some other tool as well, so that's certainly still possible with FastAPI.

I have done that as well. The CRUD class and QueryFilter class are pretty tightly coupled but other than that I can dump these classes in any project. That was a high priority for me, that the framework could work independent of Flask itself. The routes more or less exist as a point of entry and to grab the request data and pass it to a controller. Controllers are responsible for data validation and doing any kind of middleware stuff (permission checks). Models are responsible for doing the actual leg work, always operating on the assumption the data has been sanitized and properly formatted. Well, there is exception checks, but they aren't particularly in depth. Just some bare exception capturing.

I also made it so you can pass data into these models in basically any form. You can pass keyword arguments, positional arguments, dictionaries (nested as well), lists, strings, all numeric types, poo poo basically anything other than an object (working on that!) and it will process it, at least related to CRUD. The GET/UPDATE/DELETE uses the inputs as a form of filtering results (UPDATE has an additional positional argument for the actual data). If you pass no filters it just works against all records. The CREATE works both with raw instantiated classes like ModelClassName(**kwargs), or any of those previously mentioned input data types, and it also works with ModelClassName.create(**kwargs). So you can get real flexible with how you like to design.

Database version control is also agnostic and independent from Flask. Which is to say it doesn't rely on Flask-Migrate (so you can use the framework sans-Flask) and uses Alembic raw. I was really on the fence about whipping up werzeug solo as well and abandoning Flask entirely in favor of my own stuff. But then after some evaluation I decided I don't really want to handle all the internal workings of the Bluerpints and sessions and all that bullshit. Maybe one day I will start substituting pieces at a time to get off Flask but it's not particularly feasible for a few parts of the functionality. I am not an expert on whats happening in the ORM layer of SQLAlchemy and middleware of Flask which may be doing a lot of stuff I wouldn't even think I need to implement.

I also built an entire console/file logging system. All the models and pieces of the application self-log automatically. Even custom classes and models. So when you are developing using it you don't need to worry about any of that, it will just handle it right out of the box. It's all neat and tidy too. All the external libraries I am leveraging get caught in the logging class and formatted to be uniform. That works for any new libraries you introduce as well. Just happens on it's own.

I have a whole config system set up thats also mostly set up with a bunch of mixin config classes. So you can maneuver through config in the app with ease. That being said, I do have some bash script stuff that you actually run the framework from (as opposed to a flask run or python3 -m run.py and so on). The bash does some stuff like auto instaling pip packages anywhere in the project. It also finds any .conf files and auto sets them up as environment variables. All the configs that use these then have fall back default values inside the python. So the idea is that when you download the framework you can immediately just start developing packages and dump them in the Addons folder in the project directory. It will scan that poo poo for any conf or requirements files. Get all that stuff set up when the framework boots. So you really truly do not have to touch any core code in the framework. Dump your poo poo in the addons folder and start doing fun things.

Essentially the request, controller, or raw direct action through the class can take any input and work with it. Everything is agnostic and framework independent. Everything auto generates but can also be customized. I also have been doing things really by the book. Jira project, ERDs, class diagrams, logic swim lanes, etc. Most of the process is modeled after how I handle things at work. At work I am kind of a dual role where I am the devops guy handling all the code migrations between environments, deployment, CICD stuff, and all that. But I also do a little development work on some of the more trivial tickets that i can sneak in between the other stuff. I don't exactly have a good reason for doing it other than that's just how I operate. Otherwise it turns to spaghetti in a matter of days. I even have everything fully doc stringged, commented, and type hinted for preparation for Shpinx and distrubution of docs in case anyone else wants to use it one day. It's a very redundant project in the larger scope of things but it's a ton of fun to work on.

Windows 98 fucked around with this message at 20:57 on Jun 22, 2023

Data Graham
Dec 28, 2009

📈📊🍪😋



I mean, it isn't too terribly common for someone to come in posting "I have a cool independent project that I think solves things in a unique way and I'm thinking about putting it out there as an alternative to the established big players, y/n?" as opposed to "I am trying to solve X problem, ideas?"

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Yeah I'd be curious about taking a look if you have a public repo.

Windows 98
Nov 13, 2005

HTTP 400: Bad post
I do have a public repo. I will link it soon. I have a bunch of stuff I want to finish up so it's not quite as janky if someone downloads it and tries it out. Also it's under my github name which is just my name and I am not trying to dox myself. So I need to create an organization or alt repo or something. I normally work on it M-W-F from 6pm-10pm EST. I work with a friend/mentee who I am showing how to code. Maybe I can set up a stream so yall can see it and watch it get worked on.

duck monster
Dec 15, 2004

FISHMANPET posted:

Ease of development aside, the big difference between FastAPI and Flask is that FastAPI is asynchronous, and Flask is not. Which may or may not matter to you.

Strictly speaking, FastAPI and Flask probably aren't really "equivalent" tools, Flask is probably more equivalent to Starlette, which is the framework that FastAPI is built on. You've probably built something more equivalent to FastAPI on top of Flask.

This really sounds like a case where momentum is a key deciding factor. The calculation would be different if you were starting from nothing, and maybe if you'd built all that framework on top of FastAPI instead of Flask the final product would be "better" but I'm not sure it would be "better" enough to warrant throwing away what you've done and re-implement it in FastAPI.

Then again, I'll say I've had a blast developing in FastAPI, and maybe you'd have fun trying it out. It sounds like you've built out functionality (like auto-generation around models) that isn't in FastAPI, so maybe you'd have a blast porting that over to work with FastAPI.

I've structured my app so that all the data interaction happens outside the "routes" so I could, if I wanted to, pretty easily port that stuff to some other tool as well, so that's certainly still possible with FastAPI.


Half the magic of FastAPI is the loving flawless integration with Pydantic the magic that happens when you actually use the new Python Type hinting stuff properly, and suddenly your APIs are self documenting with OpenAPI (fun fact: There are library generators that can spit out a custom library to talk to your API in about 20 different languages, IF you set up your types corrently (otherwise it'll be a total trashfire.)

That said, I still have a soft spot for Django/DRF. Its a little more boilerplatey but when that puppy works, it works *real* well. Also: Djangos ORM is loving stellar. But FastAPI is so fast to work with (I you have the patience to work with SQLAlchemy)

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I've been using SQLModel, which is written by the same guy that made FastAPI. Basically it mixes Pydantic and SQLAlchemy together into a single object. So I can define a class that will be both a Pydantic model with all the typing and validation that entails, as well as a database table. Though, in practice, I've found it's not always as simple as it sounds. There are quite a few cases where the model you store in the database is not the model you want to present to the user, so you still end up writing two things, one a SQLModel object that defines the database, and another a Pydantic BaseModel object that you can present to the user.

neosloth
Sep 5, 2013

Professional Procrastinator
FastAPI is a joy to work with, I was able to generate a fully documented API from a dynamic dataclass definition and then generate a clojure client library for it in a couple of lines of code. Super impressive stuff

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Continuing the Flask vs something else discussion, if you were to build a Python webapp where you were rendering and serving HTML with something like Jinja2, what would you recommend as a framework that supports async? Starlette seems like the closest equivalent to Flask as a "micro-framework". However because of the nature of ASGI, something like FastAPI just adds on to Starlette, so you can use it to render and serve Jinja2 templates just as well as Starlette, with the added bonus of routes being defined in a way I'm already familiar with (since I'm familiar with FastAPI). Is there another framework built on Starlette that would be better suited to rendering and serving HTML?

Windows 98
Nov 13, 2005

HTTP 400: Bad post
These days Flask can support async
https://testdriven.io/blog/flask-async/

Cyril Sneer
Aug 8, 2004

Life would be simple in the forest except for Cyril Sneer. And his life would be simple except for The Raccoons.
Hey guys. To what extend can these various web frameworks be used to control local hardware?

Let me set up the problem I'm trying to solve. I work in a facility where Technician Bob might want to interface with hardware X, and Technician Sam might want to interface with hardware Y. The way it works right now is someone like me physically accesses Bob's laptop and installs whatever Python Stuff (environment, scripts) is needed for interfacing with hardware X. Then, someone like me gets ahold of Sam's computer and installs whatever Python Stuff is needed for him to interface with hardware Y.

I was thinking it would be really cool if instead Bob, Sam, and whoever else could simply access an internal webapp that provided the necessary functionality. I know that you can control hardware via a web interface but (and I'm going to bungle the phrasing here) where I've seen this, its external hardware connected to a server, and the server provides a remote user access to the local hardware. What I'm thinking is a bit of an inversion of this -- user connects their laptop to the hardware, loads the appropriate site, and then that "remote" site enables control of the local hardware (all of this would be fully internal).

My coding background is primarily in DSP/algorithm development/embedded processing so this webapp stuff is all a bit foreign to me.

spiritual bypass
Feb 19, 2008

Grimey Drawer
Replace that laptop with something permanently attached to the machine that runs a web server. The next hoop is connecting that little server to the company network so people can hit the webapp.

Is it acceptable for any user on the network to control the machine? If not, you'll need to add authentication

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Cyril Sneer posted:

Hey guys. To what extend can these various web frameworks be used to control local hardware?

Let me set up the problem I'm trying to solve. I work in a facility where Technician Bob might want to interface with hardware X, and Technician Sam might want to interface with hardware Y. The way it works right now is someone like me physically accesses Bob's laptop and installs whatever Python Stuff (environment, scripts) is needed for interfacing with hardware X. Then, someone like me gets ahold of Sam's computer and installs whatever Python Stuff is needed for him to interface with hardware Y.

I was thinking it would be really cool if instead Bob, Sam, and whoever else could simply access an internal webapp that provided the necessary functionality. I know that you can control hardware via a web interface but (and I'm going to bungle the phrasing here) where I've seen this, its external hardware connected to a server, and the server provides a remote user access to the local hardware. What I'm thinking is a bit of an inversion of this -- user connects their laptop to the hardware, loads the appropriate site, and then that "remote" site enables control of the local hardware (all of this would be fully internal).

My coding background is primarily in DSP/algorithm development/embedded processing so this webapp stuff is all a bit foreign to me.

If Python code can run the hardware, you can very easily call whatever function you’ve already written.

Computer hardware is cheap compared to technician time so I’m assuming you’re putting a dedicated laptop on the hardware.

There’s a gotcha to this if you’re new which is timeout. You can make long running calls, but they should return something before the timeout. Typical timeouts are 10s-30s. Obvs then you can poll for status or push when the long running action is complete.

Dash is basically tailor made for this. It has some long running process functionality built in

CarForumPoster fucked around with this message at 22:23 on Jul 5, 2023

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply