Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Speaking of comprehensions, is there a Pythonic way to populate a list with elements from an iterator until the list hits a certain size? Like, the naive implementation (with the desired list size of 20 in this example) is:
code:
good_stuff = []
for element in stuff_of_potential_interest:
    if len(good_stuff) >= 20:
        break
    if meets_filtering_criteria(element):
        good_stuff.append(element)
But I have some functions that have lots of layers of nesting and filtering and I’m trying to flatten them as much as possible. Having all these if statements with break gets old, and while loops (e.g., while len(good_stuff) < 20…) don’t help the nesting issue much.

I tried looking into some lesser-used functions in itertools but didn’t see anything that jumped out as being appropriate. :shrug:

Zugzwang fucked around with this message at 05:16 on Feb 15, 2023

Adbot
ADBOT LOVES YOU

i vomit kittens
Apr 25, 2019


Zugzwang posted:

Speaking of comprehensions, is there a Pythonic way to populate a list with elements from an iterator until the list hits a certain size? Like, the naive implementation (with the desired list size of 20 in this example) is:
code:
good_stuff = []
for element in stuff_of_potential_interest:
    if len(good_stuff) >= 20:
        break
    if meets_filtering_criteria(element):
        good_stuff.append(element)
But I have some functions that have lots of layers of nesting and filtering and I’m trying to flatten them as much as possible. Having all these if statements with break gets old, and while loops (e.g., while len(good_stuff) < 20…) don’t help the nesting issue much.

I tried looking into some lesser-used functions in itertools but didn’t see anything that jumped out as being appropriate. :shrug:

Is there a reason islice wouldn't work or did you just miss it when going through itertools?

StumblyWumbly
Sep 12, 2007

Batmanticore!

Soylent Majority posted:

Thats the basic poo poo I need - now I have something far less dumb:

Lists and iterators are absolutely massive and central to Python. When you're trying to do something in Python, first say "how can I use lists here", and you'll probably come up with a better solution.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

i vomit kittens posted:

Is there a reason islice wouldn't work or did you just miss it when going through itertools?
Hm. Maybe, if I build the filtering criteria into the iterator and turn it into a generator expression instead of a for/while loop. Thanks, will give that a shot. I’m trying to learn how more of these functional programming iteration constructs are used, and it still feels like dark magic.

Zugzwang fucked around with this message at 06:10 on Feb 15, 2023

QuarkJets
Sep 8, 2008

Zugzwang posted:

Speaking of comprehensions, is there a Pythonic way to populate a list with elements from an iterator until the list hits a certain size? Like, the naive implementation (with the desired list size of 20 in this example) is:
code:
good_stuff = []
for element in stuff_of_potential_interest:
    if len(good_stuff) >= 20:
        break
    if meets_filtering_criteria(element):
        good_stuff.append(element)
But I have some functions that have lots of layers of nesting and filtering and I’m trying to flatten them as much as possible. Having all these if statements with break gets old, and while loops (e.g., while len(good_stuff) < 20…) don’t help the nesting issue much.

I tried looking into some lesser-used functions in itertools but didn’t see anything that jumped out as being appropriate. :shrug:

One pythonic way is to use a generator that can touch all filtered elements and then use range(20) to limit the list size

Python code:

yield_good_stuff(stuff):
    for element in stuff:
        if meets_filtering_criteria(element):
            yield element
gen = yield_good_stuff(stuff)
[next(gen) for _ in range(20)]

You can condense this into a generator expression

Python code:

some_generator = (x for x in some_list if my_filter(x))
[next(some_generator) for _ in range(20)]

This may raise an exception if there are less than 20 valid elements; you can put a counter inside the generator if you want to limit the number of yields instead

QuarkJets fucked around with this message at 13:48 on Feb 15, 2023

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Zugzwang posted:

Speaking of comprehensions, is there a Pythonic way to populate a list with elements from an iterator until the list hits a certain size? Like, the naive implementation (with the desired list size of 20 in this example) is:
code:
good_stuff = []
for element in stuff_of_potential_interest:
    if len(good_stuff) >= 20:
        break
    if meets_filtering_criteria(element):
        good_stuff.append(element)
But I have some functions that have lots of layers of nesting and filtering and I’m trying to flatten them as much as possible. Having all these if statements with break gets old, and while loops (e.g., while len(good_stuff) < 20…) don’t help the nesting issue much.

I tried looking into some lesser-used functions in itertools but didn’t see anything that jumped out as being appropriate. :shrug:

yeah for this you should decompose things into smaller functions that take iterators and return iterables:
code:
def filter_good_stuff(potential_stuff: Iterable[Thing]) -> Iterator[Thing]:
    for element in potential_stuff:
        if meets_filtering_criteria(element):
            yield element


good_stuff = itertools.islice(filter_good_stuff(stuff_of_potential_interest), 20)
my usual goto for iterators is when i need to change the number of things in a sequence. altering them or something, i can just do an outer iteration and function calls

Soylent Majority
Jul 13, 2020

Dune 2: Chicks At The Same Time

QuarkJets posted:

Yeah this is much better

Do you need num_vals, e.g. is this for an assignment that says you have to have a num_vals? I like Zugzwang's suggestion but maybe you're required to have that num_vals input for some reason :shrug:
The exercise specified that it accepts one input at first to give the number of values that would be coming and each input was coming in individually so I think i kinda had to have it this way idk.

QuarkJets posted:

Do you know about list comprehensions yet? List comprehensions own. Try this poo poo:
code:
num_vals = int(input())
numbers = [float(input()) for _ in range(num_vals)]
normalized = [number / max(numbers) for number in numbers]
for y in normalized:
    print(f'{y:.2f}')

witchcraft - I'm guessing that'll come a little later here in babys first programming course, I'll keep an eye out for it since it seems a bit more intuitive.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Thanks for the iterator input, goons. That was exactly what I needed. Simpler, less verbose/nested code and with better results :buddy:

samcarsten
Sep 13, 2022

by vyelkin
Can anyone recommend a good tutorial on how to get Python to work with HTML? For my classes capstone project, we're making a self hosted calandar app and we're using python. I need to figure out how to link buttons and text boxes to python variables.

The Fool
Oct 16, 2003


Flask probably

e: a link - https://dev.to/nagatodev/getting-started-with-flask-1kn1

CarForumPoster
Jun 26, 2013

⚡POWER⚡

samcarsten posted:

Can anyone recommend a good tutorial on how to get Python to work with HTML? For my classes capstone project, we're making a self hosted calandar app and we're using python. I need to figure out how to link buttons and text boxes to python variables.


There are way easier ways to do this, dash + dash-bootstrap-components being the first of them where there's no HTML, its all Python.

Django if you have requirements to have like user auth and what not cause you can easily make HTML templates.

Flask is good if you want to make a geocities calendar site.

QuarkJets
Sep 8, 2008

Actually I would love to hear more about web hosting and python, I exclusively develop standalone applications but would like to try hosting some toy project that just lives on s3 or whatever

Fender
Oct 9, 2000
Mechanical Bunny Rabbits!
Dinosaur Gum

samcarsten posted:

Can anyone recommend a good tutorial on how to get Python to work with HTML? For my classes capstone project, we're making a self hosted calandar app and we're using python. I need to figure out how to link buttons and text boxes to python variables.

If you go with Django, I used to give this tutorial to students when I worked at a bootcamp https://tutorial.djangogirls.org/en/

lazerwolf
Dec 22, 2009

Orange and Black

QuarkJets posted:

Actually I would love to hear more about web hosting and python, I exclusively develop standalone applications but would like to try hosting some toy project that just lives on s3 or whatever

If it’s a static site, s3 hosting is fine. Anything that would require running a process would need some sort of server or lambda most likely. What’s your use case?

QuarkJets
Sep 8, 2008

lazerwolf posted:

If it’s a static site, s3 hosting is fine. Anything that would require running a process would need some sort of server or lambda most likely. What’s your use case?

:shrug:

I just want to throw something together using whatever is currently considered best practice. Maybe I could set up something static that just displays the last 10 pictures I dropped into a directory (e.g. I would generate html locally and then upload everything to s3, I guess?)

Macichne Leainig
Jul 26, 2012

by VG
Yeah anything that generates static HTML can just be dropped into an S3 bucket. Don't forget to check the website setting or w/e on S3

CarForumPoster
Jun 26, 2013

⚡POWER⚡

QuarkJets posted:

:shrug:

I just want to throw something together using whatever is currently considered best practice. Maybe I could set up something static that just displays the last 10 pictures I dropped into a directory (e.g. I would generate html locally and then upload everything to s3, I guess?)

So far IME the two easiest ways to deploy Python stuff are Heroku (see my other posts ITT for an example with Dash) and AWS lambda.

For lambda you can use AWS’ SAM CLI to just kinda set up everything for you including a docker image and dummy project. I highly recommend using the docker image method because this lets you have project up to 10GB. Otherwise with heroku or AWS lambda regular it is very easy to bump up against project size limits. Thus when it’s time to deploy your image recognition API, you find out that regular lambda and heroku are out.

A third does-it-for-you Python deployment method is Zappa, but because it uploads a zip file it has the same issue.

Edit I maybe misunderstood your ?. If your goal is to generate static html, ie not using HTML templates drawing from a database or from static JSON, then you can use S3.

Your HTML template files would be in the docker image, but static files you’d probably want to host externally Eg in an S3 bucket so that you can upload stuff.

CarForumPoster fucked around with this message at 14:35 on Feb 18, 2023

C2C - 2.0
May 14, 2006

Dubs In The Key Of Life


Lipstick Apathy
edit: nevermind. Just realized it's an issue with database entries.

C2C - 2.0 fucked around with this message at 19:35 on Feb 18, 2023

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

C2C - 2.0 posted:

edit: nevermind. Just realized it's an issue with database entries.

Your life will be easier if you normalize a tiny bit, and have a cities table and cafes table, then you can select cities by a city ID and not have to do string searching. At the very least, where you know exactly what city value you are searching for, you want to avoid using LIKE if you don't have to.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

C2C - 2.0 posted:

edit: nevermind. Just realized it's an issue with database entries.

If you have many repeats of the same city in a table and you want to prepopulate a list of cities you can make a call to get the unique list of the values of the cities in your table (IDK which DBs support this), you can cache them by generating a list that you update as appropriate or dynamically get all of them and then used python to just get the unique values.

C2C - 2.0
May 14, 2006

Dubs In The Key Of Life


Lipstick Apathy
I opened the .db in a viewer and noticed that a couple of times, when adding entries in my terminal via sqlite3, I added a space before the city names & that's why I was getting multiple entries. Once I corrected those, everything is working properly.

I'll do some more reading on databases; my guess is the suggestion to have 2 different tables would be considered "relational"?? I dunno. So if there was a 'cities.db' with a unique id per entry, I could somehow map that unique id to the city strings in 'cafes.db'?

QuarkJets
Sep 8, 2008

Sqlite lets you have multiple tables in a database, so you could just have 1 file.

You could have a Cafe table that has a City column, and you would designate that column as holding a Foreign Key in a Cities table. In most SQL engines, this will give you referential integrity: the City column is only allowed to contain entries that already exist in the Cities table, so if you try to use something else you'll just get an error instead.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

C2C - 2.0 posted:

I opened the .db in a viewer and noticed that a couple of times, when adding entries in my terminal via sqlite3, I added a space before the city names & that's why I was getting multiple entries. Once I corrected those, everything is working properly.

I'll do some more reading on databases; my guess is the suggestion to have 2 different tables would be considered "relational"?? I dunno. So if there was a 'cities.db' with a unique id per entry, I could somehow map that unique id to the city strings in 'cafes.db'?

Disclaimer: I am bad a DBs and I dont tend to access DBs in the way you are. (Theyre either built by migrations ala Django, wrapped in an API or I am pulling specific rows by some ID/query)

Agree with below:

QuarkJets posted:

Sqlite lets you have multiple tables in a database, so you could just have 1 file.

You could have a Cafe table that has a City column, and you would designate that column as holding a Foreign Key in a Cities table. In most SQL engines, this will give you referential integrity: the City column is only allowed to contain entries that already exist in the Cities table, so if you try to use something else you'll just get an error instead.

Also a word about python, web apps and cyber security:

1) If dealing with user entered data in a web app you should ALWAYS assume it is written by a drunk or is trying to gently caress you over via HTML injection, SQL injection, etc. You should always assume misspellings, spaces, weird caps cases and, for cybersecurity reasons, that Unicode chars, <, /, >, ', and % will exist and are trying to push your poo poo in. Google something like: sanitize user input data python
2) I have seen IRL production apps that expose their payment processing API keys because of leaving debug=True.
EDIT:
3) If youre loving around with CORS, know what youre doing.
4) Use casefold() not upper() for the unicode reasons mentioned above.

CarForumPoster fucked around with this message at 20:58 on Feb 18, 2023

C2C - 2.0
May 14, 2006

Dubs In The Key Of Life


Lipstick Apathy

QuarkJets posted:

Sqlite lets you have multiple tables in a database, so you could just have 1 file.

You could have a Cafe table that has a City column, and you would designate that column as holding a Foreign Key in a Cities table. In most SQL engines, this will give you referential integrity: the City column is only allowed to contain entries that already exist in the Cities table, so if you try to use something else you'll just get an error instead.

To that end, would it be possible to create a table that stores URLs? I kinda' wanna' dress the app up a bit so that when a city is selected and all of the cafe results are returned, a link to the cafes' location on Google Maps is part of the output. I was reading earlier to simply add a table entry as VARCHAR for URLs.


CarForumPoster posted:

Disclaimer: I am bad a DBs and I dont tend to access DBs in the way you are. (Theyre either built by migrations ala Django, wrapped in an API or I am pulling specific rows by some ID/query)

Agree with below:

Also a word about python, web apps and cyber security:

1) If dealing with user entered data in a web app you should ALWAYS assume it is written by a drunk or is trying to gently caress you over via HTML injection, SQL injection, etc. You should always assume misspellings, spaces, weird caps cases and, for cybersecurity reasons, that Unicode chars, <, /, >, ', and % will exist and are trying to push your poo poo in. Google something like: sanitize user input data python
2) I have seen IRL production apps that expose their payment processing API keys because of leaving debug=True.
EDIT:
3) If youre loving around with CORS, know what youre doing.
4) Use casefold() not upper() for the unicode reasons mentioned above.

Fortunately, there's no user input for this app other then the dropdown box selection. When we were getting deeper into Flask in the class, there was some discussion about the ability for attacks via input forms but it was cursory at best. I'll take your advice into consideration though as I do enjoy working with Flask and I plan on, at some point, delving into Django as well.

C2C - 2.0 fucked around with this message at 22:30 on Feb 18, 2023

QuarkJets
Sep 8, 2008

C2C - 2.0 posted:

To that end, would it be possible to create a table that stores URLs? I kinda' wanna' dress the app up a bit so that when a city is selected and all of the cafe results are returned, a link to the cafes' location on Google Maps is part of the output. I was reading earlier to simply add a table entry as VARCHAR for URLs.

Fortunately, there's no user input for this app other then the dropdown box selection. When we were getting deeper into Flask in the class, there was some discussion about the ability for attacks via input forms but it was cursory at best. I'll take your advice into consideration though as I do enjoy working with Flask and I plan on, at some point, delving into Django as well.

Yeah you can have a column of URLs; to the db that's no different than your city names and cafe names, basically

C2C - 2.0
May 14, 2006

Dubs In The Key Of Life


Lipstick Apathy

QuarkJets posted:

Yeah you can have a column of URLs; to the db that's no different than your city names and cafe names, basically

Thanks for all your help! I just peeked ahead and the next project is developing a web app with Flask that's essentially a to-do list that does require a bit of user input, so I guess I'll be reading up on those best practices that CarForumPoster suggested.

QuarkJets
Sep 8, 2008

CarForumPoster posted:

4) Use casefold() not upper() for the unicode reasons mentioned above.

Not ready to co-sign on this; can you provide an example where using casefold immunizes some code to a vulnerability to which lower or upper would be exposed? I know it's considered best practice to use casefold() for string comparisons in Python, but if I'm passing values to a database then I probably want to insert the lower().capitalize() form of the string, and then I definitely don't want to use casefold() for strings passed as part of db queries

nullfunction
Jan 24, 2005

Nap Ghost

QuarkJets posted:

:shrug:

I just want to throw something together using whatever is currently considered best practice. Maybe I could set up something static that just displays the last 10 pictures I dropped into a directory (e.g. I would generate html locally and then upload everything to s3, I guess?)

Best practice is sort of difficult to gauge here, because there are a dozen different ways to do something like this depending on the finer points of your requirements and what you're trying to optimize. Especially given the number of ways you can run a container on AWS alone, to say nothing of other techniques within AWS or other cloud providers.

As mentioned by a few others, if you just want a static site to display some images, S3 web hosting makes for a fantastic choice. Add a second bucket in a different region, replicate files there, throw CloudFront in with origin failover and it's suddenly globally-distributed and tolerant to a region failure -- pretty powerful for not a lot of work! There really isn't much Python involved here on the hosting side, other than maybe pushing files to S3 using boto3, generating your HTML file from a template, etc. It's nicely optimized for simplicity without sacrificing availability, and should be extremely cost-effective to run as well. It's also completely manual, if you want to add a new image to the 10 most recent, you have to regenerate your source files and reupload everything, there's no compute associated with this method.

If you wanted a solution that leans on Python more heavily as the brains, I would use AWS Lambda for something like this. I don't think I would ship it as a container, I'd probably just live with using 3.9 until AWS finally gets their poo poo together with 3.10 and 3.11 runtimes since it's a very simple task and probably doesn't require a lot of maintenance. You could use a Lambda function to generate the HTML on the fly but I'd probably create the skeleton as a raw HTML file that lives in S3 and use htmx to call the function that asks S3 for files, determines the 10 most recent, and returns them as a chunk of HTML without having to write any JS on the frontend. If you had a suitable method of authenticating users, you could also use Lambda to return a signed URL that will let you upload files directly to the S3 bucket... no extra compute handling required to process a file upload, it's completely between the browser and S3 at that point!

To take this from a toy example into the real world, you'd probably want to think about :

- AuthN/AuthZ on your endpoint(s)
- Will end-users be rate-limited? Will results from the Lambda function need to be cached to keep from running into concurrency limitations if web traffic increases significantly? API Gateway can do rate limiting and caching for you but comes with a price and its own limitations around max request timeout. Adding HA means another API Gateway in another region with all the same backing Lambdas deployed there too.
- How often are we adding new images? Should we transition older images that don't need to be retrieved often to cheaper storage classes for archival, or should they be deleted after some time or falling out of the 10 most recent?
- Will image files stored in S3 be forced to conform to a certain naming spec, or will the original filename be used for storage? Is the S3 bucket allowed to grow unbounded? S3 doesn't have a way to query objects by their modified time, which means potentially lots of API calls to find the most recent objects if you don't prefix your objects with some specifier to help limit the number of queried objects. S3 bucket replication requires that object versioning be turned on, does your software correctly handle an object that has had several versions uploaded?
- If we need to maintain a record of historical images past the 10 most recent, should we store metadata about the objects in another place where we can do a faster lookup, then just reference the objects in S3? Is eventual consistency sufficient for this lookup? Lambda -> DynamoDB is a very common pattern, but is by no means the only option. If you want to use RDS, be sure you have a proxy between Lambda and your DB if you don't want to exhaust your connections when traffic spikes.
- As the number of moving parts involved grows, managing deployment complexity is going to be a bigger deal. Ideally you started building this out with your favorite infrastructure as code tool but if not, you'll definitely want to -- Terraform, CDK, CFN, doesn't really matter. I like CDK because I can stay in Python while defining my infrastructure, but it's not perfect (none of them are).

You could probably come up with a dozen more items to think about that aren't here, and to be clear, there are plenty of ways to accomplish the above in the hosting environment of your choice. You could opt to run the whole thing (object retrieval and storage too) in Django or Flask or FastAPI or whatever if you wanted the experience of building it yourself and don't mind a little undifferentiated heavy lifting.

QuarkJets
Sep 8, 2008

C2C - 2.0 posted:

Thanks for all your help! I just peeked ahead and the next project is developing a web app with Flask that's essentially a to-do list that does require a bit of user input, so I guess I'll be reading up on those best practices that CarForumPoster suggested.

The key is to limit the amount of control that users have over your database. It's tempting for inexperienced developers to pass user input directly into the database, and that's where you run into trouble. But virtually every Python database client provides a cursor object that lets you map inputs as part of string-driven queries, and the cursor will insert escape characters to prevent injection.

The former, which would be vulnerable, looks like this:
Python code:
user_input = fetch_raw_user_input()
cursor.execute(f"select * from table_name where user = '{user_input}'")
This is tempting because it's a natural way for developers to interact with strings. And at first glance it works perfectly, you can even write some naive unit tests for this and perceive everything behaving like it should. But a problem arises when the user input is "some_valid_user'); drop table table_name; --"

The parameter binding case looks more like this:
Python code:
user_input = fetch_raw_user_input()
cursor.execute("select * from table_name where user = ?", (user_input,))
The question mark is a placeholder that is picked up by the database client, causing the client to expect a tuple of values that are meant to replace any question marks (the length of the tuple is required to be equal to the number of question marks in the string). Escape characters are inserted into the values to prevent injection. Some developers pass in dictionaries that map parameter names to specific values, some clients will let you (or may even require) % characters instead of question marks, but it's all the same principle: pass values to the cursor, do not insert user-provided input directly into the query

It's not necessary to do anything else to prevent sql injection, but you may want to help users who use slight variations of inputs. For instance if you want to query a column for the string "Red" but you also want to accept "red" or "RED" or "ReD" then you could modify the input prior to passing it to the query as a parameter:

Python code:
user_input = fetch_raw_user_input()
user_input = user_input.casefold()
cursor.execute("select * from table_name where color = ?", (user_input,))
In this example, it's assumed that the color column already contains all-lowercase contents. But this may be a bad assumption for something like a name. If your database contains capitalized entries then you may want to capitalize the user input instead

QuarkJets fucked around with this message at 00:14 on Feb 19, 2023

CarForumPoster
Jun 26, 2013

⚡POWER⚡

QuarkJets posted:

Not ready to co-sign on this; can you provide an example where using casefold immunizes some code to a vulnerability to which lower or upper would be exposed? I know it's considered best practice to use casefold() for string comparisons in Python, but if I'm passing values to a database then I probably want to insert the lower().capitalize() form of the string, and then I definitely don't want to use casefold() for strings passed as part of db queries

code:
# unicode gives you a loving
"foo@m&#305;x.com".upper()
Out[3]: 'FOO@MIX.COM'
# i like good ol egale loving americans
"foo@mix.com".upper()
Out[4]: 'FOO@MIX.COM'
"foo@m&#305;x.com".upper() == "foo@mix.com".upper()
Out[5]: True
"foo@m&#305;x.com".casefold() == "foo@mix.com".casefold()
Out[6]: False
In this case to meet the very strict definition of what you asked, you'd need to store the upper() in the db then pass that in the query but its a bit of a strawman. Plenty of people either make the comparison in python as an upper or store the user input in the db but retrieve the upper() from the db in SQL. This is a real thing that has been exploited IRL and if your login is admin@domain.com, you should register "domaın.com" just to eliminate this for $10/yr. I won't demonstrate both examples in an actual query because I feel like this case illustrates the point that this behavior is solved by casefold() which is exactly as easy to write in code as upper(). I had to fix this not long ago since my domain name contains an i and I was making upper() comparisons in places.

EDIT: Apparently somthing awful excapes the ı character in code blocks. Fascinating.

Here's the same not in a code block:

"foo@mıx.com".upper()
Out[3]: 'FOO@MIX.COM'
"foo@mix.com".upper()
Out[4]: 'FOO@MIX.COM'
"foo@mıx.com".upper() == "foo@mix.com".upper()
Out[5]: True
"foo@mıx.com".casefold() == "foo@mix.com".casefold()
Out[6]: False
"foo@mıx.com".lower().capitalize() == "foo@mix.com".lower().capitalize()
Out[9]: False


EDIT2: BRB off to register "somethıngawful.com" so I can be lowtax@somethıngawful.com

CarForumPoster fucked around with this message at 01:00 on Feb 19, 2023

CarForumPoster
Jun 26, 2013

⚡POWER⚡
Here's an example that allegedly would make it through Django's pretty good screens (haven't tested it)

https://twitter.com/SonarSource/status/1469336105841352705

QuarkJets
Sep 8, 2008

CarForumPoster posted:

Here's an example that allegedly would make it through Django's pretty good screens (haven't tested it)

https://twitter.com/SonarSource/status/1469336105841352705

This example would still be vulnerable if it used casefold() instead of upper(): "sharterß@email.com".casefold() becomes "sharterss@email.com"

This vulnerability is solved through conversion consistency. If your database only stores ascii values then you'd be better off encoding your inputs as ascii and yelling at the user if an exception gets raised. Or if you want to always store values after passing them through casefold() (totally reasonable) then you had better make sure that you use casefold() everywhere else that a user can input values for queries or insertions against those fields:

Python code:
def fetch_user_input(converter=str.casefold):
    user_input = some_third_party_api()
    return converter(user_input)
But even this function isn't an idiot-proof solution; someone can come along and decide to not use the `fetch_user_input` function. But this does give you a simple hook for enforcing string conversion compliance in unit testing: with pytest you can easily have every unit test check that `fetch_user_input` was invoked the same number of times as `some_third_party_api`

samcarsten
Sep 13, 2022

by vyelkin
Ok, conferred with my partners. We are using the starlette plugin to do HTML-Python interaction. Unfortunately, I cannot seem to find a good tutorial on my specific issue, which is pulling info from input text boxes upon button press.

spiritual bypass
Feb 19, 2008

Grimey Drawer
The browser probably submits an HTML form, creating an HTTP request that you need to respond to with a new document on the Python side

samcarsten
Sep 13, 2022

by vyelkin
Also, when using SQlite3 to create DB entries, do you need to have all the values enumerated or can you leave some blank? We have a primary key that should autoincrement so we don't want to overwrite.

samcarsten fucked around with this message at 03:19 on Feb 19, 2023

samcarsten
Sep 13, 2022

by vyelkin

cum jabbar posted:

The browser probably submits an HTML form, creating an HTTP request that you need to respond to with a new document on the Python side

right, but what is the format for linking text boxes to variables? I have the names of the boxes and the name of the variables, what is the procedure? The documentation doesn't explain itself well.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

QuarkJets posted:

This example would still be vulnerable if it used casefold() instead of upper(): "sharterß@email.com".casefold() becomes "sharterss@email.com"

This vulnerability is solved through conversion consistency. If your database only stores ascii values then you'd be better off encoding your inputs as ascii and yelling at the user if an exception gets raised. Or if you want to always store values after passing them through casefold() (totally reasonable) then you had better make sure that you use casefold() everywhere else that a user can input values for queries or insertions against those fields:

Python code:
def fetch_user_input(converter=str.casefold):
    user_input = some_third_party_api()
    return converter(user_input)
But even this function isn't an idiot-proof solution; someone can come along and decide to not use the `fetch_user_input` function. But this does give you a simple hook for enforcing string conversion compliance in unit testing: with pytest you can easily have every unit test check that `fetch_user_input` was invoked the same number of times as `some_third_party_api`

Yea agree with everything you said here. I didnt even really consider lower case collisions. One nuance though is that the big issue is with domain name collisions with email addresses allowing a password reset. So ChevySS.com should worry about "admin@chevyß.com".casefold() == "admin@chevyss.com".casefold(). Way more domains have an i than an SS though.

And yea you'd have to call the function all over the place potentially. Django does a pretty good job of automatically cleaning that poo poo up and if you let django handle the db calls for you, then you dont need to worry as much, though SQL injection exploits for Django are still found for certain unsanitized inputs that were overlooked. Its a belt and suspenders thing but IMO its good to go through your code thinking "what can users input" and assume they'll try to be fuckers and input HTML/char escapes/XSS poo poo.

EDIT: BTW neat site with list of these collisions:
https://gosecure.github.io/unicode-pentester-cheatsheet/

Double edit:

For clarification, admin@chevyß.com would have to compare the casefold, but then send the reset email to the user input, not the casefold of the user input. So sanitizing the input will always neutralize, as worst case scenario you'll just send an unwanted reset email to the correct admin@chevyss.com

CarForumPoster fucked around with this message at 04:04 on Feb 19, 2023

QuarkJets
Sep 8, 2008

samcarsten posted:

Also, when using SQlite3 to create DB entries, do you need to have all the values enumerated or can you leave some blank? We have a primary key that should autoincrement so we don't want to overwrite.

You have to provide a value for every bound parameter. If you don't want to provide a new primary key when you insert a new row, which is the correct thing to do, then don't have that field be part of the insert statement

samcarsten
Sep 13, 2022

by vyelkin

QuarkJets posted:

You have to provide a value for every bound parameter. If you don't want to provide a new primary key when you insert a new row, which is the correct thing to do, then don't have that field be part of the insert statement

ok, cool. now to just figure out how to use starlette.

Adbot
ADBOT LOVES YOU

samcarsten
Sep 13, 2022

by vyelkin
alright, so I was looking at things all wrong. Now, when given a HTTP POST request, how do i tease out which data is which? i.e. I have three separate text boxes, each with different information. I need them made into 3 separate variables. How do I split them using starlette?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply