Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

KICK BAMA KICK posted:

Do you guys have any tips for using Python to do insane amounts of crime
https://x.com/molly0xFFF/status/1710718416724595187?s=20

It's comforting to know that if I ever decide to turn to a life of crime other than y'know, working in big tech, that I won't have to learn javascript first.

Adbot
ADBOT LOVES YOU

TasogareNoKagi
Jul 11, 2013

Presto posted:

from crimes import financial_fraud

from sam import bankman as fraud

boofhead
Feb 18, 2021

pretend i remade that "the americans spent millions of dollars inventing a pen that would work in space. the soviets used a pencil" meme but for hedge funds committing large scale financial fraud

Data Graham
Dec 28, 2009

📈📊🍪😋



Falcon2001 posted:

It's comforting to know that if I ever decide to turn to a life of crime other than y'know, working in big tech, that I won't have to learn javascript first.

Want to write ethereum smart contracts? Vyper (python-like) is stillborn in favor of solidity, and brownie (python web3 wrapper) is abandoned in favor of Ape ℗thon but broken)

What does this have to do with crime you ask? Oh nothing

Data Graham fucked around with this message at 14:25 on Oct 11, 2023

Macichne Leainig
Jul 26, 2012

by VG

Presto posted:

from crimes import financial_fraud

You fool, you should do

from not_crimes import totally_not_fraud

rowkey bilbao
Jul 24, 2023
while True: yield crime

Data Graham
Dec 28, 2009

📈📊🍪😋



Macichne Leainig posted:

You fool, you should do

from crimes import fraud as totally_not_fraud

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
Are there any modern libraries that are sort of in the same vein as Pydantic, but not harmful and bad from trying to smush every data serialization/deserialization/validation/normalization concern together in the same bathroom sink? Plain dataclasses are bad. Marshmallow is bad. Mashumaro looks promising, but I haven't gone too deep yet.

Vulture Culture fucked around with this message at 22:26 on Oct 10, 2023

monochromagic
Jun 17, 2023

Vulture Culture posted:

Are there any modern libraries that are sort of in the same vein as Pydantic, but not harmful and bad from trying to smush every data serialization/deserialization/validation/normalization concern together in the same bathroom sink? Plain dataclasses are bad. Marshmallow is bad. Mashumaro looks promising, but I haven't gone too deep yet.

Have a look at attrs. https://www.attrs.org/en/stable/why.html#data-classes

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Vulture Culture posted:

Are there any modern libraries that are sort of in the same vein as Pydantic, but not harmful and bad from trying to smush every data serialization/deserialization/validation/normalization concern together in the same bathroom sink? Plain dataclasses are bad. Marshmallow is bad. Mashumaro looks promising, but I haven't gone too deep yet.

It might help if you explained more about the problem you have with those solutions.

abelwingnut
Dec 23, 2002


python novice, so this should be easy.

i have 25 simple python scripts that each run a distinct query on a database and export them to individual csv.

i then have another 'orchestration' python script that aims to run all those, then upload the files to s3.

i've got the upload working just fine, but i'm not sure how to run all of those python scripts from within the orchestration script. basically, i'd want the os module to run through the directory containing all of the scripts, put them in a list or something, then run each individually and sequentially. once done, it then moves on to the upload process.

so...how do i go about running all of those scripts like this? i can easily get the filepaths for each. is there a simple way to run them from within this orchestration script?

spiritual bypass
Feb 19, 2008

Grimey Drawer
Maybe take the code driving each script and put it into a function, then run each of those functions, sending the result to s3 (or wherever)?

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
You can structure each "script" so that it can also be imported. So, not sure how it's structured now, but get each script into such a form where it gets executed from a single function call. Then at the end do something like if __name__ == "main" and have it call that single function call. That way if you execute the file as a script it will execute, but then you could import that function from another file and execute it there.

Son of Thunderbeast
Sep 21, 2002
^^ e: that's a much better answer than mine haha

abelwingnut posted:

python novice, so this should be easy.

Also a novice, so take this with a grain of salt. I'm also open to critique of my suggestion

This is a situation where I'd probably start using classes, or at the very least refactoring everything into a function (that needs to be, anyway). Then you can import e.g. module_x.py, or even just specific functions that you need, and run them from the main script.

Oysters Autobio
Mar 13, 2017
Is there any decent Regex generators for Python? Talking about here of a library that has functions for constructing common string searches in a "pythonic way" and outputs a raw regex pattern.

Terrible written-on-phone example:

Python code:

regex_gen("foofile20230507-001.jpeg", variable = "20230597-001", static = "foofile")

Similarly, any decent "reverse engineering" regex tools in python? Like a semi supervised ML-based library that lets you input a bunch of similar strings and you can train a model to generate a regex string to parse it?

Oysters Autobio fucked around with this message at 15:32 on Oct 12, 2023

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

abelwingnut posted:

python novice, so this should be easy.

i have 25 simple python scripts that each run a distinct query on a database and export them to individual csv.

i then have another 'orchestration' python script that aims to run all those, then upload the files to s3.

i've got the upload working just fine, but i'm not sure how to run all of those python scripts from within the orchestration script. basically, i'd want the os module to run through the directory containing all of the scripts, put them in a list or something, then run each individually and sequentially. once done, it then moves on to the upload process.

so...how do i go about running all of those scripts like this? i can easily get the filepaths for each. is there a simple way to run them from within this orchestration script?

The terms you're looking for to figure this out is 'imports' - the simplest setup here is to put them all in a single folder, so that you can do relative imports from your orchestration script to all of these other scripts. The more complex way to approach this is to get them added to your PYTHONPATH, but if you're brand new to things this might be complex or lead you to doing odd things, so probably just stick with 'get 'em all in a folder'.

If they're all in the same folder then you should next follow what this poster suggests:

Son of Thunderbeast posted:

^^ e: that's a much better answer than mine haha

Also a novice, so take this with a grain of salt. I'm also open to critique of my suggestion

This is a situation where I'd probably start using classes, or at the very least refactoring everything into a function (that needs to be, anyway). Then you can import e.g. module_x.py, or even just specific functions that you need, and run them from the main script.

Notably, I'd recommend sticking to functions if that's the main way you're doing it. If you can upload these (or at least one of them) to a github gist then it'd be a bit easier to recommend specifically how to go about refactoring it, but the absolute simplest thing to do would be something like this:

Python code:
# filename is butts_finder.py

# slap this part at the top right below any imports
def find_butts():
    # take the whole drat script and indent it by one level here
    ...

if __name__ == "__main__":
    find_butts()

The if name thing at the end basically means that if you import the file it won't automatically run everything in a way that's tricky to play around with later, but you can still run the script directly. https://www.freecodecamp.org/news/if-name-main-python-example/ probably explains it better.

Then from your orchestration script in the same directory:

Python code:
from butts_finder import find_butts

# this one is not as important to stick in a if name thing but it's a good habit

def main():
    find_butts()

if __name__ == "__main__":
    main()
Again, none of this is perfect, but it's a decent starting place without bogging down into file structure setups/etc.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Falcon2001 posted:

It might help if you explained more about the problem you have with those solutions.
I'm mostly looking for inspo on how people are solving the problem of taking in data, based on some kind of protocol or standard, and shuttling it around the different intermediate domain representations in their programs/services. Some of the key problems that bug me day to day are:

  • What the standard says the data should look like and the normalized version of the data you want to work with often aren't the same thing
  • Schema validation and semantic validation are totally different things and should be handled at different layers
  • Serialization/deserialization shouldn't be tightly coupled to the transformations between those data representations
  • Going between those data representations should be consistent and rule-based instead of based on arbitrary, implicit, and often undocumented conversions
  • The above should still be DRY and easy as much as possible; it shouldn't need either re-declaration of those rules for every single field, nor ad-hoc reflection-driven handling of an entire model
  • The hardest part of managing data models is usually handling discriminated unions

A good example of where Pydantic falls down is on both implicit and explicit conversions. When you're interpreting JSON input with a defined representation, usually the standard will define things like a specific datetime format or a particular method of encoding binary data. It handles these kinds of use cases so badly that I'd rather not even try

Vulture Culture fucked around with this message at 17:26 on Oct 12, 2023

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Vulture Culture posted:

I'm mostly looking for inspo on how people are solving the problem of taking in data, based on some kind of protocol or standard, and shuttling it around the different intermediate domain representations in their programs/services. Some of the key problems that bug me day to day are:

  • What the standard says the data should look like and the normalized version of the data you want to work with often aren't the same thing
  • Schema validation and semantic validation are totally different things and should be handled at different layers
  • Serialization/deserialization shouldn't be tightly coupled to the transformations between those data representations
  • Going between those data representations should be consistent and rule-based instead of based on arbitrary, implicit, and often undocumented conversions
  • The above should still be DRY and easy as much as possible; it shouldn't need either re-declaration of those rules for every single field, nor ad-hoc reflection-driven handling of an entire model
  • The hardest part of managing data models is usually handling discriminated unions

A good example of where Pydantic falls down is on both implicit and explicit conversions. When you're interpreting JSON input with a defined representation, usually the standard will define things like a specific datetime format or a particular method of encoding binary data. It handles these kinds of use cases so badly that I'd rather not even try

Ahh okay. I wasn't sure from your original post what side of complexity you were irritated by. This all sounds pretty reasonable, but I don't have much in the way of useful answers for you based on this, since I haven't had to solve something this complex - all the web responses I work with have corresponding libraries for the most part so there' already a (decent, not great) solution.

abelwingnut
Dec 23, 2002


thank you for the help! i'll see what comes of all this and follow up i need some additional pointers.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
This is maybe more general than just Python, but figured I'd ask since implementation might be specific.

Some background: I'm working on a scheduling app for developing oncall rotations or other things. A simple view of this ties it to specific days, but as soon as you add timezones/etc that gets a lot more complicated, and this ties into something I've failed to figure out before.

What's the right pattern / name / etc for dealing with a series of things that occur during periods of time? For example, here's some basic pseudocode:

Python code:
[
    Shift(start='2023/01/15T00:00:00',end='2023/01/15T10:00:00',name='User1',role='Primary')
    Shift(start='2023/01/15T00:00:00',end='2023/01/15T10:00:00',name='User2',role='Secondary')
    Shift(start='2023/01/15T00:00:00',end='2023/01/15T10:00:00',name='User3',role='Manager')
    Shift(start='2023/01/16T00:00:00',end='2023/01/16T10:00:00',name='User1',role='Primary')
]
As a human you can understand 'oh yeah that needs to go in a timeline and the first three all overlap and the fourth ones doesn't', but what data structure / etc would be appropriate for this? For the UI I need to be able to display a schedule in a month view/etc, but that's just an abstraction away from the actual schedule itself, which is a series of time periods.

StumblyWumbly
Sep 12, 2007

Batmanticore!
Gantt chart? Or you want something that covers stored values that should not overlap?

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?
The most obvious approach imo is to keep the shift+user data in database tables and have your app run queries against it. Is there any reason that wouldn’t work?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

DoctorTristan posted:

The most obvious approach imo is to keep the shift+user data in database tables and have your app run queries against it. Is there any reason that wouldn’t work?

I should probably explain a bit more then: the goal isn't to run something like PagerDuty/etc, it's to be able to quickly generate and manipulate a schedule that can then be exported to various formats for import to real pager management systems (or just print it out, whatever works.)

Both at my current bigTech company and my previous one, we had systems for managing oncall schedules, but the actual 'build something to review' ended up being done in excel because the UI for the pager management system was terrible, so I wanted something more visually focused so you could quickly play around with things like 'oh what if we lengthened this rotation' or whatever.

Because of this, long-term storage is not really necessary or useful, so I'm hesitant to use a DB, unless there's some in-memory option that would be convenient. I also really haven't worked much with databases other than key/value ones like Dynamo.

Additionally, even a super long, complex schedule is likely going to have maybe hundreds of rows and not thousands or anything crazier, and I think many teams would have something on the range of dozens of entries.

StumblyWumbly posted:

Gantt chart? Or you want something that covers stored values that should not overlap?

I'm thinking of more like what data structure to use for access/storage - UI display is probably best done as a calendar view but that's an abstraction over the basic data set, at least in my view. I can probably look up how to render Gantt charts to get a feel for the approach though.

LightRailTycoon
Mar 24, 2017
It’s more than you need, but polars or pandas should cover it

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?
So it sounds like your data is a small collection of records with no general inter-record relationships other than ‘I want to consider these things together’. I’d just keep that as a bunch of dataclasses/namedtuples in a list unless/until you run into a reason not to (nothing you have said above seems like a reason not to).

One caveat to that is if you’re planning on using any of the numerous python dataviz frameworks then you’ll probably want to keep the data in a pandas dataframe instead since that’s what those frameworks expect.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Many languages support the concept of a RangeMap, which you can insert Ranges (essentially, pairs of a start value and an end value) into and then look up single values in. Would that be useful for what you're trying to do?

BUUNNI
Jun 23, 2023

by Pragmatica
Is there a good book or manual that goes over data analysis and visualization using python? I’m not that good at reading the official documentation for Pandas and NumPy and stuff yet.

Macichne Leainig
Jul 26, 2012

by VG
I find that Panda's Getting Started and User Guides are good starting points depending on the kind of data analysis task you are trying to perform.

The "Getting Started" stuff is much more fundamental like how do you load in a CSV, run summary statistics, and draw a plot? The User Guide covers more specific stuff like what if your data is categorical and not numerical, how can you encode it, etc.

What kind of walkthrough are you looking for specifically?

BUUNNI
Jun 23, 2023

by Pragmatica

Macichne Leainig posted:

I find that Panda's Getting Started and User Guides are good starting points depending on the kind of data analysis task you are trying to perform.

The "Getting Started" stuff is much more fundamental like how do you load in a CSV, run summary statistics, and draw a plot? The User Guide covers more specific stuff like what if your data is categorical and not numerical, how can you encode it, etc.

What kind of walkthrough are you looking for specifically?

Thanks for the suggestions! Right now I'm looking for help dealing with creating scatterplots using PyPlot (for instance, creating a plot that shows 'UV Index vs. Population Density in World's 20 Largest Cities' with the dots proportionate to city population), and handling dataframes using Pandas, including creating data columns, creating numeric variable from column data, etc...

ComradePyro
Oct 6, 2009

BUUNNI posted:

Thanks for the suggestions! Right now I'm looking for help dealing with creating scatterplots using PyPlot (for instance, creating a plot that shows 'UV Index vs. Population Density in World's 20 Largest Cities' with the dots proportionate to city population), and handling dataframes using Pandas, including creating data columns, creating numeric variable from column data, etc...

oh hey, you're me back in January. here's what got me started:

https://youtube.com/playlist?list=PL9oKUrtC4VP7ry0um1QOUUfJBXKnkf-dA&si=w5PNghLnrayuJC0R

that channel also has a numpy playlist. I can't claim to have really absorbed it all, but watching them a couple times helped a lot

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

BUUNNI posted:

Is there a good book or manual that goes over data analysis and visualization using python? I’m not that good at reading the official documentation for Pandas and NumPy and stuff yet.

https://store.metasnake.com/effective-pandas-book is the best book on Pandas I know of.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Zugzwang posted:

Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/

It isn't as good though

StumblyWumbly
Sep 12, 2007

Batmanticore!

BUUNNI posted:

Thanks for the suggestions! Right now I'm looking for help dealing with creating scatterplots using PyPlot (for instance, creating a plot that shows 'UV Index vs. Population Density in World's 20 Largest Cities' with the dots proportionate to city population), and handling dataframes using Pandas, including creating data columns, creating numeric variable from column data, etc...

I've found that Plotly makes better graphs more easily compared to Matplotlib.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Chin Strap posted:

It isn't as good though
I might check it out, thanks.

StumblyWumbly posted:

I've found that Plotly makes better graphs more easily compared to Matplotlib.
Plotly is really good for a lot of things.

Matplotlib's strength is its weakness; it can do everything. But its API can be cumbersome because of that. I like Seaborn for making the most common types of plots. It's nice because it's a wrapper around Matplotlib, so you can still add extra fancy customizations to its Figure objects if you want to.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Speaking of plots and datavis/etc, I'm curious how people would approach this problem:

I've been playing around with Advent of Code stuff, which has a lot of problems involving 2d maps, often on cartesian planes (so X/Y can be negative) - I've struggled quite a bit with storing/handling these. Here's one example (note that probably storing the actual full grid is not the right solution for this puzzle, but it's indicative of the style of problem): https://adventofcode.com/2022/day/15 - I'm trying to get some stuff in order for this year's advent.

I've used numpy ndarray, forcing the indexes to work by offsetting things, but that was super weird to work with and my code became very hard to read. I've recently moved over to using a pandas dataframe with a custom index ( as seen here: https://stackoverflow.com/questions/53494616/how-to-create-a-matrix-with-negative-index-position) but that has it's own weirdnesses.

You might say 'well just store the points in a list!' but there's a fair number of these problems where you're doing neighbor lookups regularly, and having the full grid populated is actually very useful; not to mention debugging/checking your work becomes a lot easier when you have a fully populated grid. I actually threw together my own ndarray -> image function using PIL to turn an ndarray into a graphical representation based on the integer value, but I'm increasingly thinking I'm doing it wrong.

So the question: What's the correct (or at least painless) way to store (and ideally visualize) a 2d cartesian grid where each coordinate can be an arbitrary data type? I'm not looking for code golf style solutions, as I'm using these problems as ways to explore concepts/etc, and so it's more useful to have something that I can dig through and visualize/debug/interact with than a perfect impenetrable one-line solution.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
https://docs.xarray.dev/en/latest/internals/how-to-create-custom-index.html maybe?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Zugzwang posted:

I might check it out, thanks.

Plotly is really good for a lot of things.

Matplotlib's strength is its weakness; it can do everything. But its API can be cumbersome because of that. I like Seaborn for making the most common types of plots. It's nice because it's a wrapper around Matplotlib, so you can still add extra fancy customizations to its Figure objects if you want to.

ChatGPT with GPT4 does decent with very tailored matplotlib made over several prompts if you end up in that edge case where seaborn and plotly don’t let you customize quite how you want.

I’ve done this when making specifically shaped and annotate histograms for example.

StumblyWumbly
Sep 12, 2007

Batmanticore!

Falcon2001 posted:

Speaking of plots and datavis/etc, I'm curious how people would approach this problem:

I've been playing around with Advent of Code stuff, which has a lot of problems involving 2d maps, often on cartesian planes (so X/Y can be negative) - I've struggled quite a bit with storing/handling these. Here's one example (note that probably storing the actual full grid is not the right solution for this puzzle, but it's indicative of the style of problem): https://adventofcode.com/2022/day/15 - I'm trying to get some stuff in order for this year's advent.

I've used numpy ndarray, forcing the indexes to work by offsetting things, but that was super weird to work with and my code became very hard to read. I've recently moved over to using a pandas dataframe with a custom index ( as seen here: https://stackoverflow.com/questions/53494616/how-to-create-a-matrix-with-negative-index-position) but that has it's own weirdnesses.

You might say 'well just store the points in a list!' but there's a fair number of these problems where you're doing neighbor lookups regularly, and having the full grid populated is actually very useful; not to mention debugging/checking your work becomes a lot easier when you have a fully populated grid. I actually threw together my own ndarray -> image function using PIL to turn an ndarray into a graphical representation based on the integer value, but I'm increasingly thinking I'm doing it wrong.

So the question: What's the correct (or at least painless) way to store (and ideally visualize) a 2d cartesian grid where each coordinate can be an arbitrary data type? I'm not looking for code golf style solutions, as I'm using these problems as ways to explore concepts/etc, and so it's more useful to have something that I can dig through and visualize/debug/interact with than a perfect impenetrable one-line solution.

Have you tried using DataFrame.iloc calls with Pandas? That can let you have weirdo indices and also check neighbors

Adbot
ADBOT LOVES YOU

Zoracle Zed
Jul 10, 2001

Falcon2001 posted:

So the question: What's the correct (or at least painless) way to store (and ideally visualize) a 2d cartesian grid where each coordinate can be an arbitrary data type? I'm not looking for code golf style solutions, as I'm using these problems as ways to explore concepts/etc, and so it's more useful to have something that I can dig through and visualize/debug/interact with than a perfect impenetrable one-line solution.

a dict or defaultdict with a tuple of ints as the key type would be my first try

eta: not great for visualization though

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply