|
KICK BAMA KICK posted:Do you guys have any tips for using Python to do insane amounts of crime It's comforting to know that if I ever decide to turn to a life of crime other than y'know, working in big tech, that I won't have to learn javascript first.
|
# ? Oct 8, 2023 06:06 |
|
|
# ? May 14, 2024 03:22 |
|
Presto posted:from crimes import financial_fraud from sam import bankman as fraud
|
# ? Oct 8, 2023 06:23 |
|
pretend i remade that "the americans spent millions of dollars inventing a pen that would work in space. the soviets used a pencil" meme but for hedge funds committing large scale financial fraud
|
# ? Oct 8, 2023 10:14 |
Falcon2001 posted:It's comforting to know that if I ever decide to turn to a life of crime other than y'know, working in big tech, that I won't have to learn javascript first. Want to write ethereum smart contracts? Vyper (python-like) is stillborn in favor of solidity, and brownie (python web3 wrapper) is abandoned in favor of Ape ℗thon but broken) What does this have to do with crime you ask? Oh nothing Data Graham fucked around with this message at 14:25 on Oct 11, 2023 |
|
# ? Oct 8, 2023 13:00 |
|
Presto posted:from crimes import financial_fraud You fool, you should do from not_crimes import totally_not_fraud
|
# ? Oct 8, 2023 15:19 |
|
while True: yield crime
|
# ? Oct 8, 2023 15:58 |
Macichne Leainig posted:You fool, you should do
|
|
# ? Oct 8, 2023 16:09 |
|
Are there any modern libraries that are sort of in the same vein as Pydantic, but not harmful and bad from trying to smush every data serialization/deserialization/validation/normalization concern together in the same bathroom sink? Plain dataclasses are bad. Marshmallow is bad. Mashumaro looks promising, but I haven't gone too deep yet.
Vulture Culture fucked around with this message at 22:26 on Oct 10, 2023 |
# ? Oct 10, 2023 22:21 |
|
Vulture Culture posted:Are there any modern libraries that are sort of in the same vein as Pydantic, but not harmful and bad from trying to smush every data serialization/deserialization/validation/normalization concern together in the same bathroom sink? Plain dataclasses are bad. Marshmallow is bad. Mashumaro looks promising, but I haven't gone too deep yet. Have a look at attrs. https://www.attrs.org/en/stable/why.html#data-classes
|
# ? Oct 10, 2023 23:26 |
|
Vulture Culture posted:Are there any modern libraries that are sort of in the same vein as Pydantic, but not harmful and bad from trying to smush every data serialization/deserialization/validation/normalization concern together in the same bathroom sink? Plain dataclasses are bad. Marshmallow is bad. Mashumaro looks promising, but I haven't gone too deep yet. It might help if you explained more about the problem you have with those solutions.
|
# ? Oct 11, 2023 19:39 |
|
python novice, so this should be easy. i have 25 simple python scripts that each run a distinct query on a database and export them to individual csv. i then have another 'orchestration' python script that aims to run all those, then upload the files to s3. i've got the upload working just fine, but i'm not sure how to run all of those python scripts from within the orchestration script. basically, i'd want the os module to run through the directory containing all of the scripts, put them in a list or something, then run each individually and sequentially. once done, it then moves on to the upload process. so...how do i go about running all of those scripts like this? i can easily get the filepaths for each. is there a simple way to run them from within this orchestration script?
|
# ? Oct 11, 2023 21:46 |
|
Maybe take the code driving each script and put it into a function, then run each of those functions, sending the result to s3 (or wherever)?
|
# ? Oct 11, 2023 22:06 |
|
You can structure each "script" so that it can also be imported. So, not sure how it's structured now, but get each script into such a form where it gets executed from a single function call. Then at the end do something like if __name__ == "main" and have it call that single function call. That way if you execute the file as a script it will execute, but then you could import that function from another file and execute it there.
|
# ? Oct 11, 2023 22:12 |
|
^^ e: that's a much better answer than mine hahaabelwingnut posted:python novice, so this should be easy. Also a novice, so take this with a grain of salt. I'm also open to critique of my suggestion This is a situation where I'd probably start using classes, or at the very least refactoring everything into a function (that needs to be, anyway). Then you can import e.g. module_x.py, or even just specific functions that you need, and run them from the main script.
|
# ? Oct 11, 2023 22:16 |
|
Is there any decent Regex generators for Python? Talking about here of a library that has functions for constructing common string searches in a "pythonic way" and outputs a raw regex pattern. Terrible written-on-phone example: Python code:
Similarly, any decent "reverse engineering" regex tools in python? Like a semi supervised ML-based library that lets you input a bunch of similar strings and you can train a model to generate a regex string to parse it? Oysters Autobio fucked around with this message at 15:32 on Oct 12, 2023 |
# ? Oct 12, 2023 15:27 |
|
abelwingnut posted:python novice, so this should be easy. The terms you're looking for to figure this out is 'imports' - the simplest setup here is to put them all in a single folder, so that you can do relative imports from your orchestration script to all of these other scripts. The more complex way to approach this is to get them added to your PYTHONPATH, but if you're brand new to things this might be complex or lead you to doing odd things, so probably just stick with 'get 'em all in a folder'. If they're all in the same folder then you should next follow what this poster suggests: Son of Thunderbeast posted:^^ e: that's a much better answer than mine haha Notably, I'd recommend sticking to functions if that's the main way you're doing it. If you can upload these (or at least one of them) to a github gist then it'd be a bit easier to recommend specifically how to go about refactoring it, but the absolute simplest thing to do would be something like this: Python code:
Then from your orchestration script in the same directory: Python code:
|
# ? Oct 12, 2023 17:06 |
|
Falcon2001 posted:It might help if you explained more about the problem you have with those solutions.
A good example of where Pydantic falls down is on both implicit and explicit conversions. When you're interpreting JSON input with a defined representation, usually the standard will define things like a specific datetime format or a particular method of encoding binary data. It handles these kinds of use cases so badly that I'd rather not even try Vulture Culture fucked around with this message at 17:26 on Oct 12, 2023 |
# ? Oct 12, 2023 17:20 |
|
Vulture Culture posted:I'm mostly looking for inspo on how people are solving the problem of taking in data, based on some kind of protocol or standard, and shuttling it around the different intermediate domain representations in their programs/services. Some of the key problems that bug me day to day are: Ahh okay. I wasn't sure from your original post what side of complexity you were irritated by. This all sounds pretty reasonable, but I don't have much in the way of useful answers for you based on this, since I haven't had to solve something this complex - all the web responses I work with have corresponding libraries for the most part so there' already a (decent, not great) solution.
|
# ? Oct 12, 2023 17:51 |
|
thank you for the help! i'll see what comes of all this and follow up i need some additional pointers.
|
# ? Oct 12, 2023 19:30 |
|
This is maybe more general than just Python, but figured I'd ask since implementation might be specific. Some background: I'm working on a scheduling app for developing oncall rotations or other things. A simple view of this ties it to specific days, but as soon as you add timezones/etc that gets a lot more complicated, and this ties into something I've failed to figure out before. What's the right pattern / name / etc for dealing with a series of things that occur during periods of time? For example, here's some basic pseudocode: Python code:
|
# ? Oct 15, 2023 20:26 |
|
Gantt chart? Or you want something that covers stored values that should not overlap?
|
# ? Oct 15, 2023 21:24 |
|
The most obvious approach imo is to keep the shift+user data in database tables and have your app run queries against it. Is there any reason that wouldn’t work?
|
# ? Oct 15, 2023 21:26 |
|
DoctorTristan posted:The most obvious approach imo is to keep the shift+user data in database tables and have your app run queries against it. Is there any reason that wouldn’t work? I should probably explain a bit more then: the goal isn't to run something like PagerDuty/etc, it's to be able to quickly generate and manipulate a schedule that can then be exported to various formats for import to real pager management systems (or just print it out, whatever works.) Both at my current bigTech company and my previous one, we had systems for managing oncall schedules, but the actual 'build something to review' ended up being done in excel because the UI for the pager management system was terrible, so I wanted something more visually focused so you could quickly play around with things like 'oh what if we lengthened this rotation' or whatever. Because of this, long-term storage is not really necessary or useful, so I'm hesitant to use a DB, unless there's some in-memory option that would be convenient. I also really haven't worked much with databases other than key/value ones like Dynamo. Additionally, even a super long, complex schedule is likely going to have maybe hundreds of rows and not thousands or anything crazier, and I think many teams would have something on the range of dozens of entries. StumblyWumbly posted:Gantt chart? Or you want something that covers stored values that should not overlap? I'm thinking of more like what data structure to use for access/storage - UI display is probably best done as a calendar view but that's an abstraction over the basic data set, at least in my view. I can probably look up how to render Gantt charts to get a feel for the approach though.
|
# ? Oct 15, 2023 21:37 |
|
It’s more than you need, but polars or pandas should cover it
|
# ? Oct 15, 2023 22:12 |
|
So it sounds like your data is a small collection of records with no general inter-record relationships other than ‘I want to consider these things together’. I’d just keep that as a bunch of dataclasses/namedtuples in a list unless/until you run into a reason not to (nothing you have said above seems like a reason not to). One caveat to that is if you’re planning on using any of the numerous python dataviz frameworks then you’ll probably want to keep the data in a pandas dataframe instead since that’s what those frameworks expect.
|
# ? Oct 15, 2023 22:27 |
|
Many languages support the concept of a RangeMap, which you can insert Ranges (essentially, pairs of a start value and an end value) into and then look up single values in. Would that be useful for what you're trying to do?
|
# ? Oct 15, 2023 23:21 |
|
Is there a good book or manual that goes over data analysis and visualization using python? I’m not that good at reading the official documentation for Pandas and NumPy and stuff yet.
|
# ? Oct 18, 2023 15:08 |
|
I find that Panda's Getting Started and User Guides are good starting points depending on the kind of data analysis task you are trying to perform. The "Getting Started" stuff is much more fundamental like how do you load in a CSV, run summary statistics, and draw a plot? The User Guide covers more specific stuff like what if your data is categorical and not numerical, how can you encode it, etc. What kind of walkthrough are you looking for specifically?
|
# ? Oct 18, 2023 16:41 |
|
Macichne Leainig posted:I find that Panda's Getting Started and User Guides are good starting points depending on the kind of data analysis task you are trying to perform. Thanks for the suggestions! Right now I'm looking for help dealing with creating scatterplots using PyPlot (for instance, creating a plot that shows 'UV Index vs. Population Density in World's 20 Largest Cities' with the dots proportionate to city population), and handling dataframes using Pandas, including creating data columns, creating numeric variable from column data, etc...
|
# ? Oct 18, 2023 19:47 |
|
BUUNNI posted:Thanks for the suggestions! Right now I'm looking for help dealing with creating scatterplots using PyPlot (for instance, creating a plot that shows 'UV Index vs. Population Density in World's 20 Largest Cities' with the dots proportionate to city population), and handling dataframes using Pandas, including creating data columns, creating numeric variable from column data, etc... oh hey, you're me back in January. here's what got me started: https://youtube.com/playlist?list=PL9oKUrtC4VP7ry0um1QOUUfJBXKnkf-dA&si=w5PNghLnrayuJC0R that channel also has a numpy playlist. I can't claim to have really absorbed it all, but watching them a couple times helped a lot
|
# ? Oct 18, 2023 20:04 |
|
BUUNNI posted:Is there a good book or manual that goes over data analysis and visualization using python? I’m not that good at reading the official documentation for Pandas and NumPy and stuff yet. https://store.metasnake.com/effective-pandas-book is the best book on Pandas I know of.
|
# ? Oct 18, 2023 20:12 |
|
Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/
|
# ? Oct 18, 2023 20:53 |
|
Zugzwang posted:Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/ It isn't as good though
|
# ? Oct 18, 2023 21:04 |
|
BUUNNI posted:Thanks for the suggestions! Right now I'm looking for help dealing with creating scatterplots using PyPlot (for instance, creating a plot that shows 'UV Index vs. Population Density in World's 20 Largest Cities' with the dots proportionate to city population), and handling dataframes using Pandas, including creating data columns, creating numeric variable from column data, etc... I've found that Plotly makes better graphs more easily compared to Matplotlib.
|
# ? Oct 19, 2023 00:11 |
|
Chin Strap posted:It isn't as good though StumblyWumbly posted:I've found that Plotly makes better graphs more easily compared to Matplotlib. Matplotlib's strength is its weakness; it can do everything. But its API can be cumbersome because of that. I like Seaborn for making the most common types of plots. It's nice because it's a wrapper around Matplotlib, so you can still add extra fancy customizations to its Figure objects if you want to.
|
# ? Oct 19, 2023 01:35 |
|
Speaking of plots and datavis/etc, I'm curious how people would approach this problem: I've been playing around with Advent of Code stuff, which has a lot of problems involving 2d maps, often on cartesian planes (so X/Y can be negative) - I've struggled quite a bit with storing/handling these. Here's one example (note that probably storing the actual full grid is not the right solution for this puzzle, but it's indicative of the style of problem): https://adventofcode.com/2022/day/15 - I'm trying to get some stuff in order for this year's advent. I've used numpy ndarray, forcing the indexes to work by offsetting things, but that was super weird to work with and my code became very hard to read. I've recently moved over to using a pandas dataframe with a custom index ( as seen here: https://stackoverflow.com/questions/53494616/how-to-create-a-matrix-with-negative-index-position) but that has it's own weirdnesses. You might say 'well just store the points in a list!' but there's a fair number of these problems where you're doing neighbor lookups regularly, and having the full grid populated is actually very useful; not to mention debugging/checking your work becomes a lot easier when you have a fully populated grid. I actually threw together my own ndarray -> image function using PIL to turn an ndarray into a graphical representation based on the integer value, but I'm increasingly thinking I'm doing it wrong. So the question: What's the correct (or at least painless) way to store (and ideally visualize) a 2d cartesian grid where each coordinate can be an arbitrary data type? I'm not looking for code golf style solutions, as I'm using these problems as ways to explore concepts/etc, and so it's more useful to have something that I can dig through and visualize/debug/interact with than a perfect impenetrable one-line solution.
|
# ? Oct 19, 2023 08:46 |
|
https://docs.xarray.dev/en/latest/internals/how-to-create-custom-index.html maybe?
|
# ? Oct 19, 2023 09:02 |
|
Zugzwang posted:I might check it out, thanks. ChatGPT with GPT4 does decent with very tailored matplotlib made over several prompts if you end up in that edge case where seaborn and plotly don’t let you customize quite how you want. I’ve done this when making specifically shaped and annotate histograms for example.
|
# ? Oct 19, 2023 12:10 |
|
Falcon2001 posted:Speaking of plots and datavis/etc, I'm curious how people would approach this problem: Have you tried using DataFrame.iloc calls with Pandas? That can let you have weirdo indices and also check neighbors
|
# ? Oct 19, 2023 15:06 |
|
|
# ? May 14, 2024 03:22 |
|
Falcon2001 posted:So the question: What's the correct (or at least painless) way to store (and ideally visualize) a 2d cartesian grid where each coordinate can be an arbitrary data type? I'm not looking for code golf style solutions, as I'm using these problems as ways to explore concepts/etc, and so it's more useful to have something that I can dig through and visualize/debug/interact with than a perfect impenetrable one-line solution. a dict or defaultdict with a tuple of ints as the key type would be my first try eta: not great for visualization though
|
# ? Oct 19, 2023 15:23 |