|
Falcon2001 posted:Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything. My opinion is a bit of a contrary one and probably a bad one to boot, but just stick with Excel because it's one of the few competent tabular data editors and that's probably what everyone else will be using.
|
# ? Nov 7, 2023 18:06 |
|
|
# ? May 15, 2024 04:44 |
|
Unless you have anything that looks like a date string in your cells. I used an open source csv editor once a long time ago, it did not have a smaller footprint than excel, and was worse in almost every way. I feel like if excel is not going to work, the best option is to load it into a dict or a dataframe, do what you need to do in a jupyter notebook, then generate a new csv.
|
# ? Nov 7, 2023 18:13 |
|
Falcon2001 posted:Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything. VSCode and PyCharm both have plugins that make csv editing less painful
|
# ? Nov 7, 2023 18:40 |
|
DoctorTristan posted:VSCode and PyCharm both have plugins that make csv editing less painful Maybe this is the best answer, I'm not working with massive csvs or anything right now.
|
# ? Nov 7, 2023 20:20 |
|
The Fool posted:Unless you have anything that looks like a date string in your cells. To be that guy... If you are using Excel in any serious capacity, you should already know about this one critical weakness of it. Of course it's still going to bite you in the rear end anyway
|
# ? Nov 7, 2023 20:28 |
|
If you work in Genomics for the love of all that is good in the world, stay away from Excel
|
# ? Nov 7, 2023 22:35 |
|
Macichne Leainig posted:To be that guy... I still don't know how my data had "17:04 AM" in a timestamp, but I'm blaming Excel.
|
# ? Nov 8, 2023 00:40 |
|
They're making it so you can turn off that "feature" pretty soon. No, Excel, I did not mean "October 1, 1950" when I entered "10-50"
|
# ? Nov 8, 2023 00:46 |
|
Falcon2001 posted:Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything. https://www.moderncsv.com/ i generally just use excel tho
|
# ? Nov 8, 2023 07:29 |
|
Foxfire_ posted:Does human editable mean 'editable by a programmer' (use something standard like everyone else has said) or 'editable by Bob from Marketing who needs explicit very friendly error messages on typos'? Users are people with some level of familiarity editing yaml files, which we're going to provide with tools to automate away some of their work. I'm going to get away with my favorite solution which is 'not doing anything about this just yet" though.
|
# ? Nov 8, 2023 19:33 |
|
Apparently the python library oauth2 is actually an oauth v1 library. Well that was a frustrating hour wasted. Only found out due to a stack overflow post. Just felt like writing something out.
|
# ? Nov 9, 2023 12:20 |
|
His Divine Shadow posted:Apparently the python library oauth2 is actually an oauth v1 library. Well that was a frustrating hour wasted. Only found out due to a stack overflow post. pyoidc is a solid implementation if you're looking for a library rather than a full batteries-included framework for Django, FastAPI, etc.
|
# ? Nov 9, 2023 16:10 |
|
The real PITA is dealing with the backend stuff on microsofts crappy azure environment. But I'm not feeling kindly disposed towards oauth in general at the moment, feels like total overkill for my purposes.
|
# ? Nov 9, 2023 16:39 |
|
His Divine Shadow posted:The real PITA is dealing with the backend stuff on microsofts crappy azure environment. But I'm not feeling kindly disposed towards oauth in general at the moment, feels like total overkill for my purposes.
|
# ? Nov 9, 2023 17:08 |
|
I was curious to see how different people would solve this problem that I received in one of my exams recently... quote:A random walk is a time series where the next value of the variable is equal to the previous value of the variable plus a random number with mean 0. Generate a Normally distributed random walk with a starting value of 0 as a Python list. A stopping time is a condition under which a time series stops generating new values. Make your random walk stop generating new values when its absolute value reaches three.
|
# ? Nov 10, 2023 19:35 |
|
I'm guessing numpy isn't allowed, because I think you could just do np.random.normal(0, 1) and otherwise the trick would be to just use Python's negative array indexers to add it to the previous value of the array, right? And the final check is a real simple if absolute value == 3
|
# ? Nov 10, 2023 21:45 |
|
It's frustrating because it almost fits in a comprehension, but I think the end condition means it won't.
|
# ? Nov 10, 2023 23:00 |
|
Macichne Leainig posted:I'm guessing numpy isn't allowed, because I think you could just do np.random.normal(0, 1) and otherwise the trick would be to just use Python's negative array indexers to add it to the previous value of the array, right? I think you would be dealing with floating point values, so abs(x) >= 3, but yeah. I would define a generator. It contains a while loop that computes a new random step from a Gaussian centered at "a" with "sigma" width and adds that value to the previous value, which I guess is probably initialized to "b=a" but could be anything. The loop yields each new value if its absolute value is less than input argument "threshold" otherwise the loop just ends. This generator gets used in a list comprehension. E: oh it's a stopping time, not a stopping distance. That's even easier then, the generator can just yield forever and the list comprehension is where you build in the time constraint (it isn't even an input to the walk, it's an external stopping condition). But then specifying that an absolute value is needed doesn't make sense. OP, did you quote the problem correctly? QuarkJets fucked around with this message at 23:36 on Nov 10, 2023 |
# ? Nov 10, 2023 23:24 |
|
QuarkJets posted:
Yes, that's exactly how it was stated in the exam. by "generator" you mean just a range( ) function? BUUNNI fucked around with this message at 01:01 on Nov 11, 2023 |
# ? Nov 11, 2023 00:43 |
|
Are there any restrictions on libraries, like could I just import random for the random number generation? Or are you supposed to write that too? (Sorry, dunno what these kinds of tests are like, never taken a class)
|
# ? Nov 11, 2023 01:06 |
|
The instructor did not say that certain libraries are not allowed so I imagine it's cool FWIW I think like 90% of the class failed the exam lol
|
# ? Nov 11, 2023 01:19 |
|
BUUNNI posted:Yes, that's exactly how it was stated in the exam. Generator functions in python can be identified by the "yield" keyword as a return instead of er, well, return. It's actually a perfect application for something like this because "yielding" lets the generator function retain stuff in memory, so you can just define your value as a variable inside the generator function and add to that every time you yield. Something like this thanks to ChatGPT: Python code:
|
# ? Nov 11, 2023 01:36 |
|
oh drat, nice! I got this far before giving up, it kept giving me pretty even distributions and I had no idea how to make it into a normal distribution.code:
e: Looking at yours, I may have misunderstood the time/absolute value thing
|
# ? Nov 11, 2023 02:11 |
|
I believe stopping time in this context simply means the number of steps taken. obnoxious itertools solution:Python code:
Zoracle Zed fucked around with this message at 19:42 on Nov 11, 2023 |
# ? Nov 11, 2023 19:39 |
|
The walk distribution needs to be a list, so you could just take the length of that to get the number of steps
|
# ? Nov 11, 2023 21:25 |
|
Generally I like the obnoxious itertools solution but as mentioned, yeah, doesn't quite meet the requirements laid out in the prompt:quote:Generate a Normally distributed random walk with a starting value of 0 as a Python list I'd combine the techniques to just accumulate from a generator. Python code:
nullfunction fucked around with this message at 21:39 on Nov 11, 2023 |
# ? Nov 11, 2023 21:37 |
|
nullfunction posted:Generally I like the obnoxious itertools solution but as mentioned, yeah, doesn't quite meet the requirements laid out in the prompt: I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is. A big thank you to all who attempted this. Like I said most of us failed the exam and many of the other questions were worded in similarly confusing way, I think it's because the instructor is an economist and he seems to hate us I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird. Python code:
BUUNNI fucked around with this message at 23:38 on Nov 13, 2023 |
# ? Nov 13, 2023 23:31 |
|
BUUNNI posted:I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird. Not to be that guy, but isn't a normal random variate distribution also a Gaussian distribution?
|
# ? Nov 13, 2023 23:43 |
|
Macichne Leainig posted:Not to be that guy, but isn't a normal random variate distribution also a Gaussian distribution? I have no idea, I'm just a dumb grad student lol
|
# ? Nov 13, 2023 23:46 |
|
I failed Calc 1 twice so maybe I should not be talking authoritatively about math in any manner lol
|
# ? Nov 13, 2023 23:55 |
|
BUUNNI posted:I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is. It's worth noting this phenomenon, because this is a very important lesson for two reasons: 1. There's no such thing as 'the only way to do something', even in opinionated languages. Software design is a bit of an art form, and so you can fulfill the same requirement through a bunch of different ways. That being said... 2. Writing code is not the hard part of professional software development, requirements are the hard part. It can be jarring if you've only ever worked on personal projects or school projects that are highly structured, but in business, you'll often be dealing with people that deliver very vague requirements, and they might simply not have the context to understand what's missing. This is an incredibly important part of software development, and is one of the major skills to pick up as you progress in your career. You can tell two devs that you want X, and they might deliver two ENTIRELY DIFFERENT SOLUTIONS because they both are going to fill in the blanks on what you asked for based on their own judgment.
|
# ? Nov 13, 2023 23:55 |
|
BUUNNI posted:I have no idea, I'm just a dumb grad student lol From Wikipedia: In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable.
|
# ? Nov 13, 2023 23:56 |
|
vikingstrike posted:From Wikipedia: Gotcha! We never used any Gaussian stats tools in any lessons so I know very little about it.
|
# ? Nov 14, 2023 00:13 |
|
I'm writing Python to control a 2d Plotter and make art. I've been wanting to explore randomness more in depth. Currently, I've just been doing stuff like plotting a series of points with random() as in the code below. (Code might not actually work, was just trying to give a simplified idea of what I've been working on)code:
|
# ? Nov 14, 2023 05:56 |
|
huhu posted:I'm writing Python to control a 2d Plotter and make art. I've been wanting to explore randomness more in depth. Currently, I've just been doing stuff like plotting a series of points with random() as in the code below. (Code might not actually work, was just trying to give a simplified idea of what I've been working on) You could consider using numpy.random if you're going to generate a lot of points (like more than 10). I'd probably combine the magnitude and direction randomness, generate values from (-10 to 10) instead.
|
# ? Nov 14, 2023 07:11 |
|
One possibility is to pick a random direction and distance, instead of doing X and Y separately. Another possibility is to draw Bezier curves with randomly generated control points. You could bias your diffs in some direction - so that it looks generally random at the small scale, but slowly tracks across the plot when you look at it overall. -- Really there are lots of fun things you can do with this.
|
# ? Nov 14, 2023 07:13 |
|
BUUNNI posted:I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is. This was my first solution: Python code:
All of those end-of-list accesses make me cringe a little but it's perfectly valid random.gauss() is the same as scipy.status.norm.rvs()
|
# ? Nov 14, 2023 07:34 |
|
This would probably fit in a more security-adjacent thread as well but since we're dealing with Python, I'll post here. This is a "I got handed this and uhhh, is this actually sane?" kind of a thing. I also might be using some terminology wrong, since I don't usually deal with hashing or crypto-adjacent stuff, so sorry in advance. I can't go too deeply into specifics, but there's a small database table (thousands, not tens of thousands of rows) of non-unique string identifiers. This table needs to be modified during an ETL task so that each row remains, but the identifiers themselves are transformed into, well, something else that still will be the same for each unique string and be reversible at a much later date if need be - the original list of possible identifiers will be available from elsewhere, so it'd need to get matched to the transformed identifiers. So, if I've got a table of 3 Alices and 3 Bobs, I need 3 xyz's and 3 abc's that'll can be reverted or matched somehow to Alice and Bob. Before anyone gets their blood pressure up, these are not passwords or anything of the sorts, and an outside attacker getting the original list wouldn't be the end of the world, but it's still data that we'd prefer to keep safe. The current implementation that's been done before me just hashes the strings using sha3-256 with a saltphrase in the python code that's used for the ETL task. The idea (presumably, the comments are kinda scarce) is that the original list of identifiers could be used together with the salt to match the hashes to the original strings - the salt itself is in our password manager and only gets called on task execution. Like I said, I don't usually deal with anything like this, so: is this actually sensible or safe? Due to reasons, an outsider that gains access to the data could probably guess the length and structure of the original identifier pretty easily, or possibly even gain the list of original identifiers. From my admittedly poor understanding these issues would make bruteforcing effective since it'd basically limit the character space by some amount. If this is unsecure and/or insane and should be improved, what's the best way? From a bit of searching I initially thought of using the cryptography module, basically https://cryptography.io/en/latest/fernet/#using-passwords-with-fernet, but I'm not sure on account of never implementing anything like this
|
# ? Nov 14, 2023 10:55 |
|
What are you actually worried about here? Is there a concern that someone might get access to this database table but somehow not the rest of your database where the original list of identifiers is stored?
|
# ? Nov 14, 2023 15:34 |
|
|
# ? May 15, 2024 04:44 |
|
Knew I forgot something The original db tables stay in our internal environment, but the transformed data is a part of a larger dataset that gets sent to another party, and we can't guarantee that the data is safe there (well, can't really guarantee it in our own environment either but you get my point). Of course there's contract stipulations etc., but we'd rather keep things as safe as reasonably possible, since this other party doesn't need the original identifiers (but needs to know which have identical identifiers).
|
# ? Nov 14, 2023 16:28 |