Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Macichne Leainig: Jul 26, 2012; by VG

Falcon2001 posted:

Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything.

My opinion is a bit of a contrary one and probably a bad one to boot, but just stick with Excel because it's one of the few competent tabular data editors and that's probably what everyone else will be using.

# ? Nov 7, 2023 18:06

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 04:44

The Fool: Oct 16, 2003

Unless you have anything that looks like a date string in your cells.

I used an open source csv editor once a long time ago, it did not have a smaller footprint than excel, and was worse in almost every way.

I feel like if excel is not going to work, the best option is to load it into a dict or a dataframe, do what you need to do in a jupyter notebook, then generate a new csv.

# ? Nov 7, 2023 18:13

DoctorTristan: Mar 11, 2006; I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

Falcon2001 posted:

Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything.

VSCode and PyCharm both have plugins that make csv editing less painful

# ? Nov 7, 2023 18:40

Falcon2001: Oct 10, 2004; Eat your hamburgers, Apollo.; Pillbug

DoctorTristan posted:

VSCode and PyCharm both have plugins that make csv editing less painful

Maybe this is the best answer, I'm not working with massive csvs or anything right now.

# ? Nov 7, 2023 20:20

Macichne Leainig: Jul 26, 2012; by VG

The Fool posted:

Unless you have anything that looks like a date string in your cells.

To be that guy... :spergin:

If you are using Excel in any serious capacity, you should already know about this one critical weakness of it.

Of course it's still going to bite you in the rear end anyway

# ? Nov 7, 2023 20:28

lazerwolf: Dec 22, 2009; Orange and Black

If you work in Genomics for the love of all that is good in the world, stay away from Excel

# ? Nov 7, 2023 22:35

TasogareNoKagi: Jul 11, 2013

Macichne Leainig posted:

To be that guy...

If you are using Excel in any serious capacity, you should already know about this one critical weakness of it.

Of course it's still going to bite you in the rear end anyway

I still don't know how my data had "17:04 AM" in a timestamp, but I'm blaming Excel.

# ? Nov 8, 2023 00:40

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

They're making it so you can turn off that "feature" pretty soon.

No, Excel, I did not mean "October 1, 1950" when I entered "10-50"

# ? Nov 8, 2023 00:46

Generic Monk: Oct 31, 2011

Falcon2001 posted:

Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything.

https://www.moderncsv.com/

i generally just use excel tho

# ? Nov 8, 2023 07:29

rowkey bilbao: Jul 24, 2023

Foxfire_ posted:

Does human editable mean 'editable by a programmer' (use something standard like everyone else has said) or 'editable by Bob from Marketing who needs explicit very friendly error messages on typos'?

Users are people with some level of familiarity editing yaml files, which we're going to provide with tools to automate away some of their work.

I'm going to get away with my favorite solution which is 'not doing anything about this just yet" though.

# ? Nov 8, 2023 19:33

His Divine Shadow: Aug 7, 2000; I'm not a fascist. I'm a priest. Fascists dress up in black and tell people what to do.

Apparently the python library oauth2 is actually an oauth v1 library. Well that was a frustrating hour wasted. Only found out due to a stack overflow post.

Just felt like writing something out.

# ? Nov 9, 2023 12:20

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

His Divine Shadow posted:

Apparently the python library oauth2 is actually an oauth v1 library. Well that was a frustrating hour wasted. Only found out due to a stack overflow post.

Just felt like writing something out.

OAuth2 and OIDC are surprisingly simple to implement, and JWT-handling libraries like python-jose make it a lot harder to do unsafe things with JWTs. I recently did async client and server implementations of the auth code and device authorization flows as a toy project. I found py_simple_openid_connect to be a useful reference for my own implementation, which took about a day.

pyoidc is a solid implementation if you're looking for a library rather than a full batteries-included framework for Django, FastAPI, etc.

# ? Nov 9, 2023 16:10

His Divine Shadow: Aug 7, 2000; I'm not a fascist. I'm a priest. Fascists dress up in black and tell people what to do.

The real PITA is dealing with the backend stuff on microsofts crappy azure environment. But I'm not feeling kindly disposed towards oauth in general at the moment, feels like total overkill for my purposes.

# ? Nov 9, 2023 16:39

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

His Divine Shadow posted:

The real PITA is dealing with the backend stuff on microsofts crappy azure environment. But I'm not feeling kindly disposed towards oauth in general at the moment, feels like total overkill for my purposes.

If you need to run your own IdP for a webapp, and you're not federating data access with external apps over some kind of API, then OAuth and JWTs are probably overkill vs. session cookies and a user database. OAuth2 only really removes complexity if you get to avoid handling logins at all, either because you've outsourced everything to one external IdP or you have a library like authlib that makes it easy to integrate multiple external IdPs. If for some reason I had to keep my own user store, but also integrate with external IdPs, that's the point where I'd try to implement my local login flow as an OIDC provider, because it at least keeps everything along the same flow.

# ? Nov 9, 2023 17:08

BUUNNI: Jun 23, 2023; by Pragmatica

I was curious to see how different people would solve this problem that I received in one of my exams recently...

quote:

A random walk is a time series where the next value of the variable is equal to the previous value of the variable plus a random number with mean 0. Generate a Normally distributed random walk with a starting value of 0 as a Python list. A stopping time is a condition under which a time series stops generating new values. Make your random walk stop generating new values when its absolute value reaches three.

# ? Nov 10, 2023 19:35

Macichne Leainig: Jul 26, 2012; by VG

I'm guessing numpy isn't allowed, because I think you could just do np.random.normal(0, 1) and otherwise the trick would be to just use Python's negative array indexers to add it to the previous value of the array, right?

And the final check is a real simple if absolute value == 3 :shrug:

# ? Nov 10, 2023 21:45

StumblyWumbly: Sep 12, 2007; Batmanticore!

It's frustrating because it almost fits in a comprehension, but I think the end condition means it won't.

# ? Nov 10, 2023 23:00

QuarkJets: Sep 8, 2008

Macichne Leainig posted:

I'm guessing numpy isn't allowed, because I think you could just do np.random.normal(0, 1) and otherwise the trick would be to just use Python's negative array indexers to add it to the previous value of the array, right?

And the final check is a real simple if absolute value == 3

I think you would be dealing with floating point values, so abs(x) >= 3, but yeah.

I would define a generator. It contains a while loop that computes a new random step from a Gaussian centered at "a" with "sigma" width and adds that value to the previous value, which I guess is probably initialized to "b=a" but could be anything. The loop yields each new value if its absolute value is less than input argument "threshold" otherwise the loop just ends.

This generator gets used in a list comprehension.

E: oh it's a stopping time, not a stopping distance. That's even easier then, the generator can just yield forever and the list comprehension is where you build in the time constraint (it isn't even an input to the walk, it's an external stopping condition). But then specifying that an absolute value is needed doesn't make sense. OP, did you quote the problem correctly?

QuarkJets fucked around with this message at 23:36 on Nov 10, 2023

# ? Nov 10, 2023 23:24

BUUNNI: Jun 23, 2023; by Pragmatica

QuarkJets posted:

E: oh it's a stopping time, not a stopping distance. That's even easier then, the generator can just yield forever and the list comprehension is where you build in the time constraint (it isn't even an input to the walk, it's an external stopping condition). But then specifying that an absolute value is needed doesn't make sense. OP, did you quote the problem correctly?

Yes, that's exactly how it was stated in the exam.

by "generator" you mean just a range( ) function?

BUUNNI fucked around with this message at 01:01 on Nov 11, 2023

# ? Nov 11, 2023 00:43

Son of Thunderbeast: Sep 21, 2002

Are there any restrictions on libraries, like could I just import random for the random number generation? Or are you supposed to write that too?

(Sorry, dunno what these kinds of tests are like, never taken a class)

# ? Nov 11, 2023 01:06

BUUNNI: Jun 23, 2023; by Pragmatica

The instructor did not say that certain libraries are not allowed so I imagine it's cool :shrug:

FWIW I think like 90% of the class failed the exam lol

# ? Nov 11, 2023 01:19

Macichne Leainig: Jul 26, 2012; by VG

BUUNNI posted:

Yes, that's exactly how it was stated in the exam.

by "generator" you mean just a range( ) function?

Generator functions in python can be identified by the "yield" keyword as a return instead of er, well, return. It's actually a perfect application for something like this because "yielding" lets the generator function retain stuff in memory, so you can just define your value as a variable inside the generator function and add to that every time you yield.

Something like this thanks to ChatGPT:

Python code:

import random

def random_walk_generator():
    value = 0
    while abs(value) < 3:
        step = random.gauss(0, 1)  # Generate a normally distributed random number with mean 0 and standard deviation 1
        value += step
        yield value

# Using the generator
for value in random_walk_generator():
    print(value)

Also TIL that python's inbuilt random also has a Gaussian distribution function. Neat!

# ? Nov 11, 2023 01:36

Son of Thunderbeast: Sep 21, 2002

oh drat, nice! I got this far before giving up, it kept giving me pretty even distributions and I had no idea how to make it into a normal distribution.

code:

import random
import time

stop_time = 3
random_walk = []
start_value = 0

start_time = cur_time = time.time()
elapsed = 0

while elapsed < stop_time:
    random_walk.append(start_value)
    start_value += random.randrange(-4, 4, step=1) # just for readability's sake
    cur_time = time.time()
    elapsed = cur_time - start_time

I also tried start_value += random.normalvariate(mu=0.0)

e: Looking at yours, I may have misunderstood the time/absolute value thing

# ? Nov 11, 2023 02:11

Zoracle Zed: Jul 10, 2001

I believe stopping time in this context simply means the number of steps taken. obnoxious itertools solution:

Python code:

from itertools import takewhile, accumulate, count
from random import gauss

walk = takewhile(lambda x: abs(x) < 3, accumulate(gauss(0, 1) for _ in count()))
stopping_time = 1 + sum(1 for _ in walk )

Zoracle Zed fucked around with this message at 19:42 on Nov 11, 2023

# ? Nov 11, 2023 19:39

QuarkJets: Sep 8, 2008

The walk distribution needs to be a list, so you could just take the length of that to get the number of steps

# ? Nov 11, 2023 21:25

nullfunction: Jan 24, 2005; Nap Ghost

Generally I like the obnoxious itertools solution but as mentioned, yeah, doesn't quite meet the requirements laid out in the prompt:

quote:

Generate a Normally distributed random walk with a starting value of 0 as a Python list

I'd combine the techniques to just accumulate from a generator.

Python code:

from typing import Generator
from itertools import takewhile, accumulate
from random import gauss

def gauss_gen(mu: float = 0.0, sigma: float = 1.0) -> Generator[float, None, None]:
    while True:
        yield gauss(mu, sigma)

walk = list(takewhile(lambda x: abs(x) < 3, accumulate(gauss_gen(), initial=0.0)))
stopping_time = len(walk)

print(f"Stopped in {stopping_time} steps:")
print(f"{walk=}")

e: In no way am I saying the original prompt is well-written. I'd be pissed if I was taking the exam.

nullfunction fucked around with this message at 21:39 on Nov 11, 2023

# ? Nov 11, 2023 21:37

BUUNNI: Jun 23, 2023; by Pragmatica

nullfunction posted:

Generally I like the obnoxious itertools solution but as mentioned, yeah, doesn't quite meet the requirements laid out in the prompt:

I'd combine the techniques to just accumulate from a generator.
Python code:
from typing import Generator
from itertools import takewhile, accumulate
from random import gauss

def gauss_gen(mu: float = 0.0, sigma: float = 1.0) -> Generator[float, None, None]:
    while True:
        yield gauss(mu, sigma)

walk = list(takewhile(lambda x: abs(x) < 3, accumulate(gauss_gen(), initial=0.0)))
stopping_time = len(walk)

print(f"Stopped in {stopping_time} steps:")
print(f"{walk=}")
e: In no way am I saying the original prompt is well-written. I'd be pissed if I was taking the exam.

I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is.

A big thank you to all who attempted this. Like I said most of us failed the exam and many of the other questions were worded in similarly confusing way, I think it's because the instructor is an economist and he seems to hate us :v:

I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird.

Python code:

from scipy.stats import norm
s = [0]
while abs(s[-1]) <= 3:
    s.append(s[-1]+norm.rvs())
print(s)

BUUNNI fucked around with this message at 23:38 on Nov 13, 2023

# ? Nov 13, 2023 23:31

Macichne Leainig: Jul 26, 2012; by VG

BUUNNI posted:

I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird.
Python code:
from scipy.stats import norm
s = [0]
while abs(s[-1]) <= 3:
    s.append(s[-1]+norm.rvs())
print(s)

Not to be that guy, but isn't a normal random variate distribution also a Gaussian distribution?

# ? Nov 13, 2023 23:43

BUUNNI: Jun 23, 2023; by Pragmatica

Macichne Leainig posted:

Not to be that guy, but isn't a normal random variate distribution also a Gaussian distribution?

I have no idea, I'm just a dumb grad student lol :shrug:

# ? Nov 13, 2023 23:46

Macichne Leainig: Jul 26, 2012; by VG

I failed Calc 1 twice so maybe I should not be talking authoritatively about math in any manner lol

# ? Nov 13, 2023 23:55

Falcon2001: Oct 10, 2004; Eat your hamburgers, Apollo.; Pillbug

BUUNNI posted:

I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is.

It's worth noting this phenomenon, because this is a very important lesson for two reasons:

1. There's no such thing as 'the only way to do something', even in opinionated languages. Software design is a bit of an art form, and so you can fulfill the same requirement through a bunch of different ways. That being said...
2. Writing code is not the hard part of professional software development, requirements are the hard part. It can be jarring if you've only ever worked on personal projects or school projects that are highly structured, but in business, you'll often be dealing with people that deliver very vague requirements, and they might simply not have the context to understand what's missing. This is an incredibly important part of software development, and is one of the major skills to pick up as you progress in your career. You can tell two devs that you want X, and they might deliver two ENTIRELY DIFFERENT SOLUTIONS because they both are going to fill in the blanks on what you asked for based on their own judgment.

# ? Nov 13, 2023 23:55

vikingstrike: Sep 23, 2007; whats happening, captain

BUUNNI posted:

I have no idea, I'm just a dumb grad student lol

From Wikipedia:

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable.

# ? Nov 13, 2023 23:56

BUUNNI: Jun 23, 2023; by Pragmatica

vikingstrike posted:

From Wikipedia:

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable.

Gotcha! We never used any Gaussian stats tools in any lessons so I know very little about it.

# ? Nov 14, 2023 00:13

huhu: Feb 24, 2006

I'm writing Python to control a 2d Plotter and make art. I've been wanting to explore randomness more in depth. Currently, I've just been doing stuff like plotting a series of points with random() as in the code below. (Code might not actually work, was just trying to give a simplified idea of what I've been working on)

code:

path = []

x_location = 0
y_location = 0

while True:
  x_direction = 1 if random() <= 0.5 else -1
  y_direction = 1 if random () <= 0.5 else -1

  x_magnitude = randint(0,10)
  y_magnitude = randint(0,10)

  x_location += x_direction * x_magnitude
  y_location += y_direction * y_magnitude

  points.append((x_location, y_location))

  # at some point break

plotter.plot(points)

I'm wondering if there are more interesting explorations I could do than this very simplistic approach I have in my code?

# ? Nov 14, 2023 05:56

QuarkJets: Sep 8, 2008

huhu posted:

I'm writing Python to control a 2d Plotter and make art. I've been wanting to explore randomness more in depth. Currently, I've just been doing stuff like plotting a series of points with random() as in the code below. (Code might not actually work, was just trying to give a simplified idea of what I've been working on)
code:
path = []

x_location = 0
y_location = 0

while True:
  x_direction = 1 if random() <= 0.5 else -1
  y_direction = 1 if random () <= 0.5 else -1

  x_magnitude = randint(0,10)
  y_magnitude = randint(0,10)

  x_location += x_direction * x_magnitude
  y_location += y_direction * y_magnitude

  points.append((x_location, y_location))

  # at some point break

plotter.plot(points)
I'm wondering if there are more interesting explorations I could do than this very simplistic approach I have in my code?

You could consider using numpy.random if you're going to generate a lot of points (like more than 10). I'd probably combine the magnitude and direction randomness, generate values from (-10 to 10) instead.

# ? Nov 14, 2023 07:11

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

One possibility is to pick a random direction and distance, instead of doing X and Y separately.

Another possibility is to draw Bezier curves with randomly generated control points.

You could bias your diffs in some direction - so that it looks generally random at the small scale, but slowly tracks across the plot when you look at it overall.

--

Really there are lots of fun things you can do with this.

# ? Nov 14, 2023 07:13

QuarkJets: Sep 8, 2008

BUUNNI posted:

I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is.

A big thank you to all who attempted this. Like I said most of us failed the exam and many of the other questions were worded in similarly confusing way, I think it's because the instructor is an economist and he seems to hate us

I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird.
Python code:
from scipy.stats import norm
s = [0]
while abs(s[-1]) <= 3:
    s.append(s[-1]+norm.rvs())
print(s)

This was my first solution:

Python code:

def random_walk(u=0, sigma=1, threshold=3):
    """Accumulate Gaussian steps until an absolute threshold is reached."""
    value = u
    yield value
    while abs(value) <= threshold:
        value += random.gauss(u, sigma)
        yield value

result = list(random_walk())

This is more what a computational scientist would write, the professor's is more what a data analyst would write. Both approaches are valid. This callable is more portable, resuable, and easier to read, but depending on the context those features may not matter.

All of those end-of-list accesses make me cringe a little but it's perfectly valid

random.gauss() is the same as scipy.status.norm.rvs()

# ? Nov 14, 2023 07:34

Qtamo: Oct 7, 2012

This would probably fit in a more security-adjacent thread as well but since we're dealing with Python, I'll post here. This is a "I got handed this and uhhh, is this actually sane?" kind of a thing. I also might be using some terminology wrong, since I don't usually deal with hashing or crypto-adjacent stuff, so sorry in advance.

I can't go too deeply into specifics, but there's a small database table (thousands, not tens of thousands of rows) of non-unique string identifiers. This table needs to be modified during an ETL task so that each row remains, but the identifiers themselves are transformed into, well, something else that still will be the same for each unique string and be reversible at a much later date if need be - the original list of possible identifiers will be available from elsewhere, so it'd need to get matched to the transformed identifiers. So, if I've got a table of 3 Alices and 3 Bobs, I need 3 xyz's and 3 abc's that'll can be reverted or matched somehow to Alice and Bob. Before anyone gets their blood pressure up, these are not passwords or anything of the sorts, and an outside attacker getting the original list wouldn't be the end of the world, but it's still data that we'd prefer to keep safe.

The current implementation that's been done before me just hashes the strings using sha3-256 with a saltphrase in the python code that's used for the ETL task. The idea (presumably, the comments are kinda scarce) is that the original list of identifiers could be used together with the salt to match the hashes to the original strings - the salt itself is in our password manager and only gets called on task execution. Like I said, I don't usually deal with anything like this, so: is this actually sensible or safe? Due to reasons, an outsider that gains access to the data could probably guess the length and structure of the original identifier pretty easily, or possibly even gain the list of original identifiers. From my admittedly poor understanding these issues would make bruteforcing effective since it'd basically limit the character space by some amount.

If this is unsecure and/or insane and should be improved, what's the best way? From a bit of searching I initially thought of using the cryptography module, basically https://cryptography.io/en/latest/fernet/#using-passwords-with-fernet, but I'm not sure on account of never implementing anything like this :kiddo:

# ? Nov 14, 2023 10:55

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

What are you actually worried about here?

Is there a concern that someone might get access to this database table but somehow not the rest of your database where the original list of identifiers is stored?

# ? Nov 14, 2023 15:34

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 04:44

Qtamo: Oct 7, 2012

Knew I forgot something :doh:

The original db tables stay in our internal environment, but the transformed data is a part of a larger dataset that gets sent to another party, and we can't guarantee that the data is safe there (well, can't really guarantee it in our own environment either but you get my point). Of course there's contract stipulations etc., but we'd rather keep things as safe as reasonably possible, since this other party doesn't need the original identifiers (but needs to know which have identical identifiers).

# ? Nov 14, 2023 16:28

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »