Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Macichne Leainig
Jul 26, 2012

by VG

Falcon2001 posted:

Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything.

My opinion is a bit of a contrary one and probably a bad one to boot, but just stick with Excel because it's one of the few competent tabular data editors and that's probably what everyone else will be using.

Adbot
ADBOT LOVES YOU

The Fool
Oct 16, 2003


Unless you have anything that looks like a date string in your cells.

I used an open source csv editor once a long time ago, it did not have a smaller footprint than excel, and was worse in almost every way.

I feel like if excel is not going to work, the best option is to load it into a dict or a dataframe, do what you need to do in a jupyter notebook, then generate a new csv.

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

Falcon2001 posted:

Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything.

VSCode and PyCharm both have plugins that make csv editing less painful

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

DoctorTristan posted:

VSCode and PyCharm both have plugins that make csv editing less painful

Maybe this is the best answer, I'm not working with massive csvs or anything right now.

Macichne Leainig
Jul 26, 2012

by VG

The Fool posted:

Unless you have anything that looks like a date string in your cells.

To be that guy... :spergin:

If you are using Excel in any serious capacity, you should already know about this one critical weakness of it.

Of course it's still going to bite you in the rear end anyway

lazerwolf
Dec 22, 2009

Orange and Black
If you work in Genomics for the love of all that is good in the world, stay away from Excel

TasogareNoKagi
Jul 11, 2013

Macichne Leainig posted:

To be that guy... :spergin:

If you are using Excel in any serious capacity, you should already know about this one critical weakness of it.

Of course it's still going to bite you in the rear end anyway

I still don't know how my data had "17:04 AM" in a timestamp, but I'm blaming Excel.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
They're making it so you can turn off that "feature" pretty soon.

No, Excel, I did not mean "October 1, 1950" when I entered "10-50"

Generic Monk
Oct 31, 2011

Falcon2001 posted:

Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything.

https://www.moderncsv.com/

i generally just use excel tho

rowkey bilbao
Jul 24, 2023

Foxfire_ posted:

Does human editable mean 'editable by a programmer' (use something standard like everyone else has said) or 'editable by Bob from Marketing who needs explicit very friendly error messages on typos'?

Users are people with some level of familiarity editing yaml files, which we're going to provide with tools to automate away some of their work.

I'm going to get away with my favorite solution which is 'not doing anything about this just yet" though.

His Divine Shadow
Aug 7, 2000

I'm not a fascist. I'm a priest. Fascists dress up in black and tell people what to do.
Apparently the python library oauth2 is actually an oauth v1 library. Well that was a frustrating hour wasted. Only found out due to a stack overflow post.

Just felt like writing something out.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

His Divine Shadow posted:

Apparently the python library oauth2 is actually an oauth v1 library. Well that was a frustrating hour wasted. Only found out due to a stack overflow post.

Just felt like writing something out.
OAuth2 and OIDC are surprisingly simple to implement, and JWT-handling libraries like python-jose make it a lot harder to do unsafe things with JWTs. I recently did async client and server implementations of the auth code and device authorization flows as a toy project. I found py_simple_openid_connect to be a useful reference for my own implementation, which took about a day.

pyoidc is a solid implementation if you're looking for a library rather than a full batteries-included framework for Django, FastAPI, etc.

His Divine Shadow
Aug 7, 2000

I'm not a fascist. I'm a priest. Fascists dress up in black and tell people what to do.
The real PITA is dealing with the backend stuff on microsofts crappy azure environment. But I'm not feeling kindly disposed towards oauth in general at the moment, feels like total overkill for my purposes.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

His Divine Shadow posted:

The real PITA is dealing with the backend stuff on microsofts crappy azure environment. But I'm not feeling kindly disposed towards oauth in general at the moment, feels like total overkill for my purposes.
If you need to run your own IdP for a webapp, and you're not federating data access with external apps over some kind of API, then OAuth and JWTs are probably overkill vs. session cookies and a user database. OAuth2 only really removes complexity if you get to avoid handling logins at all, either because you've outsourced everything to one external IdP or you have a library like authlib that makes it easy to integrate multiple external IdPs. If for some reason I had to keep my own user store, but also integrate with external IdPs, that's the point where I'd try to implement my local login flow as an OIDC provider, because it at least keeps everything along the same flow.

BUUNNI
Jun 23, 2023

by Pragmatica
I was curious to see how different people would solve this problem that I received in one of my exams recently...

quote:

A random walk is a time series where the next value of the variable is equal to the previous value of the variable plus a random number with mean 0. Generate a Normally distributed random walk with a starting value of 0 as a Python list. A stopping time is a condition under which a time series stops generating new values. Make your random walk stop generating new values when its absolute value reaches three.

Macichne Leainig
Jul 26, 2012

by VG
I'm guessing numpy isn't allowed, because I think you could just do np.random.normal(0, 1) and otherwise the trick would be to just use Python's negative array indexers to add it to the previous value of the array, right?

And the final check is a real simple if absolute value == 3 :shrug:

StumblyWumbly
Sep 12, 2007

Batmanticore!
It's frustrating because it almost fits in a comprehension, but I think the end condition means it won't.

QuarkJets
Sep 8, 2008

Macichne Leainig posted:

I'm guessing numpy isn't allowed, because I think you could just do np.random.normal(0, 1) and otherwise the trick would be to just use Python's negative array indexers to add it to the previous value of the array, right?

And the final check is a real simple if absolute value == 3 :shrug:

I think you would be dealing with floating point values, so abs(x) >= 3, but yeah.

I would define a generator. It contains a while loop that computes a new random step from a Gaussian centered at "a" with "sigma" width and adds that value to the previous value, which I guess is probably initialized to "b=a" but could be anything. The loop yields each new value if its absolute value is less than input argument "threshold" otherwise the loop just ends.

This generator gets used in a list comprehension.

E: oh it's a stopping time, not a stopping distance. That's even easier then, the generator can just yield forever and the list comprehension is where you build in the time constraint (it isn't even an input to the walk, it's an external stopping condition). But then specifying that an absolute value is needed doesn't make sense. OP, did you quote the problem correctly?

QuarkJets fucked around with this message at 23:36 on Nov 10, 2023

BUUNNI
Jun 23, 2023

by Pragmatica

QuarkJets posted:


E: oh it's a stopping time, not a stopping distance. That's even easier then, the generator can just yield forever and the list comprehension is where you build in the time constraint (it isn't even an input to the walk, it's an external stopping condition). But then specifying that an absolute value is needed doesn't make sense. OP, did you quote the problem correctly?

Yes, that's exactly how it was stated in the exam.

by "generator" you mean just a range( ) function?

BUUNNI fucked around with this message at 01:01 on Nov 11, 2023

Son of Thunderbeast
Sep 21, 2002
Are there any restrictions on libraries, like could I just import random for the random number generation? Or are you supposed to write that too?

(Sorry, dunno what these kinds of tests are like, never taken a class)

BUUNNI
Jun 23, 2023

by Pragmatica
The instructor did not say that certain libraries are not allowed so I imagine it's cool :shrug:

FWIW I think like 90% of the class failed the exam lol

Macichne Leainig
Jul 26, 2012

by VG

BUUNNI posted:

Yes, that's exactly how it was stated in the exam.

by "generator" you mean just a range( ) function?

Generator functions in python can be identified by the "yield" keyword as a return instead of er, well, return. It's actually a perfect application for something like this because "yielding" lets the generator function retain stuff in memory, so you can just define your value as a variable inside the generator function and add to that every time you yield.

Something like this thanks to ChatGPT:

Python code:
import random

def random_walk_generator():
    value = 0
    while abs(value) < 3:
        step = random.gauss(0, 1)  # Generate a normally distributed random number with mean 0 and standard deviation 1
        value += step
        yield value

# Using the generator
for value in random_walk_generator():
    print(value)
Also TIL that python's inbuilt random also has a Gaussian distribution function. Neat!

Son of Thunderbeast
Sep 21, 2002
oh drat, nice! I got this far before giving up, it kept giving me pretty even distributions and I had no idea how to make it into a normal distribution.

code:
import random
import time

stop_time = 3
random_walk = []
start_value = 0

start_time = cur_time = time.time()
elapsed = 0

while elapsed < stop_time:
    random_walk.append(start_value)
    start_value += random.randrange(-4, 4, step=1) # just for readability's sake
    cur_time = time.time()
    elapsed = cur_time - start_time
I also tried start_value += random.normalvariate(mu=0.0)

e: Looking at yours, I may have misunderstood the time/absolute value thing

Zoracle Zed
Jul 10, 2001
I believe stopping time in this context simply means the number of steps taken. obnoxious itertools solution:

Python code:
from itertools import takewhile, accumulate, count
from random import gauss

walk = takewhile(lambda x: abs(x) < 3, accumulate(gauss(0, 1) for _ in count()))
stopping_time = 1 + sum(1 for _ in walk )

Zoracle Zed fucked around with this message at 19:42 on Nov 11, 2023

QuarkJets
Sep 8, 2008

The walk distribution needs to be a list, so you could just take the length of that to get the number of steps

nullfunction
Jan 24, 2005

Nap Ghost
Generally I like the obnoxious itertools solution but as mentioned, yeah, doesn't quite meet the requirements laid out in the prompt:

quote:

Generate a Normally distributed random walk with a starting value of 0 as a Python list

I'd combine the techniques to just accumulate from a generator.

Python code:
from typing import Generator
from itertools import takewhile, accumulate
from random import gauss

def gauss_gen(mu: float = 0.0, sigma: float = 1.0) -> Generator[float, None, None]:
    while True:
        yield gauss(mu, sigma)

walk = list(takewhile(lambda x: abs(x) < 3, accumulate(gauss_gen(), initial=0.0)))
stopping_time = len(walk)

print(f"Stopped in {stopping_time} steps:")
print(f"{walk=}")
e: In no way am I saying the original prompt is well-written. I'd be pissed if I was taking the exam.

nullfunction fucked around with this message at 21:39 on Nov 11, 2023

BUUNNI
Jun 23, 2023

by Pragmatica

nullfunction posted:

Generally I like the obnoxious itertools solution but as mentioned, yeah, doesn't quite meet the requirements laid out in the prompt:

I'd combine the techniques to just accumulate from a generator.

Python code:
from typing import Generator
from itertools import takewhile, accumulate
from random import gauss

def gauss_gen(mu: float = 0.0, sigma: float = 1.0) -> Generator[float, None, None]:
    while True:
        yield gauss(mu, sigma)

walk = list(takewhile(lambda x: abs(x) < 3, accumulate(gauss_gen(), initial=0.0)))
stopping_time = len(walk)

print(f"Stopped in {stopping_time} steps:")
print(f"{walk=}")
e: In no way am I saying the original prompt is well-written. I'd be pissed if I was taking the exam.

I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is.

A big thank you to all who attempted this. Like I said most of us failed the exam and many of the other questions were worded in similarly confusing way, I think it's because the instructor is an economist and he seems to hate us :v:

I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird.

Python code:
from scipy.stats import norm
s = [0]
while abs(s[-1]) <= 3:
    s.append(s[-1]+norm.rvs())
print(s)

BUUNNI fucked around with this message at 23:38 on Nov 13, 2023

Macichne Leainig
Jul 26, 2012

by VG

BUUNNI posted:

I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird.

Python code:
from scipy.stats import norm
s = [0]
while abs(s[-1]) <= 3:
    s.append(s[-1]+norm.rvs())
print(s)

Not to be that guy, but isn't a normal random variate distribution also a Gaussian distribution?

BUUNNI
Jun 23, 2023

by Pragmatica

Macichne Leainig posted:

Not to be that guy, but isn't a normal random variate distribution also a Gaussian distribution?

I have no idea, I'm just a dumb grad student lol :shrug:

Macichne Leainig
Jul 26, 2012

by VG
I failed Calc 1 twice so maybe I should not be talking authoritatively about math in any manner lol

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

BUUNNI posted:

I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is.

It's worth noting this phenomenon, because this is a very important lesson for two reasons:

1. There's no such thing as 'the only way to do something', even in opinionated languages. Software design is a bit of an art form, and so you can fulfill the same requirement through a bunch of different ways. That being said...
2. Writing code is not the hard part of professional software development, requirements are the hard part. It can be jarring if you've only ever worked on personal projects or school projects that are highly structured, but in business, you'll often be dealing with people that deliver very vague requirements, and they might simply not have the context to understand what's missing. This is an incredibly important part of software development, and is one of the major skills to pick up as you progress in your career. You can tell two devs that you want X, and they might deliver two ENTIRELY DIFFERENT SOLUTIONS because they both are going to fill in the blanks on what you asked for based on their own judgment.

vikingstrike
Sep 23, 2007

whats happening, captain

BUUNNI posted:

I have no idea, I'm just a dumb grad student lol :shrug:

From Wikipedia:

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable.

BUUNNI
Jun 23, 2023

by Pragmatica

vikingstrike posted:

From Wikipedia:

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable.

Gotcha! We never used any Gaussian stats tools in any lessons so I know very little about it.

huhu
Feb 24, 2006
I'm writing Python to control a 2d Plotter and make art. I've been wanting to explore randomness more in depth. Currently, I've just been doing stuff like plotting a series of points with random() as in the code below. (Code might not actually work, was just trying to give a simplified idea of what I've been working on)

code:
path = []

x_location = 0
y_location = 0

while True:
  x_direction = 1 if random() <= 0.5 else -1
  y_direction = 1 if random () <= 0.5 else -1

  x_magnitude = randint(0,10)
  y_magnitude = randint(0,10)

  x_location += x_direction * x_magnitude
  y_location += y_direction * y_magnitude

  points.append((x_location, y_location))

  # at some point break

plotter.plot(points)

I'm wondering if there are more interesting explorations I could do than this very simplistic approach I have in my code?

QuarkJets
Sep 8, 2008

huhu posted:

I'm writing Python to control a 2d Plotter and make art. I've been wanting to explore randomness more in depth. Currently, I've just been doing stuff like plotting a series of points with random() as in the code below. (Code might not actually work, was just trying to give a simplified idea of what I've been working on)

code:
path = []

x_location = 0
y_location = 0

while True:
  x_direction = 1 if random() <= 0.5 else -1
  y_direction = 1 if random () <= 0.5 else -1

  x_magnitude = randint(0,10)
  y_magnitude = randint(0,10)

  x_location += x_direction * x_magnitude
  y_location += y_direction * y_magnitude

  points.append((x_location, y_location))

  # at some point break

plotter.plot(points)

I'm wondering if there are more interesting explorations I could do than this very simplistic approach I have in my code?

You could consider using numpy.random if you're going to generate a lot of points (like more than 10). I'd probably combine the magnitude and direction randomness, generate values from (-10 to 10) instead.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
One possibility is to pick a random direction and distance, instead of doing X and Y separately.

Another possibility is to draw Bezier curves with randomly generated control points.

You could bias your diffs in some direction - so that it looks generally random at the small scale, but slowly tracks across the plot when you look at it overall.

--

Really there are lots of fun things you can do with this.

QuarkJets
Sep 8, 2008

BUUNNI posted:

I've been copying and running all the code you guys have provided and it's interesting to see how different every answer is.

A big thank you to all who attempted this. Like I said most of us failed the exam and many of the other questions were worded in similarly confusing way, I think it's because the instructor is an economist and he seems to hate us :v:

I just noticed he gave us the answer that he wanted for this particular question and it doesn't seem to use any Gaussian stats tools...? Very weird.

Python code:
from scipy.stats import norm
s = [0]
while abs(s[-1]) <= 3:
    s.append(s[-1]+norm.rvs())
print(s)

This was my first solution:
Python code:
def random_walk(u=0, sigma=1, threshold=3):
    """Accumulate Gaussian steps until an absolute threshold is reached."""
    value = u
    yield value
    while abs(value) <= threshold:
        value += random.gauss(u, sigma)
        yield value

result = list(random_walk())
This is more what a computational scientist would write, the professor's is more what a data analyst would write. Both approaches are valid. This callable is more portable, resuable, and easier to read, but depending on the context those features may not matter.

All of those end-of-list accesses make me cringe a little but it's perfectly valid

random.gauss() is the same as scipy.status.norm.rvs()

Qtamo
Oct 7, 2012
This would probably fit in a more security-adjacent thread as well but since we're dealing with Python, I'll post here. This is a "I got handed this and uhhh, is this actually sane?" kind of a thing. I also might be using some terminology wrong, since I don't usually deal with hashing or crypto-adjacent stuff, so sorry in advance.

I can't go too deeply into specifics, but there's a small database table (thousands, not tens of thousands of rows) of non-unique string identifiers. This table needs to be modified during an ETL task so that each row remains, but the identifiers themselves are transformed into, well, something else that still will be the same for each unique string and be reversible at a much later date if need be - the original list of possible identifiers will be available from elsewhere, so it'd need to get matched to the transformed identifiers. So, if I've got a table of 3 Alices and 3 Bobs, I need 3 xyz's and 3 abc's that'll can be reverted or matched somehow to Alice and Bob. Before anyone gets their blood pressure up, these are not passwords or anything of the sorts, and an outside attacker getting the original list wouldn't be the end of the world, but it's still data that we'd prefer to keep safe.

The current implementation that's been done before me just hashes the strings using sha3-256 with a saltphrase in the python code that's used for the ETL task. The idea (presumably, the comments are kinda scarce) is that the original list of identifiers could be used together with the salt to match the hashes to the original strings - the salt itself is in our password manager and only gets called on task execution. Like I said, I don't usually deal with anything like this, so: is this actually sensible or safe? Due to reasons, an outsider that gains access to the data could probably guess the length and structure of the original identifier pretty easily, or possibly even gain the list of original identifiers. From my admittedly poor understanding these issues would make bruteforcing effective since it'd basically limit the character space by some amount.

If this is unsecure and/or insane and should be improved, what's the best way? From a bit of searching I initially thought of using the cryptography module, basically https://cryptography.io/en/latest/fernet/#using-passwords-with-fernet, but I'm not sure on account of never implementing anything like this :kiddo:

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
What are you actually worried about here?

Is there a concern that someone might get access to this database table but somehow not the rest of your database where the original list of identifiers is stored?

Adbot
ADBOT LOVES YOU

Qtamo
Oct 7, 2012
Knew I forgot something :doh: The original db tables stay in our internal environment, but the transformed data is a part of a larger dataset that gets sent to another party, and we can't guarantee that the data is safe there (well, can't really guarantee it in our own environment either but you get my point). Of course there's contract stipulations etc., but we'd rather keep things as safe as reasonably possible, since this other party doesn't need the original identifiers (but needs to know which have identical identifiers).

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply