Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
itertools is one of the great parts of python, only they have a very significant and good as hell portion of it which is not actually part of the library but is shown as "itertool recipes"

you want this one in it

code:
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Adbot
ADBOT LOVES YOU

SurgicalOntologist
Jun 17, 2004

cinci zoo sniper posted:

If your dataset has only 10000 records for test then this will be off by one error, maximum i is 9999 and can not be equal 10000, and the last modulo operation will resolve to not True.

It's not running off data--the generator will go forever without the break. That's why I'm confused: the second if statement must be True, or the loop would never stop.

After a few more runs though, it looks like it's not consistent behavior. It's hit 10,000 iterations 3 out of 10 times. Something strange going on.

bob dobbs is dead posted:

itertools is one of the great parts of python, only they have a very significant and good as hell portion of it which is not actually part of the library but is shown as "itertool recipes"

you want this one in it

code:
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Not a bad idea to factor out the grouping aspect. I already have toolz as a dependency, so I'll use partition_all. Good call. The implementation is pretty obtuse, but looks like toolz does it the same way.

SurgicalOntologist
Jun 17, 2004

Oh, figured it out. https://github.com/tqdm/tqdm/issues/613

KICK BAMA KICK
Mar 2, 2009

SurgicalOntologist posted:

And if you haven't seen it before, tqmd is just a progress bar library (a pretty great one)
Oh poo poo, thanks! I was just thinking I'd like to have exactly this for a thing but hadn't bothered to look for one yet.

cinci zoo sniper
Mar 15, 2013





Oh. :rip:

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

bob dobbs is dead posted:

itertools is one of the great parts of python, only they have a very significant and good as hell portion of it which is not actually part of the library but is shown as "itertool recipes"


This has always made me laugh.

"Let's write a bunch of useful and helpful code. Then lets take it and put it in the documentation instead of making it actually importable."

Dominoes
Sep 20, 2007

I'd be surprised if this is the only module attempting to correct that.

salisbury shake
Dec 27, 2011
I'm playing around with NumPy and Advent of Code. I'm working on vectorizing AoC Day 3's challenge.

Python code:
from typing import NamedTuple, List

import numpy as np

class Claim(NamedTuple):
    id: int
    x: int
    y: int
    width: int
    height: int

def get_fabric(claims: List[Claim], length: int = SIDE_LENGTH) -> np.ndarray:
    fabric = np.zeros((length, length), dtype=np.int8)
    increment = 1
    
    # add claims to fabric
    for claim in claims:
        x_indices = slice(claim.x, claim.x + claim.width)
        y_indices = slice(claim.y, claim.y + claim.height)
        indices = x_indices, y_indices
        
        np.add.at(fabric, indices, increment)
    
    return fabric

def count_overlapping(fabric: np.array) -> int:
    # count indices where value is > 1
    return len(np.where(fabric > 1)[0])
I'm just learning to use NumPy and had to poke around the docs to do what I want. While the above code finds the solution correctly, I'm unsure if I'm using NumPy efficiently or even canonically.

For example, in get_fabric(), I'm trying to minimize sequential Python code in favor of using NumPy's ufuncs and routines where possible, but I'm still calling np.add.at() over a thousand times in total. That feels wrong, but I don't have the context to really know if/why it's wrong.

salisbury shake fucked around with this message at 06:30 on Dec 7, 2018

breaks
May 12, 2001

Thermopyle posted:

This has always made me laugh.

"Let's write a bunch of useful and helpful code. Then lets take it and put it in the documentation instead of making it actually importable."

Brought to you by the makers of dataclasses, asyncio, mypy, and assignment expressions.

Smart people making reasonable decisions leads to astounding variety of useful and also very dumb poo poo.

Nippashish
Nov 2, 2005

Let me see you dance!

salisbury shake posted:

I'm just learning to use NumPy and had to poke around the docs to do what I want. While the above code finds the solution correctly, I'm unsure if I'm using NumPy efficiently or even canonically.

For example, in get_fabric(), I'm trying to minimize sequential Python code in favor of using NumPy's ufuncs and routines where possible, but I'm still calling np.add.at() over a thousand times in total. That feels wrong, but I don't have the context to really know if/why it's wrong.

The problem is that the pattern of indices you need to accumulate into is a bunch of irregularly scattered rectangles. There is good syntax for working with single rectangles of locations, or with irregularly scattered single locations, but irregularly scattered rectangles is a case that is not handled well.

If you build all of the indexes explicitly (e.g using np.meshgrid or np.mgrid) and concatenate them together then you could probably get away to a single call to np.add.at (by changing the problem from indexing at an irregular collection of rectangles to indexing at an irregular collection of points), but this doesn't solve the real problem which is that you need to loop over all of the claims to collect the indices in the first place. It might be worth trying as an exercise to learn the tools, but it won't be any more efficient and will probably be harder to understand.

For count_overlapping it would be more idiomatic to write np.sum(fabric > 1), but your way is fine too. It would also be more idiomatic to write fabric[indices] += increment instead of np.add.at(fabric, indices, increment) unless you are making use of the fact that np.add.at will increment repeated indices multiple times.

Nippashish fucked around with this message at 09:21 on Dec 7, 2018

Foxfire_
Nov 8, 2010

salisbury shake posted:

I'm just learning to use NumPy and had to poke around the docs to do what I want. While the above code finds the solution correctly, I'm unsure if I'm using NumPy efficiently or even canonically.

For example, in get_fabric(), I'm trying to minimize sequential Python code in favor of using NumPy's ufuncs and routines where possible, but I'm still calling np.add.at() over a thousand times in total. That feels wrong, but I don't have the context to really know if/why it's wrong.

Outside of a code-golf type fun challenge, there's not really a reason to vectorize this kind of thing. It will make it less readable, slower, and more memory hungry than a loop version.

The problem is naturally solvable and readable with a loop, and loops are what computers are good at. The problem with doing a naive loop is that Python is very, very slow.

To get your math to run at a reasonable speed, you need to get the computation out of Python. You can do that by using NumPy ufuncs so that the only bit of Python code that runs is invoking the ufunc a few times and having each ufunc [which is implemented in C] do lots of math. But that means you have to figure out a way to vectorize instead of doing it the natural way.

An alternative is to use numba. It provides a decorator that you can apply to your Python function that says "instead of executing this function in the Python interpreter, just-in-time compile it into normal instructions the first time it is used and automatically generate the boilerplate to pass data back and forth". It can only work with a subset of the Python language, but it's often good enough.

Taking your get_fabric function and massaging it a bit (I don't remember offhand if namedtuples are supported in numba):

code:
@numba.jit(nopython=True)
def get_fabric(claims, length):
    fabric = np.zeros((length, length), dtype=np.int8)
    
    # add claims to fabric
    for claim_row in range(claims.size[0]):
        claim = claims[claim_row, :]
        x = claim[0]
        y = claim[1]
        width = claim[2]
        height = claim[3]

        fabric[y:y+height, x:x+width] += 1
    
    return fabric
will be efficient.

Bonus thoughts for refining your solution:
- With your solution, what happens if you have a million elves on a 10x10 fabric?
- Can you avoid having to allocate the length x length array? i.e. can you make it work if the fabric is 10000000 x 10000000 as long as the claim list is short?

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I got another easy/dumb one.

Have df["Time"] with
code:
2 days
NULL
NULL
3 yrs
NULL
2 wks
1 yrs1 day
I want to convert these values to days where NULL = 0. I do this once a week on about 2k rows.

code:
temp = df["Time"].str.split()
table = []

for i in range(len(temp)):
    if temp[i][1] == "days":
        temp[i][0] = int(temp[i][0]) 
    elif pd.isnull(temp[i]):
        temp[i] = [0,0]
    elif temp[i][1] == "yrs":
        temp[i][0] = int(temp[i][0])*365
    elif temp[i][1] == "mths":
        temp[i][0] = int(telp[i][0])*30
    elif temp[i][1] == "wks":
        temp[i][0] = int(temp[i][0])*7
    elif temp[i][1] == "yrs1" and temp[i][2] == "days":
        temp[i][0] = int(temp[i][0])*365 + 1
    else:
        temp[i][0] = "some other regex"
    table.append([temp[i][0]])
    print([temp[i]])

df["TimeDays"] = pd.Series(table)
This throws the following when it gets to NULL:
4 for i in range(len(temp)):
----> 5 if temp[i][1] == "days":
6 temp[i][0] = int(temp[i][0])
7 elif pd.isnull(temp[i]):

TypeError: 'float' object is not subscriptable

But I think the real issue is the "isnull" part.

I think its telling me hey there is no "ith" element to look up...but there is right?

cinci zoo sniper
Mar 15, 2013




CarForumPoster posted:

I got another easy/dumb one.

Have df["Time"] with
code:
2 days
NULL
NULL
3 yrs
NULL
2 wks
1 yrs1 day
I want to convert these values to days where NULL = 0. I do this once a week on about 2k rows.

code:
temp = df["Time"].str.split()
table = []

for i in range(len(temp)):
    if temp[i][1] == "days":
        temp[i][0] = int(temp[i][0]) 
    elif pd.isnull(temp[i]):
        temp[i] = [0,0]
    elif temp[i][1] == "yrs":
        temp[i][0] = int(temp[i][0])*365
    elif temp[i][1] == "mths":
        temp[i][0] = int(telp[i][0])*30
    elif temp[i][1] == "wks":
        temp[i][0] = int(temp[i][0])*7
    elif temp[i][1] == "yrs1" and temp[i][2] == "days":
        temp[i][0] = int(temp[i][0])*365 + 1
    else:
        temp[i][0] = "some other regex"
    table.append([temp[i][0]])
    print([temp[i]])

df["TimeDays"] = pd.Series(table)
This throws the following when it gets to NULL:
4 for i in range(len(temp)):
----> 5 if temp[i][1] == "days":
6 temp[i][0] = int(temp[i][0])
7 elif pd.isnull(temp[i]):

TypeError: 'float' object is not subscriptable

But I think the real issue is the "isnull" part.

I think its telling me hey there is no "ith" element to look up...but there is right?

It tries to get 2nd element inside NULL, that temp[i] corresponds to. Also I would suggest to use dateparser for this.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

cinci zoo sniper posted:

It tries to get 2nd element inside NULL, that temp[i] corresponds to. Also I would suggest to use dateparser for this.

Good call. This works and should be more robust than my previous one:

code:
import dateparser

temp = df["Time"]
temp = temp.fillna("0 days")
temp = temp.str.replace("yrs","years ")
temp = temp.str.replace("mths","months ")
temp = temp.str.replace("wks","weeks ")
temp = temp.str.strip()
temp2 = temp.apply(lambda x: (datetime.today()-dateparser.parse(x)).days)

Jose Cuervo
Aug 25, 2004
I am trying to compute a kernel density estimate for the rate of incidents around the state of Virginia, similar to what is done here.

My previous attempts with using 'euclidean' as the metric produces this KDE plot:


I then realised that I should be using 'haversine' as the metric because I have a two dimensional vector space as described here. I have had to modify the code from the Python Data Science Handbook example because I am using geopandas to plot the state map instead of the matplotlib basemap, and my data is in a dataframe and not numpy arrays like the data seems to be in the linked example.

Here is my code:
Python code:
# Plot map of Virginia
fig, subplot = plt.subplots(figsize=(20, 10))
subplot.set_aspect('equal')
subplot.axis('off')
subplot.grid(False)    
va_gdf.plot(ax=subplot, facecolor='none', alpha=0.75, linewidth=1.5, edgecolor='#444444')
va_counties_gdf.plot(ax=subplot, facecolor='none', alpha=0.75, linewidth=1.5, edgecolor='#444444')

# [url]https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html[/url]  
# Peform the kernel density estimate   
kde = KernelDensity(bandwidth=.03, metric='haversine')  # Haversine requires latitude, longitude order
latlon = np.radians(np.vstack([df['Latitude'], df['Longitude']]).T)
kde.fit(latlon)

minx, miny, maxx, maxy = va_gdf.total_bounds
X, Y = np.mgrid[minx:maxx:100j, miny:maxy:100j]
xy = np.radians(np.vstack([X.ravel(), Y.ravel()]).T)
Z = np.exp(kde.score_samples(xy))
Z = Z.reshape(X.shape)

subplot.scatter(x=df['Longitude'], y=df['Latitude'], s=10, alpha=0.5)
subplot.imshow(Z, cmap=cmap_name, extent=[minx, maxx, miny, maxy], alpha=0.5)
plt.tight_layout()
plt.show() 
and this is the result


There are about 8000 points plotted on the map with some well defined clusters of points. Unfortunately the KDE plot does not seem to be picking up on them as it does in the linked example. I have tried playing with the bandwidth for the KDE but this does not seem to change things for the better - for example when the bandwidth is set to 0.3 this is the resulting KDE plot:


I was expecting the KDE using 'haversine' as the metric to be slightly different to the original KDE plot, but still somewhat similar. I think that the KDE plots I have now are incorrect but I cannot tell what I am doing wrong. Thoughts?

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
eucliean distance metric is fine unless you're using the map for navigation tho?

2-space is within the set of n-spaces, yes

what do the latitude and longitude data points look like?

Jose Cuervo
Aug 25, 2004

bob dobbs is dead posted:

eucliean distance metric is fine unless you're using the map for navigation tho?

2-space is within the set of n-spaces, yes

what do the latitude and longitude data points look like?

From the Python Data Science Handbook it seemed to say that you should use 'haversine' when performing KDE where the points are latitude and longitude - that is why I went from using 'euclidean' to 'haversine'. By 'fine' do you mean the error in the distances between points (because the distances will not be great circle distances but just straight line distances) is small enough to be ignored if you are not trying to navigate between points?

I am not sure what the '2-space is within the set of n-spaces, yes' comment means exactly.

The Longitude minimum and maximum are -83.6311 and -75.3771, while the Latitude maximum and minimum are 36.5454 and 39.4172. I believe they are in units of decimal degrees. Is this what you were asking?

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

Jose Cuervo posted:

From the Python Data Science Handbook it seemed to say that you should use 'haversine' when performing KDE where the points are latitude and longitude - that is why I went from using 'euclidean' to 'haversine'. By 'fine' do you mean the error in the distances between points (because the distances will not be great circle distances but just straight line distances) is small enough to be ignored if you are not trying to navigate between points?

I am not sure what the '2-space is within the set of n-spaces, yes' comment means exactly.

The Longitude minimum and maximum are -83.6311 and -75.3771, while the Latitude maximum and minimum are 36.5454 and 39.4172. I believe they are in units of decimal degrees. Is this what you were asking?

what i thought that you were thinking is that "oh, you can't use euclidean at all for this", but the real statement to make is "euclidean will introduce distortions in this. but if you're overlaying it over a flat projection in the first place, the distortions will basically look like the map distortions". the other possible misunderstanding is just because euclidean distance is viable for any dimensional space (n-spaces) you think you can't just use it for 2-dimensional space (2-space)

also, the underlaying mistake is prolly you need it radians (use this one https://docs.scipy.org/doc/numpy-1.3.x/reference/generated/numpy.deg2rad.html or https://docs.python.org/2/library/math.html#math.radians)

bob dobbs is dead fucked around with this message at 20:33 on Dec 7, 2018

Jose Cuervo
Aug 25, 2004

bob dobbs is dead posted:

what i thought that you were thinking is that "oh, you can't use euclidean at all for this", but the real statement to make is "euclidean will introduce distortions in this. but if you're overlaying it over a flat projection in the first place, the distortions will basically look like the map distortions". the other possible misunderstanding is just because euclidean distance is viable for any dimensional space (n-spaces) you think you can't just use it for 2-dimensional space (2-space)

also, the underlaying mistake is prolly you need it radians

Thanks for the clarification regarding the 'euclidean' versus 'haversine' issue.

I am using radians though: in the code I posted I convert the latlon values to radians using numpy.radians() and I do the same for the sample points with the line
Python code:
xy = np.radians(np.vstack([X.ravel(), Y.ravel()]).T)
which is why I am so confused.

EDIT: But since it seems reasonable to use 'euclidean' I will just use that and not worry about why this is not working.
Edit2: Does posting on the forums count as duck debugging? I noticed that I had the sample points as (long, lat) pairs, not (lat, long) pairs, i.e. the line
Python code:
xy = np.radians(np.vstack([X.ravel(), Y.ravel()]).T)
should be
Python code:
xy = np.radians(np.vstack([Y.ravel(), X.ravel()]).T)
That was the issue.

Jose Cuervo fucked around with this message at 20:40 on Dec 7, 2018

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
here's to not actually lookin at your code lol

accipter
Sep 12, 2003

SurgicalOntologist posted:

I'm doing a bunch of computations and saving them to a database. The following function is supposed to be committing to the database every 100 results (based on the variable commit_interval), and stopping after 10,000 (the value of total). But I'm not getting 10,000 results on each backtest_set, instead I'm getting 9,900. Can anyone spot the error? Am I making a basic off-by-one -ish mistake or do I have some weird race condition? I can't figure it out. Here's the code.
Python code:
    results = []
    for i, result in enumerate(tqdm(
            backtest_set.run(),
            unit='entry', desc='Backtesting', total=total, unit_scale=True,
    ), 1):
        results.append(dict(backtest_id=backtest_set.id, fantasy_points_hundredths=result))
        if not i % commit_interval or i == total:
            db.execute(models.BacktestResult.__table__.insert(), results)
            db.commit()
            results = []
        if i == total:
            break
Any ideas?

The actual work is here is done inside the generator function backtest_set.run(). And if you haven't seen it before, tqmd is just a progress bar library (a pretty great one), as I'm using it just wraps an iterable. The only other things there are my models and sqlalchemy session db.

You are testing that i == total, but on the last iteration i == (total - 1).

SurgicalOntologist
Jun 17, 2004

Nope, if that were the case, I'd get 10,001 results not 9,900 (the iterator is infinite so the break statement is the only thing stopping it).

Note that I use the second argument to enumerate to start the count at 1, to prevent the issue you mention.

It turned out to be a known tqdm bug, a weird async interaction that occurs when the progress bar is killed with a break statement.

School of How
Jul 6, 2013

quite frankly I don't believe this talk about the market
Does anybody know of a good python library that has an implementation of "nth prime" that works in both python 2 and 3?

>>> from some_library import nth_prime
>>> nth_prime(999)
7907

You'd think this would be easy to find, but all I can find from googling are snippets on stackoverflow that only work in python 2.

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

School of How posted:

Does anybody know of a good python library that has an implementation of "nth prime" that works in both python 2 and 3?

>>> from some_library import nth_prime
>>> nth_prime(999)
7907

You'd think this would be easy to find, but all I can find from googling are snippets on stackoverflow that only work in python 2.

thats cuz your search keyword is "sieve of erastothenes" or "number sieve", not "nth prime"

its not precisely what you want but its real close

School of How
Jul 6, 2013

quite frankly I don't believe this talk about the market

bob dobbs is dead posted:

thats cuz your search keyword is "sieve of erastothenes" or "number sieve", not "nth prime"

its not precisely what you want but its real close

Thats a completely different algorithm. Sieve gives you primes below a certain number, thats different than nth prime...

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

School of How posted:

Thats a completely different algorithm. Sieve gives you primes below a certain number, thats different than nth prime...

cache the sieve results in a list? or are we talkin O(10e6) th prime here

mewbert
Jun 12, 2011
hatsuune miku wishes you

a happy and safe halloween

School of How posted:

Does anybody know of a good python library that has an implementation of "nth prime" that works in both python 2 and 3?

>>> from some_library import nth_prime
>>> nth_prime(999)
7907

You'd think this would be easy to find, but all I can find from googling are snippets on stackoverflow that only work in python 2.

The SymPy library has such a function.

Gothmog1065
May 14, 2009

QuarkJets posted:

Not using self means that you've defined a class attribute. This is valid, all instances of the class are accessing a shared version of the variable and that may be the behavior that you want. foo.num and bar.num will always be the same value in your code

Using self creates an instance attribute, meaning all instances of the classes are accessing their own version of the variable. Declaring "self.num = 0" in your constructor creates an instance attribute, and then foo.num and bar.num can be different values

Ah, okay. That actually explains the variable instance better. For some reason I thought if defined outside of the __init__, it was a class variable, while inside the __init__ it was an instance. That actually helps clear up a lot, thank you!

MISAKA
Dec 10, 2018
When you make a new class, what actually happens is that everything inside a class statement is run as a normal python function, and then adds all local variables that are there at the function end to the class dictionary. And there is absolutely nothing special about the self. syntax, you are just setting a property on an object. (that just happens to be your current instance)

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

^^^ :yeah: that's what you get when you leave a window open

Gothmog1065 posted:

Ah, okay. That actually explains the variable instance better. For some reason I thought if defined outside of the __init__, it was a class variable, while inside the __init__ it was an instance. That actually helps clear up a lot, thank you!

If it's not clear, Python's a dynamic language where you can just assign attributes and functions to objects whenever you like. So you can take any thing and go bitmap.butts = 101 or whatever you like - under the hood there's a local namespace with a dictionary of attributes and functions, and you can add and remove from that however you want

So when you define a class, you can set attributes on the class object itself - you're adding to the class's dictionary, so any instances can see that stuff in a higher scope. So that works as a class variable - instances can reference it, and you can just reference it as an attribute on the class itself if you like, MyClass.x

What you're doing in the __init__ constructor is taking the instance itself (passed in as a parameter called self by convention) and just assigning attributes to that object. So when you do self.x = 69 you're adding that attribute to that object's local dictionary, which only affects that instance. So that basically works as an instance variable. There's nothing special about it - you're just adding that property to that object that was passed in

The maybe weird thing is that you define all these functions in the class as def whatever(self, x, y), but when you call the method on an instance, you just do thing.whatever(x, y). Under the hood, it rewrites the call and does MyClass.whatever(thing, x, y) - it calls the function in the class object, and passes in the instance object, so you can mess with it in the body of the function. That's why they all have that self parameter - so they can reference and affect the actual instance. The language sugar takes care of rewriting those calls for you, but in the function itself you have to work with the instance parameter explicitly

The XKCD Larper
Mar 1, 2009

by Lowtax
A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way?

Dominoes
Sep 20, 2007

The XKCD Larper posted:

A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way?
Sounds like Java-style. Ignore.

QuarkJets
Sep 8, 2008

The XKCD Larper posted:

A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way?

Using setters/getters means being able to do something with those values before storing them or before giving them to the requester, which can be extremely useful. Really basic example say that you want a stored value describing an angle to always be in the range 0-360; if you use a setter then your user can pass in values outside of the range and your setter can just fix the input or raise an exception or whatever you want

But you shouldn't just create setters/getters for all of your attributes, you should create setters and/or getters when you have some reason to do so. People who insist on all of their attributes having getters/setters that basically do nothing are idiots who should go back to Java

Setting attributes directly is such a standard and normal part of Python that there's even a set of decorators (properties) that allows getters/setters to be syntactically identical to directly accessing attributes, e.g.
Python code:
foo.x = 5
can invoke the set_x() method if you use properties

MISAKA
Dec 10, 2018

The XKCD Larper posted:

A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way?

The reason people do it in java is because if you cant add extra logic on variable set/get if you are using a raw class variable, and if you decide to change from raw to getters/setters, you break code that used your class before. Thankfully, in python you can use descriptors to easily add special get/set logic and it will look the same as a raw access, and wont break already written code.

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
dont learn oop from java or python, learn it from smalltalk or ruby

Chopstick Dystopia
Jun 16, 2010


lowest high and highest low loser of: WEED WEE
k

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

its worthwhile if you actually read what it was originally for and its history and understand it

it hasn't proven to be a magic bullet. it's not a synonym for modular or structured programming

Dominoes
Sep 20, 2007

It may help to think of Classes in python as either types you can make, or bags of related data with associated functions. Check out dataclasses, and see how far you can get with them alone.

keyframe
Sep 15, 2007

I have seen things
I am getting my feet wet with web scraping. Is there a reason to use beautifulsoup over selenium? I find selenium much easier to work with but I am doing super basic stuff right now while learning. Was just curious if BS brings something to the table selenium doesn't have.

Adbot
ADBOT LOVES YOU

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

keyframe posted:

I am getting my feet wet with web scraping. Is there a reason to use beautifulsoup over selenium? I find selenium much easier to work with but I am doing super basic stuff right now while learning. Was just curious if BS brings something to the table selenium doesn't have.

beautifulsoup is better for having a html doc (doesnt even have to be from the web) and doing arcane queries on it
selenium is better for doing arcane sophisticated things in a ui or some poo poo

so if you have to trigger weirdo browser events, you gotta use selenium. if you gotta get the second href that contains the word "bob dobbs" in the inner html but which don't also have this other id and yadda yadda, you're better off using the beautifulsoup capabilities more. you can use both, you can also use neither

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply