Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
i vomit kittens
Apr 25, 2019


It's just chaining a bunch of method calls together, I don't think that's exactly uncommon. Off the top of my head I do it all the time in SQLAlchemy:

Python code:

foos = Foo.query.filter(Foo.name == "bar").all()

That makes more sense to me than:

Python code:

foos = Foo.query.filter(Foo.name == "bar")
foos = foos.all()

I don't have a reason to save the return value of the filter() call because I'm just going to call all(), one(), or something else after it anyways. I would only save the intermediate return values back to the variable (or a separate one) if I actually needed it for something or if it helped readability.

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

duck monster posted:

Can someone tell me whats going on with this "fluent interface" poo poo.

https://github.com/kkroening/ffmpeg-python

Because it kind of makes me feel like bullying a javascript coder, and I'm not entirely sure why.

Is this actually a *thing* in python now? or is this dude just abusing import semantics? Cos it doesnt *look* pythonic, but I'm old and have bad knees and dont like kids on lawns or something.

As soon as I saw ffmpeg my brain went "OH GET READY TO LOOK AT SOME lovely CODE!" Like don't get me wrong ffmpeg is a masterpiece that basically the whole world relies on but it's actually pretty scary knowing that the whole world is resting on top of a bunch of rotting pizza boxes

A "fluent" interface is what i vomit kittens described, it basically just means that you can chain methods together. A bunch of string methods let you do this, by having a method return `self` you make it very easy to chain operations together and this results in slimmer, sexier code. Here's some code that makes use of string's fluent interface:

Python code:
"some_string".strip().title().ljust(10)
You probably saw this on the github page:
Python code:
import ffmpeg
(
    ffmpeg
    .input('input.mp4')
    .hflip()
    .output('output.mp4')
    .run()
)
You were right to be like "what the gently caress is this poo poo". Don't be fooled by the parens, they have nothing to do with the import statement this dork is just using them in order to have arbitrary amounts of indenting. What a jackass. That's the same as this
Python code:
import ffmpeg
ffmpeg.input('input.mp4').hflip().output('output.mp4').run()

QuarkJets fucked around with this message at 08:57 on Jul 26, 2022

lazerwolf
Dec 22, 2009

Orange and Black
I would argue the parens example is much more readable. Plus there is the added benefit of being able to comment out a chain easily

code:
(
    string 
    .func()
    .func2()
  # .func3()
    .func4()
)
I use this commonly chaining pandas methods together

Pyromancer
Apr 29, 2011

This man must look upon the fire, smell of it, warm his hands by it, stare into its heart

duck monster posted:

Can someone tell me whats going on with this "fluent interface" poo poo.

https://github.com/kkroening/ffmpeg-python

Because it kind of makes me feel like bullying a javascript coder, and I'm not entirely sure why.

Is this actually a *thing* in python now? or is this dude just abusing import semantics? Cos it doesnt *look* pythonic, but I'm old and have bad knees and dont like kids on lawns or something.

That's not exactly a JS invention, pretty sure it started in Java DSL frameworks like Spring
But you can do that in most object-oriented languages by making a class where all setter methods return "this" or "self" or whatever it is called in that language.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

lazerwolf posted:

I would argue the parens example is much more readable. Plus there is the added benefit of being able to comment out a chain easily

code:
(
    string 
    .func()
    .func2()
  # .func3()
    .func4()
)
I use this commonly chaining pandas methods together

This!

This is very common in ETL/data science/setting up for some ML training type workflows because you’ll do these types of chained operations a LOT.

duck monster
Dec 15, 2004

Yes I know what fluent is in the usual context of it just being a dumb word for "chainable methods". Those have been around in python since the 90s.

But putting them INSIDE an import statement, thats whats got my heckles up. Like, what the gently caress is it actually importing, or is it just an unpythonic hack to access some stuff in another library?

a foolish pianist
May 6, 2007

(bi)cyclic mutation

duck monster posted:

Yes I know what fluent is in the usual context of it just being a dumb word for "chainable methods". Those have been around in python since the 90s.

But putting them INSIDE an import statement, thats whats got my heckles up. Like, what the gently caress is it actually importing, or is it just an unpythonic hack to access some stuff in another library?

It’s not inside the import.

Wallet
Jun 19, 2006

QuarkJets posted:

You probably saw this on the github page:
Python code:
import ffmpeg
(
    ffmpeg
    .input('input.mp4')
    .hflip()
    .output('output.mp4')
    .run()
)

If you have your line length short enough this is more reasonable than the alternative wrapping.

In many cases it's the best way to format poo poo like that when chaining functions like that. E.g. this is readable, but trying to line wrap it any other way definitely isn't.
Python code:
    sq = (
        db.session.query(M.p_id)
        .with_entities(
            M.p_id,
            func.min(
                func.ST_Distance(M.fuzzed, active, use_spheroid=True)
            ).label("distance"),
        )
        .filter(M.p_id == P.id)
        .filter(M.fuzzed.isnot(None))
        .filter(M.complete == True)  # noqa
        .filter(P.complete == True)
        .filter(P.finished == True)
        .filter(P.visible == True)
        .group_by(M.p_id)
    )

Wallet fucked around with this message at 14:56 on Jul 27, 2022

Slimchandi
May 13, 2005
That finger on your temple is the barrel of my raygun

I watched quite a lot of these types of tutorials when I was learning about python development, and thought the same way. 'That pattern looks ugly, why do it that way?'. I'm still not entirely sure what 'inversion of control' is but I'm sure it will bite me on the rear end someday soon.

For me, the saying that encapsulates this type of experience is: 'Life gives the test first, and the lesson later'.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I'm trying to calculate the union of some time logs, then for times that are closer than a "gap time" merge them. The first thought implementation of this is pretty slow.

So getting the union is fairly easy. If unions is storing my times:
code:
unions = [(Timestamp('2022-07-26 09:32:19.100528'),
  Timestamp('2022-07-26 09:42:19.100528')),
 (Timestamp('2022-07-26 09:36:11.603177'),
  Timestamp('2022-07-26 09:46:11.603177')),
 (Timestamp('2022-07-26 09:51:31.793032'),
  Timestamp('2022-07-26 10:01:31.793032')),
 (Timestamp('2022-07-26 09:55:10.747128'),
  Timestamp('2022-07-26 10:05:10.747128')),
 (Timestamp('2022-07-26 09:58:37.018675'),
  Timestamp('2022-07-26 10:08:37.018675')),
 (Timestamp('2022-07-26 10:07:12.019496'),
  Timestamp('2022-07-26 10:17:12.019496')),
 (Timestamp('2022-07-26 10:15:13.668086'),
  Timestamp('2022-07-26 10:25:13.668086'))]
I can find the unions of those like so:
code:
def unionize(intervals):
    sorted_by_lower_bound = sorted(intervals, key=lambda tup: tup[0])
    merged = []

    for higher in sorted_by_lower_bound:
        if not merged:
            merged.append(higher)
        else:
            lower = merged[-1]
            # test for intersection between lower and higher:
            # we know via sorting that lower[0] <= higher[0]
            if higher[0] <= lower[1]:
                upper_bound = max(lower[1], higher[1])
                merged[-1] = (lower[0], upper_bound)  # replace by merged interval
            else:
                merged.append(higher)
    return merged


output = unionize(unions)
Now the output is a list of times that don't have a union:
code:
[(Timestamp('2022-07-26 09:32:19.100528'),
  Timestamp('2022-07-26 09:46:11.603177')),
 (Timestamp('2022-07-26 09:51:31.793032'),
  Timestamp('2022-07-26 10:31:22.827263')),
 (Timestamp('2022-07-26 10:31:38.523307'),
  Timestamp('2022-07-26 10:41:38.523307')),
 (Timestamp('2022-07-26 10:57:19.400866'),
  Timestamp('2022-07-26 11:10:19.591433')),
 (Timestamp('2022-07-26 11:37:19.066399'),
  Timestamp('2022-07-26 14:07:23.102825')),
 (Timestamp('2022-07-26 14:17:19.139540'),
  Timestamp('2022-07-26 15:45:06.070108')),
 (Timestamp('2022-07-26 15:52:19.856122'),
  Timestamp('2022-07-26 16:47:20.009336'))]
Now I need to find the ones that are within 10 minutes of each other and combine the ealier start and latest end for all the ones that have a gap under 10 minutes.

I can do this by checking whether output[n][1] is within 10 minutes of output[n+1][0] easily enough, then merge them if so. However I'd the need to check the resulting merged one against the next one in the loop...also that results in me modifying the list in the loop which is a no no.

I feel like I'm missing an obvious answer here that doesnt require iterating over the results multiple times until no more gaps are found. Thoughts?

EDIT: Here's a more graphical rep of the data.


So the output should be three rows because indexes 0, 1 and 2 should be merged, 3 should not be modified, 4, 5 and 6 should be merged.

CarForumPoster fucked around with this message at 14:35 on Jul 28, 2022

SirPablo
May 1, 2004

Pillbug
Could you just treat each start/end time as a data frame (or series?) and do an inner join based on the datetime index?

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?
You’re partitioning the set into its equivalence classes defined by the relation ‘the timestamps overlap’ - I don’t believe there’s any faster way to do this than the naïve brute force method.

Since it looks like you’re doing it on a pandas dataframe it would probably be simpler to use a new column to keep track of the equivalence classes - phone posting so I’ll have to do it in pseudo code but the idea is:

* create a new integer column of zeros, called ‘equivc’ or whatever
* start with the first row, set .loc[0, ‘eqquivc’] = 1
* find every row that overlaps with a row having equivc == 1, set equivc=1 for those rows
* repeat until you don’t find any more rows
* now find the first row that still has equivc==0, set equivc=2 for that row, and repeat the above steps
* keep going until there are no rows left with equivc == 0

Once you’re done with this then every row with equivc==1 has mutually overlapping intervals, as does every row with equivc ==2 and so on.

SurgicalOntologist
Jun 17, 2004

Assuming you are just grouping on 'work_start', you can get your equivalence classes with something close to:

Python code:
(
    df.work_start
    .diff().fillna(0)
    .gt(INTERVAL_THRESHOLD).astype(int)
    .cumsum()
)
Basically, take the differences, check if its greater than the interval such that a 0 indicates it should be grouped, then take the cumsum. Every row that should be grouped will have the same value as the previous. The fillna(0) is for the first row.

Then this part I'm not sure about, but you should probably define a function that outputs each row you want from a subset of the DF, let's call it merge_group(group: DataFrame) -> Series (I think it should output Series, a row of a DF).

Python code:
(
    df.groupby(
        df.work_start
        .diff().fillna(0)
        .gt(INTERVAL_THRESHOLD).astype(int)
        .cumsum()
    )
    .apply(merge_group)
)
Edit: if you want to check between end and start, the classes are something like:
Python code:
(
    df.work_start
    .sub(df.work_end.shift(1)).fillna(0)
    .gt(INTERVAL_THRESHOLD).astype(int)
    .cumsum()
)
Edit2: perhaps 0 and INTERVAL_THRESHOLD need to be TimeDeltas or whatever

Edit3: because I'm bored I checked pandas docs and I think you could do
Python code:
(
    df.groupby(
        df.work_start
        .sub(df.work_end.shift(1)).fillna(0)
        .gt(INTERVAL_THRESHOLD).astype(int)
        .cumsum()
    )
    .aggregate(dict(
        work_start='min',
        work_end='max',
    ))
)
Although if you have more columns you may still need a custom function.

SurgicalOntologist fucked around with this message at 19:14 on Jul 28, 2022

Foxfire_
Nov 8, 2010

I'm not 100% sure I'm following what you're doing, but I think you have each timespan starting out on its own and you keep merging them until the gaps are bigger than 10mins?

Disjoint set forest is a data structure that can do that efficiently. For N things, they are grouped into sets with each object belonging to exactly one set. Finding which set something belongs to and merging sets is amortized ~O(1)

(But depending on your problem size & how often you will do it, it may not be worthwhile to implement & test it vs just waiting out a trivial python version or doing a trivial implementation in some faster non-python thing (either python-adjacent numba/numpy, or a completely separate language. Or find someone whose already implemented the data structure in python)

CarForumPoster
Jun 26, 2013

⚡POWER⚡
Thanks for the help, you guys helped me zoom out of being fixated on one path. The end goal is to calculate for many people the number of hours worked in a day, allowing for short breaks. I can just add the time gaps that are less than 10 minutes and get the same total number of hours, I don't need to further reduce them at all.

code:
df2 = pd.DataFrame(output, columns=['work_start','work_end'])
df2['work_start_shifted'] = df2['work_start'].shift(-1)
df2['td_delta'] = (df2['work_start_shifted']-df2['work_end'])

def calculate_row_time(row, max_gap=10):
    worked_time = row['work_end']-row['work_start']
    if row['td_delta'] < timedelta(minutes=max_gap):
        worked_time+=row['td_delta']
    return worked_time

df2['worked_time'] = df2['work_end']-df2['work_start']
df2['total_worked_time'] = df2.apply(lambda row: calculate_row_time(row, max_gap=10), axis=1)
df2

Example output:

SurgicalOntologist
Jun 17, 2004

These types of problems are my bread and butter (finding consecutive sections of at timeseries that meet a set of criteria) and the basic pattern is:
  • Apply the criteria to get a boolean
  • Get a diff on the boolean to find "edges" of the groups
  • Take a cumsum to get indices of the groups
  • (It's not your case but often in my case I want a "null group" in between the groups. In that case, apply back the False values from step 1 to fill those as zeros or null.)

sugar free jazz
Mar 5, 2008

heh cumsum

SirPablo
May 1, 2004

Pillbug
Is there a way to scrape threads?

Armitag3
Mar 15, 2020

Forget it Jake, it's cybertown.


SirPablo posted:

Is there a way to scrape threads?

Sure, you can just make the same requests as your browser does with requests and scrape the content with beautifulsoup. Id drop a line to astral first though.

SirPablo
May 1, 2004

Pillbug
How do you pass authentication to it? I haven't learned that part. Making the request I can handle in general.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

SirPablo posted:

How do you pass authentication to it? I haven't learned that part. Making the request I can handle in general.

You make a post. I'd suggest you do babbys first scraper with selenium i.e. actually using a web browser. It will let you visualize whats going on as well as interact manually for things like logins.

SirPablo
May 1, 2004

Pillbug
Thanks for the pointers (sometimes hardest part is learning what's out there to use), I'll take a look.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
What's a more pythonic/optimized way of accomplishing this?

Goal: count the occurrences of colors found in a text file

I'll likely add a lot more colors, and will be iterating over many files.

Python code:
count = {'red': 0, 'yellow': 0, 'blue': 0}

with open ('./file.txt', 'r') as f:
	for line in f:
		if 'yellow' in line:
			count['yellow'] += 1

		if 'red' in line:
			count['red'] += 1

		if 'blue' in line:
			count['blue'] += 1

print(count)

KICK BAMA KICK
Mar 2, 2009

Look at collections.Counter(), made for exactly that

Foxfire_
Nov 8, 2010

Do you intend to count a line like

"yellow yellow"

As one yellow count? Also, do you intend it to be case specific?

Zoracle Zed
Jul 10, 2001
I like the way set operations work with dict keys:

code:
for color in (line.split() & count.keys()):
    count[color] += 1
edit: assuming you don't want to count duplicates, (just the original), like foxfire_ hinted

Hughmoris
Apr 21, 2007
Let's go to the abyss!

KICK BAMA KICK posted:

Look at collections.Counter(), made for exactly that

That looks like it'll work, thanks!

Foxfire_ posted:

Do you intend to count a line like

"yellow yellow"

As one yellow count? Also, do you intend it to be case specific?

It would count as two, and no not case specific. It's for a small toy project, to see how 'colorful' some text files are.


Zoracle Zed posted:

I like the way set operations work with dict keys:

code:
for color in (line.split() & count.keys()):
    count[color] += 1
edit: assuming you don't want to count duplicates, (just the original), like foxfire_ hinted

I'll take a look at this method as well, thanks!

QuarkJets
Sep 8, 2008

Hughmoris posted:

What's a more pythonic/optimized way of accomplishing this?

Goal: count the occurrences of colors found in a text file

I'll likely add a lot more colors, and will be iterating over many files.

Python code:
count = {'red': 0, 'yellow': 0, 'blue': 0}

with open ('./file.txt', 'r') as f:
	for line in f:
		if 'yellow' in line:
			count['yellow'] += 1

		if 'red' in line:
			count['red'] += 1

		if 'blue' in line:
			count['blue'] += 1

print(count)

Instead of writing out each if statement you could iterate over the keys in the dict. That'd be way more compact:
Python code:
count = {'red': 0, 'yellow': 0, 'blue': 0}

with open('./file.txt', 'r') as f:
    for line in f:
        for color in count:
            if color in line:
                count[color] += 1
Formatting: you have a space between "open" and its arguments, you should remove that. You can use tabs but I know a lot of programmers who use tab-expansion, e.g. when you press tab it inserts 4 spaces instead of a tab character - that's personal preference

You could fetch 'file.txt' from sys.argv (e.g. as a command-line argument), or assign it to an all-caps variable near the top of your file (this is a good way to treat constants such as this file name, it's easier to find and change them later)

Someone else mentioned collections.Counter, that would be an excellent improvement. To demonstrate another kind of tool that you may not have seen before, I'll show how you could use a dataclass (they replace the old-school dict way of managing data).

Python code:
from dataclasses import dataclass

FILE_NAME = './file.txt'

@dataclass
class Color:
    name: str
    count: int = 0

colors = [Color('red'), Color('blue'), Color('yellow')]
with open(FILE_NAME, 'r') as f:
    for line in f:
        for color in colors:
            if color.name in line:
                color.count += 1
What you have set up would count the line "yellow yellow" as only 1 yellow. Another poster has pointed out how you could count it as 2 yellows (by splitting the line and iterating over every word)

QuarkJets fucked around with this message at 16:40 on Aug 5, 2022

Hughmoris
Apr 21, 2007
Let's go to the abyss!

QuarkJets posted:

Dataclass and stuff...

I've zero experience with data classes but that looks simple and friendly enough. Thanks for the feedback.

Ultimately, this simple script will reside in an AWS Lambda as a learning exercise. Upload a file to S3 -> Triggers Lambda to process it -> Writes out to Postgres or something.

*The file name will be the uploaded S3 object, and I'm thinking the Color List will be a text file residing in S3 as well which can be updated on-demand with new colors.

QuarkJets
Sep 8, 2008

Hughmoris posted:

I've zero experience with data classes but that looks simple and friendly enough. Thanks for the feedback.

Ultimately, this simple script will reside in an AWS Lambda as a learning exercise. Upload a file to S3 -> Triggers Lambda to process it -> Writes out to Postgres or something.

*The file name will be the uploaded S3 object, and I'm thinking the Color List will be a text file residing in S3 as well which can be updated on-demand with new colors.

Nice. Are you aware of smart_open?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

QuarkJets posted:

Nice. Are you aware of smart_open?

I had not heard of it but I went off and looked it up. Seems like a sweet tool that I'll put in my back pocket for future use on bigger projects.

I got version 1 of this program working with yalls help. Uploading a text file to S3 triggers the python Lambda, which parses the contents and writers out it's findings. The code is ugly compared to what most of you write but it works so I'll take it.

QuarkJets
Sep 8, 2008

Hughmoris posted:

I had not heard of it but I went off and looked it up. Seems like a sweet tool that I'll put in my back pocket for future use on bigger projects.

I got version 1 of this program working with yalls help. Uploading a text file to S3 triggers the python Lambda, which parses the contents and writers out it's findings. The code is ugly compared to what most of you write but it works so I'll take it.

Cool, if you'd like feedback feel free to post some.

Mursupitsku
Sep 12, 2011
I have a monte carlo simulation that is of the following structure:

Python code:
import random

probs = {('condition1_1', 'condition2_1', 'condition3_1'): [0.25, 0.25, 0.25, 0.25],
         ('condition1_2', 'condition2_2', 'condition3_2'): [0.5, 0.5, 0, 0],
         ('condition1_3', 'condition2_3', 'condition3_3'): [0, 0, 0.75, 0.25]}
         
probs2 = {('condition1_1', 'condition2_1', 'condition3_1'): [0.25, 0.25, 0.25, 0.25],
          ('condition1_2', 'condition2_2', 'condition3_2'): [0, 0, 0, 1],
          ('condition1_3', 'condition2_3', 'condition3_3'): [0, 0.25, 0.75, 0]}
         
def draw_options(condition1, condition2, condition3):
    
    if len(condition1) == 1:
        probabilities = probs[(condition1, condition2, condition3)]
    else:
        probabilities = probs2[(condition1, condition2, condition3)]
    
    options = ['option 1', 'option 2', 'option 3', 'option 4']
    result = random.choices(options, probabilities)[0]
    
    return result
    
for _ in range(200000):
    #get conditions
    
    option = draw_options()
    
    #do something
The draw_options function is painfully slow. Is there a way to make it faster? In my actual code the probs dictionaries are much much larger.

Previously I had it set up as follows and it is much faster like this:

Python code:
import random

probs = {('condition1_1', 'condition2_1', 'condition3_1'): ['option 1', 'option 2', 'option 3', 'option 4'],
         ('condition1_2', 'condition2_2', 'condition3_2'): ['option 1', 'option 1', 'option 2', 'option 2'],
         ('condition1_3', 'condition2_3', 'condition3_3'): ['option 3', 'option 3', 'option 3', 'option 4']}
         
probs2 = {('condition1_1', 'condition2_1', 'condition3_1'): ['option 1', 'option 2', 'option 3', 'option 4'],
          ('condition1_2', 'condition2_2', 'condition3_2'): ['option 4'],
          ('condition1_3', 'condition2_3', 'condition3_3'): ['option 2', 'option 3', 'option 3', 'option 3']}
         
def draw_options(condition1, condition2, condition3):
    
    if len(condition1) == 1:
        result = random.choice(probs)
    else:
        result = random.choice(probs2)
    
    return result
    
for _ in range(200000):
    #get conditions
    
    option = draw_options()
    
    #do something
However there are some reason why I'd like to use the upper method.

Hopefully I didn't simplify the code too much. The actual code is more complicated and I dont think it makes sense to post it. In the actual code the probs dictionaries are much larger and there are a few more if else conditions to decide from where to draw the probabilities.

Foxfire_
Nov 8, 2010

Is it the choices call that's slow or the rest of it?

Non-python code (nump random, and numpy generally) will run much faster. The typical way to do numerical computing with python is to use python as a glue language connecting non-python code (number or numpy)

Mursupitsku
Sep 12, 2011
Removing the choice seems to speed it up a bit but it doesnt seem to be the main problem.

QuarkJets
Sep 8, 2008

I think that hashing (condition1, condition2, condition3) for every lookup is going to impose a performance bottleneck. It looks like 'condition1_1' is always paired with 'condition1_2' and ''condition1_3'. If that's the case then you don't need condition2 and condition3 in the lookup.

If you need to have all 3 for whatever reason then try nesting the dictionary instead of creating keys out of tuples, e.g. probs[condition1][condition2][condition3]

Foxfire_
Nov 8, 2010

Can you set up some smaller dummy thing that duplicates the problem?

When I run this:
Python code:
# Make a bigish dict indexed by a tuple of strings
probs = {}
for i in range(1000):
    probs[(f"thing1_{i}", f"thing2_{i}", f"thing3_{i}")] = i

def profile():
    """ The code we're interested in timing """
    fake_work = 0
    for _ in range(200_000):
        # Dict lookup, pretty sure CPython doesn't cache anything so every call does the full lookup
        fake_work += probs[("thing1_300", "thing2_300", "thing3_300")]
    return fake_work
%timeit profile() is telling me it's only tens of ms to run profile(). Doesn't seem like the dict lookup should matter much for runtime, unless your probs is a lot bigger than 1000 things

Biffmotron
Jan 12, 2007

I'm not entirely following what the code does. I get that you're choosing one of option_X from probability distribution prob_Y, but you define draw_options() with arguments and then don't pass anything to them.

I ran some profiles with %timeit. random.choice is reasonably performative, but random.choices scales poorly.

random.choice
  • 10 items, 189 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
  • 10**6 items, 362 ns ± 4.58 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  • 10**8 items, 421 ns ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

random.choices
  • 10 items, 889 ns ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  • 10**6 items, 32.9 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • 10**8 items, 3.8 s ± 82.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
random.choices has to start by iterating the entire list of weights, which of course will get slower the more items you have.

If you supply cumulative weights to random.choices then it should perform a bit better.

Adbot
ADBOT LOVES YOU

Seventh Arrow
Jan 26, 2005

I've been learning about scopes and namespaces (and maybe not paying enough attention!), so I'm trying to understand how this works:



:siren::siren::siren: without getting into an internet slapfight about the practice of tipping :siren::siren::siren:, I'm not sure how the add_tip function is able to access the 'total' variable from total_bill. Isn't 'total' local to total_bill and thus inaccessible to add_tip? Shouldn't the parameter for add_tip be 'def add_tip(total_bill)'?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply