Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

i vomit kittens: Apr 25, 2019

It's just chaining a bunch of method calls together, I don't think that's exactly uncommon. Off the top of my head I do it all the time in SQLAlchemy:

Python code:


foos = Foo.query.filter(Foo.name == "bar").all()

That makes more sense to me than:

Python code:


foos = Foo.query.filter(Foo.name == "bar")
foos = foos.all()

I don't have a reason to save the return value of the filter() call because I'm just going to call all(), one(), or something else after it anyways. I would only save the intermediate return values back to the variable (or a separate one) if I actually needed it for something or if it helped readability.

# ? Jul 26, 2022 08:22

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 13:01

QuarkJets: Sep 8, 2008

duck monster posted:

Can someone tell me whats going on with this "fluent interface" poo poo.

https://github.com/kkroening/ffmpeg-python

Because it kind of makes me feel like bullying a javascript coder, and I'm not entirely sure why.

Is this actually a *thing* in python now? or is this dude just abusing import semantics? Cos it doesnt *look* pythonic, but I'm old and have bad knees and dont like kids on lawns or something.

As soon as I saw ffmpeg my brain went "OH GET READY TO LOOK AT SOME lovely CODE!" Like don't get me wrong ffmpeg is a masterpiece that basically the whole world relies on but it's actually pretty scary knowing that the whole world is resting on top of a bunch of rotting pizza boxes

A "fluent" interface is what i vomit kittens described, it basically just means that you can chain methods together. A bunch of string methods let you do this, by having a method return `self` you make it very easy to chain operations together and this results in slimmer, sexier code. Here's some code that makes use of string's fluent interface:

Python code:

"some_string".strip().title().ljust(10)

You probably saw this on the github page:

Python code:

import ffmpeg
(
    ffmpeg
    .input('input.mp4')
    .hflip()
    .output('output.mp4')
    .run()
)

You were right to be like "what the gently caress is this poo poo". Don't be fooled by the parens, they have nothing to do with the import statement this dork is just using them in order to have arbitrary amounts of indenting. What a jackass. That's the same as this

Python code:

import ffmpeg
ffmpeg.input('input.mp4').hflip().output('output.mp4').run()

QuarkJets fucked around with this message at 08:57 on Jul 26, 2022

# ? Jul 26, 2022 08:49

lazerwolf: Dec 22, 2009; Orange and Black

I would argue the parens example is much more readable. Plus there is the added benefit of being able to comment out a chain easily

code:

(
    string 
    .func()
    .func2()
  # .func3()
    .func4()
)

I use this commonly chaining pandas methods together

# ? Jul 26, 2022 10:31

Pyromancer: Apr 29, 2011; This man must look upon the fire, smell of it, warm his hands by it, stare into its heart

duck monster posted:

Can someone tell me whats going on with this "fluent interface" poo poo.

https://github.com/kkroening/ffmpeg-python

Because it kind of makes me feel like bullying a javascript coder, and I'm not entirely sure why.

Is this actually a *thing* in python now? or is this dude just abusing import semantics? Cos it doesnt *look* pythonic, but I'm old and have bad knees and dont like kids on lawns or something.

That's not exactly a JS invention, pretty sure it started in Java DSL frameworks like Spring
But you can do that in most object-oriented languages by making a class where all setter methods return "this" or "self" or whatever it is called in that language.

# ? Jul 26, 2022 11:34

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

lazerwolf posted:

I would argue the parens example is much more readable. Plus there is the added benefit of being able to comment out a chain easily
code:
(
    string 
    .func()
    .func2()
  # .func3()
    .func4()
)
I use this commonly chaining pandas methods together

This!

This is very common in ETL/data science/setting up for some ML training type workflows because you�ll do these types of chained operations a LOT.

# ? Jul 26, 2022 11:35

duck monster: Dec 15, 2004

Yes I know what fluent is in the usual context of it just being a dumb word for "chainable methods". Those have been around in python since the 90s.

But putting them INSIDE an import statement, thats whats got my heckles up. Like, what the gently caress is it actually importing, or is it just an unpythonic hack to access some stuff in another library?

# ? Jul 26, 2022 15:12

a foolish pianist: May 6, 2007; (bi)cyclic mutation

duck monster posted:

Yes I know what fluent is in the usual context of it just being a dumb word for "chainable methods". Those have been around in python since the 90s.

But putting them INSIDE an import statement, thats whats got my heckles up. Like, what the gently caress is it actually importing, or is it just an unpythonic hack to access some stuff in another library?

It�s not inside the import.

# ? Jul 26, 2022 15:18

Wallet: Jun 19, 2006

QuarkJets posted:

You probably saw this on the github page:
Python code:
import ffmpeg
(
    ffmpeg
    .input('input.mp4')
    .hflip()
    .output('output.mp4')
    .run()
)

If you have your line length short enough this is more reasonable than the alternative wrapping.

In many cases it's the best way to format poo poo like that when chaining functions like that. E.g. this is readable, but trying to line wrap it any other way definitely isn't.

Python code:

    sq = (
        db.session.query(M.p_id)
        .with_entities(
            M.p_id,
            func.min(
                func.ST_Distance(M.fuzzed, active, use_spheroid=True)
            ).label("distance"),
        )
        .filter(M.p_id == P.id)
        .filter(M.fuzzed.isnot(None))
        .filter(M.complete == True)  # noqa
        .filter(P.complete == True)
        .filter(P.finished == True)
        .filter(P.visible == True)
        .group_by(M.p_id)
    )

Wallet fucked around with this message at 14:56 on Jul 27, 2022

# ? Jul 27, 2022 14:48

Slimchandi: May 13, 2005; That finger on your temple is the barrel of my raygun

Falcon2001 posted:

I watched quite a lot of these types of tutorials when I was learning about python development, and thought the same way. 'That pattern looks ugly, why do it that way?'. I'm still not entirely sure what 'inversion of control' is but I'm sure it will bite me on the rear end someday soon.

For me, the saying that encapsulates this type of experience is: 'Life gives the test first, and the lesson later'.

# ? Jul 27, 2022 21:27

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

I'm trying to calculate the union of some time logs, then for times that are closer than a "gap time" merge them. The first thought implementation of this is pretty slow.

So getting the union is fairly easy. If unions is storing my times:

code:

unions = [(Timestamp('2022-07-26 09:32:19.100528'),
  Timestamp('2022-07-26 09:42:19.100528')),
 (Timestamp('2022-07-26 09:36:11.603177'),
  Timestamp('2022-07-26 09:46:11.603177')),
 (Timestamp('2022-07-26 09:51:31.793032'),
  Timestamp('2022-07-26 10:01:31.793032')),
 (Timestamp('2022-07-26 09:55:10.747128'),
  Timestamp('2022-07-26 10:05:10.747128')),
 (Timestamp('2022-07-26 09:58:37.018675'),
  Timestamp('2022-07-26 10:08:37.018675')),
 (Timestamp('2022-07-26 10:07:12.019496'),
  Timestamp('2022-07-26 10:17:12.019496')),
 (Timestamp('2022-07-26 10:15:13.668086'),
  Timestamp('2022-07-26 10:25:13.668086'))]

I can find the unions of those like so:

code:

def unionize(intervals):
    sorted_by_lower_bound = sorted(intervals, key=lambda tup: tup[0])
    merged = []

    for higher in sorted_by_lower_bound:
        if not merged:
            merged.append(higher)
        else:
            lower = merged[-1]
            # test for intersection between lower and higher:
            # we know via sorting that lower[0] <= higher[0]
            if higher[0] <= lower[1]:
                upper_bound = max(lower[1], higher[1])
                merged[-1] = (lower[0], upper_bound)  # replace by merged interval
            else:
                merged.append(higher)
    return merged


output = unionize(unions)

Now the output is a list of times that don't have a union:

code:

[(Timestamp('2022-07-26 09:32:19.100528'),
  Timestamp('2022-07-26 09:46:11.603177')),
 (Timestamp('2022-07-26 09:51:31.793032'),
  Timestamp('2022-07-26 10:31:22.827263')),
 (Timestamp('2022-07-26 10:31:38.523307'),
  Timestamp('2022-07-26 10:41:38.523307')),
 (Timestamp('2022-07-26 10:57:19.400866'),
  Timestamp('2022-07-26 11:10:19.591433')),
 (Timestamp('2022-07-26 11:37:19.066399'),
  Timestamp('2022-07-26 14:07:23.102825')),
 (Timestamp('2022-07-26 14:17:19.139540'),
  Timestamp('2022-07-26 15:45:06.070108')),
 (Timestamp('2022-07-26 15:52:19.856122'),
  Timestamp('2022-07-26 16:47:20.009336'))]

Now I need to find the ones that are within 10 minutes of each other and combine the ealier start and latest end for all the ones that have a gap under 10 minutes.

I can do this by checking whether output[n][1] is within 10 minutes of output[n+1][0] easily enough, then merge them if so. However I'd the need to check the resulting merged one against the next one in the loop...also that results in me modifying the list in the loop which is a no no.

I feel like I'm missing an obvious answer here that doesnt require iterating over the results multiple times until no more gaps are found. Thoughts?

EDIT: Here's a more graphical rep of the data.

So the output should be three rows because indexes 0, 1 and 2 should be merged, 3 should not be modified, 4, 5 and 6 should be merged.

CarForumPoster fucked around with this message at 14:35 on Jul 28, 2022

# ? Jul 28, 2022 14:23

SirPablo: May 1, 2004; Pillbug

Could you just treat each start/end time as a data frame (or series?) and do an inner join based on the datetime index?

# ? Jul 28, 2022 15:29

DoctorTristan: Mar 11, 2006; I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

You�re partitioning the set into its equivalence classes defined by the relation �the timestamps overlap� - I don�t believe there�s any faster way to do this than the na�ve brute force method.

Since it looks like you�re doing it on a pandas dataframe it would probably be simpler to use a new column to keep track of the equivalence classes - phone posting so I�ll have to do it in pseudo code but the idea is:

* create a new integer column of zeros, called �equivc� or whatever
* start with the first row, set .loc[0, �eqquivc�] = 1
* find every row that overlaps with a row having equivc == 1, set equivc=1 for those rows
* repeat until you don�t find any more rows
* now find the first row that still has equivc==0, set equivc=2 for that row, and repeat the above steps
* keep going until there are no rows left with equivc == 0

Once you�re done with this then every row with equivc==1 has mutually overlapping intervals, as does every row with equivc ==2 and so on.

# ? Jul 28, 2022 18:07

SurgicalOntologist: Jun 17, 2004

Assuming you are just grouping on 'work_start', you can get your equivalence classes with something close to:

Python code:

(
    df.work_start
    .diff().fillna(0)
    .gt(INTERVAL_THRESHOLD).astype(int)
    .cumsum()
)

Basically, take the differences, check if its greater than the interval such that a 0 indicates it should be grouped, then take the cumsum. Every row that should be grouped will have the same value as the previous. The fillna(0) is for the first row.

Then this part I'm not sure about, but you should probably define a function that outputs each row you want from a subset of the DF, let's call it merge_group(group: DataFrame) -> Series (I think it should output Series, a row of a DF).

Python code:

(
    df.groupby(
        df.work_start
        .diff().fillna(0)
        .gt(INTERVAL_THRESHOLD).astype(int)
        .cumsum()
    )
    .apply(merge_group)
)

Edit: if you want to check between end and start, the classes are something like:

Python code:

(
    df.work_start
    .sub(df.work_end.shift(1)).fillna(0)
    .gt(INTERVAL_THRESHOLD).astype(int)
    .cumsum()
)

Edit2: perhaps 0 and INTERVAL_THRESHOLD need to be TimeDeltas or whatever

Edit3: because I'm bored I checked pandas docs and I think you could do

Python code:

(
    df.groupby(
        df.work_start
        .sub(df.work_end.shift(1)).fillna(0)
        .gt(INTERVAL_THRESHOLD).astype(int)
        .cumsum()
    )
    .aggregate(dict(
        work_start='min',
        work_end='max',
    ))
)

Although if you have more columns you may still need a custom function.

SurgicalOntologist fucked around with this message at 19:14 on Jul 28, 2022

# ? Jul 28, 2022 19:00

Foxfire_: Nov 8, 2010

I'm not 100% sure I'm following what you're doing, but I think you have each timespan starting out on its own and you keep merging them until the gaps are bigger than 10mins?

Disjoint set forest is a data structure that can do that efficiently. For N things, they are grouped into sets with each object belonging to exactly one set. Finding which set something belongs to and merging sets is amortized ~O(1)

(But depending on your problem size & how often you will do it, it may not be worthwhile to implement & test it vs just waiting out a trivial python version or doing a trivial implementation in some faster non-python thing (either python-adjacent numba/numpy, or a completely separate language. Or find someone whose already implemented the data structure in python)

# ? Jul 28, 2022 19:16

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Thanks for the help, you guys helped me zoom out of being fixated on one path. The end goal is to calculate for many people the number of hours worked in a day, allowing for short breaks. I can just add the time gaps that are less than 10 minutes and get the same total number of hours, I don't need to further reduce them at all.

code:

df2 = pd.DataFrame(output, columns=['work_start','work_end'])
df2['work_start_shifted'] = df2['work_start'].shift(-1)
df2['td_delta'] = (df2['work_start_shifted']-df2['work_end'])

def calculate_row_time(row, max_gap=10):
    worked_time = row['work_end']-row['work_start']
    if row['td_delta'] < timedelta(minutes=max_gap):
        worked_time+=row['td_delta']
    return worked_time

df2['worked_time'] = df2['work_end']-df2['work_start']
df2['total_worked_time'] = df2.apply(lambda row: calculate_row_time(row, max_gap=10), axis=1)
df2

Example output:

# ? Jul 28, 2022 19:53

SurgicalOntologist: Jun 17, 2004

These types of problems are my bread and butter (finding consecutive sections of at timeseries that meet a set of criteria) and the basic pattern is:

Apply the criteria to get a boolean
Get a diff on the boolean to find "edges" of the groups
Take a cumsum to get indices of the groups
(It's not your case but often in my case I want a "null group" in between the groups. In that case, apply back the False values from step 1 to fill those as zeros or null.)

# ? Jul 29, 2022 09:24

sugar free jazz: Mar 5, 2008

heh cumsum

# ? Jul 30, 2022 17:16

SirPablo: May 1, 2004; Pillbug

Is there a way to scrape threads?

# ? Jul 31, 2022 19:42

Armitag3: Mar 15, 2020; Forget it Jake, it's cybertown.

SirPablo posted:

Is there a way to scrape threads?

Sure, you can just make the same requests as your browser does with requests and scrape the content with beautifulsoup. Id drop a line to astral first though.

# ? Jul 31, 2022 19:59

SirPablo: May 1, 2004; Pillbug

How do you pass authentication to it? I haven't learned that part. Making the request I can handle in general.

# ? Jul 31, 2022 20:05

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

SirPablo posted:

How do you pass authentication to it? I haven't learned that part. Making the request I can handle in general.

You make a post. I'd suggest you do babbys first scraper with selenium i.e. actually using a web browser. It will let you visualize whats going on as well as interact manually for things like logins.

# ? Jul 31, 2022 20:08

SirPablo: May 1, 2004; Pillbug

Thanks for the pointers (sometimes hardest part is learning what's out there to use), I'll take a look.

# ? Jul 31, 2022 20:13

Hughmoris: Apr 21, 2007; Let's go to the abyss!

What's a more pythonic/optimized way of accomplishing this?

Goal: count the occurrences of colors found in a text file

I'll likely add a lot more colors, and will be iterating over many files.

Python code:

count = {'red': 0, 'yellow': 0, 'blue': 0}

with open ('./file.txt', 'r') as f:
	for line in f:
		if 'yellow' in line:
			count['yellow'] += 1

		if 'red' in line:
			count['red'] += 1

		if 'blue' in line:
			count['blue'] += 1

print(count)

# ? Aug 5, 2022 15:37

KICK BAMA KICK: Mar 2, 2009

Look at collections.Counter(), made for exactly that

# ? Aug 5, 2022 15:47

Foxfire_: Nov 8, 2010

Do you intend to count a line like

"yellow yellow"

As one yellow count? Also, do you intend it to be case specific?

# ? Aug 5, 2022 16:09

Zoracle Zed: Jul 10, 2001

I like the way set operations work with dict keys:

code:

for color in (line.split() & count.keys()):
    count[color] += 1

edit: assuming you don't want to count duplicates, (just the original), like foxfire_ hinted

# ? Aug 5, 2022 16:11

Hughmoris: Apr 21, 2007; Let's go to the abyss!

KICK BAMA KICK posted:

Look at collections.Counter(), made for exactly that

That looks like it'll work, thanks!

Foxfire_ posted:

Do you intend to count a line like

"yellow yellow"

As one yellow count? Also, do you intend it to be case specific?

It would count as two, and no not case specific. It's for a small toy project, to see how 'colorful' some text files are.

Zoracle Zed posted:

I like the way set operations work with dict keys:
code:
for color in (line.split() & count.keys()):
    count[color] += 1
edit: assuming you don't want to count duplicates, (just the original), like foxfire_ hinted

I'll take a look at this method as well, thanks!

# ? Aug 5, 2022 16:13

QuarkJets: Sep 8, 2008

Hughmoris posted:

What's a more pythonic/optimized way of accomplishing this?

Goal: count the occurrences of colors found in a text file

I'll likely add a lot more colors, and will be iterating over many files.
Python code:
count = {'red': 0, 'yellow': 0, 'blue': 0}

with open ('./file.txt', 'r') as f:
	for line in f:
		if 'yellow' in line:
			count['yellow'] += 1

		if 'red' in line:
			count['red'] += 1

		if 'blue' in line:
			count['blue'] += 1

print(count)

Instead of writing out each if statement you could iterate over the keys in the dict. That'd be way more compact:

Python code:

count = {'red': 0, 'yellow': 0, 'blue': 0}

with open('./file.txt', 'r') as f:
    for line in f:
        for color in count:
            if color in line:
                count[color] += 1

Formatting: you have a space between "open" and its arguments, you should remove that. You can use tabs but I know a lot of programmers who use tab-expansion, e.g. when you press tab it inserts 4 spaces instead of a tab character - that's personal preference

You could fetch 'file.txt' from sys.argv (e.g. as a command-line argument), or assign it to an all-caps variable near the top of your file (this is a good way to treat constants such as this file name, it's easier to find and change them later)

Someone else mentioned collections.Counter, that would be an excellent improvement. To demonstrate another kind of tool that you may not have seen before, I'll show how you could use a dataclass (they replace the old-school dict way of managing data).

Python code:

from dataclasses import dataclass

FILE_NAME = './file.txt'

@dataclass
class Color:
    name: str
    count: int = 0

colors = [Color('red'), Color('blue'), Color('yellow')]
with open(FILE_NAME, 'r') as f:
    for line in f:
        for color in colors:
            if color.name in line:
                color.count += 1

What you have set up would count the line "yellow yellow" as only 1 yellow. Another poster has pointed out how you could count it as 2 yellows (by splitting the line and iterating over every word)

QuarkJets fucked around with this message at 16:40 on Aug 5, 2022

# ? Aug 5, 2022 16:36

Hughmoris: Apr 21, 2007; Let's go to the abyss!

QuarkJets posted:

Dataclass and stuff...

I've zero experience with data classes but that looks simple and friendly enough. Thanks for the feedback.

Ultimately, this simple script will reside in an AWS Lambda as a learning exercise. Upload a file to S3 -> Triggers Lambda to process it -> Writes out to Postgres or something.

*The file name will be the uploaded S3 object, and I'm thinking the Color List will be a text file residing in S3 as well which can be updated on-demand with new colors.

# ? Aug 5, 2022 17:59

QuarkJets: Sep 8, 2008

Hughmoris posted:

I've zero experience with data classes but that looks simple and friendly enough. Thanks for the feedback.

Ultimately, this simple script will reside in an AWS Lambda as a learning exercise. Upload a file to S3 -> Triggers Lambda to process it -> Writes out to Postgres or something.

*The file name will be the uploaded S3 object, and I'm thinking the Color List will be a text file residing in S3 as well which can be updated on-demand with new colors.

Nice. Are you aware of smart_open?

# ? Aug 5, 2022 20:02

Hughmoris: Apr 21, 2007; Let's go to the abyss!

QuarkJets posted:

Nice. Are you aware of smart_open?

I had not heard of it but I went off and looked it up. Seems like a sweet tool that I'll put in my back pocket for future use on bigger projects.

I got version 1 of this program working with yalls help. Uploading a text file to S3 triggers the python Lambda, which parses the contents and writers out it's findings. The code is ugly compared to what most of you write but it works so I'll take it.

# ? Aug 6, 2022 21:06

QuarkJets: Sep 8, 2008

Hughmoris posted:

I had not heard of it but I went off and looked it up. Seems like a sweet tool that I'll put in my back pocket for future use on bigger projects.

I got version 1 of this program working with yalls help. Uploading a text file to S3 triggers the python Lambda, which parses the contents and writers out it's findings. The code is ugly compared to what most of you write but it works so I'll take it.

Cool, if you'd like feedback feel free to post some.

# ? Aug 6, 2022 21:20

Mursupitsku: Sep 12, 2011

I have a monte carlo simulation that is of the following structure:

Python code:

import random

probs = {('condition1_1', 'condition2_1', 'condition3_1'): [0.25, 0.25, 0.25, 0.25],
         ('condition1_2', 'condition2_2', 'condition3_2'): [0.5, 0.5, 0, 0],
         ('condition1_3', 'condition2_3', 'condition3_3'): [0, 0, 0.75, 0.25]}
         
probs2 = {('condition1_1', 'condition2_1', 'condition3_1'): [0.25, 0.25, 0.25, 0.25],
          ('condition1_2', 'condition2_2', 'condition3_2'): [0, 0, 0, 1],
          ('condition1_3', 'condition2_3', 'condition3_3'): [0, 0.25, 0.75, 0]}
         
def draw_options(condition1, condition2, condition3):
    
    if len(condition1) == 1:
        probabilities = probs[(condition1, condition2, condition3)]
    else:
        probabilities = probs2[(condition1, condition2, condition3)]
    
    options = ['option 1', 'option 2', 'option 3', 'option 4']
    result = random.choices(options, probabilities)[0]
    
    return result
    
for _ in range(200000):
    #get conditions
    
    option = draw_options()
    
    #do something

The draw_options function is painfully slow. Is there a way to make it faster? In my actual code the probs dictionaries are much much larger.

Previously I had it set up as follows and it is much faster like this:

Python code:

import random

probs = {('condition1_1', 'condition2_1', 'condition3_1'): ['option 1', 'option 2', 'option 3', 'option 4'],
         ('condition1_2', 'condition2_2', 'condition3_2'): ['option 1', 'option 1', 'option 2', 'option 2'],
         ('condition1_3', 'condition2_3', 'condition3_3'): ['option 3', 'option 3', 'option 3', 'option 4']}
         
probs2 = {('condition1_1', 'condition2_1', 'condition3_1'): ['option 1', 'option 2', 'option 3', 'option 4'],
          ('condition1_2', 'condition2_2', 'condition3_2'): ['option 4'],
          ('condition1_3', 'condition2_3', 'condition3_3'): ['option 2', 'option 3', 'option 3', 'option 3']}
         
def draw_options(condition1, condition2, condition3):
    
    if len(condition1) == 1:
        result = random.choice(probs)
    else:
        result = random.choice(probs2)
    
    return result
    
for _ in range(200000):
    #get conditions
    
    option = draw_options()
    
    #do something

However there are some reason why I'd like to use the upper method.

Hopefully I didn't simplify the code too much. The actual code is more complicated and I dont think it makes sense to post it. In the actual code the probs dictionaries are much larger and there are a few more if else conditions to decide from where to draw the probabilities.

# ? Aug 11, 2022 08:12

Foxfire_: Nov 8, 2010

Is it the choices call that's slow or the rest of it?

Non-python code (nump random, and numpy generally) will run much faster. The typical way to do numerical computing with python is to use python as a glue language connecting non-python code (number or numpy)

# ? Aug 11, 2022 18:31

Mursupitsku: Sep 12, 2011

Removing the choice seems to speed it up a bit but it doesnt seem to be the main problem.

# ? Aug 11, 2022 20:28

QuarkJets: Sep 8, 2008

I think that hashing (condition1, condition2, condition3) for every lookup is going to impose a performance bottleneck. It looks like 'condition1_1' is always paired with 'condition1_2' and ''condition1_3'. If that's the case then you don't need condition2 and condition3 in the lookup.

If you need to have all 3 for whatever reason then try nesting the dictionary instead of creating keys out of tuples, e.g. probs[condition1][condition2][condition3]

# ? Aug 11, 2022 21:13

Foxfire_: Nov 8, 2010

Can you set up some smaller dummy thing that duplicates the problem?

When I run this:

Python code:

# Make a bigish dict indexed by a tuple of strings
probs = {}
for i in range(1000):
    probs[(f"thing1_{i}", f"thing2_{i}", f"thing3_{i}")] = i

def profile():
    """ The code we're interested in timing """
    fake_work = 0
    for _ in range(200_000):
        # Dict lookup, pretty sure CPython doesn't cache anything so every call does the full lookup
        fake_work += probs[("thing1_300", "thing2_300", "thing3_300")]
    return fake_work

%timeit profile() is telling me it's only tens of ms to run profile(). Doesn't seem like the dict lookup should matter much for runtime, unless your probs is a lot bigger than 1000 things

# ? Aug 11, 2022 21:29

Biffmotron: Jan 12, 2007

I'm not entirely following what the code does. I get that you're choosing one of option_X from probability distribution prob_Y, but you define draw_options() with arguments and then don't pass anything to them.

I ran some profiles with %timeit. random.choice is reasonably performative, but random.choices scales poorly.

random.choice

10 items, 189 ns � 1.33 ns per loop (mean � std. dev. of 7 runs, 10,000,000 loops each)
10**6 items, 362 ns � 4.58 ns per loop (mean � std. dev. of 7 runs, 1,000,000 loops each)
10**8 items, 421 ns � 24.6 ns per loop (mean � std. dev. of 7 runs, 1,000,000 loops each)

random.choices

10 items, 889 ns � 12.2 ns per loop (mean � std. dev. of 7 runs, 1,000,000 loops each)
10**6 items, 32.9 ms � 271 �s per loop (mean � std. dev. of 7 runs, 10 loops each)
10**8 items, 3.8 s � 82.4 ms per loop (mean � std. dev. of 7 runs, 1 loop each)

# ? Aug 12, 2022 03:42

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

random.choices has to start by iterating the entire list of weights, which of course will get slower the more items you have.

If you supply cumulative weights to random.choices then it should perform a bit better.

# ? Aug 12, 2022 04:31

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 13:01

Seventh Arrow: Jan 26, 2005

I've been learning about scopes and namespaces (and maybe not paying enough attention!), so I'm trying to understand how this works:

without getting into an internet slapfight about the practice of tipping :siren:

, I'm not sure how the add_tip function is able to access the 'total' variable from total_bill. Isn't 'total' local to total_bill and thus inaccessible to add_tip? Shouldn't the parameter for add_tip be 'def add_tip(total_bill)'?

# ? Aug 12, 2022 05:01

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »