It's just chaining a bunch of method calls together, I don't think that's exactly uncommon. Off the top of my head I do it all the time in SQLAlchemy:Python code:
That makes more sense to me than: Python code:
I don't have a reason to save the return value of the filter() call because I'm just going to call all(), one(), or something else after it anyways. I would only save the intermediate return values back to the variable (or a separate one) if I actually needed it for something or if it helped readability.
|
|
# ? Jul 26, 2022 08:22 |
|
|
# ? May 14, 2024 13:01 |
|
duck monster posted:Can someone tell me whats going on with this "fluent interface" poo poo. As soon as I saw ffmpeg my brain went "OH GET READY TO LOOK AT SOME lovely CODE!" Like don't get me wrong ffmpeg is a masterpiece that basically the whole world relies on but it's actually pretty scary knowing that the whole world is resting on top of a bunch of rotting pizza boxes A "fluent" interface is what i vomit kittens described, it basically just means that you can chain methods together. A bunch of string methods let you do this, by having a method return `self` you make it very easy to chain operations together and this results in slimmer, sexier code. Here's some code that makes use of string's fluent interface: Python code:
Python code:
Python code:
QuarkJets fucked around with this message at 08:57 on Jul 26, 2022 |
# ? Jul 26, 2022 08:49 |
|
I would argue the parens example is much more readable. Plus there is the added benefit of being able to comment out a chain easilycode:
|
# ? Jul 26, 2022 10:31 |
|
duck monster posted:Can someone tell me whats going on with this "fluent interface" poo poo. That's not exactly a JS invention, pretty sure it started in Java DSL frameworks like Spring But you can do that in most object-oriented languages by making a class where all setter methods return "this" or "self" or whatever it is called in that language.
|
# ? Jul 26, 2022 11:34 |
|
lazerwolf posted:I would argue the parens example is much more readable. Plus there is the added benefit of being able to comment out a chain easily This! This is very common in ETL/data science/setting up for some ML training type workflows because you’ll do these types of chained operations a LOT.
|
# ? Jul 26, 2022 11:35 |
|
Yes I know what fluent is in the usual context of it just being a dumb word for "chainable methods". Those have been around in python since the 90s. But putting them INSIDE an import statement, thats whats got my heckles up. Like, what the gently caress is it actually importing, or is it just an unpythonic hack to access some stuff in another library?
|
# ? Jul 26, 2022 15:12 |
duck monster posted:Yes I know what fluent is in the usual context of it just being a dumb word for "chainable methods". Those have been around in python since the 90s. It’s not inside the import.
|
|
# ? Jul 26, 2022 15:18 |
|
QuarkJets posted:You probably saw this on the github page: If you have your line length short enough this is more reasonable than the alternative wrapping. In many cases it's the best way to format poo poo like that when chaining functions like that. E.g. this is readable, but trying to line wrap it any other way definitely isn't. Python code:
Wallet fucked around with this message at 14:56 on Jul 27, 2022 |
# ? Jul 27, 2022 14:48 |
|
I watched quite a lot of these types of tutorials when I was learning about python development, and thought the same way. 'That pattern looks ugly, why do it that way?'. I'm still not entirely sure what 'inversion of control' is but I'm sure it will bite me on the rear end someday soon. For me, the saying that encapsulates this type of experience is: 'Life gives the test first, and the lesson later'.
|
# ? Jul 27, 2022 21:27 |
|
I'm trying to calculate the union of some time logs, then for times that are closer than a "gap time" merge them. The first thought implementation of this is pretty slow. So getting the union is fairly easy. If unions is storing my times: code:
code:
code:
I can do this by checking whether output[n][1] is within 10 minutes of output[n+1][0] easily enough, then merge them if so. However I'd the need to check the resulting merged one against the next one in the loop...also that results in me modifying the list in the loop which is a no no. I feel like I'm missing an obvious answer here that doesnt require iterating over the results multiple times until no more gaps are found. Thoughts? EDIT: Here's a more graphical rep of the data. So the output should be three rows because indexes 0, 1 and 2 should be merged, 3 should not be modified, 4, 5 and 6 should be merged. CarForumPoster fucked around with this message at 14:35 on Jul 28, 2022 |
# ? Jul 28, 2022 14:23 |
|
Could you just treat each start/end time as a data frame (or series?) and do an inner join based on the datetime index?
|
# ? Jul 28, 2022 15:29 |
|
You’re partitioning the set into its equivalence classes defined by the relation ‘the timestamps overlap’ - I don’t believe there’s any faster way to do this than the naïve brute force method. Since it looks like you’re doing it on a pandas dataframe it would probably be simpler to use a new column to keep track of the equivalence classes - phone posting so I’ll have to do it in pseudo code but the idea is: * create a new integer column of zeros, called ‘equivc’ or whatever * start with the first row, set .loc[0, ‘eqquivc’] = 1 * find every row that overlaps with a row having equivc == 1, set equivc=1 for those rows * repeat until you don’t find any more rows * now find the first row that still has equivc==0, set equivc=2 for that row, and repeat the above steps * keep going until there are no rows left with equivc == 0 Once you’re done with this then every row with equivc==1 has mutually overlapping intervals, as does every row with equivc ==2 and so on.
|
# ? Jul 28, 2022 18:07 |
|
Assuming you are just grouping on 'work_start', you can get your equivalence classes with something close to:Python code:
Then this part I'm not sure about, but you should probably define a function that outputs each row you want from a subset of the DF, let's call it merge_group(group: DataFrame) -> Series (I think it should output Series, a row of a DF). Python code:
Python code:
Edit3: because I'm bored I checked pandas docs and I think you could do Python code:
SurgicalOntologist fucked around with this message at 19:14 on Jul 28, 2022 |
# ? Jul 28, 2022 19:00 |
|
I'm not 100% sure I'm following what you're doing, but I think you have each timespan starting out on its own and you keep merging them until the gaps are bigger than 10mins? Disjoint set forest is a data structure that can do that efficiently. For N things, they are grouped into sets with each object belonging to exactly one set. Finding which set something belongs to and merging sets is amortized ~O(1) (But depending on your problem size & how often you will do it, it may not be worthwhile to implement & test it vs just waiting out a trivial python version or doing a trivial implementation in some faster non-python thing (either python-adjacent numba/numpy, or a completely separate language. Or find someone whose already implemented the data structure in python)
|
# ? Jul 28, 2022 19:16 |
|
Thanks for the help, you guys helped me zoom out of being fixated on one path. The end goal is to calculate for many people the number of hours worked in a day, allowing for short breaks. I can just add the time gaps that are less than 10 minutes and get the same total number of hours, I don't need to further reduce them at all. code:
|
# ? Jul 28, 2022 19:53 |
|
These types of problems are my bread and butter (finding consecutive sections of at timeseries that meet a set of criteria) and the basic pattern is:
|
# ? Jul 29, 2022 09:24 |
|
heh cumsum
|
# ? Jul 30, 2022 17:16 |
|
Is there a way to scrape threads?
|
# ? Jul 31, 2022 19:42 |
|
SirPablo posted:Is there a way to scrape threads? Sure, you can just make the same requests as your browser does with requests and scrape the content with beautifulsoup. Id drop a line to astral first though.
|
# ? Jul 31, 2022 19:59 |
|
How do you pass authentication to it? I haven't learned that part. Making the request I can handle in general.
|
# ? Jul 31, 2022 20:05 |
|
SirPablo posted:How do you pass authentication to it? I haven't learned that part. Making the request I can handle in general. You make a post. I'd suggest you do babbys first scraper with selenium i.e. actually using a web browser. It will let you visualize whats going on as well as interact manually for things like logins.
|
# ? Jul 31, 2022 20:08 |
|
Thanks for the pointers (sometimes hardest part is learning what's out there to use), I'll take a look.
|
# ? Jul 31, 2022 20:13 |
|
What's a more pythonic/optimized way of accomplishing this? Goal: count the occurrences of colors found in a text file I'll likely add a lot more colors, and will be iterating over many files. Python code:
|
# ? Aug 5, 2022 15:37 |
|
Look at collections.Counter(), made for exactly that
|
# ? Aug 5, 2022 15:47 |
|
Do you intend to count a line like "yellow yellow" As one yellow count? Also, do you intend it to be case specific?
|
# ? Aug 5, 2022 16:09 |
|
I like the way set operations work with dict keys:code:
|
# ? Aug 5, 2022 16:11 |
|
KICK BAMA KICK posted:Look at collections.Counter(), made for exactly that That looks like it'll work, thanks! Foxfire_ posted:Do you intend to count a line like It would count as two, and no not case specific. It's for a small toy project, to see how 'colorful' some text files are. Zoracle Zed posted:I like the way set operations work with dict keys: I'll take a look at this method as well, thanks!
|
# ? Aug 5, 2022 16:13 |
|
Hughmoris posted:What's a more pythonic/optimized way of accomplishing this? Instead of writing out each if statement you could iterate over the keys in the dict. That'd be way more compact: Python code:
You could fetch 'file.txt' from sys.argv (e.g. as a command-line argument), or assign it to an all-caps variable near the top of your file (this is a good way to treat constants such as this file name, it's easier to find and change them later) Someone else mentioned collections.Counter, that would be an excellent improvement. To demonstrate another kind of tool that you may not have seen before, I'll show how you could use a dataclass (they replace the old-school dict way of managing data). Python code:
QuarkJets fucked around with this message at 16:40 on Aug 5, 2022 |
# ? Aug 5, 2022 16:36 |
|
QuarkJets posted:Dataclass and stuff... I've zero experience with data classes but that looks simple and friendly enough. Thanks for the feedback. Ultimately, this simple script will reside in an AWS Lambda as a learning exercise. Upload a file to S3 -> Triggers Lambda to process it -> Writes out to Postgres or something. *The file name will be the uploaded S3 object, and I'm thinking the Color List will be a text file residing in S3 as well which can be updated on-demand with new colors.
|
# ? Aug 5, 2022 17:59 |
|
Hughmoris posted:I've zero experience with data classes but that looks simple and friendly enough. Thanks for the feedback. Nice. Are you aware of smart_open?
|
# ? Aug 5, 2022 20:02 |
|
QuarkJets posted:Nice. Are you aware of smart_open? I had not heard of it but I went off and looked it up. Seems like a sweet tool that I'll put in my back pocket for future use on bigger projects. I got version 1 of this program working with yalls help. Uploading a text file to S3 triggers the python Lambda, which parses the contents and writers out it's findings. The code is ugly compared to what most of you write but it works so I'll take it.
|
# ? Aug 6, 2022 21:06 |
|
Hughmoris posted:I had not heard of it but I went off and looked it up. Seems like a sweet tool that I'll put in my back pocket for future use on bigger projects. Cool, if you'd like feedback feel free to post some.
|
# ? Aug 6, 2022 21:20 |
|
I have a monte carlo simulation that is of the following structure:Python code:
Previously I had it set up as follows and it is much faster like this: Python code:
Hopefully I didn't simplify the code too much. The actual code is more complicated and I dont think it makes sense to post it. In the actual code the probs dictionaries are much larger and there are a few more if else conditions to decide from where to draw the probabilities.
|
# ? Aug 11, 2022 08:12 |
|
Is it the choices call that's slow or the rest of it? Non-python code (nump random, and numpy generally) will run much faster. The typical way to do numerical computing with python is to use python as a glue language connecting non-python code (number or numpy)
|
# ? Aug 11, 2022 18:31 |
|
Removing the choice seems to speed it up a bit but it doesnt seem to be the main problem.
|
# ? Aug 11, 2022 20:28 |
|
I think that hashing (condition1, condition2, condition3) for every lookup is going to impose a performance bottleneck. It looks like 'condition1_1' is always paired with 'condition1_2' and ''condition1_3'. If that's the case then you don't need condition2 and condition3 in the lookup. If you need to have all 3 for whatever reason then try nesting the dictionary instead of creating keys out of tuples, e.g. probs[condition1][condition2][condition3]
|
# ? Aug 11, 2022 21:13 |
|
Can you set up some smaller dummy thing that duplicates the problem? When I run this: Python code:
|
# ? Aug 11, 2022 21:29 |
|
I'm not entirely following what the code does. I get that you're choosing one of option_X from probability distribution prob_Y, but you define draw_options() with arguments and then don't pass anything to them. I ran some profiles with %timeit. random.choice is reasonably performative, but random.choices scales poorly. random.choice
random.choices
|
# ? Aug 12, 2022 03:42 |
|
random.choices has to start by iterating the entire list of weights, which of course will get slower the more items you have. If you supply cumulative weights to random.choices then it should perform a bit better.
|
# ? Aug 12, 2022 04:31 |
|
|
# ? May 14, 2024 13:01 |
|
I've been learning about scopes and namespaces (and maybe not paying enough attention!), so I'm trying to understand how this works: without getting into an internet slapfight about the practice of tipping , I'm not sure how the add_tip function is able to access the 'total' variable from total_bill. Isn't 'total' local to total_bill and thus inaccessible to add_tip? Shouldn't the parameter for add_tip be 'def add_tip(total_bill)'?
|
# ? Aug 12, 2022 05:01 |