|
Too simple for Python??? Yeah, probably. I like writing python so here's a crack at it anyway:Python code:
quote:Username (userID1) 2021-07-14 14:49:22 The program prints: quote:defaultdict(<class 'list'>, {'Username (userID1)': ['wtf are you reading?', 'guys, erek is reading 50 shades of grey', 'lets all laugh and point at him', 'wrong', 'there is like 3 books', 'I um', 'uhh', 'gotta go!'], 'Username (userID2)': ['nothing go away', '50 Shades of Grey is a book not a series', 'I think', 'To google!', 'ew', 'there are three-...', "How'd you know that..."]})
|
# ? Feb 3, 2023 08:51 |
|
|
# ? May 15, 2024 03:56 |
|
God drat, this is way better than my garbage awk and grep. Thanks QuarkJets, you're a legend, for some reason this just feels like a much more elegant solution.
|
# ? Feb 3, 2023 15:40 |
|
QuarkJets posted:Too simple for Python??? Yeah, probably. I like writing python so here's a crack at it anyway: Haha, I was just coming here to write this, glad I swapped pages before doing it. I work with a bunch of shell script folks and there was this weird nested dictionary we had to modify and they gave up with sed and I managed to hammer out a script to not only modify it, but generate valid XML from it! Which is great until I need to write a shell script and suddenly I'm (justifiably) the idiot
|
# ? Feb 3, 2023 20:34 |
|
Sed, awk, and grep are fantastic and powerful tools but bash scripts have famously bad readability, which is important if it's something that you're going to ever reuse. I like seeing a clever bash one-liner on stackoverflow sometimes but not in production code
|
# ? Feb 3, 2023 23:04 |
|
And now technology has come full circle, and dockerfiles are full of sed and awk commands to manipulate configuration files etc etc.
|
# ? Feb 3, 2023 23:31 |
|
I am running into an issue. I have a Jupyter notebook which contains code that I want to run multiple times using different parameters. The table in the SQLite3 database from which the data will be pulled depends on a parameter passed in (there is one table with 4 hour data, one table with 6 hour data, and one table with 8 hour data). From reading stackoveflow posts (for example this one) it looks like the table name cannot be parameterized. The reasoning behind this is that they want to prevent SQL injection. There are ways around this: Python code:
I understand this is bad practice in general, but in this particular context where I am the user and will be supplying the tss_length_hrs string which will not be malicious, it should be ok, right?
|
# ? Feb 4, 2023 17:45 |
|
Yes, if you completely control the inputs its okay to do. The danger comes from uncontrolled inputs.
|
# ? Feb 4, 2023 17:57 |
|
Jose Cuervo posted:I am running into an issue. I have a Jupyter notebook which contains code that I want to run multiple times using different parameters. The table in the SQLite3 database from which the data will be pulled depends on a parameter passed in (there is one table with 4 hour data, one table with 6 hour data, and one table with 8 hour data). From reading stackoveflow posts (for example this one) it looks like the table name cannot be parameterized. The reasoning behind this is that they want to prevent SQL injection. SQL Injection is only possible if you accept unsanitized user input, there is nothing wrong with what you've written here. Put the f-string directly into the execute statement if you want, even. Just so long as you control the content of 'tss_length_hrs' and aren't just slamming in whatever a user provides. Dictionary dispatch would work well for that
|
# ? Feb 4, 2023 22:14 |
|
QuarkJets posted:SQL Injection is only possible if you accept unsanitized user input, there is nothing wrong with what you've written here. Put the f-string directly into the execute statement if you want, even. Just so long as you control the content of 'tss_length_hrs' and aren't just slamming in whatever a user provides. Dictionary dispatch would work well for that Can you elaborate on what you mean by dictionary dispatch and also explain why it would work well in this context?
|
# ? Feb 5, 2023 02:33 |
|
Jose Cuervo posted:Can you elaborate on what you mean by dictionary dispatch and also explain why it would work well in this context? Sure, say that you want a user to be able to select which table is queried via a command line argument. Passing input directly to the query is dangerous because it would leave you vulnerable to SQL injection, but you can easily pass that value to a dictionary that maps to only valid names: Python code:
This is dictionary dispatch; it's just mapping some input to a set of outputs. You could use the table_name variable here to build a SQL query without issue, because you know the valid table names already. If an invalid table name is provided then an exception is raised. If someone tries to attack you with an SQL injection input then it just dies at this dictionary lookup
|
# ? Feb 5, 2023 02:57 |
|
QuarkJets posted:Sure, say that you want a user to be able to select which table is queried via a command line argument. Passing input directly to the query is dangerous because it would leave you vulnerable to SQL injection, but you can easily pass that value to a dictionary that maps to only valid names: Got it, thanks!
|
# ? Feb 5, 2023 03:36 |
|
dictionary dispatch is a great pattern. if you're on a newer version of python (3.10+) you also have structural pattern matching. you should totally give this whole doc a read through, but this snippet in particular is especially relevantPython code:
i'd probably still start by implementing dictionary dispatch, it seems like a much better fit for your current problem set of "how do I translate user input into a known-safe thing", but if you're writing user tools you should definitely check out the above link which has "handling user input" as a center-stage feature of the tutorial and examples.
|
# ? Feb 5, 2023 03:49 |
|
I'm looking for Python courses that I can do fully from an IDE, REPL or failing that through Jupyter notebooks. About 6-7 years ago I did this great interactive course in R that was delivered entirely through R itself as a package you ran and all of the tutorials and exercises were done right from inside R Studio and I really enjoyed the "immersive" nature of it. Tried searching for something like this for Python but all I can find are interactive courses online that are taught through a web browser IDE. Maybe it's a bit picky but I found something about learning a language and getting familiar with the tools itself appealing, and that it was fully automated and even offline possible too. Something that I could run in VS Code would be fantastic. I've found a few courses taught over Jupyter notebooks which will be fine too obviously but just curious if anyone knows any Python courses run fully independently as packages.
|
# ? Feb 6, 2023 15:24 |
|
I'd never spent any thought on default arguments before, and when they're bound, etc.Python code:
pre:['a'] ['a', 'b'] question: is handling command dispatch via getattr(self, 'command_' + command, self.unknown_command) remotely sane? It's a tossup between that and dict dispatch but the latter has 3 times the boilerplate.
|
# ? Feb 6, 2023 23:07 |
|
Harik posted:question: is handling command dispatch via getattr(self, 'command_' + command, self.unknown_command) remotely sane? It's a tossup between that and dict dispatch but I try to avoid use of `getattr`, but there are rare circumstances where it makes sense. Can you provide a more complete example of what you're thinking about?
|
# ? Feb 6, 2023 23:24 |
|
QuarkJets posted:I try to avoid use of `getattr`, but there are rare circumstances where it makes sense. Can you provide a more complete example of what you're thinking about? Sure, it'd be something like Python code:
Python code:
e: obviously there's a lot more going on with async callbacks and state and access checks, but none of that was really relevant to the core question. e2: I guess the same could be achieved in __init__ by iterating over self and creating a dict of all the methods that startwith('command_') Harik fucked around with this message at 23:46 on Feb 6, 2023 |
# ? Feb 6, 2023 23:42 |
|
Harik posted:Sure, it'd be something like Yeah iterating over the methods would be a cool way to implement this without resorting to getattr. Is this for a CLI? Have you looked at dispatch?
|
# ? Feb 7, 2023 00:31 |
|
QuarkJets posted:Yeah iterating over the methods would be a cool way to implement this without resorting to getattr. Is this for a CLI? Have you looked at dispatch? dispatch looks useful, I may try it on my next CLI. but no, this isn't for a CLI. internet facing API so the documentation is separate from the code. It's a pattern I've been playing with to eliminate redundancy in already bloated code. I'll make the change to a dict populated via dir(self) though, that seems like the best overall. Not that the performance hit matters much: (times in nanoseconds) pre:getattr: 83 'command_' + cmd: 78 dict.get: 32 Thanks!
|
# ? Feb 7, 2023 01:12 |
|
Harik posted:I'd never spent any thought on default arguments before, and when they're bound, etc. That honestly feels like a bad design decision on pythons behalf. Almost worth reporting it as a bug. Principle of least surprise and all that.
|
# ? Feb 11, 2023 13:51 |
|
I have a Python memory management and speed question: I have a Python program where I'm going to be reading in a stream of data and displaying the data received over the last X seconds. Data outside that window can be junked, I'd like to optimize for speed first, and size second. I know how I'd do this in C, but I'm not sure about Python memory management. The data are going to be in a live updating graph, so accessing it contiguously would be good. I have my own ideas about circular buffers or offset pre-allocated buffers, but I have the feeling Python has something off the shelf that will handle this well. Does anything like that exist?
|
# ? Feb 11, 2023 14:50 |
duck monster posted:That honestly feels like a bad design decision on pythons behalf. Almost worth reporting it as a bug. Principle of least surprise and all that. The thing about mutable default arguments is one of the biggest gotchas and interview-interrogation points about Python and one of the great sources of truth-and-beauty discussions when you and the other devs start talking about coding best practices. It's been that way for so long that changing it now would probably cause a bunch of breaking changes to people's code. https://docs.python-guide.org/writing/gotchas/ Still would be nice to be able to do stuff like Python code:
Python code:
|
|
# ? Feb 11, 2023 17:02 |
|
duck monster posted:That honestly feels like a bad design decision on pythons behalf. Almost worth reporting it as a bug. Principle of least surprise and all that. They've seen the complaint already, you can be sure of that I understand why it works that way, it's the natural result of other design choices. It's still surprising and a pitfall
|
# ? Feb 11, 2023 19:22 |
|
check out pep 671 - late-bound function argument defaults
|
# ? Feb 11, 2023 20:24 |
|
StumblyWumbly posted:I have a Python memory management and speed question: I have a Python program where I'm going to be reading in a stream of data and displaying the data received over the last X seconds. Data outside that window can be junked, I'd like to optimize for speed first, and size second. I know how I'd do this in C, but I'm not sure about Python memory management. If you want to hold the last N items in a structure similar to a ring buffer without implementing your own, a deque is probably what you're after: https://docs.python.org/3/library/collections.html#collections.deque Deques can be given an optional maxlen to limit their size, but if you're expecting the stream volume to ebb and flow, you're probably better off implementing the cleanup of the tail end yourself and tying it to the time of the streamed event. In terms of performance, it's O(1) to access either end of the deque, O(n) in the middle, so if you're not doing a ton of random reads, or have a relatively small collection, a deque makes for a really nice choice.
|
# ? Feb 11, 2023 21:52 |
|
Does python not have an arraydeque of any kind? It shouldn't be O(n) just to read items by index in the middle.
|
# ? Feb 11, 2023 23:21 |
|
Jabor posted:Does python not have an arraydeque of any kind? It shouldn't be O(n) just to read items by index in the middle. Assuming you're referring to Java's arraydeque, Python's equivalent is just called deque: https://docs.python.org/3/library/collections.html#collections.deque
|
# ? Feb 11, 2023 23:47 |
|
QuarkJets posted:Assuming you're referring to Java's arraydeque, Python's equivalent is just called deque: https://docs.python.org/3/library/collections.html#collections.deque quote:Indexed access is O(1) at both ends but slows to O(n) in the middle. This doesn't look like an arraydeque to me. It should have O(1) indexed access (as long as you're just peeking at elements and not adding/removing).
|
# ? Feb 11, 2023 23:49 |
|
StumblyWumbly posted:I have a Python memory management and speed question: I have a Python program where I'm going to be reading in a stream of data and displaying the data received over the last X seconds. Data outside that window can be junked, I'd like to optimize for speed first, and size second. I know how I'd do this in C, but I'm not sure about Python memory management. There's a cachetools package that might be relevant (I see it has a TTLCache). Or if performance is critical, you may want to actually write it in C and bind it. Pybind11 is pretty good if you're familiar with C++.
|
# ? Feb 12, 2023 00:00 |
|
Jabor posted:This doesn't look like an arraydeque to me. It should have O(1) indexed access (as long as you're just peeking at elements and not adding/removing). They are indeed different. I'm not aware of an arraydeque equivalent with fast access in the middle in Python's stdlib though I'd be surprised if there wasn't an implementation somewhere out there in the broader ecosystem. Cachetools is an option, provided an LRU cache with TTL on top of it is acceptable. If you have the necessary skills and need the raw performance, binding something faster to Python is always an option as well. I would start by throwing together a proof of concept in Python and profiling it before adding dependencies or binding another language, though.
|
# ? Feb 12, 2023 00:53 |
|
Jabor posted:This doesn't look like an arraydeque to me. It should have O(1) indexed access (as long as you're just peeking at elements and not adding/removing). Oh, does Java's arraydeque have indexable access? I didn't think that it did I'm not aware of a Python deque with faster indexed access, I'm most often a numpy pain pig sorry What kind of use case are you thinking about?
|
# ? Feb 12, 2023 07:00 |
|
Rust's VecDeque supports indexed access. It's not actually particularly useful for the sort of things that you'd use a deque for - but O(n) access implies that it's some sort of slow linked data structure instead of a contiguous ring buffer, which is pretty bad.
|
# ? Feb 12, 2023 08:52 |
|
Out of curiosity, I asked ChatGPT for a beginner python puzzle, and this is what I got:quote:Write a function `def product_except_self(nums):` that takes in a list of integers `nums` and returns a list of integers where each element is the product of all the numbers in `nums` except the one at that index. And this was the solution: code:
code:
I guess we have different definitions of 'beginner'!
|
# ? Feb 12, 2023 22:30 |
|
Python code:
|
# ? Feb 12, 2023 23:06 |
|
That can give a divide by zero answer,, but otherwise is better. The question is easy enough but the ChatGPT answers are pretty cryptic. Thanks everyone for the Python tips, I'll definitely play around with deque.
|
# ? Feb 12, 2023 23:22 |
I had ChatGPT help me out with some thorny unit test mocking approaches. It was surreal and it worked
|
|
# ? Feb 12, 2023 23:29 |
|
I'm finally learning to do anything more than basic powershell oneliners and I'm having a stupid newbie moment here. I'm trying to take a bunch of values, divide them by the largest value and output with a couple decimal places INPUT: 5 30.0 50.0 10.0 100.0 65.0 should break down to 0.30 0.50 0.10 1.00 0.65 I've got something that'll work with that just fine, grabs the six inputs and creates vars a-f, then checks for the largest of them, divides each of the vars by that largest number and outputs. simple enough. But I didn't actually read things well enough, and I need to have the first input be an integer that indicates the number of values that will be coming in, and that's got me banging against a wall. I'm sure its because my approach is hot garbage here but I'm not sure where to even begin doing that sort of dynamic creation of x number of variables. I'm sure this is dumb, please tell me how dumb this is: Python code:
|
# ? Feb 14, 2023 22:34 |
|
Make a list. Populate it with the values. Iterate through it.
|
# ? Feb 14, 2023 22:42 |
|
Thats the basic poo poo I need - now I have something far less dumb:code:
|
# ? Feb 14, 2023 23:15 |
|
You could also get an arbitrary number of floats by having the user specify something like a space-delimited string. For ex:code:
|
# ? Feb 14, 2023 23:32 |
|
|
# ? May 15, 2024 03:56 |
|
Soylent Majority posted:Thats the basic poo poo I need - now I have something far less dumb: Yeah this is much better Do you need num_vals, e.g. is this for an assignment that says you have to have a num_vals? I like Zugzwang's suggestion but maybe you're required to have that num_vals input for some reason Do you know about list comprehensions yet? List comprehensions own. Try this poo poo: code:
|
# ? Feb 15, 2023 03:58 |