First of all you're repeating yourself; you're doing the exact same random number generation thing both outside the while loop and inside it. If you find yourself doing that, you should be thinking "a) I should refactor this repeated code into a single method I can call multiple times; and b) I should rethink this while loop pattern so I only have to write it once in the first place". Secondly you shouldn't have to do a query at all. Assuming (and if this is a bad assumption, disregard this suggestion) that the index is a unique column in your table, if you try to just insert a row with a pregenerated index that collides with an existing row, it ought to raise a DatabaseError (or something similar, I'm guessing this is sqlalchemy and I'm not sure what the exceptions look like). So what I would try is the "forgiveness rather than permission" approach, i.e. Python code:
Data Graham fucked around with this message at 04:25 on Oct 1, 2023 |
|
# ? Oct 1, 2023 04:18 |
|
|
# ? May 15, 2024 04:07 |
|
Why not use UUIDv4 instead of generating this weird custom hashing thing? It's basically guaranteed to not collide and v4 is generated from a seed. The collision chance of UUIDv4 is essentially zero - one explanation put it at 'generate a million UUIDv4s a day for 100 years and you'll have a 50% chance of a single duplicate.'
|
# ? Oct 1, 2023 05:02 |
|
Data Graham posted:First of all you're repeating yourself; you're doing the exact same random number generation thing both outside the while loop and inside it. If you find yourself doing that, you should be thinking "a) I should refactor this repeated code into a single method I can call multiple times; and b) I should rethink this while loop pattern so I only have to write it once in the first place". Huge thanks for the tip and example there. Honestly it just drives home the point that I need to finally get off my rear end and learn how to actually handle errors. Because yeah, my script would error when trying to insert the dupe index value, and the hackjob was just so I could get the errors off my back and push back having to learn how to deal with them properly. Guess that era's over! Falcon2001 posted:Why not use UUIDv4 instead of generating this weird custom hashing thing? It's basically guaranteed to not collide and v4 is generated from a seed. The collision chance of UUIDv4 is essentially zero - one explanation put it at 'generate a million UUIDv4s a day for 100 years and you'll have a 50% chance of a single duplicate.' e: oh, I remember one reason why I didn't think to try it Data Graham's way at first, because the function in question makes several dataframes (each with its own function building it), generates an index for each, and then inserts them all to their various tables all at once at the end, so there's a lot happening between the generate and insert. I realize that's a design problem and I'll need to look at a deeper refactor later. e2: actually, now, since the design issue is blocking the fix, lol, dammit I was gonna play video games Son of Thunderbeast fucked around with this message at 06:26 on Oct 1, 2023 |
# ? Oct 1, 2023 05:37 |
|
Data Graham posted:First of all you're repeating yourself; you're doing the exact same random number generation thing both outside the while loop and inside it. If you find yourself doing that, you should be thinking "a) I should refactor this repeated code into a single method I can call multiple times; and b) I should rethink this while loop pattern so I only have to write it once in the first place". This is a great time to mention `suppress`, I like to recommend that for bare except blocks Python code:
|
# ? Oct 1, 2023 07:14 |
|
Son of Thunderbeast posted:Yeah, I recognized the repeated code as an issue even at the time, and was like "I'll fix that up when I have time or need to" and well here I am now haha. Yeah, it's sqlalchemy. Took a minute to get used to it but once I did it was great. So if all you're doing is generating a unique ID, and the actual length of that ID isn't particularly concerning from a performance standpoint, then a UUID is a great way to do that; UUIDv4 is deterministic, whereas UUIDv6 is not (it's more complex than that obviously, but that's a good starting point). The deterministic part is useful if you want to consistently produce the same UUIDv4 from source data.
|
# ? Oct 1, 2023 07:14 |
|
QuarkJets posted:This is a great time to mention `suppress`, I like to recommend that for bare except blocks Ahh that's neat, thank you
|
# ? Oct 1, 2023 07:30 |
Seconded, that's rad But yeah OP, mastering exceptions is one of those things you should definitely make time for, because it's not some kind of advanced afterthought system for dealing inelegantly with corner cases the way some languages treat it — in Python it's a robust, first-class and recommended pattern for handling certain kinds of logic flow. Once I started thinking of exceptions as a top-drawer tool just like for loops and class methods, it opened up a whole world of straightforward and concise code. Clearly there's always more to learn about them (case in point), but the fundamentals of them are real solid and pretty simple.
|
|
# ? Oct 1, 2023 12:46 |
|
Tiniest thing that would almost never cause a problem but would be hell to figure out if you weren't using an IDE or linter that would have already pointed it out: id is a built-in you should avoid shadowing with an argument of the same name.
|
# ? Oct 1, 2023 18:06 |
|
Falcon2001 posted:So if all you're doing is generating a unique ID, and the actual length of that ID isn't particularly concerning from a performance standpoint, then a UUID is a great way to do that; UUIDv4 is deterministic, whereas UUIDv6 is not (it's more complex than that obviously, but that's a good starting point). The deterministic part is useful if you want to consistently produce the same UUIDv4 from source data. QuarkJets posted:This is a great time to mention `suppress`, I like to recommend that for bare except blocks KICK BAMA KICK posted:Tiniest thing that would almost never cause a problem but would be hell to figure out if you weren't using an IDE or linter that would have already pointed it out: id is a built-in you should avoid shadowing with an argument of the same name. Data Graham posted:Seconded, that's rad e: I'm aware of pytest and have a tab open with the documentation, just haven't dipped my toes in yet
|
# ? Oct 1, 2023 21:07 |
|
Quick update, I did the refactor and the script is loving zooming now, even faster than before (and it wasn't that bad before i hosed it up), thanks for all the help! Before it was like Python code:
Python code:
Son of Thunderbeast fucked around with this message at 22:45 on Oct 2, 2023 |
# ? Oct 2, 2023 19:26 |
|
Writing a small Flask website that displays a bar chart. The bar chart displays the data for a single unit and there are 12 units. I want to easily navigate between the bar chart for each of the 12 units. I know I can build a form with a drop down selector and a submit button, and when I click the submit button the entire page will reload with the new data. I want to avoid reloading the entire page (and just update the bar chart) by using the drop down selector and fetch to accomplish this functionality. What event do I need to listen out for with the drop down selector? I tried using 'click', but since you have to click once to bring up the list of options the fetch is being fired before a new unit has been selected (note that qrData is a variable which is declared earlier on the page, and I am trying to set it to the new data passed back by the "/qr_data" route). JavaScript code:
Jose Cuervo fucked around with this message at 00:55 on Oct 3, 2023 |
# ? Oct 2, 2023 19:31 |
|
Folks. Don't Suppress(): errors. And dont except: pass them either. Handle the drat things! Otherwise your just writing dangerous code.
|
# ? Oct 3, 2023 08:14 |
|
Jose Cuervo posted:jablahblahscipr Make life easier for yourself. Javascript is the hell dimension, but there are ways to make life less complicated: https://svelte.dev/ (If its just inhouse stuff and dont mind it looking like an IBM website, I solidly recomend the svelte carbon components from IBM)
|
# ? Oct 3, 2023 08:17 |
|
duck monster posted:Folks. Don't Suppress(): errors. And dont except: pass them either. There's plenty of times where you just want to say 'yeah there's no action to take here' and that pattern is fine. It's better than a bare except as well, since using suppress at least has you doing a specific exception clause.
|
# ? Oct 3, 2023 11:17 |
|
I'd only use it if I expected a specific kind of error to arise that I knew was okay to ignore. One of the files I need to regularly read at work comes from a 3rd party and is essentially a giant LZMA2-compressed text file. Whenever I get to the end of it, it raises an EOFError due to a glitch in whatever they're using to compress it. (This error also shows up in 7-Zip; or rather, 7-Zip says "hey this file looks a bit odd" but still reads it okay.) I had to write a context manager specifically for ignoring EOFErrors in that dumb file type because otherwise my code would spend like 30 minutes rolling through it completely fine and then crash when it got to the very last line.
|
# ? Oct 3, 2023 11:42 |
|
Selenium raises whenever an element isn’t present in a web page so try except NoSuchElementException is basically if/then. It’s normal to verify something ISNT there by passing or suppressing.
|
# ? Oct 3, 2023 14:01 |
|
duck monster posted:Folks. Don't Suppress(): errors. And dont except: pass them either. Suppression just means "I expect this error, and I want to handle it by ignoring it." Sometimes that's okay Consider a situation where you want to update an object from extracted values that are in a bunch of nested dictionaries, but if any of the keys are missing then you don't want to perform any updates. I'd argue that the best design here is to wrap the function call in suppress(KeyError). Python code:
|
# ? Oct 3, 2023 16:53 |
|
Import errors are another common reason to use suppression, for instance this is from scikit-learn:Python code:
|
# ? Oct 3, 2023 16:56 |
|
*screams in programer documentation* Why do libraries have such poo poo documentation? Trying to implement Tortise ORM with Aerich migrations in Fastapi and holy gently caress is this painful. Theres a giant config object that is required that appears to be completely undocumented, and *all* possible configuration errors produce the same error with no hint on whats wrong. how to mark a table unmanaged so the migrations leaves it alone? Who the gently caress knows. Come back Django. All is forgiven. duck monster fucked around with this message at 06:51 on Oct 5, 2023 |
# ? Oct 5, 2023 06:48 |
|
Python 3.12 is out and looks interesting. If you believe some benchmarks it looks like it could be 20-30% faster than 3.10 which I run a lot of.
|
# ? Oct 5, 2023 12:36 |
|
I want to make sure that a string that is passed in has the year-month format 'yyyy-mm'. So for example, '2022-09' would work or even '2022-9', but not '2022 09' or '22-09' etc. Should I be trying to use a regular expression for this?
|
# ? Oct 6, 2023 03:13 |
|
You could, but you could also split the string on - and parse the results into integers e: woops, yeah, regex will validate that. Four digits, a -, then either 1 or 2 digits
|
# ? Oct 6, 2023 03:18 |
|
Jose Cuervo posted:I want to make sure that a string that is passed in has the year-month format 'yyyy-mm'. So for example, '2022-09' would work or even '2022-9', but not '2022 09' or '22-09' etc. Should I be trying to use a regular expression for this? This seemed like a fun thing to pass to ChatGPT, so I asked for three ways it could be done because two seemed obvious: regex and datetime. It found a third and that other way ended up being the fastest (according to ChatGPT Advanced Data Analytics Plugin, I didnt test it) code:
code:
|
# ? Oct 6, 2023 03:56 |
|
Looks like that third method (using str.split() and isdigit() ) will parse “2022-2022” as a valid date, and only the datetime.strptime() method will reject invalid months like “2022-17” so apples and oranges.
|
# ? Oct 6, 2023 07:40 |
|
Yeah I like how ChatGPT decided that there can't be more than 9999 years but more than 12 months is okay
|
# ? Oct 6, 2023 16:50 |
|
QuarkJets posted:Yeah I like how ChatGPT decided that there can't be more than 9999 years but more than 12 months is okay DoctorTristan posted:Looks like that third method (using str.split() and isdigit() ) will parse “2022-2022” as a valid date, and only the datetime.strptime() method will reject invalid months like “2022-17” so apples and oranges. This is, IMO, the biggest problem with ChatGPT and other things; they'll be wrong in ways that humans generally aren't, and generating code with it will cause all sorts of weird bugs, because it hasn't even attempted to work through the problem like a human would, it's just going to go 'yeah this is what it should look like'. It's like if you took advice from a guy who had read every stack overflow post, but hadn't written any actual code, and just wrote things down on paper and send them to you. Not bad for basic boilerplate stuff where issues will be apparent, but every time I've asked it for regex it's been wrong, and it's hard to tell since regex is already an occult language.
|
# ? Oct 6, 2023 17:34 |
|
thought I must be missing some slick way to use regular expressions for this task, but I I think I will stick to checking the length of the string is 7 or 6. If the length is 7, then I will check for the 5th character being '-', the last two characters need to be one of 12 options ('01' through '12'), and that the first 4 four characters are a number between 2006 and 2023 (there is no data pre-2006). If the length is 6, then the only thing that changes is that the last character needs to be one of 9 options ('1' through '9').
|
# ? Oct 6, 2023 19:03 |
|
LLMs are useful when there's a lot of data ("how do I web scrape this page in Python") but decidedly less useful when there's not. It doesn't help at all that they don't actually understand anything, they just mush together stuff that is usually mushed together. Zugzwang fucked around with this message at 19:59 on Oct 6, 2023 |
# ? Oct 6, 2023 19:56 |
|
Jose Cuervo posted:thought I must be missing some slick way to use regular expressions for this task, but I I think I will stick to checking the length of the string is 7 or 6. If the length is 7, then I will check for the 5th character being '-', the last two characters need to be one of 12 options ('01' through '12'), and that the first 4 four characters are a number between 2006 and 2023 (there is no data pre-2006). If the length is 6, then the only thing that changes is that the last character needs to be one of 9 options ('1' through '9'). You could also try using https://www.programiz.com/python-programming/datetime/strptime to brute force it and do error checking to see if you've passed in a bad date. At least that way you're using a consistent, proven method for date validation instead of rolling your own.
|
# ? Oct 6, 2023 22:38 |
|
Yeah datetime.strptime does most of that poo poo for you, I don't remember if it needs to be given different character codes for 4-digit vs 2-digit years but that's the much simpler implementation. Use them standard tools
|
# ? Oct 6, 2023 22:44 |
|
Zugzwang posted:It doesn't help at all that they don't actually understand anything, they just mush together stuff that is usually mushed together. succinct description of most reddit comments
|
# ? Oct 6, 2023 22:58 |
I was gonna suggest strptime but wasn’t the question about the fastest method, strptime being one possible option? (And the slowest)
|
|
# ? Oct 6, 2023 23:05 |
|
Data Graham posted:I was gonna suggest strptime but wasn’t the question about the fastest method, strptime being one possible option? (And the slowest) The original question didn't require fastest, that was just a followup note. Jose Cuervo posted:I want to make sure that a string that is passed in has the year-month format 'yyyy-mm'. So for example, '2022-09' would work or even '2022-9', but not '2022 09' or '22-09' etc. Should I be trying to use a regular expression for this? Regex would probably be fastest, but like...this isn't exactly a hyper expensive thing to check unless you're doing this millions of times a second or something. I dunno the workflow but it woudl be an insane setup to revalidate something like that so frequently. Edit: wow I'm an idiot, it's literally in the timeit results from the other goon. Teach me to post while working. Still 6x increase in time is probably not meaningful for something so small. Falcon2001 fucked around with this message at 23:40 on Oct 6, 2023 |
# ? Oct 6, 2023 23:27 |
|
ComradePyro posted:succinct description of most reddit comments
|
# ? Oct 7, 2023 00:00 |
|
Yeah I am not going to worry about the efficiency of strptime until I've seen proof that it's bogging down my software, often that half-millisecond is not going to matter and if it does then I'm probably not going to use option 1 or 2 either
QuarkJets fucked around with this message at 04:47 on Oct 7, 2023 |
# ? Oct 7, 2023 04:45 |
|
Do you guys have any tips for using Python to do insane amounts of crime https://x.com/molly0xFFF/status/1710718416724595187?s=20
|
# ? Oct 7, 2023 22:35 |
|
Yeah, don't get caught like that fuckin' assclown
|
# ? Oct 7, 2023 23:08 |
|
I am a mediocre programmer and only know how to commit sane amounts of crime
|
# ? Oct 7, 2023 23:54 |
|
KICK BAMA KICK posted:Do you guys have any tips for using Python to do insane amounts of crime Obfuscate better than def do_crimes()
|
# ? Oct 8, 2023 00:11 |
|
|
# ? May 15, 2024 04:07 |
|
from crimes import financial_fraud
|
# ? Oct 8, 2023 04:10 |