|
nullfunction posted:sorted() gives you the sorted keys, not key-value pairs. I mean, fundamentally if you iterate over a dictionary directly it gives you a list so passing sorted a dict to iterate over is going to return the list. If you give it d.items() you get them back as tuples but you can redict if you want to. Python code:
|
# ? Sep 28, 2021 23:10 |
|
|
# ? May 14, 2024 17:03 |
|
Neat, that's definitely nicer to look at than a dict comprehension. I guess I've never really thought about lists of tuples being transformed back to a dict like that, but it makes a lot of sense.
|
# ? Sep 28, 2021 23:38 |
|
I have heard that python isn't the best language for concurrency, parallelism and all that jazz, but I am trying to learn it, so I decided to do some experimentation. I have created a backuper script to do my regularly scheduled backup. It's an implementation of rsync, meaning that, if something is already there and hasn't been modified, then it's not copied again. This are the key lines of code: code:
The multiprocess version has this as the key code: code:
code:
code:
code:
So... why is that ? Part of the reason of running this experiment was to get some empirical results that would show me whether working with memory is more of an IO problem or a CPU problem. My ideas: - Maybe memory a different thing, parallelism/concurrency is not really possible because of the von Neumann chokehold and all I am doing is adding needless complexity? - Maybe I have just programmed wrong and I am using threads and processes like an rear end? - Maybe I don't know how to read the results right? - Maybe I should hang my head in shame and go do crossstitch instead? - Other? Thanks for any suggestions and comments.
|
# ? Oct 1, 2021 11:35 |
There's a lot of guessing in my reply because you haven't simply shared all of the code. However, I assume "dobackup" is the entire process of walking the filesystem, running the comparison and optionally copying the file. In this case all you're doing in the multiprocessing and multithreaded runs is asking that process to run 3 times so I am unsurprised you saw an increase in time on the first run as it's likely you ran into a race condition with the if path exists check and the files were copied at least twice if not thrice. This would make sense with the subsequent runs as they're all roughly the same time. To make use of concurrency here you'd want to walk the source first and have that data set available, at that point you could iterate over the file list and distribute it to parallel workers/async tasks, e.g. (with trio): code:
code:
code:
In your code you're asking python to do the entire operation three times. What you want to do is break the workload up into three pieces. If you want to look at the async approach for concurrency I recommend trio and it has documented help on file operations on their readthedocs page.
|
|
# ? Oct 1, 2021 12:45 |
|
To answer your question, every thread/process is given a directory, and isbin charge of walking/copying that entire directory. Linux_dirs is a tuple of directory names. There's no reason for it to be a class (actually a function of a class iirc) other than forcing myself to do things the object-oriented way to practice it. I *think* processes are given only one directory and no directory is processed twice, but I am very open to the idea that I did it wrong and/or that there's more optimal ways. I'm also gonna try your suggestions! Thanks!
|
# ? Oct 1, 2021 13:24 |
|
Instead of multiprocessing use concurrent.futures.ProcessPoolExecutor It won't solve the problem you see but it's the new thing
|
# ? Oct 1, 2021 16:57 |
|
I also agree with NinpoEspiritoSanto, without seeing what the underlying code is actually doing it's difficult to say what the problem could be. I think the best approach would be to walk with the main process to send changed files to a Queue, and then transfer them with one or two other processes. It would be good to see the current underlying implementation though, because I think your design should see a small performance boost with the SSD in the multiprocessing case Please show more code
|
# ? Oct 1, 2021 20:41 |
|
Ok, more code! This is the main function that does the walking. It is fed one of the folders that I want to back up and simply walks it, checks for existence, then copies or jumps. There's one instance of this function in the sequential version, working one folder at a time. In the multiprocessed and multithreaded versions, this function is mapped, in the process/thread pool with a tuple of directory names. code:
In the multithread/process version I create another function called "start" that launches the pools as I copied and pasted above. What I posted above is the entirety of the start function. And if you are a glutton for punishment you can also check the entirety of my little script, including finding out which removable disks are mounted, mixing to distinct lists of directories and other silly stuff. Any comments and criticism welcome. Es mi dia primero, to quote Homer Simpson. Sequential version Multiprocess version Multithreaded version
|
# ? Oct 2, 2021 21:24 |
|
That code is pretty hard to read I decided to reimplement this problem in very simple terms: 10 GB of data moved around on a single NVMe drive t = 33.7 s for single process t = 33.5 s for multi process walk + transfer t = 30.2 s for multi process transfer only Why isn't the performance any better? Probably hardware reasons, I'd guess. I believe that the *nix rsync utility is a single threaded application for this reason Python code:
|
# ? Oct 4, 2021 06:52 |
|
I managed to land my first dev title (half dev, half my current specialty) in a mostly Python group at a big company. I know the first rule of software engineering is 'do what your team does as long as it's not insane' but are there any other tips in terms of like real-world stuff I should consider?
|
# ? Oct 4, 2021 18:26 |
Falcon2001 posted:I managed to land my first dev title (half dev, half my current specialty) in a mostly Python group at a big company. I know the first rule of software engineering is 'do what your team does as long as it's not insane' but are there any other tips in terms of like real-world stuff I should consider? Get real familiar with the testing infrastructure first thing.
|
|
# ? Oct 5, 2021 03:03 |
|
a foolish pianist posted:Get real familiar with the testing infrastructure first thing. “Internal group where the devs are half Python dev half something else” and “test infrastructure” have very little overlap ime May still be good advice tho
|
# ? Oct 5, 2021 04:12 |
|
QuarkJets posted:That code is pretty hard to read Solid state disk is much slower than main RAM Main RAM is much slower than L3 cache L3 cache is slower than L2 cache L2 is slower than L1 L1 is slower than CPU registers Copying a file is essentially no CPU work if the underlying copy is implemented sanely. The CPU is telling the disk controller "Copy this sector of data to main memory, interrupt me when you're done", taking a nap, telling the disk controller to copy in the other direction and napping again, then repeating that till everything's copied. Having more or faster CPUs won't help because they're napping 99% of the time anyway It's like you've got one guy with a shovel and ten people standing around telling him where to dig next. It's not faster than one guy with the shovel and one boss. e: also, unrelatedly, you have to do import multiprocessing; multiprocessing.set_start_method('spawn') on Unix to get not-broken behavior. The default ('fork') violates fork()'s specification and may randomly deadlock depending on what other code in your process is doing. Foxfire_ fucked around with this message at 04:36 on Oct 5, 2021 |
# ? Oct 5, 2021 04:26 |
|
Can anyone tell me why this works:Python code:
Python code:
code:
p.s. I know it's a bad idea to use hostKeys = None, this is just for some quick testing
|
# ? Oct 5, 2021 11:58 |
Probably need to expand cinfo as kwargs, no?Python code:
|
|
# ? Oct 5, 2021 12:46 |
|
Oh nice, that worked. Thank you!
|
# ? Oct 5, 2021 14:28 |
|
QuarkJets posted:That code is pretty hard to read thanks everybody for your explanationa. I get now why it doesn't go any faster. And, QJ, I'm going to read your example carefully and also read a book on writing better code. Sorry it was hard to read. If you have pointers on what not to dp, great, but as Ibsaid I'll read on the topic a bit.
|
# ? Oct 5, 2021 14:47 |
|
If you run flake8 it will give you all kinds of style suggestions and other modifications, it's good. Pycharm will do the same but will also make it easy to make changes.
|
# ? Oct 5, 2021 16:12 |
|
QuarkJets posted:If you run flake8 it will give you all kinds of style suggestions and other modifications, it's good. Pycharm will do the same but will also make it easy to make changes. This!
|
# ? Oct 8, 2021 02:25 |
|
So just for my own edification, I wrote a quick and dirty script to encode things in a cryptogram-puzzle style format (random mapping based on ord() and chr()); I'm gonna post it here and ask for critiques on how to better handle it. It's all contained in a single python file cipher.py. My kid was interested in it so I did this Python code:
|
# ? Oct 8, 2021 16:37 |
|
code:
|
# ? Oct 8, 2021 16:49 |
|
DoctorTristan posted:
code:
|
# ? Oct 8, 2021 17:34 |
|
I recommend reading Clean Code by Robert Martin, which has a lot of good and timeless advice applicable to all software languages. For example, you should assign the magic number 65 to a well-named variable. And the comment that says "Imports" is useless, may as well remove it You should try to use more list comprehension and more direct iteration, by which I mean try not to use an index as often. Like this: Python code:
|
# ? Oct 8, 2021 18:00 |
|
That dumbass feeling you get when you realize why the logging module you spent so much time unfucking is hosed again and not spitting log_debug output to the stream handler like it should? That dumbass feeling when you realize you had a DEBUG_MODE constant in the module that you weren't setting based on the --debug command-line argument? Yeah, I know that feeling.
|
# ? Oct 14, 2021 18:37 |
|
Edit: lol nvm figured it out. underscore vs dash version name fuckery. x_x too simple to leave up for posterity DearSirXNORMadam fucked around with this message at 20:04 on Oct 14, 2021 |
# ? Oct 14, 2021 19:58 |
|
Is there a quick fix in PyCharm for type inspection warnings that are actually ok? For example I'm using yarl for URLs and requests.get(yarl.URL("http://asdf.net") gives warning about type not being Union[str, bytes], but it works fine. I see this from time to time ("with open(pathlib.Path)" gave similar error for a while). Would be nice if there was an easy way to tell PyCharm "this is ok". Or, perhaps it is not; maybe even though it works we want to explicitly cast as str if that is what requests is telling us it wants to use?
|
# ? Oct 15, 2021 21:50 |
mr_package posted:Is there a quick fix in PyCharm for type inspection warnings that are actually ok? For example I'm using yarl for URLs and requests.get(yarl.URL("http://asdf.net") gives warning about type not being Union[str, bytes], but it works fine. I see this from time to time ("with open(pathlib.Path)" gave similar error for a while). Would be nice if there was an easy way to tell PyCharm "this is ok". Requests doesn’t ask for that, URL just has to have a string representation - it will decode if bytestring, otherwise cast to string whatever you pass. The entire module is untyped, so I suspect this is a PyCharm inference issue. You can suppress inspections for a line by adding a comment right above it: Python code:
|
|
# ? Oct 15, 2021 22:05 |
|
Has anybody here ever directly used a slice object directly for something? Slicing sure, but I mean actually creating a slice() and going nuts with it.
|
# ? Nov 5, 2021 07:55 |
|
Rocko Bonaparte posted:Has anybody here ever directly used a slice object directly for something? Slicing sure, but I mean actually creating a slice() and going nuts with it. Yes.
|
# ? Nov 5, 2021 08:19 |
|
mr_package posted:Is there a quick fix in PyCharm for type inspection warnings that are actually ok? For example I'm using yarl for URLs and requests.get(yarl.URL("http://asdf.net") gives warning about type not being Union[str, bytes], but it works fine. I see this from time to time ("with open(pathlib.Path)" gave similar error for a while). Would be nice if there was an easy way to tell PyCharm "this is ok". In IntelliJ you can do this with alt+enter, it's likely the same in pycharm.
|
# ? Nov 5, 2021 09:40 |
|
QuarkJets posted:Yes. For what? It seems so bizarre.
|
# ? Nov 5, 2021 23:09 |
|
Rocko Bonaparte posted:For what? It seems so bizarre. I do a lot of CV and it's pretty nice to be able to programatically slice a few arrays in the same complicated way, you can accomplish the same thing with an index array but that's extremely impractical if you have a 10 GB image or something. It's like wrapping blocks of code with a function, but just for slice notation All slicing notation does is create a slice object, so if you've ever sliced a bunch of arrays with identical notation then you could have used a slice object to improve those lines of code. QuarkJets fucked around with this message at 02:07 on Nov 6, 2021 |
# ? Nov 6, 2021 01:56 |
|
If I want a dead simple, easy to use cross-platform 2d graphics api (not full screen) what is my go-to these days? The only other thing I've used so far is tkinter and it always felt really clunky. I guess there's QT but I haven't revisited that in ages Basically I want a windowed "app" with some text down the left side with readouts of various metrics, maybe 1 or 2 graphs, and on the right is a big square with i guess 20
|
# ? Nov 6, 2021 12:14 |
As they say, it’s not a crime to want things.
|
|
# ? Nov 6, 2021 12:33 |
|
Kivy? HTML Canvas and Javascript?
OnceIWasAnOstrich fucked around with this message at 14:36 on Nov 6, 2021 |
# ? Nov 6, 2021 14:34 |
|
How is kivy vs like, pygame or whatever I refuse to touch html or JavaScript
|
# ? Nov 6, 2021 14:39 |
|
Hadlock posted:How is kivy vs like, pygame or whatever There is a lot more to it and it is much better maintained. It's a lot more batteries-included for UI stuff especially. Pygame is one (of several) of the backends Kivy will use for rendering. It is also genuinely not very hard to package it for various platforms and handle touch control.
|
# ? Nov 6, 2021 14:48 |
OnceIWasAnOstrich posted:Kivy? HTML Canvas and Javascript? This Kivy thing looks very cool, I’ve never heard of it before. Thank you for bringing it up!
|
|
# ? Nov 6, 2021 15:16 |
|
I've never done any serious work with it but I did throw together a quick gimmick android app with a few buttons and a constantly-updating visual with it once. It came together extremely smoothly and quickly considering I just don't make GUIs as a rule. Without a bunch of aesthetics work it does have A Look that makes it very obvious it isn't native anything.
|
# ? Nov 6, 2021 15:18 |
|
|
# ? May 14, 2024 17:03 |
|
I'm going through HackerRank problems, and I just finished the Nested List problem successfully after two failed submissions involving missing edge cases. I'm thinking about how to write code which tests a function, but without having to create a separate file or import anything that requires installation, because setting that up sounds like it'd eat up some extra time for HackerRank problems or screening tests with a time limit. If I'm really stumped at finding edge cases and coming up with custom input which helps me figure out what to do, I'd try writing a function which generates random arguments within constraints. In this case, it did help me find the edge casePython code:
https://pastebin.com/0pRj8r5Q galenanorth fucked around with this message at 16:19 on Nov 7, 2021 |
# ? Nov 7, 2021 02:40 |