|
I'm not sure if this is a short question or not... but here goes. I'm a physicist, and coding is generally something we do and not something that we're taught, so I'm hoping that someone with more knowledge of the tools available can tell me whether there's a better way to solve my problem. I have two sets of run numbers, and each run number comes with a list of event numbers. I want to check overlap between the two runs so that I don't look at the same event twice, but I want to look at every event at least once. For example RunStream1 Run 152000 has event numbers (1, 2, 3, 41, 45) Run 152234 has event numbers (1, 2, 14, 15, 20) ..... RunStream2 Run 152000 has event numbers (30, 31, 32, 34, 45) Run 156000 has event numbers (blah blah blah) ..... A single RunStream is guaranteed to not have run/event duplicates, but there is no such guarantee between runstreams; in this short example, run 152000 event 45 is in both sets. I have upwards of billions of events distributed cross hundreds of runs, so checking run/event overlap between the two runstreams becomes a pretty large task. I'm working on a server, not a supercomputer, so memory limitations are a concern. Right now for overlap checking I use a dictionary where the run numbers determine the keys and the value of each key is a set() filled with event numbers. I only fill the dictionary/sets when looking at RunStream1, and I only check for overlap when looking at RunStream2. Dictionaries and sets are both hash lists, so checking for membership is fast and easy, as is appending new runs and events. Is there a more effective way that might use less memory while maintaining at least the speed of a hash list?
|
# ¿ Sep 30, 2010 03:54 |
|
|
# ¿ Apr 27, 2024 06:47 |
|
In Python 2.6 I want to take a dictionary and convert all of the keys, which are strings with various cases, to lowercase strings. Right now I am doing it like this:code:
I just want to change the keys to lowercase, not create a second copy of the dictionary in memory. Does this code do that, and how can I check that this is the case?
|
# ¿ Feb 11, 2013 19:42 |
|
yaoi prophet posted:Honestly, that's an odd enough request that I'm curious as to why exactly you want to do this. I'm not saying it's wrong, I'm just curious. I am taking dictionaries as input and trying to read values with specific keys, but there are sometimes inconsistencies in capitalization. Sometimes the dictionaries can be large, but I do have plenty of memory available; I just want to write effective code that doesn't waste time and memory making huge dictionary copies So I think that what I will do is check whether a key is already lower case, and if it isn't then I'll create a new lowercase key with the old value. That would be better than a full dictionary copy, I think
|
# ¿ Feb 12, 2013 01:33 |
|
Emacs Headroom posted:It would probably only be better if you have a relatively low proportion on non-lower case keys and making new keys is a rare operation. You can't "change" a key, since that would also change its hash value, you can only make new keys and delete old ones. So it might end up being like 6-of-one or half-dozen of another when choosing between the solutions. You can always profile with your data to see if there's a winner just to make sure. Yeah, having to create a lowercase key is relatively uncommon (most come lowercased already). I suppose that I could run some tests and find out for sure whether one way or the other is actually faster on average for my data QuarkJets fucked around with this message at 02:02 on Feb 12, 2013 |
# ¿ Feb 12, 2013 01:59 |
|
Thinking deeper, I should specify that the dictionary values are actually numpy arrays each with 1k to 1B entries. There are maybe only 100 keys in the dictionaries, really it's these arrays that are large. When I create a new key and give it the same value as the old key, then a reference gets passed and I am not actually creating any new arrays in memory, correct? So I don't really even need to worry about deleting the old keys since minimal memory is used by two keys both pointing to the same array
|
# ¿ Feb 12, 2013 02:17 |
|
The Insect Court posted:Unless you're going to be vastly increasing the number of keys, it shouldn't be an issue. That said, you can pretty easily do: Excellent, thanks guys!
|
# ¿ Feb 12, 2013 06:53 |
|
What's the general feel regarding Python 2 vs Python 3? None of the computers at the place where I work have Python 3 installed, and I know that it's not backwards compatible, so isn't every new Python 2 script just going to create problems in the future when Python 2 eventually gets abandoned? Python 3.x doesn't come with our redhat installs, and getting IT to install it for us would be a pain. Is this worth it? I've been slowly ramping up my effort to get people in my workplace to switch from MATLAB to numpy/scipy, but if this is creating headaches for a Python 2.X to 3.X switchover in the future then I'd like to make the switch happen sooner rather than later aeverous posted:What do you guys use for your Python work, I'm currently using Notepad++ with the pyNPP plugin but earlier this year I used VS2010 for a C# project and gently caress if I didn't get really spoiled by the code completion. I've looked around at Python IDEs and they all look a bit crap except PyCharm which is pretty expensive. Are there any free/OSS Python IDEs with really solid code completion and a built in dark theme? Vim Although recently I tried Spyder 2 on my Windows box (comes with the Pythonxy package) and it worked really really great, so I may start using that more.
|
# ¿ Mar 7, 2013 05:39 |
|
Thanks for the feedback, guys Modern Pragmatist posted:In my experience, 2to3 works pretty well for converting python2 code to be compatible with python3. The biggest hurdle is the bytes/str/unicode change, but as long as you don't do much work with strings it should be pretty painless to migrate. That's good news; it sounds like I don't need to worry about this too much and can just make a casual python3 software request without banging on doors
|
# ¿ Mar 7, 2013 21:28 |
|
JetsGuy posted:Same. I avoided lambda for a long time because it just seemed lazy and dumb to me. Then I programmed in python for a few years and understood. Can you help me understand? Because lambda seems lazy and dumb to me, and while I do use Python a lot I wouldn't say that I'm more than average-skilled, so I'm authentically interested in learning more about the parts of Python that I don't get to see day-to-day. Wouldn't it be better to write a clean and documented def in case someone else has to use your code in the future? I've been following along, but I still can't think of a circumstance where a lambda is easier to understand than a def QuarkJets fucked around with this message at 08:10 on Mar 13, 2013 |
# ¿ Mar 13, 2013 08:06 |
|
I don't really understand how to spot whether I may be writing spaghetti or ravioli code. I want to be a good python programmer, so I have read the most recent PEP and I try to code readable but efficient code. Would anyone mind posting examples of python spaghetti/ravioli code for examples of how not to code? With no formal training but years of experience, I am worried about bad habits that I may not even realize I have
|
# ¿ Mar 15, 2013 21:20 |
|
I have used Qt and wxPython on Redhat and Windows. I couldn't say that one was necessarily better than the other, so if wxPython has Mac problems then you may as well go to Qt
|
# ¿ Mar 20, 2013 21:44 |
|
DARPA posted:If you're new to both python and programming I recommend Tkinter. It's incredibly easy to learn, handles data binding simply, and comes bundled with the standard python installation. Using the widgets from ttk makes things look nicer than Tk's reputation let's on. Just make sure the python ttk package is installed and imported. Then all it takes is changing your widget definitions like Button(...) to ttk.Button(...) and you'll get a better themed look. That said, all of the Tkinter examples that I've seen look like poo poo, so think of it as a good educational tool only
|
# ¿ Mar 21, 2013 09:22 |
|
Dominoes posted:Thanks for the GUI advice dudes - I'm going to use QT. Spyder 2 does that also, I think
|
# ¿ Mar 23, 2013 01:29 |
|
Popper posted:This is basically how people are thought classes. The assignment makes it sound like he hasn't been told about classes yet (specifically, it says to use functions), so he shouldn't use a class (even if it would be objectively better and was my first thought, too).
|
# ¿ Mar 25, 2013 00:45 |
|
I have something like this:code:
So here's my question: are there any dangers to making each class threaded? I'm new to threading and haven't experienced the difficulties of creating a threaded program before. So for instance, class host creates a list of dir() objects, and class dir creates a list of file() objects, so I could theoretically thread each of these and experience a speed increase so long as I'm careful about waiting for for the dictionary-filling and class-creating operations to complete before declaring the thread complete (ie use join() to make sure that everything is filled before accessing any of the dictionaries?) But since this is a lot of file I/O, will this not benefit as much from threading?
|
# ¿ Mar 25, 2013 23:19 |
|
Suspicious Dish posted:You seem to have a fundamental misunderstanding of threading. I'd lay it to rest until you pick up more of the basics of CS. Threading is a fairly complex subject in and of itself, so it's something to tackle after you understand more of Python and more about memory and processes and everything. I'm self-taught and have been coding for nearly a decade now (scientific coding, a means to an end), but I don't know anything about threading aside from whatever I've read on the Internet. I'm well-versed in dealing with memory, but I haven't really learned any actual computer science
|
# ¿ Mar 26, 2013 12:19 |
|
Suspicious Dish posted:So, you can't make objects threaded. Threads are like additional programs that run in the same memory space (in Linux, that's literally the only difference from forked processes — they get their own PID and everything) This is all easy to understand, from the way you've described it. Thanks. My project consists of a step in which specific files are read in and then a second step in which calculatios are performed using data from those files, but the files and the data in the files is not being changed (IE results are stored in new variables and saved elsewhere on the disk, MySQL insertions are performed, etc). So I could set up a Queue for the read-in (which shouldn't benefit much due to a file I/O bottleneck) and another queue for the operations, and before queuing operations I would just need to wait for the read-in queue to empty, right? Alternatively, if the file I/O queue really doesn't improve speed at all, then I could just read everything in normally and then setup a queue for processing the data, so long as I'm careful about not changing the data that is being operated on. I'm very comfortable with coding in Python and using Python, I've just never done Computer Science, which I see as "understanding what the code is making the computer actually do." I have a lot of experience in C++ as well, so I know all about memory management and reference passing and how memory management in Python works differently (IE basically Python uses something analogous to smart pointers, a chunk of memory can only cleaned up after the number of active references to that chunk of memory becomes 0, all done automatically), I'm just really clueless in threading and multiprocessing, and I'm not well-versed in the "guts" of Python, just the high-level stuff. AKA there is a gap between what many of the hard science programs teach (a bare-bones introduction to scientific computing) and what is actually needed in the real world of hard science (actual computer science knowledge for real and efficient scientific computing), and individually overcoming this gap is what I've been trying to do for the last few years. This July I'll become eligible for free remote-learning university courses through my employer, I already have plans to get some intro-level CS courses under my belt in the hope of building a better understanding of what is going on under the hood. Thern posted:You can release the GIL? I need to investigate this further as I have something that is very I/O bound. And Multiprocessing is a bit of hack I feel. According to this page the GIL is always unlocked when doing I/O. I don't know whether this helps you. QuarkJets fucked around with this message at 20:58 on Mar 26, 2013 |
# ¿ Mar 26, 2013 20:55 |
|
[i for i in u] will create a list that is just all of the elements in u. If u is list1, then list3 will be filled with the same values as list1 with that line. You don't appear to be using enumerate, so instead of keeping track of an index and incrementing it you could just:code:
code:
This isn't great though because if b is shorter than u then the code will bomb out
|
# ¿ Mar 29, 2013 21:08 |
|
ARACHNOTRON posted:why is Python so weird?? C and other languages work in a similar way; would you prefer it if Python/C/etc recursively searched your entire file system every time that you tried to import something? Setting your PYTHONPATH only takes a second. Just point it to the directory where you're currently testing things. It would also be way faster than any of the workarounds that you're considering
|
# ¿ Mar 29, 2013 23:29 |
|
ARACHNOTRON posted:It would be nice if it searched back until it didn't find an init file. I don't know. Ah, but Python also supports importing files that reside in directories without an init file. And you might have more than one area holding .py files that you want to use, which would ultimately necessitate some sort of PATH variable anyway (Java uses a PATH-like variable, too, it was probably set for you by your IDE if you're working in Windows. I think it's JAVAHOME? I haven't used Java in a long time)
|
# ¿ Mar 30, 2013 21:29 |
|
Social Animal posted:So I learned the basics of Python from code academy and now I really want to start a small project to practice. I was thinking of a page I can upload/download files from sort of like my own ghetto megaupload. What's a good place to start? I looked at Flask and it looks like I can use this but I'm hitting a brick wall. When it comes to coding I am pretty bad (probably why I was attracted to Python's easy to use/read syntax.) The problem is it feels like frameworks are a whole different language in themselves, and I'm pretty lost. Can anyone please recommend me a good tutorial or path I should take to really start learning? Or recommend a good beginner's project I can start with? Have you completed the Python Challenge yet? That's a really good first project
|
# ¿ Apr 3, 2013 06:55 |
|
Social Animal posted:This is the only page I could find: That's the one. It was made in 2005, so not exactly the height of web design
|
# ¿ Apr 3, 2013 11:43 |
|
BeefofAges posted:Wow, those are crappy function names. It's pretty similar to the old atoi and atof functions that C has
|
# ¿ Apr 5, 2013 10:07 |
|
Is there a handy but thorough list of advantages gained by using new-styled classes in Python 2.X? I know a few of the advantages, like being able to user super, but I'd like to have a whole list handy to show to a coworker
|
# ¿ Apr 12, 2013 08:45 |
|
dedian posted:I wrote a little script this weekend to calculate and store md5 hashes of files in a list of directories I point the script to, for the purpose of finding duplicates. I fully realize there's lots of tools to do this already, and do it better, but I wanted to write something myself as I get more familiar with Python (I keep meaning to go through Learn python the hard way or other tutorials ). Does any of this look like the completely wrong way of doing things? It seems like the for loops don't need to be as nested... somehow? List comprehensions, or.. something? Anyway, just something dumb to poke holes in on a Monday morning (I'm happy at least that it works ) MD5 is great, but if you're like me then you worry that your hashes might detect file duplicates when none exist (IE two unique files can produce the same hash). If that's the case, then you want your hash strings to be as long as possible so as to minimize the likelihood of collisions. You'd only need to change 1-2 lines in order to use sha256 or sha512 instead of md5 (hashlib supports both). You could also check for the file size in bytes when checking for duplicates; if two files have the same hash but different file sizes, then they are probably unique files, no?
|
# ¿ Apr 15, 2013 18:28 |
|
Scaevolus posted:Unless you have a habit of storing outputs of MD5 collision creators, you will never see an MD5 collision on your filesystem. You probably won't see a collision on your filesystem with md5, you mean. You can trivially implement any of the sha-2 hash functions for much better collision resistance, and it's not really costing you anything to do so, so why not? Checking for file sizes will make the probability basically zero and will make the duplicate checking much faster, but there's always that one in a gazillion chance that two unique files with the same file size will also have the same hash...
|
# ¿ Apr 16, 2013 05:47 |
|
Haystack posted:Given that dedian would need to have a folder with about 10 billion billion files in it before he'd have a decent chance of having single MD5 collision, he'd probably be better off spending his time shielding his computer from cosmic radiation Hey I agree with you in that the probability is basically zero and he doesn't need to worry, I just like effortless solutions that teach people new things
|
# ¿ Apr 16, 2013 06:58 |
|
My girlfriend is interested in learning more web development skills. Maintaining a web site is a small facet of her job, so it actually does have some applicability to what she's doing. For instance, she knows how to use CSS and javascript. She knows that I've been using Python for a long time, and she has asked me if Python is a useful language to learn for web development purposes. I wasn't sure what to tell her; I've never done any web stuff. Searching around on the web gives links to people talking about how awesome django is as a web framework, but I don't really understand what django does. Is it only useful for creating web applications with forms and the like or can it also just be used to make nice-looking web sites in a relatively easy way?
|
# ¿ Apr 17, 2013 08:41 |
|
I feel like I'm typing 'self' a whole lot when I write classes. Using self.method() when I need to call internal methods, self.object to access internal variables, etc. This looks ugly to me; is there a way to prettify it?
|
# ¿ Apr 18, 2013 19:25 |
|
yaoi prophet posted:
That's even uglier than just using 'self' everywhere
|
# ¿ Apr 18, 2013 21:00 |
|
Dominoes posted:I'm making a binary because I'd like to distribute this program to coworkers. I'd prefer the program to just work. There's got to be something I can add to my setup.py, or a way to get the proper .dll manually into the program's folder. Do you have the Visual Studio 2010 redistributable package on the computer that you're using to create the binary?
|
# ¿ Apr 19, 2013 04:19 |
|
Dominoes posted:No, but I have various 'Microsoft Visual C++ (year), x86 and x64 Redistrbutables' under the uninstall-a-program dialog. Check if there are any keys in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\VisualStudio\10.0\VC\VCRedist\ And if you don't have it installed, then you might try installing it on the machine that you're using to create the executable; at some point the Visual Studio compiler is getting called (since you're on Windows). It's not totally clear to me how cx_freeze works in Windows
|
# ¿ Apr 19, 2013 11:07 |
|
I am using paramiko to sftp get a large file from a remote server. code:
The implementation works great, but keyboard interrupts cause weird behavior. One of two things happen: A) Paramiko becomes a zombie and I have to kill the python process by hand B) Paramiko catches the keyboard interrupt and raises a paramiko.SFTPError with the text "Garbage packet received." Has anyone else had this experience? Can I somehow prevent A) from happening?
|
# ¿ Apr 19, 2013 20:58 |
|
Dominoes posted:Lesson in reinventing the wheel today: I created a function that allows date addition and subtraction using modulus and floor division. Turns out the datetime function already does that! It was a good learning experience. And quite well, at that! MySQLdb even converts its own datetime format into the Python datetime format. Very handy, although inserting a datetime with MySQLdb requires converting it to a string first
|
# ¿ Apr 21, 2013 05:48 |
|
Agreed; I have no idea what you're asking for at this point. What is this most recent Python code supposed to do? e: To me, nothing in your psuedo-code looks anything like the code that you just posted
|
# ¿ Apr 21, 2013 20:34 |
|
Dominoes posted:I'm not asking anything at this point; Nippa and Plork posted examples that I turned into a solution. I was wondering if there's a clean way to implement variables in code similar to the .join and %s abilities of strings. This is terrible, don't go looking for this. It's probably not what you actually want to do. e: You asked for clean, what you did is way cleaner than trying to pull variables from strings and then loop over them or whatever QuarkJets fucked around with this message at 20:44 on Apr 21, 2013 |
# ¿ Apr 21, 2013 20:39 |
|
Dominoes posted:Iterators. Don't do this: code:
code:
And instead of a failure flag, you could just return when your failure condition is met. Plus it looks like result is never longer than length==1, so couldn't you just scrap it entirely? Like this : Python code:
... Wait, symbol isn't ever used or modified anywhere in the code! Can't this just return True or False? QuarkJets fucked around with this message at 22:53 on Apr 21, 2013 |
# ¿ Apr 21, 2013 22:39 |
|
When you're done with an object that takes up a lot of memory (for instance, a 1k by 1k by 1k numpy array), is it considered better practice to delete it with del or just leave it for garbage collection?
|
# ¿ Apr 28, 2013 21:07 |
|
^^^ Because he's probably in Windows and the Windows command line is ghetto as helldantheman650 posted:I am completely new to programming and am playing around with Python after learning some basics on CodeAcademy. I tried using Notepad++ on a friend's recommendation but it turns out getting Python code to run from it is a pain. The OP of this thread is mostly going over my head and the tutorial on setting up VIM is much more advanced and complicated than anything I need at this point. What is a good, simple IDE to write and run Python code? The wiki has a huge list and I don't know which to pick. I really, really like Spyder 2. It comes with Pythonxy, which is basically a big executable full of additional python libraries (numpy and others) and IDEs. Pythonxy is aimed toward people who want to make a switch from MATLAB to Python on Windows systems. It's also free. Even if you're not interested in any of the computational stuff, it's still a great starting point simply because it gives you the option of installing a bunch of different IDEs (so that you can try out a bunch of them out and then keep whichever one you like best) and a bunch of extra libraries to play with (although they're all optional components that you can add later). Spyder 2 is as simple as you want, but it's also incredibly easy to run your code in it. It comes with a command-line interpreter that has all of the normal additional features that you'd expected of a well-developed command-line interpreter (such as tab completion), but there's also a keyboard shortcut for just running code in a fresh window. I suggest trying it
|
# ¿ May 1, 2013 06:01 |
|
|
# ¿ Apr 27, 2024 06:47 |
|
BeefofAges posted:First make it work, then make it fast (if you need to), then make it pretty (if you need to). If by "pretty" you mean "readable" then shouldn't that be part of "make it work?" Most scientific programming is done in the style of "I'm going to get this to work, I don't care if it's fast or readable", and it's actually a huge problem when a change to the code needs to be made but the entire house of cards falls apart because the code has turned into a black box and no one knows what makes it work
|
# ¿ May 14, 2013 08:09 |