|
OnceIWasAnOstrich posted:The way I usually handle this is by having each script call functions defined in that file in a if __name__ == '__main__': block separating out the output from the actual work, and then I can easily combine these into one superscript by importing those functions and making the whole pipeline call the imported functions in the correct order. Then I get one big script that can do everything, as well as a bunch of scripts that can do each step. I have used you solution in the past for faster scripts. My problem is that currently some of the function calls take hours to complete. So I don't want to just have one sequential list of function calls. I have gotten around this by manually switching function calls on/off depending on if I need to update the entire sequence. This works well until I forgot what needs to be called. I am thinking about decorating each function with lists of files that the function is dependent (required) on, and files that the function creates (provides). The decorator would then only run if the required files are newer than the provided files, or the provided files don't exist.
|
# ? Jan 29, 2014 02:47 |
|
|
# ? Jun 12, 2024 17:23 |
|
deimos posted:Oh poo poo, a fellow puertorican goon? No, just liked the name.
|
# ? Jan 29, 2014 02:50 |
|
accipter posted:I have used you solution in the past for faster scripts. My intuition would be to set up the pipeline in a way that you can start it from the top, or you can start it from any step along the way. Each step calls the next step in the pipeline. That's assuming it's entirely linear, i.e. if you update the first step, you'll have to re-run everything.
|
# ? Jan 29, 2014 03:52 |
|
One of the things I Have seen a lot of pipelines do (and what I did for one of the ones I had with hours-long steps) was have the script accept an output folder as the placeholder for that run, and each step gets its own numbered subdirectory. This also makes it easy to have auto-restart because you can simply place a 0-byte file in the subfolder once the step is complete, and the pipeline script can check for the existence of properly named subfolders and then check whether that step had completed. If other people are going to be using it it helps to keep a hash or something of the input files so you can avoid errors when people rerun the script with different data and the same output folder. Some of my intermediate files got ridiculously large so I had an option to delete subfolders after the subsequent step had finished. There are also solutions like ruffus (a very lightweight tools originally built for building bioinformatics workflows) or luigi (a much more complicated library used by Spotify to manage batch and Hadoop jobs) which are designed for combining multiple long-running steps into a flexible workflow.
|
# ? Jan 29, 2014 04:45 |
|
Jose Cuervo posted:No, just liked the name. What? No, the data you posted is PR Municipalities.
|
# ? Jan 29, 2014 05:00 |
|
accipter posted:Typically, I create multiple Python files that accomplish a small task. By the end of a project, I might have a dozen or so scripts the rely on the output from a couple different other scripts. I don't want to create one massive script because there are times when just one portion needs to be updated. This might be overkill, but I use Jenkins ( http://jenkins-ci.org/ ) for managing this sort of thing. It's got great built in support for scheduling things, creating pipelines of tasks, archiving results, publishing results, and so on.
|
# ? Jan 29, 2014 06:24 |
|
OnceIWasAnOstrich posted:... It looks like ruffus is just what I was looking for. Thanks! BeefofAges posted:This might be overkill, but I use Jenkins ( http://jenkins-ci.org/ ) for managing this sort of thing. It's got great built in support for scheduling things, creating pipelines of tasks, archiving results, publishing results, and so on. Just poked around on the Jenkins site and it does seem a little overkill for this project.
|
# ? Jan 29, 2014 15:54 |
|
Lysidas posted:I strongly recommend that you use Python 3, where the bytes object behaves the way you want: Trying this out now after having trouble parsing character streams in python2 and hitting a strange error. I can't imagine what delta there is between your terminal and mine? code:
|
# ? Jan 29, 2014 20:17 |
|
accipter posted:I have used you solution in the past for faster scripts. I would pickle the intermediate data structures after they are created. If something is canceled and resumed have it read back in the pickled stuff and pick up where it left off. If you want the intermediate results files to be shareable and/or archiveable then pickle probably isn't the right choice.
|
# ? Jan 29, 2014 20:22 |
|
JawnV6 posted:Trying this out now after having trouble parsing character streams in python2 and hitting a strange error. I can't imagine what delta there is between your terminal and mine? base64 module documentation posted:Changed in version 3.3: ASCII-only Unicode strings are now accepted by the decoding functions of the modern interface.
|
# ? Jan 29, 2014 20:22 |
|
I wrote a python script that I'd like to share with someone to run on another computer. What's the easiest way to get them setup with the required libraries that I am using in my script? Also, i used py_compile to compile it to a .pyc. I'd rather them not be able to see the source, is this a good way of doing that? I know it's probably reversible if someones very techy-savvy but as long as an average person can't easily do it that would be fine.
|
# ? Jan 29, 2014 20:31 |
|
Phiberoptik posted:I wrote a python script that I'd like to share with someone to run on another computer. What's the easiest way to get them setup with the required libraries that I am using in my script? Also, i used py_compile to compile it to a .pyc. I'd rather them not be able to see the source, is this a good way of doing that? I know it's probably reversible if someones very techy-savvy but as long as an average person can't easily do it that would be fine. The first question: make a requirements.txt then have the other person run code:
E: Getting pip installed in the first place might be a bit tricky if they're not on Linux--does this person have a Python install ready to go? SurgicalOntologist fucked around with this message at 22:29 on Jan 29, 2014 |
# ? Jan 29, 2014 22:25 |
|
SurgicalOntologist posted:E: Getting pip installed in the first place might be a bit tricky if they're not on Linux--does this person have a Python install ready to go? pip installs pretty much everywhere easily now a days. edit: To make it clear, for windows you use Cristoph Gohike's wonderful world of Windows python packages to install setuptools then pip deimos fucked around with this message at 22:45 on Jan 29, 2014 |
# ? Jan 29, 2014 22:42 |
|
Crosscontaminant posted:You need to pass it bytes in 3.2. Ah, oops. Sorry about that. Newer Python 3 releases have made the strict bytes/text separation a lot nicer to deal with, I didn't realize that b64decode accepting a str was one of the enhancements in 3.3. Bytes literals require a b prefix in Python 3; this prefix is accepted in 2.6 and 2.7 but just give you instances of those versions' str type. Python code:
Python code:
Python code:
|
# ? Jan 29, 2014 23:31 |
|
video talking about twitter's PEX, a tool for packaging python programs: https://www.youtube.com/watch?v=NmpnGhRwsu0 It is yet-another-python-packager with the unique feature of creating a single run-anywhere (the video specifies OSX and Linux, didn't say anything about Windows) output file.
|
# ? Jan 30, 2014 20:09 |
|
I feel like the boy who cried wolf...I asked a bunch of silly questions but now I've run into a real puzzler. I'm running into major headaches with pickling functions that are passed from another script. I've been reading on this issue for the past couple hours and I think I have my head around the various pitfalls and potential solutions. But maybe someone has a suggestion for me. I know I usually get asked to provide more context, but I think this is a fairly well-encapsulated problem, so I think a minimal example will suffice. But I'm happy to share the source code if it would help. (Python 3.3) First the library code... Python code:
Python code:
code:
I inspected the __module__ attribute on the function, it's '__main__'. So, I tried setting __module__ to 'test' (via sys.argv[0]), hoping that then pickle would know where to look, but then I can't pickle: code:
I'm going to try that last one, it's the only one that doesn't require a __getstate__. But in the meantime if anyone has any other ideas, or if there's a simple solution I missed, I'm all ears. Edit: gently caress. If I save sys.argv[0] as an instance attribute, I can't get to it before I unpickle. I guess I could make the original script's filename an argument to load_pickled_instance... Edit2: I can't figure out how to do the equivalent of Python code:
SurgicalOntologist fucked around with this message at 06:25 on Jan 31, 2014 |
# ? Jan 31, 2014 06:12 |
|
The easiest solution is just not to pickle at all. Pickle is a bad, confusing library, and constantly gets some stuff wrong, as you're figuring out! Pasting the pickle file in here might give us better clues about what's going on. Pickle is not magic, and it doesn't store the bytecode of a function (for various reasons), instead it stores the name of a function as a global. So, if you have a module "foo" and inside of it was a function "blah()", then Pickle stores in plaintext the full path: "foo.blah" and then is able to resolve it later using __import__ and some fancy name mangling. (It actually has bytecode itself: import module "foo", and then from that, pull out the function called "bar") In the case of your main script, the main script is always run in a special module called __main__, so what's happening is the reference being stored is "__main__.blah", not "foo.blah". When run against a different module, __main__.blah() can't be found, and that's where that error comes from. Personally, issues like this have bitten me so many times that I've just stopped using pickle altogether. You really should have your code always be in modules, with the __main__ just importing a module and calling a function on it. But I understand you want to use Python as your own pseudo-DSL thing.
|
# ? Jan 31, 2014 06:29 |
|
Yeah, I understand that pickle works by reference, that why I'm talking about how I can make sure pickle knows where to look to unpickle the functions, by either importing them into __main__ before unpickling or manually changing their __module__ attribute before pickling. How can I get a string from the pickled file (to post here)? Is there an easier way than changing pickle.dump to pickle.dumps? And I've also realized that pickling is difficult. I wonder if there's a way around it. The fact of the matter is, if I'm running an experiment, it needs to be run over many different Python sessions. However, there might be a way around it. I'd have to redesign things, but perhaps I can make it work. gently caress. People are waiting on me on this. Everything else is ready. I'm not sure what you mean about a DSL. This started as a learn-the-language toy project (well, like my 3rd or 4th). With a pretty simple premise: given a function that runs a trial, systematically vary the kwargs to that function according to the principles of experimental design. But it turns out that setting things up just right to run that function, and doing so over multiple sessions, is difficult. I didn't think it would be, though. I was doing all this poo poo in Matlab before, for Christ's sakes. Maybe my mistake was talking this project up with colleagues and now they want in on the action, and are learning Python. I did already use this library to run an experiment I'm submitting for publication, so it was working, but it had a terrible subclass interface. instead of adding functions as attributes, you overrode the methods. That pickled fine. But I don't want to go back to that, I think trying to get it to work without pickling would be better. SurgicalOntologist fucked around with this message at 06:58 on Jan 31, 2014 |
# ? Jan 31, 2014 06:41 |
|
Suspicious Dish posted:You really should have your code always be in modules, with the __main__ just importing a module and calling a function on it. But I understand you want to use Python as your own pseudo-DSL thing. Maybe there's a solution here... can I get pickle to store the function reference as foo.bar instead of __main__.bar? Instead of having client code be a script that ultimately creates the pickle file, have it define a function that does so. Then have them import their script (now a module, I guess) and run that function. Hmm... (and for the record, your suggestion is how everything else works. Once the experiment is pickled, the functions that load it and run bits and pieces of it are called from __main__ based on CLI arguments) Edit: This is clearly a complicated issue, if anyone is actually interested in talking through a solution, maybe we should do so in the repo instead of taking up the whole thread. SurgicalOntologist fucked around with this message at 07:14 on Jan 31, 2014 |
# ? Jan 31, 2014 06:56 |
|
SurgicalOntologist posted:Edit2: I can't figure out how to do the equivalent of You want importlib.import_module()
|
# ? Jan 31, 2014 08:30 |
|
How do I do the '*' syntax with that function? That's what I couldn't figure out. Otherwise it imports into the namespace as foo.function_name but (one of) my possible hacky solutions would require importing the function as __main__.function_name, which I think is the equivalent of 'from foo import *'.
|
# ? Jan 31, 2014 17:10 |
Is there any way to set the max number of http retries that a pip install will attempt? I keep getting this error trying to grab stuff off s3 from ec2:quote:Max retries exceeded with url: /pypi/uwsgi-1.9.18.2.tar.gz (Caused by <class 'httplib.BadStatusLine'>: It seems odd that I can't reproduce it with this though: code:
|
|
# ? Jan 31, 2014 20:03 |
|
Wooho! An easier solution worked, I didn't have to gently caress with the __main__ namespace. Even still, it's a dirty hack. If anyone cares, check it out:Python code:
SurgicalOntologist fucked around with this message at 03:29 on Feb 1, 2014 |
# ? Feb 1, 2014 03:10 |
|
I don't know if anyone here is interested in MUDs in this day and age, but there is a small group working on a pretty robust python mud framework called Evennia. I was looking through the code and functionality and was pretty impressed. They seem to have put together a pretty good project so far. I am playing around with it a bit to help improve my python skills and have submitted a couple bug fixes to them so far. I don't expect to create a full mud or anything like that, but it is fun to mess around with in my free time.
|
# ? Feb 1, 2014 05:07 |
Windows 8.1 x64. How do I make Python 2.7 run from cmd ? Python 2.7 is installed on C:/Python27 (from python(x,y)). I have tried:
User variable Path: http://privatepaste.com/0564253fc5 System variable Path: http://privatepaste.com/cd39597e02 cinci zoo sniper fucked around with this message at 18:56 on Feb 2, 2014 |
|
# ? Feb 2, 2014 17:10 |
|
Check the directory structure of C:\Python27. I vaguely recall the executable being in C:\Python27\bin, which is why adding C:\Python27 to your PATH didn't work.
|
# ? Feb 2, 2014 18:50 |
|
I saw something about using an asterisk alone in a function definition in Python 3, but I can't find any reference to it in the documentation. Does it allow you to do this: Python code:
|
# ? Feb 2, 2014 18:51 |
Crosscontaminant posted:Check the directory structure of C:\Python27. I vaguely recall the executable being in C:\Python27\bin, which is why adding C:\Python27 to your PATH didn't work. python.exe is located in C:\Python27\ so it maybe is something else
|
|
# ? Feb 2, 2014 18:53 |
|
SurgicalOntologist posted:I saw something about using an asterisk alone in a function definition in Python 3, but I can't find any reference to it in the documentation. code:
|
# ? Feb 2, 2014 19:04 |
|
Ah, thanks, so it's been around longer than I thought. I did figure it out by trial-and-error, but it's nice to read about it. Strange that it's not discussed in the documentation, as far as I can tell.
|
# ? Feb 2, 2014 19:53 |
|
kalstrams posted:Windows 8.1 x64. How do I make Python 2.7 run from cmd ? Adding C:\python27; to path should be sufficient. Did you restart the cmd prompt? Try instaling normal python in other folder. Maybe the packadge is hosed.
|
# ? Feb 3, 2014 15:53 |
Edit: Solved by not using python(x,y). I decided to switch because I did read that author of python(x,y) did not rely on official packages. Installed Anaconda - everything works. cinci zoo sniper fucked around with this message at 00:31 on Feb 4, 2014 |
|
# ? Feb 3, 2014 20:26 |
|
Has anyone ever found a way to do session cookies with httplib? I have searched, but just can't see much. I'm adding an authentication backend to one of our services, but our client machines only have httplib, and I'm trying to find a way to shoehorn persistent authentication into just httplib. I don't have control of the client machines or I would just make them do it with something a little more powerful like requests.
|
# ? Feb 3, 2014 21:35 |
|
My Rhythmic Crotch posted:Has anyone ever found a way to do session cookies with httplib? I have searched, but just can't see much. I'm adding an authentication backend to one of our services, but our client machines only have httplib, and I'm trying to find a way to shoehorn persistent authentication into just httplib. I don't have control of the client machines or I would just make them do it with something a little more powerful like requests. I'm sorry. Can you use urllib2? It can use CookieJar http://docs.python.org/2/library/cookielib.html
|
# ? Feb 4, 2014 13:08 |
|
I was able to convince the owner of the client machines to refactor his code to use requests. He looked at urllib2 and didn't even want to touch it.
|
# ? Feb 4, 2014 16:42 |
|
Hey guys, there were a some of y'all interested in interactive visualization targeting the browser from python, so I just wanted to mention that we just cut another Bokeh release today, now 0.4. Preliminary matplotlib compat layer, new tools and datetime axis improvements, image plots available from python, added python 3 support, lots of bug fixes, and greatly expanded docs, especially about the JS side of the library. Check it out at: http://bokeh.pydata.org Lots of fun stuff on deck in the next few months!
|
# ? Feb 5, 2014 02:51 |
|
Is there an easy way to just save a plot as .png? Is there an easy way to save a plot as .png without displaying it first? I like your interface a lot more than matplotlib's, so I'm interested in using it to actually plot scientific data for papers and such.
|
# ? Feb 5, 2014 08:11 |
|
QuarkJets posted:Is there an easy way to just save a plot as .png? Is there an easy way to save a plot as .png without displaying it first? I like your interface a lot more than matplotlib's, so I'm interested in using it to actually plot scientific data for papers and such. edit: completely misread your post, but I'm leaving this here anyway because it might be useful for someone. code:
FoiledAgain fucked around with this message at 08:51 on Feb 5, 2014 |
# ? Feb 5, 2014 08:48 |
|
QuarkJets posted:Is there an easy way to just save a plot as .png? Is there an easy way to save a plot as .png without displaying it first? I like your interface a lot more than matplotlib's, so I'm interested in using it to actually plot scientific data for papers and such. I too am interested in this, except I want to save PDFs. I'd like to be able to plot and save to PDF without ever displaying it on the screen or having a human in the loop to push a button.
|
# ? Feb 5, 2014 10:29 |
|
|
# ? Jun 12, 2024 17:23 |
So in Python 3, the cmp argument to the list sort method has been removed, allowing for key only. My understanding is that it takes a function to be called on each element when doing the sort? So I have a list like this: list = [[3, 2], [6, 8], [20, 1], [6, 10], [3, 3], [10, 1]] and I want to sort the elements [x, y] by highest x then highest y. Obviously it makes sense to do list.sort(key=lambda x: x[0]) to sort by the first value, but my brain stops working on how I can nicely sort by the second value. I know that sort() is stable, so I could do: list.sort(key=lambda x: x[1]) list.sort(key=lambda x: x[0]) But I'm uspet that I can't just use a cmp function and pass that in one call, it makes less sense with this key malarkey because it can only act on one value?
|
|
# ? Feb 5, 2014 13:34 |