|
Cingulate posted:Yes, but so does everything that's not "a word document named manuscript_final-version_b_2014_reallyfinal_mkII_d.docx". You just told us that we should be sharing notebooks instead of outputting to latex, which at least handles the formating for us. Now you want us to do the formatting manually and export to PDF? What advantage is gained by doing that? Cingulate posted:Big difference: everyone can see the (potentially bad) code you used to get to the results. Not really; as has been discussed, notebooks are lovely IDEs, so most of the code probably isn't even on the notebook. And even if it's all there, does this actually make a difference if all of your colleagues are pressing the "code until it works" button just as hard as you? It seems like the best notebooks that I've seen are all designed from the start as educational examples, and I think that's good and effective. I just don't know if the world is ready to throw out the journal article in exchange for the ipython notebook
|
# ? Sep 25, 2015 02:14 |
|
|
# ? May 9, 2024 02:13 |
|
QuarkJets posted:You just told us that we should be sharing notebooks instead of outputting to latex, which at least handles the formating for us. Now you want us to do the formatting manually and export to PDF? What advantage is gained by doing that? quote:Not really; as has been discussed, notebooks are lovely IDEs, so most of the code probably isn't even on the notebook. quote:I just don't know if the world is ready to throw out the journal article in exchange for the ipython notebook
|
# ? Sep 25, 2015 02:37 |
|
Emacs Headroom posted:Well that's OK when you're putting your model library into code review and deploying it to your company (or lab's) pypi server But that's my point. If you're doing intraorganizational code reviews on a pypi server, then what does having a notebook for the analysis portion buy you? Why not use the same code review procedure that was applied to the rest of the project? quote:
I agree, but I was wondering if other posters did not.
|
# ? Sep 25, 2015 03:48 |
|
Luckily I had my Jupyter notebook handy to calculate the results of this plague:
|
# ? Sep 25, 2015 05:06 |
|
QuarkJets posted:But that's my point. It gets you a look at the actual analysis and findings? Maybe my view is biased because the code I write is typically signal processing or statistics or models, and those things are applied to real data. You don't check your data into a code repo (especially ~~big data~~), but you can show the results of applying your model or analysis on the available data, make notes about what you see, have embedded graphs, and also a reference for how to use your library code in practice.
|
# ? Sep 25, 2015 05:49 |
|
QuarkJets what kind of science are you doing? Cause my reference is people analysing neuroscientific data (and I would argue, it also translates to what most psychologists, sociologists, and probably most people in any kind of medicine are doing). If you're talking about, I don't know, computer science, or even a part of science where the writing of the code itself may be where the major creative component lies, the matter may be different.
|
# ? Sep 25, 2015 07:49 |
|
Cingulate posted:QuarkJets what kind of science are you doing? I used to do a lot of particle physics, but these days I'm doing more high resolution astronomy. Both have a lot of max likelihood problems and huge datasets, where running on a single server is impractical, basically. But I can still imagine some use cases where it would make sense to write a short script; if I just want to plot the PSSNR of some data then that would be pretty simple to put in a notebook. But it would also be pretty easy to just write a python script in PyCharm that spits out a nice PDF that I can give to colleagues that haven't even heard of IPython, so I'm still kind of flummoxed on the whole notebook thing. Emacs Headroom posted:It gets you a look at the actual analysis and findings? I'm still not getting it. Why should the people reviewing my code give a poo poo about the findings? The purpose of a code review should be to verify that the code does what it's supposed to do, not to review the results of a scientific analysis. In this case all that you need is a definition of the input data, not the input data itself. It can be helpful to use small amounts of known input data to verify outputs, but that's not what we're really talking about. Likewise, why should the people reviewing the results give a poo poo about the underlying code? When I'm reading a research paper I almost never want to see the underlying code. If I want to reproduce the results of a specific project then I should be writing my own code to do that in order to prevent biases from creeping in, so looking at someone else's notebook is the last thing that I should ever do. QuarkJets fucked around with this message at 10:09 on Sep 25, 2015 |
# ? Sep 25, 2015 10:05 |
|
Lest people think that I'm just making GBS threads on IPython Notebook, I'm really not. Like look at this thing: http://nbviewer.ipython.org/gist/twiecki/3962843 That's a cool way of showing how code takes an input and produces an output. It's very cool and educational and the instant visualization of everything is really nice. But how was this notebook created? Probably the code was written in PyCharm until things were working right (because I want a powerful IDE when I'm writing code) and then it was copied piece by piece into a notebook to create a cool educational resource. But if I'm writing a research paper (PUBLISH OR PERISH), then what good is the notebook copying step? I'm just going to take all of the cool things on this notebook and dump them to latex anyway, so why not skip copying to a notebook and just dump to latex directly? As an educational tool, either for teaching Python or scientific Python or specific techniques in Python or any number of other Pythony things, I think these are great. But when someone says "IPython notebooks are invaluable for scientific research" I just kind of scratch my head, because I don't see a huge benefit to using them in most of the research environments that I've encountered. The benefit seems to be in immediate visualization of output, which relies on the dataset being small and the analysis being computationally straightforward. I don't think that this type of situation is commonly found in science, but maybe I'm wrong. I'm used to dealing with datasets so large that Big Data is an understatement.
|
# ? Sep 25, 2015 10:32 |
|
QuarkJets posted:As an educational tool, either for teaching Python or scientific Python or specific techniques in Python or any number of other Pythony things, I think these are great. But when someone says "IPython notebooks are invaluable for scientific research" I just kind of scratch my head, because I don't see a huge benefit to using them in most of the research environments that I've encountered. The benefit seems to be in immediate visualization of output, which relies on the dataset being small and the analysis being computationally straightforward. I don't think that this type of situation is commonly found in science, but maybe I'm wrong. I'm used to dealing with datasets so large that Big Data is an understatement. As said above, I'm not entirely sold on IPy notebooks, but they aren't restricted to small datasets, immediate results or straightforward computation. IPython has gone to a lot of effort to let you use remote clusters and leave something running to get the answer back later. As for straightforward, I think a lot people in the biological sciences find IPN useful for ad hoc analyses, where you want to be clear what parameters / which controls you used and whether you trimmed the sequences / dropped outliers / etc. rather than letting some script rip and then later (maybe much later) looking at the results and saying " I have no idea how I got this". (I'm not saying that this is the only way to capture process & parameters, but it's a light-weight and user-friendly way that makes a lot of sense to me.) nonathlon fucked around with this message at 12:42 on Sep 25, 2015 |
# ? Sep 25, 2015 12:32 |
|
QuarkJets how much of your code is written specifically for each analyses? For reference, most of what I do looks a lot like Thomas Wiecki's analyses, in scope and outcome. (I've in fact used his packages a few times.) As outlier noted, needing to run distributed computations does not speak against the notebook. However, it is possible that at particularly large scales, the semi-interactive notebook model doesn't bring you much benefit - I think most of what people do on notebooks takes at best a day or so to run. If there is no part of your script where you don't tend to change a parameter, re-run the analysis, and look at the plot five seconds later, maybe the notebook format doesn't help you. I really don't think your idea - that people develop in an IDE, and then copy it into the notebook for demonstration - is true. I'm admittedly a lovely, or probably better said, not even a programmer, but most of what I do is typed directly into notebooks. Something on the scope of the notebook you posted, I don't see how that requires a real IDE!
|
# ? Sep 25, 2015 13:05 |
|
QuarkJets posted:That's a cool way of showing how code takes an input and produces an output. It's very cool and educational and the instant visualization of everything is really nice. But how was this notebook created? Probably the code was written in PyCharm until things were working right (because I want a powerful IDE when I'm writing code) and then it was copied piece by piece into a notebook to create a cool educational resource. Notebooks have a lot of the nice features of an IDE (like autocomplete), and is in some cases a little better, since often the objects are in memory and can be introspected on, as opposed to an IDE which has to rely on janky type annotations and the like. Plus you might be interested in the fact that PyCharm supports IPython notebooks: https://www.jetbrains.com/pycharm/help/ipython-notebook-support.html
|
# ? Sep 25, 2015 13:51 |
|
QuarkJets posted:Why should the people reviewing my code give a poo poo about the findings? The purpose of a code review should be to verify that the code does what it's supposed to do, not to review the results of a scientific analysis. You might not reject the code because of the notebook, sure. But let's say you get hit by a bus tomorrow, and a new grad student comes in and is asked to take over your galactic formation model or whatnot. Say you have a few models -- one using some multiscale Ising model, one using some fancy Gaussian process, etc. If you had notebooks going along with your code, you could demonstrate how the models are fit (maybe more than one way), how they can be used for simulation, and how you can evaluate them against recorded data (say with your de-biased KL divergence code). Then the new grad student gets an idea of how the algorithms work in practice, not just that your implementation is not buggy.
|
# ? Sep 25, 2015 14:03 |
|
QuarkJets posted:Likewise, why should the people reviewing the results give a poo poo about the underlying code? When I'm reading a research paper I almost never want to see the underlying code. If I want to reproduce the results of a specific project then I should be writing my own code to do that in order to prevent biases from creeping in, so looking at someone else's notebook is the last thing that I should ever do. They might want to throw their own data at your code?
|
# ? Sep 25, 2015 17:28 |
|
Looking for advice on matplotlib date formatting. I'd like to plot a timeseries of data, and a filtered result of the data, and have the x axis display dates in intervals. I used this page as a guide. In the code below, I'm unable to plot the second data set, xs, without the result appearing corrupt. Additionally, the data formatter's not working; the label format is showing as the default. data_series is a pandas time series, and data is its numpy version. xs is a numpy array. Python code:
Dominoes fucked around with this message at 23:53 on Sep 25, 2015 |
# ? Sep 25, 2015 18:38 |
|
Have you tried pandas' own plotting faculties (e.g. data_series.plot())?
|
# ? Sep 25, 2015 19:11 |
Python 2 question. Say, I have my_module.py and task.py I use to do things with my module. Is there a convenient way to implement a function into my_module.py that I could invoke later in the end of task.py so it produces a session.py file which will contain both my_module.py and task.py inside it? Edit: Also, is there convenient way to implement a function into my_module.py that I could invoke later in the end of task.py which would export, to a text file or to same session.py mentioned before, list with Python version and packages' names/versions used during the execution of task.py? cinci zoo sniper fucked around with this message at 20:40 on Sep 25, 2015 |
|
# ? Sep 25, 2015 20:34 |
|
Cingulate posted:Have you tried pandas' own plotting faculties (e.g. data_series.plot())?
|
# ? Sep 25, 2015 23:52 |
|
Emacs Headroom posted:Notebooks have a lot of the nice features of an IDE (like autocomplete), and is in some cases a little better, since often the objects are in memory and can be introspected on, as opposed to an IDE which has to rely on janky type annotations and the like. Spyder, the IDE included with Anaconda, can do introspection and also provides an ipython console. Munkeymon posted:They might want to throw their own data at your code? ...why can't they do that with a regular Python script? There's nothing magical about ipynb that allows people to provide different inputs and regenerate output that can't be done with regular scripts/modules/IDEs. Still agreeing with QJ for my HPC workflow too! Can't even possibly do some of the work things I do with Python modules in a ipynb. Notebooks might be useful for short tasks or teaching but no need to use them for everything.
|
# ? Sep 26, 2015 00:31 |
|
I am trying out the Project Euler problem sets to learn some python and I can't for the life of me get an output I want from some functions I madecode:
I was able to do this with recursion but it falls apart with a very small input. Demonachizer fucked around with this message at 20:57 on Sep 26, 2015 |
# ? Sep 26, 2015 20:47 |
|
Demonachizer posted:I only get a result of none out of firstprimedivisor(). It seems like I do not know how to return values from functions that are using for loops correctly... Is there anything super obvious? Interestingly, your function works correctly in Python 3 but not Python 2. If you are trying to run this in 2.x then your n/i == int(n/i) doesn't make any sense, because division in 2.x returns the truncated answer instead of a floating point number. edit: You can from __future__ import division and avoid this problem in 2.x, or you could change your math to use the modulo operator or something. Also, in general it is preferred to use True/False instead of 1/0 for something like checkprime, and to name it something like is_prime, but those are just style things. OnceIWasAnOstrich fucked around with this message at 21:29 on Sep 26, 2015 |
# ? Sep 26, 2015 21:25 |
|
n % i == 0 is a better way to check for divisibility.
|
# ? Sep 26, 2015 21:32 |
|
OK, I'm running into an issue I'm just not getting. pexpect is dumping out on me because of a weird error regarding the timeout value. It dumps out every time at 'i = device.expect(self.password_prompt, self.en_prompt)' with the following error:code:
Python code:
Python code:
|
# ? Sep 26, 2015 21:43 |
|
'ST posted:Just add each value for "ip" into a running set: https://docs.python.org/3/library/stdtypes.html?highlight=set#set-types-set-frozenset Thanks for the suggestion - I'm rather new to Python (and my algorithm-fu is lackluster), so the help is appreciated!
|
# ? Sep 26, 2015 22:02 |
|
The March Hare posted:I'm doing the Coursera Algorithms 1 class in order to (hopefully) get myself a bit more comfortable in interviews, and also because I think it is probably worth doing. That said, the course is in Java, which I don't know or really care to learn, so I'm doing it in Python instead. Because of this, however, I don't really have a feedback option for my implementation of the homework stuff. If any of you guys have some free time, I'd appreciate some crits. Posted about this ~a month ago, got some good feedback, thank you dudes :~). The course I was in was ending though, so I had to wait a bit for it to start back up again so I could access the new material. I just finished week 2, which I found a bit easier than week 1, but I'm still hoping for some feedback! I've done away w/ getters and setters for week 2 (I'm familiar w/ them, and have an understanding of properties in Python anyway, so I feel OK with "cheating" there.) But otherwise have tried to stick pretty closely to the spec, including implementing resizing arrays using lists as the base container, even though it felt really dumb while I was doing it. This week was Deques and Randomized Queues. I opted to use a doubly linked list for the Deque implementation, and a resizing array for Randomized Queues. I'm pretty open to feedback on absolutely anything, but especially if I have obviously hosed something up or misunderstood it. My code is here https://github.com/LarryBrid/algos/tree/master/Week%202%20-%20Randomized%20Queues%20and%20Deques and the assignment spec is here http://coursera.cs.princeton.edu/algs4/assignments/queues.html -- TIA for any feedback, it's a big help and I'll buy you all beer once I get a job :~). The March Hare fucked around with this message at 20:42 on Sep 27, 2015 |
# ? Sep 27, 2015 20:39 |
|
Oh hey, I'm doing that course too right now, also my first experience with Java. Just finished debugging the Week 3 assignment to 100%, significantly tougher than the other ones I thought. I was also tempted to do it in Python and then translate but stuck with the Java. I always liked being able to just spit a command at the Python interpreter and see what comes out but never really appreciated how much easier it is to debug. I don't mind static typing and semicolons and stuff like that but I don't think I can live without the REPL.
|
# ? Sep 27, 2015 23:55 |
|
KICK BAMA KICK posted:Oh hey, I'm doing that course too right now, also my first experience with Java. Just finished debugging the Week 3 assignment to 100%, significantly tougher than the other ones I thought. I was also tempted to do it in Python and then translate but stuck with the Java. I always liked being able to just spit a command at the Python interpreter and see what comes out but never really appreciated how much easier it is to debug. I don't mind static typing and semicolons and stuff like that but I don't think I can live without the REPL. Sup classmate~ I considered doing it in Java, but I've never used it before and it seemed like learning Java and the course material at the same time would probably have been worse than doing it in something I'm already familiar w/ and have a nice development environment set up for. I've done a few other online classes, and this is probably the best one I've taken so far. It's super clear and feels like it has a really good pace to it.
|
# ? Sep 28, 2015 00:29 |
|
KICK BAMA KICK posted:I always liked being able to just spit a command at the Python interpreter and see what comes out but never really appreciated how much easier it is to debug. I don't mind static typing and semicolons and stuff like that but I don't think I can live without the REPL. /derail Are you using IntelliJ? Do so if you're not already, and run in debug (click your breakpoints wherever). You can look inside objects, step into and over functions, etc. It's fairly easy to debug Java as long as you're using an IDE. Once you're used to that, look into Scala -- you get the interactive notebook / REPL, static typing, compile time checking, and functional stuff all at once.
|
# ? Sep 28, 2015 00:42 |
|
Emacs Headroom posted:Are you using IntelliJ? Have always wanted to check out a functional language too, will look into Scala when I get a chance.
|
# ? Sep 28, 2015 01:38 |
|
pmchem posted:...why can't they do that with a regular Python script? There's nothing magical about ipynb that allows people to provide different inputs and regenerate output that can't be done with regular scripts/modules/IDEs. QJ was arguing against sharing code at all, not about how it's shared and that bugs me purely from a consumer of scientific output standpoint because it's a great way to accidentally hide errors that affect results.
|
# ? Sep 28, 2015 15:29 |
|
Munkeymon posted:QJ was arguing against sharing code at all, not about how it's shared and that bugs me purely from a consumer of scientific output standpoint because it's a great way to accidentally hide errors that affect results. But that's *not* what he's saying? Unless I completely missed something elsewhere, he even mentioned having the code available for review to make sure the methodologies are appropriate and correct. He's just saying he doesn't see the point of a Notebook at all and it's sloppier than using a formally coded program.
|
# ? Sep 28, 2015 15:52 |
|
flosofl posted:But that's *not* what he's saying? Unless I completely missed something elsewhere, he even mentioned having the code available for review to make sure the methodologies are appropriate and correct. He's just saying he doesn't see the point of a Notebook at all and it's sloppier than using a formally coded program. Yeah, maybe I was reading that paragraph in isolation rather than proper context, but that's what it seemed like at the time
|
# ? Sep 28, 2015 15:57 |
|
Munkeymon posted:Yeah, maybe I was reading that paragraph in isolation rather than proper context, but that's what it seemed like at the time This happens to me. People will be arguing about something I don't give a poo poo about so I'll skim it and a paragraph will catch my eye and I'll be like "WHAT THE gently caress, YOU'RE WRONG MISTER" and then proceed to miss the point.
|
# ? Sep 28, 2015 17:56 |
|
Munkeymon posted:QJ was arguing against sharing code at all, not about how it's shared and that bugs me purely from a consumer of scientific output standpoint because it's a great way to accidentally hide errors that affect results. flosofl posted:But that's *not* what he's saying? Unless I completely missed something elsewhere, he even mentioned having the code available for review to make sure the methodologies are appropriate and correct. He's just saying he doesn't see the point of a Notebook at all and it's sloppier than using a formally coded program. flosofl understands what I'm trying to say. Code review is useful and should be a more central part of the scientific process, because it's lovely if bad code results in false positives or false negatives. But sometimes you want to test a methodology, rather than a specific implementation, and in those cases you should write your own code. This is analogous to building an experimental apparatus (writing code) vs using someone else's apparatus (receiving code). To me, the question of whether or not notebooks enhance any of that does not have a clear answer. For code review, you need to share your entire code package anyway, so putting some of it in a notebook just complicates the process. I can see the appeal of having example code in a notebook for testing and educational purposes. For instance, it seems like it'd be cool to include a .ipynb with a github project, but hopefully you've written a nice README that accomplishes the same thing, so at that point the .ipynb feels kind of superfluous. Maybe you can use a .ipynb to more easily generate the README and then include both?
|
# ? Sep 29, 2015 05:23 |
|
Don't you ever just want to poke around some data and check out what it looks like? Maybe look at it, plot a thing, flip it upside down, plot another thing, etc, etc? Notebooks are great for that.
|
# ? Sep 29, 2015 22:47 |
|
Nippashish posted:Don't you ever just want to poke around some data and check out what it looks like? Maybe look at it, plot a thing, flip it upside down, plot another thing, etc, etc? Notebooks are great for that. You don't need a notebook for that, though. Other things are great for it too. One example is using Spyder in Anaconda and running code which executes matplotlib commands after data analysis and displays them in the ipython console. You could have a bash script that runs a command-line python script which renders plots and then displays them afterward. Notebooks are certainly convenient for some things, but for your example I think Spyder works equally as well (if not better).
|
# ? Sep 29, 2015 23:12 |
|
pmchem posted:You don't need a notebook for that, though. Other things are great for it too. One example is using Spyder in Anaconda and running code which executes matplotlib commands after data analysis and displays them in the ipython console. You could have a bash script that runs a command-line python script which renders plots and then displays them afterward.
|
# ? Sep 30, 2015 00:02 |
|
Cingulate posted:But then, everybody who only wants to look at the results needs the full data, and the used libs, and the computing power. I must be missing something, so your basic argument is that the notebook saves some data, like a ndarray object or whatever (take your pick of data object), and then you can plot it how you want? I mean if that's really what you want to do, you can just save the data to a file and read it and plot it in a myriad of ways?
|
# ? Sep 30, 2015 00:08 |
|
pmchem posted:I must be missing something, so your basic argument is that the notebook saves some data, like a ndarray object or whatever (take your pick of data object), and then you can plot it how you want?
|
# ? Sep 30, 2015 00:27 |
|
Cingulate posted:The notebook contains the plots. Yeah, that falls under the "take your pick of data object", as it is an object storing the data that is plotted. That presumably only includes the data that was passed to the object at time of plot creation and not arbitrary previously calculated raw data. So, if you created a plot based on a slice of a list or ndarray you'd only have access to that slice in the plot object (yes?). If that is right, it's actually more limited than a traditional workflow where you dump the data to a file -- it may be a large amount of raw data -- and then manipulate later, plotting as you please. Unless, of course, you're going to recalculate the raw data and at that point you're no better off than doing it the traditional way either. Again, just like QJ, I'm failing to see a real advantage to the notebook workflow other than (a) personal preference because you're used to doing it that way or (b) educational. Can someone link a notebook that's doing something useful enough that I'll want to adopt it into my daily workflow?
|
# ? Sep 30, 2015 00:42 |
|
|
# ? May 9, 2024 02:13 |
|
A few more advantages of notebooks (a non-exhaustive list): 1. Very quick iteration time -- instead of "edit script, save script, run script, look at plots, repeat", it's "edit cell, hit shift-enter, repeat" 2. Collaboration -- you can send notebooks to technical people, or send html output to not-so-technical people, or run a jupyter server and edit collaboratively 3. Widgets and whatnot -- you can quite easily make interactive javascript to let people (or yourself) mess around with different views of the data But at the end of the day they're a tool. If you insist that you don't like them as a tool, or that the tools you have are better, we're not your boss and that's fine.
|
# ? Sep 30, 2015 04:26 |