Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
QuarkJets
Sep 8, 2008

Cingulate posted:

Yes, but so does everything that's not "a word document named manuscript_final-version_b_2014_reallyfinal_mkII_d.docx".

And you can easily export the notebook to a PDF.

You just told us that we should be sharing notebooks instead of outputting to latex, which at least handles the formating for us. Now you want us to do the formatting manually and export to PDF? What advantage is gained by doing that?

Cingulate posted:

Big difference: everyone can see the (potentially bad) code you used to get to the results.

Not really; as has been discussed, notebooks are lovely IDEs, so most of the code probably isn't even on the notebook. And even if it's all there, does this actually make a difference if all of your colleagues are pressing the "code until it works" button just as hard as you?

It seems like the best notebooks that I've seen are all designed from the start as educational examples, and I think that's good and effective. I just don't know if the world is ready to throw out the journal article in exchange for the ipython notebook

Adbot
ADBOT LOVES YOU

Emacs Headroom
Aug 2, 2003

QuarkJets posted:

You just told us that we should be sharing notebooks instead of outputting to latex, which at least handles the formating for us. Now you want us to do the formatting manually and export to PDF? What advantage is gained by doing that?
IPython / Jupyter will actually handle markdown and LaTeX for nice formatting

quote:

Not really; as has been discussed, notebooks are lovely IDEs, so most of the code probably isn't even on the notebook.
Well that's OK when you're putting your model library into code review and deploying it to your company (or lab's) pypi server

quote:

I just don't know if the world is ready to throw out the journal article in exchange for the ipython notebook
A journal article is a very different medium. They don't replace each other, any more than the existence of Twitter means we throw out the New York Times.

QuarkJets
Sep 8, 2008

Emacs Headroom posted:

Well that's OK when you're putting your model library into code review and deploying it to your company (or lab's) pypi server

But that's my point.

If you're doing intraorganizational code reviews on a pypi server, then what does having a notebook for the analysis portion buy you? Why not use the same code review procedure that was applied to the rest of the project?

quote:


A journal article is a very different medium. They don't replace each other, any more than the existence of Twitter means we throw out the New York Times.

I agree, but I was wondering if other posters did not.

ahmeni
May 1, 2005

It's one continuous form where hardware and software function in perfect unison, creating a new generation of iPhone that's better by any measure.
Grimey Drawer
Luckily I had my Jupyter notebook handy to calculate the results of this plague:

Only registered members can see post attachments!

Emacs Headroom
Aug 2, 2003

QuarkJets posted:

But that's my point.

If you're doing intraorganizational code reviews on a pypi server, then what does having a notebook for the analysis portion buy you? Why not use the same code review procedure that was applied to the rest of the project?

It gets you a look at the actual analysis and findings?

Maybe my view is biased because the code I write is typically signal processing or statistics or models, and those things are applied to real data. You don't check your data into a code repo (especially ~~big data~~), but you can show the results of applying your model or analysis on the available data, make notes about what you see, have embedded graphs, and also a reference for how to use your library code in practice.

Cingulate
Oct 23, 2012

by Fluffdaddy
QuarkJets what kind of science are you doing?

Cause my reference is people analysing neuroscientific data (and I would argue, it also translates to what most psychologists, sociologists, and probably most people in any kind of medicine are doing).
If you're talking about, I don't know, computer science, or even a part of science where the writing of the code itself may be where the major creative component lies, the matter may be different.

QuarkJets
Sep 8, 2008

Cingulate posted:

QuarkJets what kind of science are you doing?

Cause my reference is people analysing neuroscientific data (and I would argue, it also translates to what most psychologists, sociologists, and probably most people in any kind of medicine are doing).
If you're talking about, I don't know, computer science, or even a part of science where the writing of the code itself may be where the major creative component lies, the matter may be different.

I used to do a lot of particle physics, but these days I'm doing more high resolution astronomy. Both have a lot of max likelihood problems and huge datasets, where running on a single server is impractical, basically. But I can still imagine some use cases where it would make sense to write a short script; if I just want to plot the PSSNR of some data then that would be pretty simple to put in a notebook. But it would also be pretty easy to just write a python script in PyCharm that spits out a nice PDF that I can give to colleagues that haven't even heard of IPython, so I'm still kind of flummoxed on the whole notebook thing.

Emacs Headroom posted:

It gets you a look at the actual analysis and findings?

Maybe my view is biased because the code I write is typically signal processing or statistics or models, and those things are applied to real data. You don't check your data into a code repo (especially ~~big data~~), but you can show the results of applying your model or analysis on the available data, make notes about what you see, have embedded graphs, and also a reference for how to use your library code in practice.

I'm still not getting it.

Why should the people reviewing my code give a poo poo about the findings? The purpose of a code review should be to verify that the code does what it's supposed to do, not to review the results of a scientific analysis. In this case all that you need is a definition of the input data, not the input data itself. It can be helpful to use small amounts of known input data to verify outputs, but that's not what we're really talking about.

Likewise, why should the people reviewing the results give a poo poo about the underlying code? When I'm reading a research paper I almost never want to see the underlying code. If I want to reproduce the results of a specific project then I should be writing my own code to do that in order to prevent biases from creeping in, so looking at someone else's notebook is the last thing that I should ever do.

QuarkJets fucked around with this message at 10:09 on Sep 25, 2015

QuarkJets
Sep 8, 2008

Lest people think that I'm just making GBS threads on IPython Notebook, I'm really not. Like look at this thing: http://nbviewer.ipython.org/gist/twiecki/3962843

That's a cool way of showing how code takes an input and produces an output. It's very cool and educational and the instant visualization of everything is really nice. But how was this notebook created? Probably the code was written in PyCharm until things were working right (because I want a powerful IDE when I'm writing code) and then it was copied piece by piece into a notebook to create a cool educational resource. But if I'm writing a research paper (PUBLISH OR PERISH), then what good is the notebook copying step? I'm just going to take all of the cool things on this notebook and dump them to latex anyway, so why not skip copying to a notebook and just dump to latex directly?

As an educational tool, either for teaching Python or scientific Python or specific techniques in Python or any number of other Pythony things, I think these are great. But when someone says "IPython notebooks are invaluable for scientific research" I just kind of scratch my head, because I don't see a huge benefit to using them in most of the research environments that I've encountered. The benefit seems to be in immediate visualization of output, which relies on the dataset being small and the analysis being computationally straightforward. I don't think that this type of situation is commonly found in science, but maybe I'm wrong. I'm used to dealing with datasets so large that Big Data is an understatement.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

QuarkJets posted:

As an educational tool, either for teaching Python or scientific Python or specific techniques in Python or any number of other Pythony things, I think these are great. But when someone says "IPython notebooks are invaluable for scientific research" I just kind of scratch my head, because I don't see a huge benefit to using them in most of the research environments that I've encountered. The benefit seems to be in immediate visualization of output, which relies on the dataset being small and the analysis being computationally straightforward. I don't think that this type of situation is commonly found in science, but maybe I'm wrong. I'm used to dealing with datasets so large that Big Data is an understatement.

As said above, I'm not entirely sold on IPy notebooks, but they aren't restricted to small datasets, immediate results or straightforward computation. IPython has gone to a lot of effort to let you use remote clusters and leave something running to get the answer back later. As for straightforward, I think a lot people in the biological sciences find IPN useful for ad hoc analyses, where you want to be clear what parameters / which controls you used and whether you trimmed the sequences / dropped outliers / etc. rather than letting some script rip and then later (maybe much later) looking at the results and saying " I have no idea how I got this".

(I'm not saying that this is the only way to capture process & parameters, but it's a light-weight and user-friendly way that makes a lot of sense to me.)

nonathlon fucked around with this message at 12:42 on Sep 25, 2015

Cingulate
Oct 23, 2012

by Fluffdaddy
QuarkJets how much of your code is written specifically for each analyses?

For reference, most of what I do looks a lot like Thomas Wiecki's analyses, in scope and outcome. (I've in fact used his packages a few times.)

As outlier noted, needing to run distributed computations does not speak against the notebook. However, it is possible that at particularly large scales, the semi-interactive notebook model doesn't bring you much benefit - I think most of what people do on notebooks takes at best a day or so to run. If there is no part of your script where you don't tend to change a parameter, re-run the analysis, and look at the plot five seconds later, maybe the notebook format doesn't help you.

I really don't think your idea - that people develop in an IDE, and then copy it into the notebook for demonstration - is true. I'm admittedly a lovely, or probably better said, not even a programmer, but most of what I do is typed directly into notebooks. Something on the scope of the notebook you posted, I don't see how that requires a real IDE!

Emacs Headroom
Aug 2, 2003

QuarkJets posted:

That's a cool way of showing how code takes an input and produces an output. It's very cool and educational and the instant visualization of everything is really nice. But how was this notebook created? Probably the code was written in PyCharm until things were working right (because I want a powerful IDE when I'm writing code) and then it was copied piece by piece into a notebook to create a cool educational resource.

Notebooks have a lot of the nice features of an IDE (like autocomplete), and is in some cases a little better, since often the objects are in memory and can be introspected on, as opposed to an IDE which has to rely on janky type annotations and the like.

Plus you might be interested in the fact that PyCharm supports IPython notebooks: https://www.jetbrains.com/pycharm/help/ipython-notebook-support.html

Emacs Headroom
Aug 2, 2003

QuarkJets posted:

Why should the people reviewing my code give a poo poo about the findings? The purpose of a code review should be to verify that the code does what it's supposed to do, not to review the results of a scientific analysis.

You might not reject the code because of the notebook, sure. But let's say you get hit by a bus tomorrow, and a new grad student comes in and is asked to take over your galactic formation model or whatnot. Say you have a few models -- one using some multiscale Ising model, one using some fancy Gaussian process, etc. If you had notebooks going along with your code, you could demonstrate how the models are fit (maybe more than one way), how they can be used for simulation, and how you can evaluate them against recorded data (say with your de-biased KL divergence code). Then the new grad student gets an idea of how the algorithms work in practice, not just that your implementation is not buggy.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



QuarkJets posted:

Likewise, why should the people reviewing the results give a poo poo about the underlying code? When I'm reading a research paper I almost never want to see the underlying code. If I want to reproduce the results of a specific project then I should be writing my own code to do that in order to prevent biases from creeping in, so looking at someone else's notebook is the last thing that I should ever do.

They might want to throw their own data at your code?

Dominoes
Sep 20, 2007

Looking for advice on matplotlib date formatting. I'd like to plot a timeseries of data, and a filtered result of the data, and have the x axis display dates in intervals. I used this page as a guide. In the code below, I'm unable to plot the second data set, xs, without the result appearing corrupt. Additionally, the data formatter's not working; the label format is showing as the default. data_series is a pandas time series, and data is its numpy version. xs is a numpy array.


Python code:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

months = mdates.MonthLocator()

fig, ax = plt.subplots()

ax.plot(data_series.index, data, 'k.')  # Original data. Plots OK with the below line commented out.
ax.plot(xs[:, 0], 'b-')  # Plotting this in addition to the line above screws up the plot. With or without plotting it against the index.

ax.xaxis.set_minor_locator(months)
ax.format_xdata = mdates.DateFormatter('%m-%d') # Not having any effect.
       
plt.show()
Note that running plt.plot(data), plt.plot(xs[:, 0]), plt.show() displays the data correctly, but with a not-useful x-axis scale.

Dominoes fucked around with this message at 23:53 on Sep 25, 2015

Cingulate
Oct 23, 2012

by Fluffdaddy
Have you tried pandas' own plotting faculties (e.g. data_series.plot())?

cinci zoo sniper
Mar 15, 2013




Python 2 question. Say, I have my_module.py and task.py I use to do things with my module. Is there a convenient way to implement a function into my_module.py that I could invoke later in the end of task.py so it produces a session.py file which will contain both my_module.py and task.py inside it?

Edit: Also, is there convenient way to implement a function into my_module.py that I could invoke later in the end of task.py which would export, to a text file or to same session.py mentioned before, list with Python version and packages' names/versions used during the execution of task.py?

cinci zoo sniper fucked around with this message at 20:40 on Sep 25, 2015

Dominoes
Sep 20, 2007

Cingulate posted:

Have you tried pandas' own plotting faculties (e.g. data_series.plot())?
That plots the dates of that data, but I'm unable to get it to plot along-side separate data; xs in my example.

pmchem
Jan 22, 2010


Emacs Headroom posted:

Notebooks have a lot of the nice features of an IDE (like autocomplete), and is in some cases a little better, since often the objects are in memory and can be introspected on, as opposed to an IDE which has to rely on janky type annotations and the like.

Spyder, the IDE included with Anaconda, can do introspection and also provides an ipython console.

Munkeymon posted:

They might want to throw their own data at your code?

...why can't they do that with a regular Python script? There's nothing magical about ipynb that allows people to provide different inputs and regenerate output that can't be done with regular scripts/modules/IDEs.

Still agreeing with QJ for my HPC workflow too! Can't even possibly do some of the work things I do with Python modules in a ipynb. Notebooks might be useful for short tasks or teaching but no need to use them for everything.

Demonachizer
Aug 7, 2004
I am trying out the Project Euler problem sets to learn some python and I can't for the life of me get an output I want from some functions I made
code:
import sys
import math

inputvalue = int(sys.argv[1])

def firstprimedivisor(n):
    for i in range (2, n):
        if n/i == int(n/i) and checkprime(i):
            return i
            break

def checkprime(n):
    stop=int(math.ceil(math.sqrt(n)))
    for i in range (2, stop):
        if n/i == int(n/i):
            return 0
        else:
            return 1

def hpvfinder(n, out):
    if n == 1:
        return out
    else:
        out = max(out, n/firstprimedivisor(n))
        n = n/firstprimedivisor(n)
        return hpvfinder(n, out)

def main():
    if inputvalue < 2:
        print('Entered value was less than 2')
        exit
    else:
        print(firstprimedivisor(inputvalue)
        print int(hpvfinder(inputvalue, 0))


main()


I am hoping to input a value and find the first prime divisor then divide the input by that and then repeat with what remains until I have a list of the prime divisors with the max value of that list being the output. I only get a result of none out of firstprimedivisor(). It seems like I do not know how to return values from functions that are using for loops correctly... Is there anything super obvious?

I was able to do this with recursion but it falls apart with a very small input.

Demonachizer fucked around with this message at 20:57 on Sep 26, 2015

OnceIWasAnOstrich
Jul 22, 2006

Demonachizer posted:

I only get a result of none out of firstprimedivisor(). It seems like I do not know how to return values from functions that are using for loops correctly... Is there anything super obvious?


Interestingly, your function works correctly in Python 3 but not Python 2. If you are trying to run this in 2.x then your n/i == int(n/i) doesn't make any sense, because division in 2.x returns the truncated answer instead of a floating point number.

edit: You can from __future__ import division and avoid this problem in 2.x, or you could change your math to use the modulo operator or something.

Also, in general it is preferred to use True/False instead of 1/0 for something like checkprime, and to name it something like is_prime, but those are just style things.

OnceIWasAnOstrich fucked around with this message at 21:29 on Sep 26, 2015

Nippashish
Nov 2, 2005

Let me see you dance!
n % i == 0 is a better way to check for divisibility.

Proteus Jones
Feb 28, 2013



OK, I'm running into an issue I'm just not getting. pexpect is dumping out on me because of a weird error regarding the timeout value. It dumps out every time at 'i = device.expect(self.password_prompt, self.en_prompt)' with the following error:
code:
  File "/Users/philf/anaconda/lib/python3.4/site-packages/pexpect/__init__.py", line 1514, in expect_loop
    end_time = time.time() + timeout
TypeError: unsupported operand type(s) for +: 'float' and 'str'
The thing is, when I put a break point right before that line, the device object has timeout = {int} 30 when I inspect it in the pycharm debugger. This is driving me crazy. I've tried manually setting

Python code:

    def check_ssh_logon(self, device, results, pull_results):
        ssh_logon_success = False
        ssh_logon_return = [SSH_FIRST_TIME,
                            self.password_prompt,
                            pexpect.TIMEOUT,
                            pexpect.EOF
                            ]
        ssh_attempt = device.expect(ssh_logon_return)
        if ssh_attempt == 0:
            device.sendline('yes')
            device.expect(self.password_prompt)
            device.sendline(self.password)
            ssh_logon_success = True
        elif ssh_attempt == 1:
            ssh_logon_success = True
            device.sendline(self.password)
        elif ssh_attempt == 2:
            ssh_logon_success = False
            print('!!!SSH TIMEOUT!!!\r\n')
            results.write('ssh,N,NETWORK TIMEOUT\r\n')
            pull_results.write('ssh,N,NETWORK TIMEOUT\r\n')
        elif ssh_attempt == 3:
            ssh_logon_success = False
            print('!!!SSH session refused!!!\r\n')
            results.write('ssh,N,SESSION NOT ALLOWED\r\n')
            pull_results.write('ssh,N,NETWORK TIMEOUT\r\n')
        return ssh_logon_success

    def ssh_try(self, results, pull_results):
        credential_error = '\r\n!!BAD CREDENTIALS(ssh)' \
                           '\r\n*****************\r\n\r\n'
        success_string = '\r\n *** Success (ssh) ***\r\n\r\n'
        device = pexpect.spawnu('ssh ' + self.username + '@' + self.ip)
        ssh_success = self.check_ssh_logon(device, results, pull_results)
        if ssh_success:
            i = device.expect(self.password_prompt, self.en_prompt)
            if i == 0:
                print(credential_error)
                results.write('ssh,N,BAD CREDENTIALS\r\n')
                pull_results.write('ssh,N,BAD CREDENTIALS\r\n')
                device.terminate()
                return
            elif i == 1:
                print(success_string)
                self.process_commands(results, pull_results, device)
        else:
            device.terminate()
            return
EDIT: NVM. As soon as I hit submit, I figured it out. I forgot to make the device.expect into a list.

Python code:
i = device.expect([self.password_prompt, self.en_prompt])

Dumlefudge
Feb 25, 2013

'ST posted:

Just add each value for "ip" into a running set: https://docs.python.org/3/library/stdtypes.html?highlight=set#set-types-set-frozenset

Something like
Python code:
node_values = set()
for node_obj in response['nodes']:
    node_values.add(node_obj['ip'])

Thanks for the suggestion - I'm rather new to Python (and my algorithm-fu is lackluster), so the help is appreciated!

The March Hare
Oct 15, 2006

Je rêve d'un
Wayne's World 3
Buglord

The March Hare posted:

I'm doing the Coursera Algorithms 1 class in order to (hopefully) get myself a bit more comfortable in interviews, and also because I think it is probably worth doing. That said, the course is in Java, which I don't know or really care to learn, so I'm doing it in Python instead. Because of this, however, I don't really have a feedback option for my implementation of the homework stuff. If any of you guys have some free time, I'd appreciate some crits.

Posted about this ~a month ago, got some good feedback, thank you dudes :~). The course I was in was ending though, so I had to wait a bit for it to start back up again so I could access the new material.

I just finished week 2, which I found a bit easier than week 1, but I'm still hoping for some feedback!

I've done away w/ getters and setters for week 2 (I'm familiar w/ them, and have an understanding of properties in Python anyway, so I feel OK with "cheating" there.) But otherwise have tried to stick pretty closely to the spec, including implementing resizing arrays using lists as the base container, even though it felt really dumb while I was doing it.

This week was Deques and Randomized Queues. I opted to use a doubly linked list for the Deque implementation, and a resizing array for Randomized Queues. I'm pretty open to feedback on absolutely anything, but especially if I have obviously hosed something up or misunderstood it.

My code is here https://github.com/LarryBrid/algos/tree/master/Week%202%20-%20Randomized%20Queues%20and%20Deques and the assignment spec is here http://coursera.cs.princeton.edu/algs4/assignments/queues.html -- TIA for any feedback, it's a big help and I'll buy you all beer once I get a job :~).

The March Hare fucked around with this message at 20:42 on Sep 27, 2015

KICK BAMA KICK
Mar 2, 2009

Oh hey, I'm doing that course too right now, also my first experience with Java. Just finished debugging the Week 3 assignment to 100%, significantly tougher than the other ones I thought. I was also tempted to do it in Python and then translate but stuck with the Java. I always liked being able to just spit a command at the Python interpreter and see what comes out but never really appreciated how much easier it is to debug. I don't mind static typing and semicolons and stuff like that but I don't think I can live without the REPL.

The March Hare
Oct 15, 2006

Je rêve d'un
Wayne's World 3
Buglord

KICK BAMA KICK posted:

Oh hey, I'm doing that course too right now, also my first experience with Java. Just finished debugging the Week 3 assignment to 100%, significantly tougher than the other ones I thought. I was also tempted to do it in Python and then translate but stuck with the Java. I always liked being able to just spit a command at the Python interpreter and see what comes out but never really appreciated how much easier it is to debug. I don't mind static typing and semicolons and stuff like that but I don't think I can live without the REPL.

Sup classmate~ I considered doing it in Java, but I've never used it before and it seemed like learning Java and the course material at the same time would probably have been worse than doing it in something I'm already familiar w/ and have a nice development environment set up for. I've done a few other online classes, and this is probably the best one I've taken so far. It's super clear and feels like it has a really good pace to it.

Emacs Headroom
Aug 2, 2003

KICK BAMA KICK posted:

I always liked being able to just spit a command at the Python interpreter and see what comes out but never really appreciated how much easier it is to debug. I don't mind static typing and semicolons and stuff like that but I don't think I can live without the REPL.

/derail

Are you using IntelliJ? Do so if you're not already, and run in debug (click your breakpoints wherever). You can look inside objects, step into and over functions, etc. It's fairly easy to debug Java as long as you're using an IDE.

Once you're used to that, look into Scala -- you get the interactive notebook / REPL, static typing, compile time checking, and functional stuff all at once.

KICK BAMA KICK
Mar 2, 2009

Emacs Headroom posted:

Are you using IntelliJ?
They provided us with a package that installs Java and a few class-specific libraries and this godawful IDE called DrJava; it's so bad I did go download the IntelliJ Community Edition, being very familiar with PyCharm, but I haven't set it up yet. DrJava does have a debugger with breakpoints, stepping, watch variables and all that, it's just real lovely at it, look and feel is like Windows 3.1 software. Will make a point to get going with IntelliJ.

Have always wanted to check out a functional language too, will look into Scala when I get a chance.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



pmchem posted:

...why can't they do that with a regular Python script? There's nothing magical about ipynb that allows people to provide different inputs and regenerate output that can't be done with regular scripts/modules/IDEs.

QJ was arguing against sharing code at all, not about how it's shared and that bugs me purely from a consumer of scientific output standpoint because it's a great way to accidentally hide errors that affect results.

Proteus Jones
Feb 28, 2013



Munkeymon posted:

QJ was arguing against sharing code at all, not about how it's shared and that bugs me purely from a consumer of scientific output standpoint because it's a great way to accidentally hide errors that affect results.

But that's *not* what he's saying? Unless I completely missed something elsewhere, he even mentioned having the code available for review to make sure the methodologies are appropriate and correct. He's just saying he doesn't see the point of a Notebook at all and it's sloppier than using a formally coded program.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



flosofl posted:

But that's *not* what he's saying? Unless I completely missed something elsewhere, he even mentioned having the code available for review to make sure the methodologies are appropriate and correct. He's just saying he doesn't see the point of a Notebook at all and it's sloppier than using a formally coded program.

Yeah, maybe I was reading that paragraph in isolation rather than proper context, but that's what it seemed like at the time :shrug:

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Munkeymon posted:

Yeah, maybe I was reading that paragraph in isolation rather than proper context, but that's what it seemed like at the time :shrug:

This happens to me. People will be arguing about something I don't give a poo poo about so I'll skim it and a paragraph will catch my eye and I'll be like "WHAT THE gently caress, YOU'RE WRONG MISTER" and then proceed to miss the point.

QuarkJets
Sep 8, 2008

Munkeymon posted:

QJ was arguing against sharing code at all, not about how it's shared and that bugs me purely from a consumer of scientific output standpoint because it's a great way to accidentally hide errors that affect results.

flosofl posted:

But that's *not* what he's saying? Unless I completely missed something elsewhere, he even mentioned having the code available for review to make sure the methodologies are appropriate and correct. He's just saying he doesn't see the point of a Notebook at all and it's sloppier than using a formally coded program.

flosofl understands what I'm trying to say. Code review is useful and should be a more central part of the scientific process, because it's lovely if bad code results in false positives or false negatives. But sometimes you want to test a methodology, rather than a specific implementation, and in those cases you should write your own code. This is analogous to building an experimental apparatus (writing code) vs using someone else's apparatus (receiving code).

To me, the question of whether or not notebooks enhance any of that does not have a clear answer. For code review, you need to share your entire code package anyway, so putting some of it in a notebook just complicates the process. I can see the appeal of having example code in a notebook for testing and educational purposes. For instance, it seems like it'd be cool to include a .ipynb with a github project, but hopefully you've written a nice README that accomplishes the same thing, so at that point the .ipynb feels kind of superfluous. Maybe you can use a .ipynb to more easily generate the README and then include both?

Nippashish
Nov 2, 2005

Let me see you dance!
Don't you ever just want to poke around some data and check out what it looks like? Maybe look at it, plot a thing, flip it upside down, plot another thing, etc, etc? Notebooks are great for that.

pmchem
Jan 22, 2010


Nippashish posted:

Don't you ever just want to poke around some data and check out what it looks like? Maybe look at it, plot a thing, flip it upside down, plot another thing, etc, etc? Notebooks are great for that.

You don't need a notebook for that, though. Other things are great for it too. One example is using Spyder in Anaconda and running code which executes matplotlib commands after data analysis and displays them in the ipython console. You could have a bash script that runs a command-line python script which renders plots and then displays them afterward.

Notebooks are certainly convenient for some things, but for your example I think Spyder works equally as well (if not better).

Cingulate
Oct 23, 2012

by Fluffdaddy

pmchem posted:

You don't need a notebook for that, though. Other things are great for it too. One example is using Spyder in Anaconda and running code which executes matplotlib commands after data analysis and displays them in the ipython console. You could have a bash script that runs a command-line python script which renders plots and then displays them afterward.

Notebooks are certainly convenient for some things, but for your example I think Spyder works equally as well (if not better).
But then, everybody who only wants to look at the results needs the full data, and the used libs, and the computing power.

pmchem
Jan 22, 2010


Cingulate posted:

But then, everybody who only wants to look at the results needs the full data, and the used libs, and the computing power.

I must be missing something, so your basic argument is that the notebook saves some data, like a ndarray object or whatever (take your pick of data object), and then you can plot it how you want?

I mean if that's really what you want to do, you can just save the data to a file and read it and plot it in a myriad of ways?

Cingulate
Oct 23, 2012

by Fluffdaddy

pmchem posted:

I must be missing something, so your basic argument is that the notebook saves some data, like a ndarray object or whatever (take your pick of data object), and then you can plot it how you want?

I mean if that's really what you want to do, you can just save the data to a file and read it and plot it in a myriad of ways?
The notebook contains the plots.

pmchem
Jan 22, 2010


Cingulate posted:

The notebook contains the plots.

Yeah, that falls under the "take your pick of data object", as it is an object storing the data that is plotted. That presumably only includes the data that was passed to the object at time of plot creation and not arbitrary previously calculated raw data. So, if you created a plot based on a slice of a list or ndarray you'd only have access to that slice in the plot object (yes?). If that is right, it's actually more limited than a traditional workflow where you dump the data to a file -- it may be a large amount of raw data -- and then manipulate later, plotting as you please. Unless, of course, you're going to recalculate the raw data and at that point you're no better off than doing it the traditional way either.

Again, just like QJ, I'm failing to see a real advantage to the notebook workflow other than (a) personal preference because you're used to doing it that way or (b) educational.

Can someone link a notebook that's doing something useful enough that I'll want to adopt it into my daily workflow?

Adbot
ADBOT LOVES YOU

Emacs Headroom
Aug 2, 2003
A few more advantages of notebooks (a non-exhaustive list):

1. Very quick iteration time -- instead of "edit script, save script, run script, look at plots, repeat", it's "edit cell, hit shift-enter, repeat"
2. Collaboration -- you can send notebooks to technical people, or send html output to not-so-technical people, or run a jupyter server and edit collaboratively
3. Widgets and whatnot -- you can quite easily make interactive javascript to let people (or yourself) mess around with different views of the data

But at the end of the day they're a tool. If you insist that you don't like them as a tool, or that the tools you have are better, we're not your boss and that's fine.

  • Locked thread