Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Gothmog1065 posted:

What issue? I use variable1/2/3 all the time (or list_1, list_2, list3 and never had a problem).

There's actual technical issues, as in "you can't do that, it's a syntax error" and then there's "you shouldn't do that, it makes the code hard to read"

If you are writing a short function that obviously does something with two lists (because of the nature of whatever it does) then having variables named list1 and list2 is not a big deal, in any other situation you can call variables names like that but you probably shouldn't.

I know this is tangential to the discussion at hand, but I want to make what I think is a useful stylistic point.

Adbot
ADBOT LOVES YOU

SurgicalOntologist
Jun 17, 2004

Down that road lies questions like "how can I iterate over integers in variable names" and eval.

OnceIWasAnOstrich
Jul 22, 2006

SurgicalOntologist posted:

Down that road lies questions like "how can I iterate over integers in variable names" and eval.

Who needs eval?

Python code:
for list_ in ({**globals(),**locals()}['list{}'.format(x)] for x in range(3)):
        print(list_)

Dominoes
Sep 20, 2007

Is there a clean way to update and merge timeseries' in Pandas? For example, I have data_current, which is a timeseries of dates and values from October 1 - October 10; I have data_future, which is in the same format, from October 7 - October 20. I would like to merge the two into a timeseries from October 1 - October 20. The overlapping values would have to match, or raise an error. (Or you could specify a default, like the first series value, the second series value, NA etc.)

edit: Solved with updated_data = data_current.combine_first(data_future)

Dominoes fucked around with this message at 21:31 on Sep 15, 2015

ahmeni
May 1, 2005

It's one continuous form where hardware and software function in perfect unison, creating a new generation of iPhone that's better by any measure.
Grimey Drawer

Hadlock posted:

OpenCV just got official Python 3 support in June of this year. There's enough big legacy projects out there that still require 2.7, and I haven't run in to anything that explicitly requires 3 yet. Async sounds interesting though.

Only thing I've found so far is JupyterHub, which is a nice shared implementation of the Jupyter/iPython notebook. I've got a pull request pending to fix domain username support and will be hassling my DevOps group shortly to join in the pandas / bokeh fun.

qntm
Jun 17, 2009
I'm working on a library for finite state machines. When you make a finite state machine you have to supply an alphabet of symbols, which in this case is a set of hashable values (they get used as keys in a dict). But I would also like there to be a special "anything else" value which you can add to your alphabet. So if your alphabet is {"a", "b", "c", "d", fsm.anything_else}, and you pass "e" into your FSM, then the library treats that as "anything else" and follows the transition you selected for that special value.

Problem is, what should I set the value fsm.anything_else to? I can't set it to "e" because that might be part of the user's chosen alphabet. I can't set it to None for the same reason. In fact the alphabet could in theory legitimately contain any hashable value. Users don't care, of course, unless there's a clash, because they'll use the symbol, not the value. Is the best approach really to just use a very large integer which nobody is ever likely to run into by accident?

Nippashish
Nov 2, 2005

Let me see you dance!

qntm posted:

Problem is, what should I set the value fsm.anything_else to?

Make an empty class called AnythingElse and set it to an instance of that.

Kuule hain nussivan
Nov 27, 2008

I am trying to make a GUI with Python using Tkinter. This is also the first thing I've made in Python so I'm 100% sure this is a retarded question.

I have a main function which sets up the window and widgets for the UI. I have a button that calls a function, which injects text into a Field in the UI. If I do...

def foo():
baa.insert(INSERT, "This is text")

def main():
boo = Button(command=foo())
baa = Text()

I get a complaint in foo that baa is an unresolved reference. If I do it the other way around, boo complains that foo is an unresolved reference. Is there some sort of forward declaration in Python, or is there something else I'm completely missing?

Asymmetrikon
Oct 30, 2009

I believe you're a big dork!
First of all, looking at the Tkinter docs, command is a callback, so it expects a function; you're passing it the result of calling foo.

I don't know Tkinter, so I don't know if there's a way of passing in variables at call time, but you can use a closure to make a function with baa bound correctly, like so:

code:
def mkFoo(text):
  def foo():
    text.insert(INSERT, "This is text")
  return foo

def main():
  baa = Text()
  foo = mkFoo(baa)
  boo = Button(command=foo)

tef
May 30, 2004

-> some l-system crap ->

qntm posted:

I'm working on a library for finite state machines. When you make a finite state machine you have to supply an alphabet of symbols, which in this case is a set of hashable values (they get used as keys in a dict). But I would also like there to be a special "anything else" value which you can add to your alphabet. So if your alphabet is {"a", "b", "c", "d", fsm.anything_else}, and you pass "e" into your FSM, then the library treats that as "anything else" and follows the transition you selected for that special value.

Problem is, what should I set the value fsm.anything_else to? I can't set it to "e" because that might be part of the user's chosen alphabet. I can't set it to None for the same reason. In fact the alphabet could in theory legitimately contain any hashable value. Users don't care, of course, unless there's a clash, because they'll use the symbol, not the value. Is the best approach really to just use a very large integer which nobody is ever likely to run into by accident?

foo = object()

>>> foo = object()
>>> bar = object()
>>> foo == bar
False
>>> foo is bar
False
>>> any(foo == x for x in [1,2,3,"a","b","c", None, bar])
False

Kuule hain nussivan
Nov 27, 2008

Asymmetrikon posted:

First of all, looking at the Tkinter docs, command is a callback, so it expects a function; you're passing it the result of calling foo.

I don't know Tkinter, so I don't know if there's a way of passing in variables at call time, but you can use a closure to make a function with baa bound correctly, like so:

code:
def mkFoo(text):
  def foo():
    text.insert(INSERT, "This is text")
  return foo

def main():
  baa = Text()
  foo = mkFoo(baa)
  boo = Button(command=foo)
Yeah, that was a mistake on my part. Luckily, it looks like my original problem was just a fart with the IDE, since moving the main method to the top didn't cause any problems the second time. So no need for closures. Now my only problem is that the sqlite query seems to have trouble with an input string that's more than 1 character long.

Edit: Nevermind, got it to work. Changing the parameter to a list rather than a single parameter did the trick.

Kuule hain nussivan fucked around with this message at 17:20 on Sep 17, 2015

Gothmog1065
May 14, 2009
Okay, more of a theoretical question than a straight code question, but I'm still working on the Skynet game I started ([urlhttps://gist.github.com/gothmog1065/449da65f1320fb6390c1]Here is my ultimate revised code[/url] and Here is the game itself).

Now if you just play the "triple star" scenario, you'll see there's 3 gateways the "enemy" can go to. One of the achievements is to complete the game (trap the enemy so it can't get to any node) with 50 links available. I've gotten it close, but my current code leaves 41 links left, which means you have to trap the enemy on one of the stars and not let it out. My question is how do you predict where the enemy is going well enough to accomplish that? Playing out with my current code, it leaves three "wasted" link breakages, the rest stop them directly at the gateway.

I guess I'm trying to problem solve it now.

One way I could think of is to break certain links (the links on the rings that are only between two nodes) at the ends of the "stars", and when the virus goes around a star and has to come back, break the other side so it's trapped on the outer ring of a star, but I'm not sure how I'd go about determining where to break.

FoiledAgain
May 6, 2007

I'm experiencing some confusion/frustration with PyQt.

I have a QDialogBox that returns a Corpus object, like this:

code:
dialog = CorpusLoadDialog(self, self.settings)
result = dialog.exec_()
if result:
    corpus = dialog.corpus
The Corpus has an attribute .inventory. This is an Inventory object, which itself is a subclass of QAbstractTableModel. But when I try to call any inherited methods, I get a RuntimeError, e.g.

code:
corpus.inventory.headerDataChanged.emit(1,1,1)
RuntimeError: super-class __init__() of type Inventory was never called
As far as I can tell, Python is just lying to me here. I really do call super(), and the call works. Relevant code looks like this:

code:
class Corpus(object):

    def __init__(self):
        self.inventory = Inventory()
        self.inventory.headerDataChanged.emit(1,1,1) #does not raise an Error

class Inventory(QAbstractTableModel):
    def __init__(self):
        super().__init__() #Here's the crucial line!
As far as I can tell, the super() call works, because I can call Inventory's inherited methods back up inside the Corpus.__init__() method without a problem. However, as soon as the Corpus gets returned from the QDialogBox, all bets are off, as I described at the beginning of the post. Trying to called an inherited method raises the RuntimeError claiming I never called super().

How can the super() called get "nullified" like this? Am I using super() incorrectly?

I should also mention that the Inventory class has .data(), .headerData(), .columnCount(), and .rowCount(), i.e. all the methods you're required to implement when you subclass QAbstractTableModel. If I call super() a second time after getting the Corpus back from the QDialogBox, then things partially work (but not completely, because the QTableView that gets this model refuses to display headers). In any case, calling super() twice feels really suspicious, and I don't think it's an acceptable work-around.

tef
May 30, 2004

-> some l-system crap ->

FoiledAgain posted:

How can the super() called get "nullified" like this? Am I using super() incorrectly?

super takes args in 2.x: https://docs.python.org/2/library/functions.html#super

in the current version of python, you can use the no-arg form: https://docs.python.org/3/library/functions.html#super "The zero argument form only works inside a class definition, as the compiler fills in the necessary details to correctly retrieve the class being defined, as well as accessing the current instance for ordinary methods."

my advice is to do SuperClass.__init__(self) explicitly and just avoid super altogether unless you're doing multiple inheritance

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

Are you working in Python 2 or 3? You're manually inheriting from object which implies Python 2, but calling super() without arguments which implies (really requires) Python 3.

My best guess is there's some weird inheritance diamond stuff going on. What happens if you change the super call to super(Inventory, self).__init__()?

FoiledAgain
May 6, 2007

chutwig posted:

Are you working in Python 2 or 3? You're manually inheriting from object which implies Python 2, but calling super() without arguments which implies (really requires) Python 3.

My best guess is there's some weird inheritance diamond stuff going on. What happens if you change the super call to super(Inventory, self).__init__()?

I'm using 3.4, but I started on 2.7, so I'm in the habit of inheriting from object.
super(Inventory, self).__init__() has the same effect as plain super()

tef posted:

my advice is to do SuperClass.__init__(self) explicitly and just avoid super altogether unless you're doing multiple inheritance

Calling QAbstractTableModel.__init__(self) doesn't work either. Same RuntimeError.

FoiledAgain fucked around with this message at 21:55 on Sep 20, 2015

FoiledAgain
May 6, 2007

I think I'm on to something. I just discovered that a pickled copy of the Corpus is saved before the DialogBox returns. A little googling suggests that you can't reliably pickle some Qt objects (possibly including QAbstractTableModel). Does anyone know anything about this?

FoiledAgain
May 6, 2007

I fixed the problem I was describing in my last two posts, but in the process came across another weird Qt thing. When trying to display my TableView headers, I initially had this code:

code:
def headerData(self, row_or_col, orientation, role=None):
        try:
            if orientation == Qt.Horizontal:
                return self.column_data[row_or_col]
            elif orientation == Qt.Vertical:
                return self.column_data[row_or_col]
        except KeyError:
            return QVariant()
But no headers would show up. I changed the code very minimally to include a comparison, and suddenly everything works:

code:
def headerData(self, row_or_col, orientation, role=None):
        try:
            if orientation == Qt.Horizontal and role == Qt.DisplayRole:
                return self.column_data[row_or_col]
            elif orientation == Qt.Vertical and role == Qt.DisplayRole:
                return self.column_data[row_or_col]
        except KeyError:
            return QVariant()
Why is this? How come checking for Qt.DisplayRole makes the headers visible? It's not like I'm actually setting the Role, I'm just doing a comparison.

hooah
Feb 6, 2006
WTF?
I'm trying to wrap my head around DEAP (documentation), and ran across an error I don't really understand when trying to do a simple onemax genetic algorithm. Here's my setup code:
Python code:
import random

from deap import tools
from deap import base
from deap import creator
from deap import algorithms


def evaluate(individual):
    return sum(individual),


def select(population):
    return population


def randBinList(n):
    return [random.randint(0, 1) for _ in range(1, n+1)]

IND_SIZE = 20
POP_SIZE = 100

creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox = base.Toolbox()
toolbox.register("attr_bin", randBinList)
#toolbox.register("attr_float", random.random)
toolbox.register("individual", tools.initRepeat, creator.Individual,
                 toolbox.attr_bin, n=IND_SIZE)
toolbox.register("mate", tools.cxOnePoint)
toolbox.register("mutate", tools.mutFlipBit, indpb=0.2)
toolbox.register("select", select)
toolbox.register("evaluate", evaluate)

pop = [toolbox.individual() for _ in range(POP_SIZE)]
When I run this, I get a TypeError on the last line: "randBinList() missing 1 required positional argument: 'n'". I get that randBinList isn't getting its argument, but I don't understand why. I thought the initRepeat should take care of that?

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

hooah posted:

When I run this, I get a TypeError on the last line: "randBinList() missing 1 required positional argument: 'n'". I get that randBinList isn't getting its argument, but I don't understand why. I thought the initRepeat should take care of that?
It's because when you supply n=IND_SIZE, that's going to get passed as an argument to tools.initRepeat and not to randBinList. If that register were turned into a method invocation, it would look like
Python code:
tools.initRepeat(creator.Individual, toolbox.attr_bin, n=IND_SIZE)
The callable that you supply to tools.initRepeat looks like it needs to take no arguments or supply defaults for any arguments. If you register randBinList with an argument for n, it works as expected:
Python code:
toolbox.register("attr_bin", randBinList, n=IND_SIZE)

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
So what do people use for testing within IPython notebooks? Do people do testing within IPython notebooks? I use them for laying out scientific analyses, so it's fairly important to "get things right". On one hand, you could argue that anything substantial should be rolled out into an external module with it's own testing, but with science we're alway trying new analyses and bespoke / ad hoc techniques, so there's always going to be something new in there.

hooah
Feb 6, 2006
WTF?

chutwig posted:

It's because when you supply n=IND_SIZE, that's going to get passed as an argument to tools.initRepeat and not to randBinList. If that register were turned into a method invocation, it would look like
Python code:
tools.initRepeat(creator.Individual, toolbox.attr_bin, n=IND_SIZE)
The callable that you supply to tools.initRepeat looks like it needs to take no arguments or supply defaults for any arguments. If you register randBinList with an argument for n, it works as expected:
Python code:
toolbox.register("attr_bin", randBinList, n=IND_SIZE)

That makes sense, but after changing the attr_bin registration (and leaving the individual registration as is), the framework seems to be calling randBinList twice, so I get a population that has individuals which are made up of 20 lists of 20 1s/0s apiece. I changed the individual's register to take n=1 and left the attr_bin register at 20, but that made each individual a list that contains a single list which then contains the 20 binary elements.

Dominoes
Sep 20, 2007

I'm looking for advice on cleaning a pandas timeseries. I currently have financial data in minute increments. Sometimes the data includes every minute, sometimes it goes in increments of a few minutes at a time. The cleaned needs to include an entry for every minute, during set hours on set days.

I can populate data for the missing minutes minute using this: (As nan, backfill, forward fill etc)
Python code:
data.asfreq('1Min')
However, this also fills the large chunks of time I'm not interested in.

I can then filter for the relevant times and dates using this:
Python code:
for timestamp in data.index:
    if not in_range(timestamp):
        data.drop(timestamp, inplace=True)
Where in_range is a function that evaluates if each time is in the range. However, this currently takes an unnaceptably-large amount of time to run through the data. I could try to optimize my in_range function, but I suspect there's a cleaner way to do this.

The Pandas docs on Timeseries' and Missing data seem to point in the right direction, but I'm unable to find a built-in solution.

Dominoes fucked around with this message at 17:03 on Sep 23, 2015

Emacs Headroom
Aug 2, 2003
I'd probably do it a super lazy way, like

Python code:
times = [start_t + datetime.timedelta(minutes=i) for i in range(something)]
full_series = pd.Series(np.zeros(len(times)), index=times)
full_series[partial_series.index] = partial_series
edit: er, just replace 'times' with the minutes of the times you do actually care about and it should work

Emacs Headroom fucked around with this message at 06:35 on Sep 24, 2015

QuarkJets
Sep 8, 2008

outlier posted:

So what do people use for testing within IPython notebooks? Do people do testing within IPython notebooks? I use them for laying out scientific analyses, so it's fairly important to "get things right". On one hand, you could argue that anything substantial should be rolled out into an external module with it's own testing, but with science we're alway trying new analyses and bespoke / ad hoc techniques, so there's always going to be something new in there.

Just use git or some other version control software. I don't see a lot of advantage to using IPython notebooks for scientific analyses

ahmeni
May 1, 2005

It's one continuous form where hardware and software function in perfect unison, creating a new generation of iPhone that's better by any measure.
Grimey Drawer

outlier posted:

So what do people use for testing within IPython notebooks? Do people do testing within IPython notebooks? I use them for laying out scientific analyses, so it's fairly important to "get things right". On one hand, you could argue that anything substantial should be rolled out into an external module with it's own testing, but with science we're alway trying new analyses and bespoke / ad hoc techniques, so there's always going to be something new in there.

I'm always for more testing when possible. I've seen one or two people roll IPython Nose into their notebooks as a collapsed cell and it seems to be a nice and tidy way to keep things tested.

QuarkJets posted:

Just use git or some other version control software. I don't see a lot of advantage to using IPython notebooks for scientific analyses

You're right in that IPython is not version control, but it's also great when used with it. There is definitely an advantage to distributing your scientific analyses with built in data display via nice plotting systems like Bokeh. Plenty of exmamples in A Gallery Of Interesting IPython Notebooks.

QuarkJets
Sep 8, 2008

I never liked using Mathematica notebooks because of the whole fakeness of having to get a perfect-working state and then copying it all into a new notebook to make it look nice, and that's on top of a bunch of manual formatting. The whole thing feels hokey. At that point it feels like you'd save time just saving the output that you need and writing a Latex document. I imagine that IPython notebooks suffer from the same problems, but I guess I've never used them so I don't really know.

I mean I guess it works well if you're talking about < 100 lines of code, but that seems more relevant to a class room than to a lab. Unless the objective is to just use IPython notebooks for your frontend presentation on top of a bunch of backend code, but again I don't really understand the point of that. I'm not saying it's bad or stupid or whatever, I just don't understand it

Cingulate
Oct 23, 2012

by Fluffdaddy
At the point I'm at, doing science in Python without using iPython notebooks seems like using Python without list comprehensions. Sure, you still got a useful package, but why??

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

QuarkJets posted:

I never liked using Mathematica notebooks because of the whole fakeness of having to get a perfect-working state and then copying it all into a new notebook to make it look nice, and that's on top of a bunch of manual formatting. The whole thing feels hokey. At that point it feels like you'd save time just saving the output that you need and writing a Latex document. I imagine that IPython notebooks suffer from the same problems, but I guess I've never used them so I don't really know.

I mean I guess it works well if you're talking about < 100 lines of code, but that seems more relevant to a class room than to a lab. Unless the objective is to just use IPython notebooks for your frontend presentation on top of a bunch of backend code, but again I don't really understand the point of that. I'm not saying it's bad or stupid or whatever, I just don't understand it

I use IPython notebooks for science & analysis a lot ... and I'm in two minds about it. It's a very good way to document your analysis workflow and show results to colleagues. On the other hand, it's not the best development environment and once you start doing major complex code in the notebook, the cracks start to show. I really need to use that function I wrote in the other notebook ... uh, better cut and paste ...

QuarkJets
Sep 8, 2008

e: ^^^ Okay, yeah, that's basically how I felt about using Mathematica notebooks. It feels good for small analyses but messy for more complicated projects. So if I'm just going to analyze an output file real quick then that makes sense to put in a notebook, but if I need to generate a bunch of plots for a .tex paper that are all sort of formatted in the same way then it'd actually be faster to just write a function to do that

Cingulate posted:

At the point I'm at, doing science in Python without using iPython notebooks seems like using Python without list comprehensions. Sure, you still got a useful package, but why??

Can you elaborate, maybe? I mean I get that it's cool to be able to make plots and notes in the same window, and that you can make it look nice with some formatting. So it's basically useful for just the data analysis portion of a project, yeah?

In my line of work I'm either writing code that will run on a supercomputer, code that's complicated enough to be put in its own module, or both. A notebook isn't really a good IDE, and I have PyCharm for that anyway. I could use a notebook during analysis, I guess, but the rest of my workflow is already in PyCharm... so it feels easier to just write a script in PyCharm that outputs content directly to latex. What am I missing?

QuarkJets fucked around with this message at 10:04 on Sep 24, 2015

Cingulate
Oct 23, 2012

by Fluffdaddy
A lot of what scientists are doing is share analyses - at every stage, including preliminary stages. A notebook is self documenting, can be directly interacted it by the receiver, and can be easily annotated. And a lot of data scientific code isn't really that complicated that you need a real IDE.

Sure, there are steps that might better be done in a real IDE, but eventually, when you HAVE that module, you can still import it in the notebook. And if you HAVE crunched the numbers, you can still visualize (and document) it in the notebook

For me, another enormous benefit is how easily I can use our servers to crunch numbers and plot them all over a spotty SSH connection - I do a lot of work on airports, trains, etc. Sure, you can run your regular python session in e.g. a Screen. But that way you can't easily plot the results. It's trivial t reconnect to a notebook, look at previous results and pick up right where you left it.

In R, an analogous workflow exists with the rmarkdown/knitr packages for a reason - it's simply a great way to do data science.

Cingulate
Oct 23, 2012

by Fluffdaddy

QuarkJets posted:

e: ^^^ Okay, yeah, that's basically how I felt about using Mathematica notebooks. It feels good for small analyses but messy for more complicated projects. So if I'm just going to analyze an output file real quick then that makes sense to put in a notebook, but if I need to generate a bunch of plots for a .tex paper that are all sort of formatted in the same way then it'd actually be faster to just write a function to do that
Or just directly share the notebook.

Dumlefudge
Feb 25, 2013
I am fetching data from the API on a number of devices, each of which have a list of network nodes that they are associated with, in the following format
code:
# response 1
{
	# other fields omitted
	'nodes': [
		{
			'ip': '1.2.3.4'
			# more values here
		},
		{
			'ip':  '2.3.4.5'
		}
	]
}

# response 2
{
	'nodes': [
		{
			'ip': '1.2.3.4'
		}
	]
}
I want to take all the entries in 'nodes' and combine them into a single list, where no duplicates exist (items are considered duplicates if their 'ip' values are equal).

My first thought was to iterate over the list being built, as I inspect the content of each incoming dictionary, checking for a match.
However, that doesn't seem like the right way to go about it, since I end up iterating over a constantly-growing list.
Is there a cleaner way to approach this problem?

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

ahmeni posted:

I'm always for more testing when possible. I've seen one or two people roll IPython Nose into their notebooks as a collapsed cell and it seems to be a nice and tidy way to keep things tested.

That is so cool and so useful. Thanks!

'ST
Jul 24, 2003

"The world has nothing to fear from military ambition in our Government."

Dumlefudge posted:

My first thought was to iterate over the list being built, as I inspect the content of each incoming dictionary, checking for a match.
However, that doesn't seem like the right way to go about it, since I end up iterating over a constantly-growing list.
Is there a cleaner way to approach this problem?
Just add each value for "ip" into a running set: https://docs.python.org/3/library/stdtypes.html?highlight=set#set-types-set-frozenset

Something like
Python code:
node_values = set()
for node_obj in response['nodes']:
    node_values.add(node_obj['ip'])

Emacs Headroom
Aug 2, 2003

outlier posted:

I use IPython notebooks for science & analysis a lot ... and I'm in two minds about it. It's a very good way to document your analysis workflow and show results to colleagues. On the other hand, it's not the best development environment and once you start doing major complex code in the notebook, the cracks start to show. I really need to use that function I wrote in the other notebook ... uh, better cut and paste ...

As my analysis / modeling evolves, I end up moving code from inside the notebook to a library in Python that gets imported into the notebook (and re-used by other notebooks). The library is the time to add unit tests, write good docstrings, etc.

If you're doing data engineering in industry, the library can also be a good reference to base your streaming / hadoop / spark / whatever version on as well.

Proteus Jones
Feb 28, 2013



Emacs Headroom posted:

As my analysis / modeling evolves, I end up moving code from inside the notebook to a library in Python that gets imported into the notebook (and re-used by other notebooks). The library is the time to add unit tests, write good docstrings, etc.

If you're doing data engineering in industry, the library can also be a good reference to base your streaming / hadoop / spark / whatever version on as well.

I don't use notebooks (really I do different stuff with my programs), but I've found creating libraries has been very useful to me, since there have been some classes I've used over and over again in different projects. I've found that while it adds a little more time to planning and a bit more effort programming and documenting when creating it for the first time, it's saved me oh so much time as opposed to cutting and pasting code and hammering it to fit.

**kwargs are my bestest friends.

QuarkJets
Sep 8, 2008

Cingulate posted:

Or just directly share the notebook.

That doesn't really work with anyone who's over 50. Old people want a white paper. And it certainly doesn't work if I want to publish a journal article

e: The notebook approach feels like it's the "code until it works" approach that scientific code is so ill-known for. Would you disagree?

QuarkJets fucked around with this message at 19:36 on Sep 24, 2015

Cingulate
Oct 23, 2012

by Fluffdaddy

QuarkJets posted:

That doesn't really work with anyone who's over 50. Old people want a white paper. And it certainly doesn't work if I want to publish a journal article
Yes, but so does everything that's not "a word document named manuscript_final-version_b_2014_reallyfinal_mkII_d.docx".

And you can easily export the notebook to a PDF.

QuarkJets posted:

e: The notebook approach feels like it's the "code until it works" approach that scientific code is so ill-known for. Would you disagree?
Big difference: everyone can see the (potentially bad) code you used to get to the results.

Cingulate fucked around with this message at 01:08 on Sep 25, 2015

Adbot
ADBOT LOVES YOU

pmchem
Jan 22, 2010


Cingulate posted:

Big difference: everyone can see the (potentially bad) code you used to get to the results.

I agree with QJets on this issue. If you work in a large team with diverse ages then trying to get people to adopt, or even look at, ipython nb's is a hopeless endeavor. Other programmers that are not primarily Python guys can open up .py source in vim or emacs but not ipynb files. Ipynb only really works when you're in a herd of other people that also use it. Even then, I wouldn't want to use it for large scientific projects when work on a remote machine will be done with regular py and not ipynb.

  • Locked thread