Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
SelfOM
Jun 15, 2010

JetsGuy posted:

Echoing matplotlib. It loving rules, and I've even used it in graphs for my professional academic journal articles.

I heartily endorse it.

Pretty much the first thing I do when I get a new machine is Python 2.7, NumPy, SciPy, PyFits and matplotlib.

I also recently got turned on to how awesome iPython really is. I'm really annoyed at myself for resisting using it for so long!

I recently switched to doing a lot of data analysis in R to IPython + the libraries you mentioned. One of the nice features is the interactive cluster functionality. Also the notebook option is amazing if you haven't tried it yet. Both of these features require zeromq and the latter requires tornado. Here is an example of it in use: http://healthyalgorithms.com/2012/02/09/powells-method-for-maximization-in-pymc/. It would be cool if ipython notebook had a nice export to html function though.

Also check out: http://pandas.pydata.org/. It's a really great data munging package.

Adbot
ADBOT LOVES YOU

SelfOM
Jun 15, 2010
I'm finding myself using this pattern with applies where I pass in a second pandas series or dataframes as an additional argument. The other dataframes are usually the same size, but the operation can't be done simply by matrix multiplication (or at least not at first glance). In addition, I kind of like being able to reference a position of a dataframe in the current apply by using the index, ie x.name in the example below even though there has to be a lookup as well as passing in another dataframe or series.

Python code:
import pandas as pd
import numpy as np

# df1 is a pd.DataFrame
# s1 and s2 is a pd.Series

def apply_func(x, series1, series2, tol=1e-10, maxiter=50):
    cur_beta = x[x>tol]/np.exp(offset[x>low_level])).sum()
    cur_beta = np.log(cur_beta/x.shape[0])
    # Newton-raphson
    for i in xrange(0, maxit):
        mu = np.exp(cur_beta + offset)
        denom = 1 + mu * series2[x.name]
        # More irrelevant stuff. 

df1.apply(apply_func, args=(s1, s2))

For optimization, I probably should be dropping down to C/Cython and using pointers in for loops rather than apply?

SelfOM
Jun 15, 2010
Something like this works for me: https://gist.github.com/anonymous/5962369, maybe it has something do with with what's going on in Button as this works for me:

Python code:
test = Test()
test.something(20)
test.seg_button.push()
I'm having with trouble with Cython memory views:
Python code:
import numpy as np
cimport numpy as np                                                                     
cpdef double test(int[:]):
      ### Stuff ## 
Used to work, now I get:
code:
cpdef double test(int[:]):
                  ^
Expected an identifier or literal

SelfOM
Jun 15, 2010
I find myself using this pattern a lot for testing for numpy arrays as optional arguments. Is this the best way to do this?

Python code:
import numpy as np

def test_func(x, y=None):
    if getattr(y, 'shape', False):
         print('Array passed for y')
    else:
         print('No array passed for y')

SelfOM
Jun 15, 2010

Thermopyle posted:

Also Guido wants to bring something like mypy into core python. In fact, you can use mypy with type annotations right now!
Python code:
def fib(n: int) -> Iterator[int]:
    a, b = 0, 1
    while a < n:
        yield a
        a, b = b, a+b
http://www.mypy-lang.org/tutorial.html

I really like this, but why not use syntax closer to Cython? I guess this is more pythonic but it would be great if it could all converge such that optionally statically typed data in python could then be easily compiled with cython. Cython is great because it already has integration with numpy and all the C libraries.

SelfOM
Jun 15, 2010
Does anyone know if in matplotlib you can initialize subplot axes dynamically without knowing the number of subplots you are going to have before hand? I want to be able to use functions like ax.add_patches in convenience functions that just add plots in sequentially. The other option I was thinking of holding is using holding objects that would call add_patches after everything is set.

SelfOM
Jun 15, 2010
Is there a way to have variable axes in matplotlib, ie from 1-100 scaled 10:1 and then from 100-110 be scale 1:1 then back to 10:1. I could scale the underlying data, but I want to avoid this.

SelfOM
Jun 15, 2010

QuarkJets posted:

You want to squish the data around another set of data, basically? I don't think that you should do this for many reasons that are non-programmatic, such as it could cause misinterpretation of your results. Make a second graph of the range that you're interested in, you could even plot it in the same figure, just please don't go loving with axes in weird ways like this.

People that don't understand variable scaling of an axis isn't the audience I care about. I've done the latter and the graphs already take up too much space.

SelfOM
Jun 15, 2010
The banded idea is smart.

Here is something similar where it scales differently in different regions along the X-axis (but there is no explicit x-axis labeling, which is problem for me, and looking at the code the data is scaled):
http://miso.readthedocs.org/en/fastmiso/_images/sashimi-plot-example.png

SelfOM
Jun 15, 2010

ShadowHawk posted:

Python2 will always be what gets run in the "python" command because it is brain-dead moronic for a Linux distribution to override that and break user scripts. Only one distro to my knowledge is hostile enough to users to have done that (I think Arch?)

Meanwhile everyone else is smart enough to have "python" run python2 like existing scripts expect and "python3" run python3, even when 100% of the system components are python3.

Distro scripts, for what it's worth, should explicitly declare the version of python they expect (eg #!/usr/bin/python2.7 or #!/usr/bin/python3.4) -- this tells you what remains to be tested and ported when you the distro maker are considering dropping an older version of python (say, 3.3). Sometimes porting is as simple as changing that line and seeing if anything breaks.

Yep it is definitely arch.

SelfOM
Jun 15, 2010
What is the best way to generate a sorted random int array of variable length in Cython (this array is generated in a loop, which I don' know how to type)? I'm trying to weave two arrays together, where switchover positions occur at the random indexes.

SelfOM
Jun 15, 2010

Rosalind posted:

My questions now are: 1. How the heck does anyone ever learn how to use this? I'm the first to admit that I'm no programming guru, but I have some experience with several different languages and actually getting started with Python makes absolutely no sense. Everything feels like the biggest clusterfuck of dependencies and "install X, Y, and Z to get W to work but to get X to work install A and B which require C to run on Windows." 2. Is there a simple, idiot-proof (because that's what I am apparently) guide to moving from R to Python for data analysis and hopefully eventually machine learning?

I do a lot of R and python. For windows I would use anaconda: https://conda.io/miniconda.html It's not perfect, but I find myself using it for non-python libraries as well. I don't recommend rpy2 at all for beginners, its for importing R objects, which often times you can just use subprocess and grab the output data. For R equivalent tables, install pandas. Conda should handle the dependencies.

Adbot
ADBOT LOVES YOU

SelfOM
Jun 15, 2010

Eela6 posted:

np.sort defaults to quicksort, also has mergesort and heapsort. For integers in a restricted range, counting sort should be fastest of all - if you're really burning for speed, you can use that. But this should be pretty fast.

Thanks this is the solution I'm using at the moment.

  • Locked thread