Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Nippashish
Nov 2, 2005

Let me see you dance!

Pollyanna posted:

Okay, this should be a (sorta) working version of Conway's Game of Life:

http://pastebin.com/7b8Zt6kf

It asks for a height and width, and currently doesn't toggle any cells. That comes once I figure out how to give it a GUI.

This should all probably be packaged up into a class so things like high and wide and the_world become member variables instead of globals, but this is orthogonal to my comments below.

Your code doesn't actually implement the correct rules for the game of life as described on the wiki page. I don't know if this is intentional or not.

You should name togglecell and checkneighbors as toggle_cell and check_neighbors respectively, to match the rest of your code and normal python convention.

neighbors would be clearer if you generate a list of all neighbors and then prune out the ones outside the grid like so (pretty formatting optional):
code:
def neighbors(x, y):
    my_neighbors = [
        (x-1, y-1), (x, y-1), (x+1, y-1),
        (x-1, y)              (x+1, y),
        (x-1, y+1), (x, y+1), (x+1, y+1),
        ]
    return [ (i,j) for i,j in my_neighbors if 0 <= i < wide and 0 <= j < high ]
You shouldn't call a variable neighbors inside a function called neighbors. Confusing things would start happening if you ever tried to call neighbors recursively.

toggle_cell can be simpler (it also isn't used anywhere):
code:
def toggle_cell(x, y):
    the_world[x,y] = not the_world[x,y]
next_round could just be replaced with the_world = updated_world unless you have some plan for why you need two copies of the updated world around.

check_neighbors is badly named for what it does. It really figures out what the location x,y should look like at the next time step.

I think a better structure would be to have a function like advance_time which would move your world one step into the future like this:
code:
the_world = advance_time(the_world)
The advance_time function takes a world dictionary as an argument and return a new world dictionary that represents the state of the world one time step in the future. You can implement this function in terms of a next_state function that works like this:
code:
new_world[x,y] = next_state(the_world, x, y)
The next_state function takes a world and a position and returns the state of the cell at the given position one time step in the future.

I know other people in the thread told you to use a dictionary to represent the world, but I think this is a really absurd data structure to represent a dense matrix. A 2d numpy array of booleans or integers is a much more sane choice imo.

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!

Pollyanna posted:

I renamed it "tick", so it'll return what the gameboard will look like next round (or "tick").
...
I think this is what I was trying to do with next_round, but it didn't work very well.

tick is a good name for that function.

Instead of having tick update the new board directly I recommend writing it so you use it like this:
code:
def tick(x, y, before):
    ...
    return next state for x,y (True or False)

...

next_world[x,y] = tick(x, y, array_world)
You should also probably have two tick functions, tick_world(before) and tick_position(x, y, before). tick_world is responsible for creating the new world and calling tick_position to fill in each cell and then returning the newly created world.

In general having functions create the values they return instead of accepting an argument that they mutate (like you do in tick now) will make your code a lot easier to reason about since you never need to worry that a function might change some data you give it.

Nippashish
Nov 2, 2005

Let me see you dance!

Pollyanna posted:

Cool, thanks. I updated it:

tick_world is still doing the mutate-its-argument thing.

Pollyanna posted:

Now I'm trying to figure out how to make it return an image of the array.

PIL has an Image.fromarray function for creating an image out of a numpy array. Alternatively you can use matplotlib.

Nippashish
Nov 2, 2005

Let me see you dance!
Since we're all comparing GOL implementations now here's mine: http://pastebin.com/zLu8eNWc. I've included the interface Dren's flatfile_gol.py expects for reading/writing grids, but everything of interest happens in tick().

Some times for different sized grids:
code:
50x50
1000 loops, best of 3: 1.82 msec per loop
200x200
10 loops, best of 3: 26.2 msec per loop
1000x1000
10 loops, best of 3: 652 msec per loop
For comparison, Dren's code takes 325 msec per loop on with a 200x200 grid on my machine.

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

I'm trying to refer to each point1, point2, point3 while I cycle through the for loop. Anyone?

Please avoid wanting to do that.

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

Difficult and abstract question, but how can I figure out how to speed my code up? It runs unbearably slow at the moment, taking 5-10 minutes to do just a bit of what I need.

Difficult and abstract answer, but try to vectorize it. For example, instead of writing
code:
for i in range(0,num,1):
        theta[i] = numpy.tan(v[i,0]/v[i,1])
write
code:
theta = numpy.tan(v[:,0]/v[:,1])
Numpy isn't a magical thing where using their types makes your code faster, you need to write vectorized code if you want to see any of the benefits.

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

Tried to find the first 100 prime numbers in 10 lines or less. I'm at 12. How can I make it smaller?

This is the shortest thing I could come up with:
code:
from itertools import count, islice, takewhile, chain
def primes(): return chain([2], (n for n in count(3) if not any(n % p == 0 for p in takewhile(lambda x: x<=n/2, primes()))))
print list(islice(primes(), 100))
It is extremely inefficient.

Nippashish
Nov 2, 2005

Let me see you dance!

Pollyanna posted:

Something something big O notation. How would you try and optimize that algorithm anyway? What's the usual approach to something like that?

The usual approach to better prime finding algorithms is "lots of number theory," which is not really a typical case.

Nippashish
Nov 2, 2005

Let me see you dance!

FoiledAgain posted:

I have a question about sorting in Python. I have a list of lists, and each of the sublists contains only floats. Some of the lists have exactly one value of 1.0 and all other values are 0.0. Some of the lists have more variety in their values. Each list is the same length. I want to sort these lists such that all of the lists with exclusively 1.0 and 0.0 values come first, and all the other lists follow. Is this something I can do with the key keyword that sort() uses? Or do I need a more complicated solution? (A quick technical explanation is that these are values in a Markov chain's transition matrix before they get put into a numpy matrix. I want to sort all the absorbing states before the non-absorbing states.)

Write a key function that returns 0 for the rows you want at the start and 1 for all the others.

Nippashish
Nov 2, 2005

Let me see you dance!

Mortimer posted:

I'm trying to use python to make my data analysis more automated but I can't seem to find a way to use its csv module to perform operations between columns.

I've got a column called "current" and a column called "voltage". I want this program to take each line of voltage and divide it by each line of current and put the resistance in a third column.

I've tried both reader and dictreader methods, dictreader being easier to convert both into floats, but I can't seem to do the line by line division. I think csv might work like making an array of arrays, at least that's what I think after voltage[2] returns "l" (in non-dictreader). Anyone have experience with csv or can point me to a solution? I feel like I've looked nearly everywhere and read the documentation a hundred times already.

You should use pandas for dealing with tables of data in python.

Nippashish
Nov 2, 2005

Let me see you dance!

Pollyanna posted:

Fake edit: Apparently that's sort of what mod_python does, but I'm still weirded out by that test. I can foresee nothing but tears coming from this.

I don't really see how pml is inherently less secure than having your browser run whatever shifty javascript code a webpage pulls in.

Nippashish
Nov 2, 2005

Let me see you dance!

Plorkyeran posted:

What makes you think that?

I'm not sure what made me think that, and I realized after I made the post that that part didn't make sense, which is why I edited it out.

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

Basically what I'm saying is that the Notepad and Windows cmd route takes all of the fun out of Python programming and requires a new user to go through a lot of unnecessary bullshit, whereas a basic IDE lets you get started and learning Python right away. Even the distributors of Python recognize that, which is why the Python.org distribution comes with an IDE

This. Insisting new users start with the command line and vanilla packages is like insisting that people learn assembly before a higher level programming language.

Nippashish
Nov 2, 2005

Let me see you dance!

salisbury shake posted:

Has anyone used NLTK? It's py2k only, but suits my needs for funny text generation. Above was generated from a trigram model of this thread's last 50 pages.

NLTK is my goto for text wrangling in python, although I'm far from an expert in it. (I'm 100% python 2.7 so the version restriction doesn't bother me). I made this thing one bored evening a while back http://uligen.herokuapp.com/. I also used a trigram model iirc.

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

Is there an easy way to just save a plot as .png? Is there an easy way to save a plot as .png without displaying it first? I like your interface a lot more than matplotlib's, so I'm interested in using it to actually plot scientific data for papers and such.

I too am interested in this, except I want to save PDFs. I'd like to be able to plot and save to PDF without ever displaying it on the screen or having a human in the loop to push a button.

Nippashish
Nov 2, 2005

Let me see you dance!

Lysidas posted:

I think I can count how many times I've been disappointed with the Python standard library on one hand.

I wish they'd add a prod function to go with sum. As far as I can tell the rationale for not having one is that no one would have a use for it. I would use it :colbert:

Nippashish
Nov 2, 2005

Let me see you dance!

fritz posted:

reduce+operator.mul?

Good point, we should probably get rid of sum in that case; people can use reduce+operator.add and be grateful for it.

Nippashish
Nov 2, 2005

Let me see you dance!

Suspicious Dish posted:

Just write a method to call a*b*c.

Do you think having sum in the standard library was a mistake?

Nippashish
Nov 2, 2005

Let me see you dance!
I'm not buying this edge case argument. Literally every function ever written has edge cases.

BeefofAges posted:

Just write a library that spins up a hadoop cluster to do your prod operation via mapreduce, then make it part of the standard library.

I think this is the optimal solution.

Nippashish
Nov 2, 2005

Let me see you dance!

Suspicious Dish posted:

prod and sum to me, are such simple things that there isn't a benefit in making them builtin. I'm in favor of builtins that are non-trivial to implement, and have clear meanings.

Ugh, you are the reason C++ added concurrency to the standard library but didn't add semaphores, and also the reason why boost.ublas doesn't have determinants or inverses. Python is supposed to be batteries included.

Nippashish
Nov 2, 2005

Let me see you dance!
I usually solve this by making the callback a class with a __call__ method , i.e.
code:
class MyCallback(object):
    def __call__(self, whatever, bro):
        return bro
which pickle handles better than raw functions (i.e. at all). This is a bit more annoying for the user though.

Nippashish
Nov 2, 2005

Let me see you dance!

Cheekio posted:

Is there an easy way to do this? It seems like the thing 'for's and 'while's were designed to do.
Something like this:
code:
def n_letters(n):
    if n == 0:
        return ""

    for i in ascii_lowercase:
        for tail in n_letters(n-1):
            yield i + tail
but in reality you should do what KICK BAMA KICK said and use itertools.

Nippashish
Nov 2, 2005

Let me see you dance!

OnceIWasAnOstrich posted:

Today I spent some time testing out some scripts that fit a lot of linear models. I'm using statsmodels for the formula building and fitting and pandas for the data. I started off using my generic Anaconda environment which has all of the Accelerate packages installed, like MKL numpy. When I fit my model (~250 data points, 4 predictors) it uses all 16 threads on my processor and takes about 250ms. This seemed kind of slow so I compared it to the standard numpy environment to see if my model fitting is just really really slow that way. It turns out standard numpy performs the same fit in 60% of the time using only one thread.

Your models are very small so I'm not surprised you're losing time to overhead from multiple cores. Try using the anaconda + accelerate with MKL_NUM_THREADS=1 set in your environment; this will use MKL but will tell it to only use one thread. If your models are independent then consider parallelizing over models instead of inside the optimization. I like to use joblib for this since it has a very nice and painless interface for executing task-parallel jobs using multiprocessing.

Nippashish
Nov 2, 2005

Let me see you dance!

OnceIWasAnOstrich posted:

I've been using the map function of multiprocessing Pools to parallelize the normal version, which seems a lot simpler than bringing in a whole other library to do the same thing with extra syntax and imports.

The embarrassingly parallel for loops thing I linked to essentially does this but with less boilerplate. If you're already set up using multiprocessing directly then there is not much point to switch to something new though. I just mention it because I like their interface a lot more than using multiprocessing directly.

Nippashish
Nov 2, 2005

Let me see you dance!
Has anyone worked with remote interpreters in pycharm? I would like to run pycharm locally on my laptop for development, but the code I am writing needs to run on a remote server because it needs access to special hardware that my laptop doesn't have.

I can set up pycharm on my laptop to invoke python on the server when I press ctrl+R to run, but I don't understand how to have pycharm synchronize the code between my laptop and the server. It seems like I can set up a deployment in pycharm to copy the code to the server, but then I would need to run the upload manually each time I make changes and I don't want to do that.

I have also tried storing the code on the server and accessing it from my laptop through sshfs, but pycharm is completely unusable with this setup (frequently goes unresponsive for multiple minutes, locks up completely if the internet connection goes down, etc).

I think pycharm would be happy if I set up sshfs in the other direction (storing the files on my laptop and mounting my laptop via sshfs from the server) but I am frequently in places where I can't accept incoming connections so I don't see how I can make this viable. I'd also need to log in to the server to re-mount my laptop each time I start working which is not ideal.

Nippashish
Nov 2, 2005

Let me see you dance!

ahmeni posted:

Have you tried PyCharm's debug server?

That being said, I can highly advise writing mock services if you need to test against specific hardware. It's a bit of work up front but it saves heaps of trouble in the long run because you can write unit tests that both validate locally against the mock and remotely against the hardware on deployment.

That is more complicated than what I am trying to do (although remote debugging is something I'd like to get working eventually), and also doesn't solve the problem I'm having. Step 2 in "To prepare for remote debugging" on that page is "Copy the local script to the remote location, where you want to debug it.", which is precisely what I want pycharm to handle for me.

Right now I have a nice tweak script -> run script -> get results workflow on my local machine. I'd like to keep this workflow but have the "run script" step execute on the server instead of on my laptop. Something like tramp mode in emacs would be fantastic. Of course I can just work on the server directly with vim+ssh but I'd much rather use pycharm because it is very nice.

Being able to run the code from within pycharm isn't really essential (maybe my original question was a bit misleading), what I really want is something to handle the synchronization. I don't want to use source control for this because a big part of what I'm doing is making tiny changes between runs (like changing a 0.1 to 0.15 to see what happens) and I don't want a million tiny commits with useless parameter changes, and I also don't want tweak -> run to turn into tweak -> commit to git -> push to remote -> switch to server -> pull from remote -> run.

I don't think mocking the hardware makes sense here because what I really care about is the program output. The special hardware I'm using is a GPU to do fast linear algebra and as far as I know mocking this would mean either making up numbers (which doesn't help me because I need the non-made-up numbers) or running the calculations on the CPU, which is what I'm doing now, but the whole point of using the GPU is that it's a lot faster.

Nippashish fucked around with this message at 15:43 on Apr 23, 2014

Nippashish
Nov 2, 2005

Let me see you dance!

vikingstrike posted:

Install pycharm on the remote server and forward it over X?

I tried this too but the server is far away so it is pretty painful. Dropbox is a good idea though, I don't know why I didn't think of that.

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

I think PyCharm can automate sshfs for you. Check out Tools > Deployment > Automatic Upload. You will have to configure first using Tools > Deployment > Configure.

I eventually found a stack overflow post this afternoon that pointed me at this option, and it does the trick. I've got remote execution and debugging working and it's really quite nice now that I know how to set it up.

Nippashish
Nov 2, 2005

Let me see you dance!

BigRedDot posted:

I work for Continuum, and I wrote the original version of conda but it has been taken over and taken much further by others in the last year. These days I mostly work on Bokeh. Happy to answer any Anaconda questions, though!

When is conda going to support activating environments by relative path? It's silly to need to source activate $(pwd)/venv instead of just source activate venv to activate an environment in my working directory. Why does conda want to keep my environments in a central location by default? I don't understand the workflow that fits with. (I use conda every day tyvm for writing it).

Nippashish
Nov 2, 2005

Let me see you dance!

BigRedDot posted:

Yah that was an early decision, we wanted simple named environments but also didn't want to have to introduce a persistence layer to map those names to absolute paths. conda started out as just a devops tool for us, if I'd known how popular it would become I probably would have made a different decision. Still I can imagine it would not be to difficult to add this kind of behavior: source activate -r <relpath> or something similar. I'll mention this to Aaorn and Ilan but the best way is to make a GH issue (https://github.com/conda/conda) or ask about it on the mailing list. (Edit: or even better a Pull Request!)

More Edit: I think source activate ./myenv should work too?

I did ask on github a while ago but I don't think it's high on anyone's priority list (it's not high on mine either, since I've got my own scripts around conda at this point). It is nice that source activate ./myenv works though. I remember trying this back when I started using conda and back then it would just add ./myenv to my path which obviously doesn't work if you change to a different working directory. This seems to be fixed though.

Nippashish
Nov 2, 2005

Let me see you dance!
As long as you don't care about using new language features (last time I used jython it implemented python 2.6), and don't care about C extensions then cPython and jython are basically identical from a programmer perspective.

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

I have a list that contains two columns: Column A is made up a mix of 4 different strings, and Column B is True or False.

I want to easily get a count of one of the strings in Column A and the amount of Trues.

If you want to deal with tables of data in python you should very seriously consider using pandas.

Nippashish
Nov 2, 2005

Let me see you dance!

Jose Cuervo posted:

The parameter filter is optional, but if provided will be a string (e.g., 'agent'). I am not sure why (as far as I can tell) the if statement works, i.e., when filter is not specified the code following the if statement is not executed, but if filter is set to something other than None then the code following the if statement is executed.

Any clarification as to why this works would be appreciated.

If you don't specify a filter then it gets set to None. If the filter is not None then the code inside the if is run.

Nippashish
Nov 2, 2005

Let me see you dance!

Jose Cuervo posted:

I understand that part. What is unclear to my is why "None is not None" evaluates to False, but something like "'agent' is not None" evaluates to True.

There is only one None in python. That's why None is None. On the other hand, is is sometimes pretty weird
code:
Python 2.7.6 |Continuum Analytics, Inc.| (default, Jan 10 2014, 11:23:15) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> None is None
True
>>> 123456789 is 123456789
True
>>> a = 123456789
>>> b = 123456789
>>> a is b
False
>>> a is 123456789
False
>>> a = 5
>>> b = 5
>>> a is b
True

Nippashish
Nov 2, 2005

Let me see you dance!

Lyon posted:

So if you want to know if two variables point to the same object you use is, if you want to know if two values are equal you use ==.

I was just trying to point out that when two things are actually the same object is not always intuitively obvious.

Nippashish
Nov 2, 2005

Let me see you dance!

KernelSlanders posted:

I'm sure there's a "right" way to do this, but it's not obvious to me what it is.

The right way to do this is to use row vectors. Everything goes a little bit smoother in numpy if your data are in rows instead of in columns.

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

But the real answer to this, that goes along with Nippashish's answer, is to avoid growing arrays altogether.

This is true, it often makes sense to build a list of vectors and then concatenate them into a matrix at the end instead of growing an array like you would in matlab, but that doesn't solve the shape issue. What I was trying to say is that if you work with row vectors instead of column vectors then you don't need to deal with any reshaping/newaxis rejiggering to get them to concatenate nicely into a matrix.

e: For example:

code:
np.vstack((M.T, 2*M[:,0] - M[:,1])).T
becomes

code:
np.vstack((M, 2*M[0] - M[1]))

Nippashish fucked around with this message at 07:55 on Jul 24, 2014

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

No, unless the use-system-site-packages flag is used creating the virtualenv, which is not a good idea for us.

Anyways it's not desirable to have the same system package in every environment because if/when it gets upgraded we will inevitably start having the situation where different environments want a different version, which is the whole reason to use environments.

Which actually, my plan wouldn't support either. Each env would have a specific version of the make install part but a system-wide version of the C application part (in /opt and the symlink). Ugh. I think I'll package the entire upstream bundle as a conda package then, that should work.

I usually solve this by having an env folder in my project root that I install software into. I end up with a structure like my_project/env/lib and my_project/env/bin and whatever else the installers create. It's usually pretty easy to coerce software to install like this, although sometimes you need to fiddle around a bit if you need to install multiple things that depend on each other (e.g. telling the linker for project B to look in my_project/env/lib to find lib A). I also have a bash script in the project root that rebuilds everything in env from scratch, so if something gets messed up in my environment I can just nuke the whole thing and start over.

My setup sounds more or less like what you were planning on doing, except I don't ever put things into system folders.

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

Are there that many? Let's say I don't need to be exact, but like 95% accurate on an idea of how many males/females I have in this list.

Maybe check what kind of coverage you have from these lists: http://www.census.gov/genealogy/www/data/1990surnames/names_files.html

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!

duck hunt posted:

Should I be so freaked out?

No, you should be learning how to make a hash table.

  • Locked thread