Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Nippashish
Nov 2, 2005

Let me see you dance!

FoiledAgain posted:

I have two agents, call them Sender and Reciever. The Sender chooses two items x,y from a set of items and combines them into z, which is sent to the Reciever. On the basis of z, the Receiver should be able to work out what x and y are. It would be ideal if this could be done imperfectly - I actually want the Receiver to occasionally deduce x or y incorrectly. The items x and y are generated from a set of binary features, they could be represented as a sequence of 1s and 0s.

"Error correcting code" is probably a good search term to start with.

I'm imagining a process like this:

1. Sender combines x and y in a reversible way to get z (e.g. concatenate their bitstring representations)
2. Sender encodes z using an error correcting code.
3. Receiver gets z' which is z + some flipped bits. Choose the number of flipped bits randomly around the error correction threshold of the code you're using. If you pick below the threshold then Receiver can recover z from z' and thus decode x and y correctly, if you pick above the threshold then z will not be recovered and the x and y Receiver sees will not be correct. Adjust the amount of noise to control how often Receiver makes mistakes.

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!

Winkle-Daddy posted:

Since I'm not wanting to increment but save the result of my query to a variable each time, can I get around this by declaring a new cursor variable each time?

Have you considered saving the result of your query to a variable each time instead?

Nippashish
Nov 2, 2005

Let me see you dance!

Seaside Loafer posted:

I mean id like to be able to say in psudeocode:

code:
collectionofdata = new collection
collectionofdata.add ["zaslkjas", "asldkj", "fdsdf"]
collectionofdata.orderme

You can do this:
code:
collectionofdata = ["zaslkjas", "asldkj", "fdsdf"]
collectionofdata.sort()
There are also some options you can pass to sort if you want to change the order, see here.

Nippashish
Nov 2, 2005

Let me see you dance!

Thermopyle posted:

Before I go and spend a bunch of time trying out a bunch of different libraries, anyone have any suggestions for making graphs with python?

I'm looking for something simple to use and that makes pretty graphs easily. I can barely make graphs with Excel, and I barely want to make these graphs anyway so I don't have a lot of motivation to learn something complex.

CairoPlot almost fits the bill, but it hasn't had a release since 2008.

I use matplotlib for python plots. Their pyplot module is pretty easy to use.

Nippashish
Nov 2, 2005

Let me see you dance!

JetsGuy posted:

Pretty much the first thing I do when I get a new machine is Python 2.7, NumPy, SciPy, PyFits and matplotlib.

If you're using python for scientific computing I highly recommend enthought's python distribution if you're not already using it. Its killer feature IMO is that it comes with numpy backed by Intel's MKL, meaning numpy is really, really fast. it's free if you're attached to a university

Nippashish
Nov 2, 2005

Let me see you dance!

JetsGuy posted:

So lets say I'm running something where not only the counts but the time bin sizes can change depending on what I'm doing. I want the code to be able to tell me exactly where counts=95 or what's the value at time=5. I can't just index it for one case because it will be different every loop. For cases like this, I just reuse a simple function I wrote years ago that works much like Excel's Lookup:

The stupid question part is - surely there's a built-in function to do this that I just don't know about. What could it be?

You can use boolean arrays as indexes, so "what's the count at time == 5" can be written as counts[time==5]. To get the indices where counts==95 you can say i, = np.nonzero(counts==95). (The comma on the LHS unpacks the tuple returned by np.nonzero. If counts was n x m you'd do i,j = np.nonzero(counts==95)).

Both of these return all the values which match the condition, unlike your version which returns only the first match. Returning all the matches means they immediately generalize to things like counts[np.logical_and(3 <= time, time < 10)].

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

Strange.

For some reason, it said that all of these modules were installed already.

However, earlier today I was getting error messages that they weren't installed, and I gave up.

Enthought exists separately from your system python, packages that are installed with apt-get won't be seen by Enthought.

Replacing your system python with Enthought is a bad idea (things will break), you need to keep them separate.

If you can get everything you need through apt-get then it's easier to stick with that method. If you want the features of Enthought (mainly fast linear algebra in numpy) then setting up Enthought may be worth it to you.

Nippashish
Nov 2, 2005

Let me see you dance!

Captain von Trapp posted:

What's the Pythonic way to do this? At first I tried:

code:
print [x for x in range(1000,10000) if list(str(x)).sort() == list(str(4*x)).sort()]

You can use sorted which creates a generator that yields elements in sorted order, so your one liner becomes:
code:
print [x for x in range(1000,10000) if sorted(str(x)) == sorted(str(4*x))]

Nippashish
Nov 2, 2005

Let me see you dance!

huhu posted:

Also another question about the line if x<y<z and x**2+y**2==z**2: if it finds that x<y<z is false will it break the loop and go back to the beginning or will it still try to compute x**2+y**2==z**2? I feel that this operation could slow down the code.

When you write 'A and B' then if A is false then B will not be evaluated (this is called short circuiting, and a similar thing happens with 'or').

Really though, you don't need the 'if x < y < z' check at all if you set up the bounds of your loops to make it so this is always true.

Nippashish
Nov 2, 2005

Let me see you dance!

huhu posted:

Any suggestions for making this more efficient though? It took somewhere between two and five minutes to solve. Also, is making coding more efficient in regards to math problems mostly done by knowing more advanced theories? Like for example, here is another method for solving this problem:
http://ca.answers.yahoo.com/question/index?qid=20100123221401AA7q4lR

Often the best way to make a program more efficient is to find a strategy for solving the same problem that requires fewer steps, rather than trying to make each step faster. This applies pretty much everywhere, not just to coding in regards to math.

Nippashish
Nov 2, 2005

Let me see you dance!

huhu posted:

Thanks again for all of your input.

Can anyone point me to a basic introduction to matrices and interacting with rows, columns, changing elements, adding, rows/columns, etc? I must be searching the wrong thing because I've been looking for twenty minutes and can't find anything useful.

Do you want to know about doing these things in code or are you asking where you can learn linear algebra? Assuming the former (since you're in the python thread) you should look at numpy.

Nippashish
Nov 2, 2005

Let me see you dance!

Pangolin Poetry posted:

I have two sets of data in the form of tables in a database.

If you're going to be wrangling around data like this you might want to look at pandas.

Nippashish
Nov 2, 2005

Let me see you dance!

GrumpyDoctor posted:

Let me ask a more generic question. If I have numpy array, and I need to apply some weird-rear end function (in this case it was calculate_flex) to each element of it to get a result array, how should I do it?

Instead of writing a scalar function to operate on each element of the array it's usually better to write a function that operates on the whole array elementwise with operations that are implemented in C.

For example calculate_flex might look something like this:
code:
def calculate_flex(X, shed_length):
    Y = (24 / shed_length) * ((X-1)/X)
    Y[X > 24 / (24 - shed_length)] = 1
    Y[X < 1] = 0
    return Y
assuming I actually understand what np.vectorize does.

Nippashish
Nov 2, 2005

Let me see you dance!

Sylink posted:

How does one write a python module in C?

If you're willing to use C++, boost::python is actually pretty awesome for this once you figure out how to set it up properly (which is needlessly hard).

Nippashish
Nov 2, 2005

Let me see you dance!

This is the right answer. You can probably turn those three for loops into a single expression with the right slicing of the arrays. If you find yourself writing a loop over the elements of a numpy array there is a >99.9% chance you are doing something wrong and your code will be stupendously slow.

The mindset you need when you're using numpy is pretty much the same as the mindset you need when writing matlab: Executing a line of python code is super, super slow and you should be making every effort to execute as few lines of python as possible. This means you need to push elementwise operations into numpy instead of writing the loops in python.

As an illustration, the difference in speed between these two snippits is about a factor of 10 on my machine:
code:
import numpy as np

X = np.random.standard_normal(size=(100,100,100))
Y = X**2
and
code:
import numpy as np

X = np.random.standard_normal(size=(100,100,100))
Y = np.zeros_like(X)
for x in xrange(100):
    for y in xrange(100):
        for z in xrange(100):
            Y[x,y,z] = X[x,y,z]**2
The result in both cases is the same. The first is an example of the right way to use numpy, and the second is an example of the wrong way to use numpy.

Nippashish
Nov 2, 2005

Let me see you dance!

FoiledAgain posted:

Basically, I have a list of objects passed to me (I don't always know anything about them or how many are in the list, but I do know that they all have a .name attribute) and I need to turn it into a dictionary where the keys are the name attribute and the values are the objects themselves. Is there some dictionary method I could use to make this a single line? (It's not that I need a one-liner, I'm just a low-intermediate programmer wondering about new ways to do things.)

dict() will accept a sequence of (key, value) tuples so you can do something like d = dict((i.name, i) for i in getInventory()).

Nippashish
Nov 2, 2005

Let me see you dance!

Symbolic Butt posted:

I'm doing Cousera's Interactive Introduction to Interactive Programming in Python and it's really weirding me out by how much they use globals there. http://www.codeskulptor.org/#examples-blackjack_template.py :confused:

Most of the things at the top are static data, which is fine as globals because it's not going to change. Also, people will hate on you for globals, but sometimes (like when you're writing an isolated 100 line script) it's not a big deal.

The stuff at the bottom I suppose should be inside an if __name__ == "__main__": block, but I can understand why they left that out in an introduction because honestly that is a seriously weird convention.

Nippashish
Nov 2, 2005

Let me see you dance!

SirPablo posted:

See above. I need to compute 65,160 regressions individually. So recasting it defeats my goal.

Just do the linear regression calculations yourself instead of calling polyfit, like so:
code:
import numpy as np

# Generate some dummy data
n, m, d = 181, 360, 10
D = np.dstack([i+np.zeros((n,m)) for i in xrange(d)])
D += 0.1*np.random.standard_normal(size=D.shape)
D += 2

# solve the grid of linear regression problems
X = np.vstack([np.ones(d), np.arange(d)])
solution = np.einsum('ij,klj->kli',
    np.linalg.solve(np.dot(X, X.T), X),
    D)
slopes = solution[:,:,1]
yints = solution[:,:,0]
slopes and yints are now (n,m) arrays where slopes[i,j] and yints[i,j] are the parameters of the line fit to D[i,j,:].

Nippashish
Nov 2, 2005

Let me see you dance!

JetsGuy posted:

So I remember this lesson from a while ago, but for my own clarification, what constitutes "appending'? That is, if I *change* the value of a numpy array, does numpy create a copy of that array in memory like it does when it appends? Or does it edit that specific value in the memory?

Appending means changing the number of elements. Numpy arrays are contiguous blocks of memory (which is essential for doing matrix operations quickly) but it means that if you add a new element then behind the scenes numpy has to re-allocate the buffer it's using and copy all the old array contents to the new buffer. Allocating some memory with np.zeros() and then overwriting the zeros with interesting numbers is okay, another alternative is to build a ordinary python list by append()-ing values (or entire arrays) to it and then converting it to a numpy type with np.asarray or np.concatenate.

JetsGuy posted:

I have yet to teach myself how to manage thread output merging all the threads (would appreciate a clearer explanation than what I'm finding on google, guys!) BUt this may give you want you want.

Doing CPU intensive work in threads doesn't work well in python because of the global interpreter lock. Maybe someone more knowledgeable than I can comment on why this is the case, but in practice this means that you can't really get more than one core's worth of CPU work from python code, even if you have multiple threads. (This applies to code written in python, but C modules can create their own threads for which this doesn't apply. That's why things like np.dot can use more than one core for big operations). I think there are other implementations of python that avoid this limitation, but that's a moot point because if you want to use numpy you have no choice but to use CPython (i.e. the standard python).

There are two ways to work around this. One is to use the multiprocessing module, which has an api more or less like the threading module but uses processes instead of threads. Processes have a bit higher startup cost than threads, so you need to have enough work for each process that this cost is worthwhile. They also have more restrictions on how memory can be shared, but as long as you just want to have a bunch of worker processes that each work independently this is rarely a problem. A good way to set this up is to create a multiprocessing.Pool and then use pool.map(function_to_call, list_of_arguments_for_each_call). The multiprocessing module uses pickle to transport objects between processes, so the arguments and return values of function_to_call need to be pickle-able.

Using the multiprocessing module requires a bit of boilerplate to set up. An easier option, which uses a multiprocessing.Pool under the hood, is to use the joblib module. The joblib module has tools to make executing the pattern I described above easier with their embarrassingly parallel helper. They also have some tools for memoizing expensive functions, which I haven't personally used. Their embarrasingly parallel helper is really nice though, it hides all the boilerplate you need for multiprocessing and works around some weird warts multiprocessing has, like not being able to ctrl+c when a pool is running and properly propogating errors in the worker processes back to the parent process so you can see what went wrong. I highly recommend joblib if you want to write embarrassingly parallel code in python.

Nippashish fucked around with this message at 21:44 on Mar 1, 2013

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

I've been slowly ramping up my effort to get people in my workplace to switch from MATLAB to numpy/scipy, but if this is creating headaches for a Python 2.X to 3.X switchover in the future then I'd like to make the switch happen sooner rather than later

Basically everyone doing numeric work in python is using 2.7 and will continue to do so for the foreseeable future.

Nippashish
Nov 2, 2005

Let me see you dance!

Suspicious Dish posted:

Define a function so you can explain why you need such a weird getter in a docstring.

How is this better than a comment over the line with the lambda, assuming it's not already obvious?

This sounds like silly dogma.

Nippashish
Nov 2, 2005

Let me see you dance!

Emacs Headroom posted:

I think the point isn't that lambdas will break your code or ruin everything, but rather that consistent style guides are good (even if you personally don't like them, it's good to have an official document guiding the style and it's genrally better to conform where it doesn't hurt you).

This is a reasonable view, but what happens pretty often instead is that people say things like "I can't imagine why you'd ever use a lambda" and then get snarky when someone tries to suggest that maybe they aren't the worst thing in the world.

Nippashish
Nov 2, 2005

Let me see you dance!
Since we are still on the topic of lambdas, I don't see why I would ever want to use operator.itemgetter(0) over lambda x: x[0]. The second one is shorter to type, no less clear, and doesn't require I import an extra module (i.e. doesn't require a code change many lines away).

Nippashish
Nov 2, 2005

Let me see you dance!

a lovely poster posted:

Because writing the shortest lines possible is not really a goal for most software developers. I would use the first because in the case of someone else editing my code it would be a lot easier to understand.

operator.itemgetter(0) looks like a python statement, anyone with two weeks of python would understand what's going on

lambda x: x[0] might as well be a different language as far as a lot of developers go

Now, it's important to be concise and clear, but I'm unconvinced that it's worth shaving off five characters

My point wasn't that optimizing for line length is a worthwhile goal in and of itself, my point was that I see many advantages to the lambda version and one of them is that it's shorter. I don't agree that the itemgetter version is easier to understand, unless you simply don't know what lambda does, but the idea that you should avoid language features because your coworkers don't understand the language they're using is silly.

As fritz mentions, it's also not clear apriori that operator.itemgetter(0) returns a callable object.

Nippashish
Nov 2, 2005

Let me see you dance!

Lysidas posted:

The problems are quite neat; my only gripe with the site is that they recommend installing Python 2 instead of 3.

Probably because the scientific python world still lives in 2.7.

Nippashish
Nov 2, 2005

Let me see you dance!

Dren posted:

Suggesting that it never be used seems extreme.

Never use shell=True is sort of like never use lambda.

Nippashish
Nov 2, 2005

Let me see you dance!
Assuming you're using numpy, it has its own file format (see numpy.save/numpy.savez).

Don't put numpy matrices in json files :ughh:

Nippashish
Nov 2, 2005

Let me see you dance!

Dominoes posted:

What's the preferred way to handle this?

Python code:
data = ['a', 'b', 'c', 'd']
input1 = 1
input2 = 2

def func(data, input):
    return data[input]
    
func(data, input1)
func(data, input2)

Nippashish
Nov 2, 2005

Let me see you dance!

Dominoes posted:

I was wondering if there's a clean way to implement variables in code similar to the .join and %s abilities of strings.

This is never a good idea. Don't do this. Avoid wanting to do this.

Nippashish
Nov 2, 2005

Let me see you dance!

LOOK I AM A TURTLE posted:

Master_Odin, depending on how serious this poker app you're making is, you may want to think about the security implications of the tools you're using.

The counterpoint being that unless there's real money involved you can ignore everything in this post (except for the part about random.shuffle, you should use random.shuffle).

Nippashish
Nov 2, 2005

Let me see you dance!
You might as well just write mydict = {k:v for k,v in mydict.iteritems() if v.objectvalue != 0} and avoid this whole business of copying and deleting.

Nippashish
Nov 2, 2005

Let me see you dance!
You can put from x import * into __init__.py. This will let you import x and then access its members like x.fancy_things().

Nippashish
Nov 2, 2005

Let me see you dance!

ATM Machine posted:

On the topic of if x in y, would there happen to be an anagram library of some sort that can take strings from a list, then mutate the input string to see if it matches the list string?

I don't know if I actually understand what you're asking, but can't you just sort the letters in all the strings and then compare normally?

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

Trying to figure out how to make an n * n matrix in Python and then populate it with random numbers. The user specifies a size (example, 5 makes a 5x5 matrix).

Use the functions in numpy.random.

the posted:

But how do I create one without knowing the size?

More generally, you can allocate blocks using ones/zeros/np.random.whatever and fill them in element by element or stitch them together using hstack/vstack/dstack/concatenate/kron/etc to build up structured arrays.

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

I'm trying to make a coordinate grid of points that go from 0 to 10 in both "x and y" dimensions. So I want a 2d array that has (0,0, (0,1), (1,0), (1,1), etc...

thanks!

Try meshgrid.

Nippashish
Nov 2, 2005

Let me see you dance!

the posted:

Now, let's say I have a 10x10 array that has in each spot either a 0 or a 1. How would I go about plotting only the spots with 1, and in the coordinates that have a 1 in them? Like if array[5,3] has a 1, I'd put a dot at (5,3).

You can use spy() from matplotlib.

Nippashish
Nov 2, 2005

Let me see you dance!
I can't tell what proportion of the lines have TMAX in them, but if you're ignoring most of the them it will probably be a lot faster to select those lines outside of python with grep or similar and create files that have only the lines you want for python. The speed difference between grepping 2.4GB of text and looping line by line over 2.4GB of text in python is quite large. This might help even if you are using most of the lines, since it would avoid doing two passes through each file in python.

Nippashish
Nov 2, 2005

Let me see you dance!

sharktamer posted:

Does anyone know of any module I can use to export generated strings into a html template? I don't mean something as sophisticated as Django, just something where I can define a html like:

with %string1 and %string2 being the exported strings. Something like a class that took in the html file and stuck in the strings would be all I needed.

Jinja is pretty cool.

Nippashish
Nov 2, 2005

Let me see you dance!

Dren posted:

represent a list of x by y (8 by 8) dimensions with a flat list

Then write a function to translate (x,y) coordinates into a flat list index.

This both more complicated and more verbose than just using a 2d numpy array.

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!
Use a 2d numpy array to model the states of cells in the world. Use convolution operations to count neighbors.

  • Locked thread