Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

That's not a lock-up. That's the main loop running. You never called "start_up", so the window is never created.

# ? Feb 22, 2013 04:05

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 09:39

Gothmog1065: May 14, 2009

Okay, I'm confused. It was opening the window back when it was throwing the TK error. I'd click the "Begin" button, and it'd call the window. what is it doing differently this time? Literally, I change the code back to self in the Text bit, and it will start up the window, but naturally fail when creating the text box. The end/close button in the popup even works, just no text box (and the error). What is so drastically different that one word completely changes it? Is the error breaking something that will cause it to stop before it starts the loop?

# ? Feb 22, 2013 04:23

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Oh, sorry, I'm bad at this. Clicking Begin calls start_up.

One thing you can try is running your program outside of IDLE. IDLE uses Tk as well, and two mainloops at the same time wreaks havoc.

Tk and IDLE are not well-engineered programs.

# ? Feb 22, 2013 04:39

The Gripper: Sep 14, 2004; i am winner

Gothmog1065 posted:

Okay, I'm confused. It was opening the window back when it was throwing the TK error. I'd click the "Begin" button, and it'd call the window. what is it doing differently this time? Literally, I change the code back to self in the Text bit, and it will start up the window, but naturally fail when creating the text box. The end/close button in the popup even works, just no text box (and the error). What is so drastically different that one word completely changes it? Is the error breaking something that will cause it to stop before it starts the loop?

You're not packing mess in start_up, do that and all is well!

edit; well I mean that will fix the hang, but there's another problem I'm sure you will see and sort out pretty quickly.

The drastic difference is that with your changed code there is no actual exception being thrown AND the Text object is being created, but because you aren't packing the Text object (which will place it within the parent) some other function is getting stuck in a loop trying to deal with it.

The Gripper fucked around with this message at 04:55 on Feb 22, 2013

# ? Feb 22, 2013 04:47

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

The Gripper posted:

edit; well I mean that will fix the hang

Why would it?

# ? Feb 22, 2013 04:51

The Gripper: Sep 14, 2004; i am winner

Suspicious Dish posted:

Why would it?

I have no idea about the internals of it, but any widget (Text, Button, whatever) that you apply a grid to with <widget>.grid(...) will cause a hang if you forget to pack() it.

# ? Feb 22, 2013 05:00

Gothmog1065: May 14, 2009

Suspicious Dish posted:

Tk and IDLE are not well-engineered programs.

I only use IDLE to actually step the program through on occasion if it does something like what it's doing now. Normally I run a .bat with pause in it to catch the errors in the console.

Packing mess (Changed it to message because mess is something your kid make in their bedroom) fixed the loop error, and I'm sure the next problem popped up because there's no butans! Moving on, the attribute error is next on my list, going to do googling tomorrow to see what I can discover.

I'm trying to find out as much as I can on my own, it's been a long time since I've done anything really programming related and not had my hand held the entire time.

e: TK is used to cut my teeth on something as I have some sort of documentation on it. As I move away from the beginner's book, I'll look into other GUI's.

# ? Feb 22, 2013 05:02

The Gripper: Sep 14, 2004; i am winner

Gothmog1065 posted:

Packing mess (Changed it to message because mess is something your kid make in their bedroom) fixed the loop error, and I'm sure the next problem popped up because there's no butans! Moving on, the attribute error is next on my list, going to do googling tomorrow to see what I can discover.

Buttons and Textbox worked fine for me when I tested it a few minutes ago :o

# ? Feb 22, 2013 05:04

Gothmog1065: May 14, 2009

Unless I'm packing it in the wrong spot (right after creating message), depending on how I try to pack it I get two attribute errors:

self.message.pack() gets (<- Remember, I renamed mess to message and changed all the subsequent variables)
AttributeError: 'New_window' object has no attribute 'message'

message.pack() gets
AttributeError: 'str' object has no attribute 'pack'

I'm going to go under the assumption that the second one is simply wrong.

# ? Feb 22, 2013 05:23

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

The Gripper posted:

I have no idea about the internals of it, but any widget (Text, Button, whatever) that you apply a grid to with <widget>.grid(...) will cause a hang if you forget to pack() it.

I've never used Tkinter. This seems absurdly broken.

# ? Feb 22, 2013 05:29

The Gripper: Sep 14, 2004; i am winner

Gothmog1065 posted:

Unless I'm packing it in the wrong spot (right after creating message), depending on how I try to pack it I get two attribute errors:

self.message.pack() gets (<- Remember, I renamed mess to message and changed all the subsequent variables)
AttributeError: 'New_window' object has no attribute 'message'

message.pack() gets
AttributeError: 'str' object has no attribute 'pack'

I'm going to go under the assumption that the second one is simply wrong.

I think in the first case you haven't made a self.message object (should be self.message = Text(...)), maybe it's still called mess or you forgot the self. The second error sounds like maybe you have message = "Some text here" for some other reason in your start_up method. The code I tested with is here if you want to look it over and see what the difference is: https://gist.github.com/farces/5010759

Suspicious Dish posted:

I've never used Tkinter. This seems absurdly broken.

Yeah it has it's quirks, there's a good reason that despite it being shipped with python people tend to favor Qt and PySide over it.

# ? Feb 22, 2013 05:40

ArcticZombie: Sep 15, 2010

I'm trying to use radio buttons in GTK, but the documentation seems to gloss over how you actually get the active radio button from a group. I've seen some vague references to get_group() but nothing explaining what has that method so I can call it. I could resort to using a global variable and changing it whenever a radio button is toggled but I'd rather not.

# ? Feb 22, 2013 23:09

FoiledAgain: May 6, 2007

I'm new to object inheritance, and I'm confused about how to properly design something. I have two objects which represent agents interacting in a game. They look something like this:

code:

class NormalAgent(object):
    def __init__(self,a,b,c,d,e):
	self.a = a
	self.b = b
	self.c = c
	self.d = d
	self.e = e
	
	self.make_friends()
	self.do_something_normal()
	self.be_boring()

class CrazyAgent(object):
    def __init__(self,a,b,c,f,g):
	self.a = a
	self.b = b
	self.c = c
	self.f = f
	self.g = g
	
	self.make_friends()
	self.do_something_crazy()
	self.be_eccentric()

It feels like there is some redundancy here that I could eliminate. The acceptable values for a,b,c are identical for both classes. What's the best way to split this kind of information up? Should one of these inherit from the other? Should I make a third class they both inherit from?

# ? Feb 22, 2013 23:11

accipter: Sep 12, 2003

FoiledAgain posted:

I'm new to object inheritance...

I would probably do this:

code:

class Agent(object):
    def __init__(self,a,b,c,d,e):
	self.a = a
	self.b = b
	self.c = c
	self.d = d
	self.e = e
	
	self.make_friends()
	self.do_something_normal()
	self.be_boring()

class CrazyAgent(Agent):
    def __init__(self,a,b,c,f,g):
        super(CrazyAgent, self).__init__(a,b,c,d,f,g)
	
	self.make_friends()
	self.do_something_crazy()
	self.be_eccentric()

I am not sure why you would need a third agent class.

# ? Feb 22, 2013 23:39

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

ArcticZombie posted:

I'm trying to use radio buttons in GTK, but the documentation seems to gloss over how you actually get the active radio button from a group. I've seen some vague references to get_group() but nothing explaining what has that method so I can call it. I could resort to using a global variable and changing it whenever a radio button is toggled but I'd rather not.

Why do you care about which radio button is active? You should change whatever you were going to change when the toggle button is selected. A group of radio buttons is just a list of radio buttons, so you can get the group with get_group, iterate over it, and check for get_active.

# ? Feb 23, 2013 00:14

FoiledAgain: May 6, 2007

accipter posted:

I would probably do this:

Ah, that helps a lot. I wasn't completely sure how to use super(), but this makes sense now. Thank you very much.

quote:

I am not sure why you would need a third agent class.

Because I'm stupid newbie.

# ? Feb 23, 2013 00:33

ArcticZombie: Sep 15, 2010

Suspicious Dish posted:

Why do you care about which radio button is active? You should change whatever you were going to change when the toggle button is selected. A group of radio buttons is just a list of radio buttons, so you can get the group with get_group, iterate over it, and check for get_active.

I'm just playing around with GTK and in my test example, which just makes a story out of all the info your provide, using entry boxes, toggle buttons, check buttons, radio buttons and combo boxes. Once you press the "Go" button, it will look at all the things you've entered/selected. So finding which radio button was active and get its label seemed like the way to do this, rather than changing a variable to the button's label every time you pick one.

I didn't know how to use get_group() but now I know it's some_radio_button.get_group() to get the group of some_radio_button. I saw some example online where he was doing it some other way and I couldn't get it to work.

# ? Feb 23, 2013 13:05

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Update: After a try or two forgetting I was running the damned thing, I'm getting done doing a big test on the sleep times we talked about like a week ago. Hopefully I won't find out that there's some dumb typo at the very end and I'll have a plot for y'all soon.
-------------

Now, I've run into a strange thing with subprocess

While OSErrors like command not found don't surprise me, this one kinda does. I run idl routines often in the terminal. The IDL command works perfectly from the terminal.

However, let's try this for a dumb example:

code:

In [24]: task = subprocess.Popen("idl -e 'print, \"Hello, World\"'", shell=True)

In [25]: /bin/sh: idl: command not found

So for whatever reason, python appears to be ignoring the path that it would normally use to find IDL. Indeed, when I try something like:

code:

n [25]: task = subprocess.Popen("/Applications/itt/idl/idl81/bin/idl -e 'print, \"Hello, World\"'", shell=True)

In [26]: IDL Version 8.1, Mac OS X (darwin x86_64 m64). (c) 2011, ITT Visual Information Solutions
Installation number: (redacted)
Licensed for use by: (redacted)

Hello, World

Works wonderfully. Is this a common thing for python? I suppose I could just put a symbolic link or something in /bin/sh for ease, but examples I am seeing on this suggest that Popen should be using the path.

# ? Feb 25, 2013 23:07

spankweasel: Jan 4, 2006

Are you running things as root or via sudo? sudo does goofy stuff with environment variables. Also, try dumping your environment (via 'env') and see what your PATH is set to?

edit: you can also pass an environment dictionary to subprocess

spankweasel fucked around with this message at 02:34 on Feb 26, 2013

# ? Feb 26, 2013 02:31

Houston Rockets: Apr 15, 2006

FWIW I use s3cmd with Popen for some basic tasks and I always have to specify the location of the dotfile using a parameter.

Also, you should give your commands in arrays instead of strings. That may actually fix the path issue you're having.

quote:

args should be a sequence of program arguments or else a single string. By default, the program to execute is the first item in args if args is a sequence. If args is a string, the interpretation is platform-dependent and described below. ... Unless otherwise stated, it is recommended to pass args as a sequence.

On Unix, if args is a string, the string is interpreted as the name or path of the program to execute. However, this can only be done if not passing arguments to the program.

Source: http://docs.python.org/2/library/subprocess.html#popen-constructor

# ? Feb 26, 2013 03:01

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Houston Rockets posted:

FWIW I use s3cmd with Popen for some basic tasks and I always have to specify the location of the dotfile using a parameter.

Also, you should give your commands in arrays instead of strings. That may actually fix the path issue you're having.

Right, I remember y'all telling about that the last time I had questions with Popen. It unfortunately doesn't fix things.

Indeed, if I tried something like:

code:

In [28]: subprocess.Popen("ls -l", shell=False)
ERROR: An unexpected error occurred while tokenizing input
.
.
bunch of stuff
.
.
OSError: [Errno 2] No such file or directory

But it I shlex-split the input, that will work, just as using the string with shell=True works.

However, trying it with IDL seems to throw the error I mentioned above. It's like python is not using the whole path, which I suppose I could edit in the script. The way I see it, it's just as easy to do that as it is to give to define a /path/to/dir when I use Popen.

spankweasel posted:

Are you running things as root or via sudo? sudo does goofy stuff with environment variables. Also, try dumping your environment (via 'env') and see what your PATH is set to?

edit: you can also pass an environment dictionary to subprocess

Neither? I never run in root in OSX (I'm not even sure there is a technical "root" on OSX?) I have admin rights, but I do have to sudo everything on installs to makes sure they run right.

Apparently the issue has something to do with how PYTHONPATH and PATH interact, yes. With your post, I was able to find this, which is just confusing me. Apparently, shell=False is supposed to use the relative paths, but that doesn't explain why shell=False when using shlex.

http://stackoverflow.com/questions/5658622/python-subprocess-popen-environment-path

# ? Feb 26, 2013 18:00

OppositeAstronomer: May 26, 2008; yoink!

This is my first language, and I'm a bit stuck. How do you program something to occur at regular intervals using, I'm assuming, either a for or a while loop? I've been at it all day flipping through my textbook and trying different things and I can't get the proper output no matter what I do or where I plug it in into the code. I'm trying to get a number to randomize every 15 seconds (or so).

# ? Feb 27, 2013 03:55

Emacs Headroom: Aug 2, 2003

code:

import time
import numpy as np

for i in range(100):
    print np.random()
    time.sleep(15)

edit:
vvvv I'm just used to using numpy; no special reason not to use built-in random I guess

Emacs Headroom fucked around with this message at 05:06 on Feb 27, 2013

# ? Feb 27, 2013 04:06

FoiledAgain: May 6, 2007

Out of curiosity why import numpy instead of random? Do their .random() methods differ?

# ? Feb 27, 2013 04:32

Ashex: Jun 25, 2007; These pipes are cleeeean!!!

I'm using py2exe to build a binary I can run wherever I need it. I was going to wrap it in NSIS (basing it off this) so I can just copy one file that unpacks, runs, and cleans up after itself. I realized a bit earlier this is going to be a little tricky as the application reads a config file that I'd be updating with whatever requirements are needed for the machine I'm working on. There's also a couple directories containing pre-req files, but I may include those files in a repository I'm building that files are fetched from.

Any suggestions on the best way to handle a config file? I'm thinking I'll have the config file next to the "installer" and it will just copy the config file to the temp dir where I'm running the app from.

edit: Decided to scrap using NSIS and instead use the bundle_files option so it builds a single executable file.

Ashex fucked around with this message at 08:29 on Feb 28, 2013

# ? Feb 27, 2013 04:47

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

FoiledAgain posted:

Out of curiosity why import numpy instead of random? Do their .random() methods differ?

I'm not sure how their randomization differs, but since i use numpy is literally every program, it's a lot easier to use np.random.uniform() or np.random.normal() instead of loading another module. I also adore numpy, in case I haven't said it recently. :3:

# ? Feb 27, 2013 08:21

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

By the way, for those of you who haven't had to use it, count your damned blessings that you don't have to use IDL. IDL is literally the worst. Coding in straight machine code would be kinder.

# ? Feb 28, 2013 07:36

SirPablo: May 1, 2004; Pillbug

Is there a smart/quick way to get the slope/y-int of a linear regression line of along the 3rd axis of an array using Numpy? Here is an example array...

code:

data = \		
array([[[0, 0, 4, 9, 1],
        [6, 9, 6, 3, 7],
        [6, 1, 3, 7, 0]],

       [[9, 1, 5, 6, 2],
        [3, 0, 0, 7, 1],
        [1, 3, 5, 3, 4]],

       [[1, 9, 4, 1, 6],
        [2, 2, 6, 2, 6],
        [2, 4, 7, 0, 4]]])

For this 3x3x5 array I'd like to get a 3x3 array with the slope and another 3x3 array with the y-int for the linear regression calculated along the n=5 (third) axis. Such as this...

code:

slope = \
array([[  1.10000000e+00,  -4.00000000e-01,  -6.00000000e-01],
       [ -9.00000000e-01,   3.00000000e-01,   6.00000000e-01],
       [  2.00000000e-01,   8.00000000e-01,  -2.99404884e-16]])

yint = \
array([[-0.5,  7.4,  5.2],
       [ 7.3,  1.3,  1.4],
       [ 3.6,  1.2,  3.4]])

Right now my script is just iterating through each row/column combination which is terribly slow and inefficient. My googling doesn't turn up any help, thus I turn to you learned goons.

# ? Feb 28, 2013 08:54

Wildtortilla: Jul 8, 2008

I'm currently taking PSU's Certification in Geographic Information Systems (aka, intelligently using ArcMAP). In May I'll be starting my final course for the certificate and I have a huge array of options, but I'm leaning towards their course in Python. From looking at job postings for GIS positions, I'd wager at least 50% of postings include knowing Python as a desirable skill. However, coming from a background in geology, I have no experience with coding; would the links in the OP be a good place for me to start or should I start elsewhere since I have no experience? The first link in the tutorials "MIT Introduction to Computer Science and Programming" and the contents of this post seem like they'd be a good start for me. Any suggestions?

# ? Feb 28, 2013 15:32

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

SirPablo posted:

Is there a smart/quick way to get the slope/y-int of a linear regression line of along the 3rd axis of an array using Numpy? Here is an example array...
code:
data = \		
array([[[0, 0, 4, 9, 1],
        [6, 9, 6, 3, 7],
        [6, 1, 3, 7, 0]],

       [[9, 1, 5, 6, 2],
        [3, 0, 0, 7, 1],
        [1, 3, 5, 3, 4]],

       [[1, 9, 4, 1, 6],
        [2, 2, 6, 2, 6],
        [2, 4, 7, 0, 4]]])
For this 3x3x5 array I'd like to get a 3x3 array with the slope and another 3x3 array with the y-int for the linear regression calculated along the n=5 (third) axis. Such as this...
code:
slope = \
array([[  1.10000000e+00,  -4.00000000e-01,  -6.00000000e-01],
       [ -9.00000000e-01,   3.00000000e-01,   6.00000000e-01],
       [  2.00000000e-01,   8.00000000e-01,  -2.99404884e-16]])

yint = \
array([[-0.5,  7.4,  5.2],
       [ 7.3,  1.3,  1.4],
       [ 3.6,  1.2,  3.4]])
Right now my script is just iterating through each row/column combination which is terribly slow and inefficient. My googling doesn't turn up any help, thus I turn to you learned goons.

Two quick things:

1) Is this 3x3x3 array just your y-values? I'm trying to figure out how you're getting the slope from a, say 1x5 array. You'll need the x-values (and presumably some error bar)?

2) Could you maybe post a little more of the code? I have found that when I get to this point with data, it would have been actually easier to process a step while making a "master" data array. That is, at some point, you're creating this array of values. Maybe as you're throwing these values together, it would be easier to calculate the slope/int at the same time and just make the slope/int arrays on the fly.

EDIT:

For example, a while back in grad school, I would often create arrays of data, think of something else I wanted, and made a new loop to go through the data table to do it. That's not elegant and probably not fast. The way I would do it now is to just define a bunch of empty arrays and 'loop' once.

JetsGuy fucked around with this message at 19:04 on Feb 28, 2013

# ? Feb 28, 2013 18:16

SirPablo: May 1, 2004; Pillbug

JetsGuy posted:

1) Is this 3x3x3 array just your y-values? I'm trying to figure out how you're getting the slope from a, say 1x5 array. You'll need the x-values (and presumably some error bar)?

Yes sorry, for this example I just used x = np.arange(5) so I could demonstrate.

quote:

2) Could you maybe post a little more of the code? I have found that when I get to this point with data, it would have been actually easier to process a step while making a "master" data array. That is, at some point, you're creating this array of values. Maybe as you're throwing these values together, it would be easier to calculate the slope/int at the same time and just make the slope/int arrays on the fly.

Yea, here is my code. The arrays I am working on (weather model data) are 181x360 and I'm obtaining the slope/yint for 10 points. Thus, the array is 181x360x10. I do start with an empty 181x360x10 array and as I open each data file I load the data into an empty 2d array on the third axis. Once they're loaded in, then the only way I can figure to get the linear regression information along the third axis is to go point by point. Here is the code, trimmed a bit for easier reading.

code:

# Big data array
D = np.zeros((181,360,10))

# Load data
for x in range(10):
    D[:,:,x] = pygrib.open(files[x]).values.resize((181,360))

# Initiate arrays for linear regression    
slopes = np.zeros((181,360))
yints = np.zeros((181,360))

# Compute regression
for ix in range(0,360):
    for iy in range(0,181):
        slopes[iy][ix], yints[iy][ix] = np.polyfit(np.arange(10), D[:,iy,ix], 1)[0]

Edit: FYI It takes my machine 41 seconds to loop through all those, biggest killer in my script.

SirPablo fucked around with this message at 21:33 on Feb 28, 2013

# ? Feb 28, 2013 21:23

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

SirPablo posted:

Yea, here is my code. The arrays I am working on (weather model data) are 181x360 and I'm obtaining the slope/yint for 10 points. Thus, the array is 181x360x10. I do start with an empty 181x360x10 array and as I open each data file I load the data into an empty 2d array on the third axis. Once they're loaded in, then the only way I can figure to get the linear regression information along the third axis is to go point by point. Here is the code, trimmed a bit for easier reading.
code:
# Big data array
D = np.zeros((181,360,10))

# Load data
for x in range(10):
    D[:,:,x] = pygrib.open(files[x]).values.resize((181,360))

# Initiate arrays for linear regression    
slopes = np.zeros((181,360))
yints = np.zeros((181,360))

# Compute regression
for ix in range(0,360):
    for iy in range(0,181):
        slopes[iy][ix], yints[iy][ix] = np.polyfit(np.arange(10), D[:,iy,ix], 1)[0]
Edit: FYI It takes my machine 41 seconds to loop through all those, biggest killer in my script.

OK. I am in the middle of some satellite crunching right now, but since you may still be at work right now, I figured I'd at least give you an initial impression of what might help. It's really funny to me because I was going to write an example earlier where using zeroes made a code of mine slower. However, running THAT many fits in about 45 seconds really doesn't seem too bad too me. You COULD always thread it, but I'm literally just teaching myself that so I'm not the person to ask about that.

Anyway anecdotes aside, my first order of business would be to only assign numpy arrays as needed. I'm betting creating (and re-editing) the zeros arrays are slowing you down. I'm far from senor memory management though.

I don't know how pygrib works, but it looks like a file reader you'll need. I'm assuming it returns to you a nice 181x360 array, but NOT a numpy array??

This is just a stab, and may end up being slower. I love numpy, but I find when I only define numpy arrays when I have to, things tend to go better.

code:

# Load Data
D = []
for x in range(10):
    bit = pygrib.open(files[x]).values.resize((181,360))
    D.append(bit)

# Initiate arrays for linear regression    
slopes = []
yints = []

# Compute regression
for i in range(0, 360):
    column_m = []
    column_b = []
    for j in range(0,181):
        temp_slope, temp_int = np.polyfit(range(0,10,1), D[:,i,j], 1)[0]
        column_m.append(temp_slope)
        column_b.append(temp_int)
    slopes.append(column_m)
    slopes.append(column_b)

Again, you're going to want to review this, because I undoubtedly screwed up dimensions or something somewhere. This should give you a decent place to start though, I hope!

# ? Feb 28, 2013 22:13

SirPablo: May 1, 2004; Pillbug

Thanks for the response. I guess I was under the impression that numpy arrays work much quicker than regular python arrays and I try to use them everywhere I can. I'll take a stab at your process and see if it is any faster. I think making those 65,160 iterations though will still slow it down - wish there was sleek way numpy could slice in and two a linear regression along a defined axis!

# ? Feb 28, 2013 22:37

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

SirPablo posted:

Thanks for the response. I guess I was under the impression that numpy arrays work much quicker than regular python arrays and I try to use them everywhere I can. I'll take a stab at your process and see if it is any faster. I think making those 65,160 iterations though will still slow it down - wish there was sleek way numpy could slice in and two a linear regression along a defined axis!

I was too, somewhere earlier in this thread someone pointed out to me that every time you append to a numpy array it makes an entirely new array in the memory, which can add up fast! Apparently, using the native python arrays does something that's cleaner... I don't really remember.

However, the more I am thinking about this, the more I'm convinced you'll want to thread this. It doesn't even look like it'd be a very difficult thing to thread. That's what's really going to save you now.

# ? Feb 28, 2013 22:42

Emacs Headroom: Aug 2, 2003

SirPablo posted:

Thanks for the response. I guess I was under the impression that numpy arrays work much quicker than regular python arrays and I try to use them everywhere I can. I'll take a stab at your process and see if it is any faster. I think making those 65,160 iterations though will still slow it down - wish there was sleek way numpy could slice in and two a linear regression along a defined axis!

Numpy arrays typically are a lot faster. I might not be understanding your problem correctly, but maybe this makes sense:

1) You have n points that have dimensionality x by y. You want to get a regression across these points, each of which is represented by a 2d matrix.
2) The linear regression functionality in numpy expects 1d arrays, not 2d matrices
3) So why not cast your matrices as arrays? Then you'd get n points of size 181x360 = 65160. You could load these into the regression, and then just reshape them into 2d matrices when you're done. There's nothing special about arranging the parameters into the 2d matrix right? You're considering them all as independent dimensions, so just go ahead and treat them this way.

This is more of a linear algebra (and problem statement) comment than a numpy comment I guess.

edit: as a rule of thumb, any time you think you're struggling with some code related to linear algebra, bust out the pen and paper and make sure you know what's going on. These types of least-squared error problems always decompose into a single matrix inversion, so if you're stuck write out what the matrix inversion is going to be.

edit2:

JetsGuy posted:

I was too, somewhere earlier in this thread someone pointed out to me that every time you append to a numpy array it makes an entirely new array in the memory, which can add up fast! Apparently, using the native python arrays does something that's cleaner... I don't really remember.

Yeah if you try to append to an array it'll have to copy it. Instead of using built-in lists though, I find it's almost always better just to initialize a np.zeros() array with the correct size and then put values into it.

Emacs Headroom fucked around with this message at 22:53 on Feb 28, 2013

# ? Feb 28, 2013 22:44

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Emacs Headroom posted:

Numpy arrays typically are a lot faster. I might not be understanding your problem correctly, but maybe this makes sense:

That's interesting, because on *operations* on an array, I've seen this to be true. However, when casting an array, I have found num.append() (etc.) to be deadly for speed.

# ? Feb 28, 2013 22:51

BigRedDot: Mar 6, 2008

SirPablo posted:

Yes sorry, for this example I just used x = np.arange(5) so I could demonstrate.

Yea, here is my code. The arrays I am working on (weather model data) are 181x360 and I'm obtaining the slope/yint for 10 points. Thus, the array is 181x360x10. I do start with an empty 181x360x10 array and as I open each data file I load the data into an empty 2d array on the third axis. Once they're loaded in, then the only way I can figure to get the linear regression information along the third axis is to go point by point. Here is the code, trimmed a bit for easier reading.
code:
# Big data array
D = np.zeros((181,360,10))

# Load data
for x in range(10):
    D[:,:,x] = pygrib.open(files[x]).values.resize((181,360))

# Initiate arrays for linear regression    
slopes = np.zeros((181,360))
yints = np.zeros((181,360))

# Compute regression
for ix in range(0,360):
    for iy in range(0,181):
        slopes[iy][ix], yints[iy][ix] = np.polyfit(np.arange(10), D[:,iy,ix], 1)[0]
Edit: FYI It takes my machine 41 seconds to loop through all those, biggest killer in my script.

I doubt it will make much difference, but in general you should do:

code:

slopes[iy,ix]

instead of

code:

slopes[iy][ix]

It's one less lookup. You should also prefer xrange over range. You can also use np.empty() to create an unitialized array, instead of np.zeros(). I doubt this will make much difference in this case though. You can hoist the np.arange(10) outside of the loop and reuse it. That might actually make a noticeable difference, since you are creating and destroying that same little array sixty five thousand times. Beyond that, polyfit is non-trivial and you are doing quite a few of them.

Never append to a numpy array. They are optimized for other use cases and appending will force a copy. If anyone had asked me I would have said even including an append function in the api is a mistake.

BigRedDot fucked around with this message at 00:36 on Mar 1, 2013

# ? Mar 1, 2013 00:33

SirPablo: May 1, 2004; Pillbug

Emacs Headroom posted:

1) You have n points that have dimensionality x by y. You want to get a regression across these points, each of which is represented by a 2d matrix.

Not quite. I have a three dimensional array (181x360x10). I am after a regression through the third axis (10) for each point on the "face" of the 181x360 array. The regression for each point is done against a simple np.arange(10) because in the regression, the x-axis values do not vary and aren't important.

quote:

2) The linear regression functionality in numpy expects 1d arrays, not 2d matrices

Correct.

quote:

3) So why not cast your matrices as arrays? Then you'd get n points of size 181x360 = 65160. You could load these into the regression, and then just reshape them into 2d matrices when you're done. There's nothing special about arranging the parameters into the 2d matrix right? You're considering them all as independent dimensions, so just go ahead and treat them this way.

See above. I need to compute 65,160 regressions individually. So recasting it defeats my goal.

quote:

This is more of a linear algebra...

Oh the horrors!

# ? Mar 1, 2013 00:44

Nippashish: Nov 2, 2005; Let me see you dance!

SirPablo posted:

See above. I need to compute 65,160 regressions individually. So recasting it defeats my goal.

Just do the linear regression calculations yourself instead of calling polyfit, like so:

code:

import numpy as np

# Generate some dummy data
n, m, d = 181, 360, 10
D = np.dstack([i+np.zeros((n,m)) for i in xrange(d)])
D += 0.1*np.random.standard_normal(size=D.shape)
D += 2

# solve the grid of linear regression problems
X = np.vstack([np.ones(d), np.arange(d)])
solution = np.einsum('ij,klj->kli',
    np.linalg.solve(np.dot(X, X.T), X),
    D)
slopes = solution[:,:,1]
yints = solution[:,:,0]

slopes and yints are now (n,m) arrays where slopes[i,j] and yints[i,j] are the parameters of the line fit to D[i,j,:].

# ? Mar 1, 2013 08:11

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 09:39

accipter: Sep 12, 2003

Nippashish posted:

Code with Einstein summation.

Thanks for pointing out that einsum was added. I am not sure when I will get a chance to use, but it looks pretty powerful and would need some good documentation in the code so that others could understand it.

# ? Mar 1, 2013 16:36

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »