Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Rohaq: Aug 11, 2006

the posted:

I'm running a sci/numerical python program in Canopy right now It takes about 4 hours to run. Afterwards I need to manipulate the data in the console (like check specific values and whatnot). Just a bunch of 41x41x41 arrays and such.

Is there any way I can *save* all the computed data so I can just instantly load it up again to work with it? That would save me literal hours of work in having to run this program again if I have to shut down my computer for some reason.

Sounds like you want to be using the dump and load functions of the pickle module: http://docs.python.org/2/library/pickle.html

# ? Apr 20, 2013 22:25

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 00:05

evensevenone: May 12, 2001; Glass is a solid.

Pickle, yaml, or json are all pretty easy ways to achieve this. You'll want to make an object that contains all the data, then dump it to a file. Json and yaml are better if you might want to read the data in other languages/platforms. Pickle is faster and produces smaller files, so if you've got a shitload of data that might make sense.

# ? Apr 20, 2013 22:26

Emacs Headroom: Aug 2, 2003

For stuff that does well structured like json, json is (much) faster in my experience.

Pickle is really slow for large-ish things (even cPickle); maybe someone knows more about the details but I assume it has something to do with how pickle does the object instantiation.

# ? Apr 20, 2013 22:29

Nippashish: Nov 2, 2005; Let me see you dance!

Assuming you're using numpy, it has its own file format (see numpy.save/numpy.savez).

Don't put numpy matrices in json files :ughh:

# ? Apr 20, 2013 23:16

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

I'm becoming more interested in TDD, but I'm largely clueless because most of what I use Python for is async with gevent. Anyone have any recommendations for starting points with testing concurrent clusterfuck applications?

# ? Apr 21, 2013 00:16

Dominoes: Sep 20, 2007

Lesson in reinventing the wheel today: I created a function that allows date addition and subtraction using modulus and floor division. Turns out the datetime function already does that! It was a good learning experience.

Dominoes fucked around with this message at 02:37 on Apr 21, 2013

# ? Apr 21, 2013 02:33

Rohaq: Aug 11, 2006

JSON and YAML are great for structured data, numpy has its own format as said, but pickle works with objects as a whole; if your files are meant to work exclusively with your script, pickle is probably the best guarantee that your object remains the same once it's been saved and retrieved.

# ? Apr 21, 2013 02:38

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Dominoes posted:

Lesson in reinventing the wheel today: I created a function that allows date addition and subtraction using modulus and floor division.

And chances are you got it wrong.

# ? Apr 21, 2013 04:32

Dominoes: Sep 20, 2007

Suspicious Dish posted:

And chances are you got it wrong.

There is some error, but it was acceptable for what I used it for. No more than 2 days of error for the first few years.

# ? Apr 21, 2013 05:30

QuarkJets: Sep 8, 2008

Dominoes posted:

Lesson in reinventing the wheel today: I created a function that allows date addition and subtraction using modulus and floor division. Turns out the datetime function already does that! It was a good learning experience.

And quite well, at that!

MySQLdb even converts its own datetime format into the Python datetime format. Very handy, although inserting a datetime with MySQLdb requires converting it to a string first

# ? Apr 21, 2013 05:48

Dominoes: Sep 20, 2007

Technique question: I have two similar very similar, and lengthy bits of code (Pulling financial data from two different sources, one uses json, the other csv). There are common replaceable items. For example, where one function uses 'data[i][4]', the other might use 'data[n][day]['date']'. I'd pass '[n][day]' and '[i][4]' as parameters to the new function, reusing the code instead of having it exist twice. Logically, it seems easy to combine these into a common function, but simply turning them into arguments doesn't seem to work.

Option 1: Keep the functions separate due to this issue. Don't try to reuse the code

Option 2: I've read about using exec(), but it seems like a messy hack, and I can't get it working.

Option 3: A Python solution designed for this scenario that I haven't found. Someone earlier helped me with a similar problem using __getattribute__ with %s, but I can't seem to get it working in this scenario, and it also feels like a weird hack.

What's the preferred way to handle this?

example:

Python code:

data = ['a', 'b', 'c', 'd']
input1 = '[1]'
input2 = '[2]'

def func(data, input):
    return ''.join(data, input) #pseudocode
    
func(data, input1)
func(data, input2)

Dominoes fucked around with this message at 18:55 on Apr 21, 2013

# ? Apr 21, 2013 18:34

Nippashish: Nov 2, 2005; Let me see you dance!

Dominoes posted:

What's the preferred way to handle this?

Python code:

data = ['a', 'b', 'c', 'd']
input1 = 1
input2 = 2

def func(data, input):
    return data[input]
    
func(data, input1)
func(data, input2)

# ? Apr 21, 2013 19:18

Dominoes: Sep 20, 2007

Nippashish posted:

Python code:

data = ['a', 'b', 'c', 'd']
input1 = 1
input2 = 2

def func(data, input):
    return data[input]
    
func(data, input1)
func(data, input2)

What about this?

Python code:

data1 = ['a', 'b', 'c', 'd']
data2 = [['a', 'b', 'c', 'd'],['w', 'x', 'y', 'z']]
input1 = '[1]'
input2 = '[0][1]'

def func(data, input):
    return ''.join(data, input) #pseudocode
    
func(data1, input1)
func(data2, input2)

edit: Actually it looks like your solution works. Just need to pass the arguments as lists/tuples and account for all possible slots.

Python code:

data1 = ['a', 'b', 'c', 'd']
data2 = [['a', 'b', 'c', 'd'],['w', 'x', 'y', 'z']]
input1 = (1, 0)
input2 = (1, 2)

def func(data, input):
    return data[input[0]][input[1]]
    
func(data1, input1)
func(data2, input2)

Dominoes fucked around with this message at 19:42 on Apr 21, 2013

# ? Apr 21, 2013 19:27

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Python code:

data1 = ['a', 'b', 'c', 'd']
data2 = [['a', 'b', 'c', 'd'],['w', 'x', 'y', 'z']]
input1 = [1]
input2 = [0, 1]

def func(data, input):
    for key in input:
        data = data[key]
    return data

func(data1, input1) # 'b'
func(data2, input2) # 'b'

# ? Apr 21, 2013 19:35

Dominoes: Sep 20, 2007

Thanks Plork - that works too.

Last one! I think this can be solved with an if statement asking which input format you're entering (as an additional parameter) and having two separate return lines, but I'm wondering if there's a cleaner way.

Python code:

data1 = ['a', 'b', 'c', 'd']
data2 = [['a', 'b', 'c', 'd'],['w', 'x', 'y', 'z']]
input1 = '[1]'
input2 = '[0][n]'

def func(data, input):
    for n in range(3):
        return ''.join(data, input) #pseudocode
    
func(data1, input1)
func(data2, input2)

Dominoes fucked around with this message at 20:09 on Apr 21, 2013

# ? Apr 21, 2013 20:02

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Please share more about your actual problem. This seems like overengineering at the highest level.

# ? Apr 21, 2013 20:21

Dominoes: Sep 20, 2007

Suspicious Dish posted:

Please share more about your actual problem. This seems like overengineering at the highest level.

It's similar to the example I posted, but more complicated. I'm looking at historical stock data from Yahoo and Tradeking. I'm working with the Yahoo data as a .csv, and Tradeking as a .json phrase.

It looks like I have a working solution using Nippa's and Plork's examples. I used an if statement as described to sort the last example I posted. Here's the current implementation:

Python code:

eval(data, parameters, n, (n, 'date'), (n, 'close'), symbols[n], 'tk')

def eval(data, parameters, len_source, date_loc, price_loc, symbol, _type):
    failed = False
    result = []
    for n2 in parameters:
        start_date = date.today() - timedelta(days = n2.days_start)
        end_date = date.today() - timedelta(days = n2.days_end)
        
        #Checks for weekend dates; uses nearest Friday instead
        if date.weekday(start_date) == 5: 
            start_date -= timedelta(days = 1)
        if date.weekday(start_date) == 6:
            start_date -= timedelta(days = 2)
        if date.weekday(end_date) == 5:
            end_date -= timedelta(days = 1)
        if date.weekday(end_date) == 6:
            end_date -= timedelta(days = 2)
        
        count = 0 #for breaking loop once both dates determined, to save time
        #loops through the days, checking if dates match. Find a way around this?
        for n3 in range(len(data[len_source])): 
            if _type == 'tk':
                if data[date_loc[0]][n3][date_loc[1]] == str(start_date):
                    start_value = float(data[price_loc[0]][n3][price_loc[1]])
                    count += 1
                if data[date_loc[0]][n3][date_loc[1]] == str(end_date):
                    end_value = float(data[price_loc[0]][n3][price_loc[1]])
                    count += 1
                if count == 2:
                    break
            elif _type == 'yf':
                if data[date_loc[0]][n3][date_loc[1]] == str(start_date):
                    start_value = float(data[price_loc[0]][price_loc[1]])
                    count += 1
                if data[date_loc1[0]][n3][date_loc1[1]] == str(end_date):
                    end_value = float(data[price_loc[0]][price_loc[1]])
                    count += 1
                if count == 2:
                    break
        
        change = ((end_value - start_value) / start_value) * 100

        print('\n', symbol, n2.name)
        print ("change: " + str(change))
        print("start:", start_value, "end:", end_value)
        print("start:", start_date, "end:", end_date)
        
        if not n2._floor <= change <= n2._ceil:
            failed = True
            print (symbol, "failed for", n2.name)
            break

    if failed == False:
        result.append(symbol)
    return result

# ? Apr 21, 2013 20:33

QuarkJets: Sep 8, 2008

Agreed; I have no idea what you're asking for at this point. What is this most recent Python code supposed to do?

e: To me, nothing in your psuedo-code looks anything like the code that you just posted :psyduck:

# ? Apr 21, 2013 20:34

Dominoes: Sep 20, 2007

QuarkJets posted:

Agreed; I have no idea what you're asking for at this point. What is this most recent Python code supposed to do?

Nippa and Plork posted examples that I turned into a solution. I was wondering if there's a clean way to implement variables in code similar to the .join and %s abilities of strings.

I'm now curious if there's a way to clean up the duplicate 'if' statements in a situation like this.

Dominoes fucked around with this message at 20:43 on Apr 21, 2013

# ? Apr 21, 2013 20:36

QuarkJets: Sep 8, 2008

Dominoes posted:

I'm not asking anything at this point; Nippa and Plork posted examples that I turned into a solution. I was wondering if there's a clean way to implement variables in code similar to the .join and %s abilities of strings.

This is terrible, don't go looking for this. It's probably not what you actually want to do.

e: You asked for clean, what you did is way cleaner than trying to pull variables from strings and then loop over them or whatever

QuarkJets fucked around with this message at 20:44 on Apr 21, 2013

# ? Apr 21, 2013 20:39

Nippashish: Nov 2, 2005; Let me see you dance!

Dominoes posted:

I was wondering if there's a clean way to implement variables in code similar to the .join and %s abilities of strings.

This is never a good idea. Don't do this. Avoid wanting to do this.

# ? Apr 21, 2013 20:40

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Why do you have variables named n2 and n3?

# ? Apr 21, 2013 22:31

Dominoes: Sep 20, 2007

Suspicious Dish posted:

Why do you have variables named n2 and n3?

Iterators.

Dominoes fucked around with this message at 22:35 on Apr 21, 2013

# ? Apr 21, 2013 22:33

QuarkJets: Sep 8, 2008

Dominoes posted:

Iterators.

Don't do this:

code:

 if failed == False:
        result.append(symbol)

Instead:

code:

if not failed:
        result.append(symbol)

...

And instead of a failure flag, you could just return when your failure condition is met. Plus it looks like result is never longer than length==1, so couldn't you just scrap it entirely? Like this :

Python code:

eval(data, parameters, n, \
	(n, 'date'), (n, 'close'), \
	symbols[n], 'tk')

def eval(data, parameters, len_source, \
	date_loc, price_loc, symbol, _typ):
    for n2 in parameters:
	#some code
        
        if not n2._floor <= change <= n2._ceil:
            print (symbol, "failed for", n2.name)
            return None
    return symbol

If eval returns None, then eval failed, otherwise you've got your symbol object...

...

Wait, symbol isn't ever used or modified anywhere in the code! Can't this just return True or False?

QuarkJets fucked around with this message at 22:53 on Apr 21, 2013

# ? Apr 21, 2013 22:39

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Dominoes posted:

Iterators.

Name them something better. I also see a lot of issues with your code; use more variables. Like, I see that some of your code references date_loc1, which AFAICT doesn't exist. I don't know much about your data structures, but you should be able to use zip() or something instead of the range.

# ? Apr 21, 2013 23:52

Dominoes: Sep 20, 2007

QuarkJets posted:

Don't do this:
code:
 if failed == False:
        result.append(symbol)
Instead:
code:
if not failed:
        result.append(symbol)
...

And instead of a failure flag, you could just return when your failure condition is met. Plus it looks like result is never longer than length==1, so couldn't you just scrap it entirely? Like this :
Python code:
eval(data, parameters, n, \
	(n, 'date'), (n, 'close'), \
	symbols[n], 'tk')

def eval(data, parameters, len_source, \
	date_loc, price_loc, symbol, _typ):
    for n2 in parameters:
	#some code
        
        if not n2._floor <= change <= n2._ceil:
            print (symbol, "failed for", n2.name)
            return None
    return symbol
If eval returns None, then eval failed, otherwise you've got your symbol object...

...

Wait, symbol isn't ever used or modified anywhere in the code! Can't this just return True or False?

Good catches. Didn't know about the 'if not failed' boolean logic. You're right about the append being unecessary; I was mixing up code in this function and in the one that calls it. Made your change to the return logic. 'symbol' is passed as an argument. It looks like you're right that I could turn this function into a True/False return. Not sure if it's better; I'll think about it. I'd still want to pass the symbol name for the debugging prints.

edit: I like your True/False return suggestion more than returning the symbol name. Changed. Removed the "failed" variable.

Suspicious Dish posted:

Name them something better. I also see a lot of issues with your code; use more variables. Like, I see that some of your code references date_loc1, which AFAICT doesn't exist. I don't know much about your data structures, but you should be able to use zip() or something instead of the range.

The tutorials I learned from would use statements like "for parameter in parameters" or "for day in range". I don't like them because I have the phrase "parameter(s)" and "day(s)" several other places in the program, and want to make it easy to distinguish. The short names also make the code easier to read, although harder for someone else to interpret. I'm open to changing and suggestions on alternate ways of doing this. "loc1" was a typo. I'll look up zip().

Dominoes fucked around with this message at 02:22 on Apr 22, 2013

# ? Apr 22, 2013 00:44

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Dominoes posted:

The tutorials I learned from would use statements like "for parameter in parameters" or "for day in range". I don't like them because I have the phrase "parameter(s)" and "day(s)" several other places in the program, and want to make it easy to distinguish.

This means two things:

Your functions are too long and mix too many concerns, so you're not able to properly leverage variable scope
Your variable names are not descriptive in the first place

Both of these things also make it harder to unit test your code, which starts the death spiral of "this code is unreadable and I don't want to touch it because I might break something."

Dominoes posted:

The short names also make the code easier to read, although harder for someone else to interpret. I'm open to changing and suggestions on alternate ways of doing this. "loc1" was a typo.

Also harder for you to interpret once you take a few weeks off from this code, which honestly is the bigger concern to most developers.

# ? Apr 22, 2013 01:02

Dominoes: Sep 20, 2007

Replaced all instances of "n" and "i" iterators with names that make sense in both my programs.
Do y'all usually include __repr__ functions in classes containing instances?

Dominoes fucked around with this message at 02:22 on Apr 22, 2013

# ? Apr 22, 2013 02:01

the: Jul 18, 2004; by Cowcaster

So I have a 3d-array ey that increments over i,j,k. I also have an array eytimes that's 4 dimensions, I want it to be the 3d component of ey plus a counter that goes from 0-59 (because ey evolves in a while loop that counts 60 times).

Python code:

eytimes[i,j,k,counter] = ey[i,j,k] + counter

So, I want to plot all of the values on top of each other, like... I want to plot [i,0,0,1] and [i,0,0,2], etc... all on top on the same graph versus x. How would I do this?

I had

Python code:

pylab.plot(x,eytimes[:,half,half,:])

But that doesn't work

edit: fixed it

the fucked around with this message at 06:36 on Apr 22, 2013

# ? Apr 22, 2013 03:07

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

the posted:

So I have a 3d-array ey that increments over i,j,k. I also have an array eytimes that's 4 dimensions, I want it to be the 3d component of ey plus a counter that goes from 0-59 (because ey evolves in a while loop that counts 60 times).
Python code:
eytimes[i,j,k,counter] = ey[i,j,k] + counter
So, I want to plot all of the values on top of each other, like... I want to plot [i,0,0,1] and [i,0,0,2], etc... all on top on the same graph versus x. How would I do this?

I had
Python code:
pylab.plot(x,eytimes[:,half,half,:])
But that doesn't work

I'm assuming you have some 3D array of values (say x, y, t), and 60 of those? If that's the case, you may wanna check out the "meshgrid" funtions and how to plot those. It may be more useful for you. If you're just concerned with the first value over "counter" (where y=t=0 in my example), here's a few things:

1) Stop using pylab, I keep telling you to stop using pylab. :v:

2) From what you've described here, it doesn't look like eytimes[:,half,half,:] would be a 1D array, which pyplot.plot() is expecting for y.

3) Maybe I'm misunderstanding how you're construction eytimes, but those indicies don't make sense to me here. Counter should be the "lead" one I think. As in:

code:

In [4]: num.zeros([2,3,3,3])
Out[4]: 
array([[[[ 0.,  0.,  0.],
         [ 0.,  0.,  0.],
         [ 0.,  0.,  0.]],

        [[ 0.,  0.,  0.],
         [ 0.,  0.,  0.],
         [ 0.,  0.,  0.]],

        [[ 0.,  0.,  0.],
         [ 0.,  0.,  0.],
         [ 0.,  0.,  0.]]],


       [[[ 0.,  0.,  0.],
         [ 0.,  0.,  0.],
         [ 0.,  0.,  0.]],

        [[ 0.,  0.,  0.],
         [ 0.,  0.,  0.],
         [ 0.,  0.,  0.]],

        [[ 0.,  0.,  0.],
         [ 0.,  0.,  0.],
         [ 0.,  0.,  0.]]]])

Anyway, my advice here is to not shortcut it at the plot command and construct the data array you're trying to do in "eytimes[:,half,half,:]" on it's own. You're probably getting something like this:

code:

In [8]: x[:,:,0,0]
Out[8]: 
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

So you'll either have to do a couple of appends, or rethink how you cast eytimes. I think that may be the better solution.

# ? Apr 22, 2013 06:53

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Goddamnit. Beaten while writing my post again.

# ? Apr 22, 2013 06:54

Dren: Jan 5, 2001; Pillbug

Dominoes, you were talking about having data from two sources. One dataset is in json the other is in csv. Do the two data sources provide you the same data? If so, you should look at creating your own type, call it StockData. When you ingest data, load it into a StockData object. The custom code for how to read in each type of data will go into those load functions. Your code will access the StockData object. This way you don't have to write nutty custom accessor stuff (like the code you have).

This is just a suggestion, you might have a need for the way your stuff is right now.

# ? Apr 22, 2013 16:18

Houston Rockets: Apr 15, 2006

Is it possible to have my setup.py download a binary file and include it with the module?

I'm considering hacking it by having setup.py just download the file using urllib before setup() is invoked, but I wanted to know if there was a better way.

# ? Apr 22, 2013 20:07

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Don't do that. Your package's payload should have all the stuff it needs. Offline installation should be supported.

# ? Apr 22, 2013 21:10

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Suspicious Dish posted:

Don't do that. Your package's payload should have all the stuff it needs. Offline installation should be supported.

Sometimes this isn't possible due to licensing issues (think of the video driver or MS corefont installers in most Linux distros)

For basically all other instances though, agreed.

# ? Apr 22, 2013 21:48

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Why would the license allow downloading it over the internet as part of the install process? Like, corefonts never supported that. MS pulled it after people started extracted things from the .cab file.

Please do not install video drivers from setup.py.

# ? Apr 22, 2013 22:24

FoiledAgain: May 6, 2007

I want to be able to draw some simple lines and circles on screen, as a way of visualizing some data that my program outputs. Essentially I just want to draw a whole bunch of number lines from 1-100 with a few annotations on each line. I have no experience doing anything visual other than making some figures in matplotlib. What's a good package to use? The offical docs only seem to include Tkinter/turtle, and then recommendations for wxPython, PyQt, or PyGTK. I haven't used any of these before. I'm willing to learn new stuff, but this is a program that I want other people to use, and my target audience is academic colleagues who are afraid of code so I want to avoid sending them download anything other than Python if that's possible.

# ? Apr 22, 2013 22:47

Masa: Jun 20, 2003; Generic Newbie

I suck at Pyplot and Matplotlib and the documentation and examples never seem to help me, is there some simple way that I'm missing to show the same labels for the yticks on both the left and right sides of the graph?

# ? Apr 22, 2013 23:10

accipter: Sep 12, 2003

Masa posted:

I suck at Pyplot and Matplotlib and the documentation and examples never seem to help me, is there some simple way that I'm missing to show the same labels for the yticks on both the left and right sides of the graph?

This is the example you are looking for: http://matplotlib.org/examples/api/two_scales.html

You will have to copy the limits, ticks, and labels over with:

code:

ax2.set_ylim(*ax1.get_ylim()
ax2.set_yticks(ax1.get_yticks())
ax2.set_ylabel(ax1.get_ylabel())

edit: And another example: http://matplotlib.org/examples/api/fahrenheit_celsius_scales.html

accipter fucked around with this message at 23:24 on Apr 22, 2013

# ? Apr 22, 2013 23:17

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 00:05

accipter: Sep 12, 2003

FoiledAgain posted:

I want to be able to draw some simple lines and circles on screen, as a way of visualizing some data that my program outputs. Essentially I just want to draw a whole bunch of number lines from 1-100 with a few annotations on each line. I have no experience doing anything visual other than making some figures in matplotlib. What's a good package to use? The offical docs only seem to include Tkinter/turtle, and then recommendations for wxPython, PyQt, or PyGTK. I haven't used any of these before. I'm willing to learn new stuff, but this is a program that I want other people to use, and my target audience is academic colleagues who are afraid of code so I want to avoid sending them download anything other than Python if that's possible.

Are you looking to create a static image, or provide some interactive capabilities? If you are interested in a static image I would just create it with matplotlib.

# ? Apr 22, 2013 23:19

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »