Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Master_Odin: Apr 15, 2010; My spear never misses its mark...

ladies

Alright, I'll change my terminology, thanks for the correction. By running it directly I mean cding into the directory and doing "python code2.py" to test some of its internal methods. Then doing the same thing in code1.py. Originally, it was setup such that package 1 and package 2 had to exist side by side and then package 1 added package 2 on the os path, but I changed it such that package 2 exists inside package 1.

Changing it to fully qualified imports breaks doing python code2.py but does fix doing "python code1.py".

I think my best solution would be to do:

code:

try:
    import code3
catch ImportError:
    import package_2.code3 as code3

but it does feel sort of hacky. A better solution would probably be to package up package_2 and then installing locally via pip and then having package_1 reference it as is, but baby steps on improving undocumented relatively untested academic code.

# ? Jun 30, 2016 06:32

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 09:59

SurgicalOntologist: Jun 17, 2004

Yeah, fully-qualified imports will work for every use case if you install the top-level as a distribution. Making a setup.py file and pip installing it is the right way to go, and will guarantee that it works however you want to use it. The only other way to meet your requirements is probably to start hacking with sys.path at the top of every file.

When you run a file as a script, Python adds its directory to the path, but not the directory above it. That's the main source of your problems.

SurgicalOntologist fucked around with this message at 06:46 on Jun 30, 2016

# ? Jun 30, 2016 06:44

Shmoogy: Mar 21, 2007

I have a dataframe with 750,000 values in Unique_ID -- I need to test to see which Unique_ID is inside of any other Unique_ID -- and label it as a parent.
Example rows 0 and 1 are parent and child. I want to label 0 as a parent -- 349894 is in 349894 _ 4073467. A child will always be in the form of PARENTID_####### --

code:

    Unique_ID       ParentChild
0   349894  
1   349894_4073467  
2   349907  
3   349955  
4   349957  
5   349958  
6   349964  
7   349966  
8   349967  
9   349970  
10  349994  
11  349997  
12  350001  
13  350179  
14  350181  
15  350183  
16  350184  
17  350277  
18  350278  
19  350281  
20  350282

I've tried a few different ways that I knew would not be efficient (using df.iterrows), but I can't even get this logic to work.. The closest I've come is to - but is not actually working. testlist = gftest['Unique_ID'].tolist()

code:

for index,row in gftest.iterrows():
    count=0
    if row['Unique_ID'] in testlist:
        count = count + 1


    if count>1:
        row['ParentChild'] = 'Parent'

I think I can make this code work, but I feel like 1) its very non-pythonic, and two, it's going to be slow as gently caress. Is there an alternative method of doing this - especially a vectorized method?

# ? Jul 1, 2016 21:42

vikingstrike: Sep 23, 2007; whats happening, captain

Not sure what you're really looking for as the output goes, but is this similar to what you have in mind?

code:

df[['id_part1', 'id_part2']] = df.Unique_ID.str.split('_', expand=True)
df['num_id_part1'] = df.groupby('id_part1').id_part1.transform(len)
df.loc[(df.num_id_part1 > 1)&(df.id_part2.isnull()), 'ParentChild'] = 'Parent'
df.loc[(df.num_id_part1 > 1)&(df.id_part2.notnull()), 'ParentChild'] = 'Child'
df.drop(['id_part1', 'id_part2', 'num_id_part1'], axis=1, inplace=True)

code:

In [31]: df = pd.DataFrame(['423424234', '423424234_234234234234234', '97734950430'], columns=['Unique_ID'])

In [32]: df
Out[32]: 
                   Unique_ID
0                  423424234
1  423424234_234234234234234
2                97734950430

In [33]: %paste
df[['id_part1', 'id_part2']] = df.Unique_ID.str.split('_', expand=True)
df['num_id_part1'] = df.groupby('id_part1').id_part1.transform(len)
df.loc[(df.num_id_part1 > 1)&(df.id_part2.isnull()), 'ParentChild'] = 'Parent'
df.loc[(df.num_id_part1 > 1)&(df.id_part2.notnull()), 'ParentChild'] = 'Child'
df.drop(['id_part1', 'id_part2', 'num_id_part1'], axis=1, inplace=True)

## -- End pasted text --

In [34]: df
Out[34]: 
                   Unique_ID ParentChild
0                  423424234      Parent
1  423424234_234234234234234       Child
2                97734950430         NaN

vikingstrike fucked around with this message at 22:58 on Jul 1, 2016

# ? Jul 1, 2016 22:52

Eela6: May 25, 2007; Shredded Hen

I don't know how your data framework works, but this is the most 'Pythonic' way I can think of to handle the problem without needing to import anything. Hop[e this helps some!

code:

database = ['111111', '111111_222222', '141561', '123123']
def assign_parent_child(database):
    """assign tuples to database of index, unique id, and parent or child"""
    KEYLEN = 6
    # create dictionary of children
    childDict = {}  
    for data in database:
        if len(data) > KEYLEN:
            parent = (data[0:KEYLEN])
            childDict[parent] = data[KEYLEN+1:]
    
    # go through index and reassign in-place
    for index, data in enumerate(database):
        if len(data) > KEYLEN:
            parentChild = 'child'
        else:
            try:
                childDict[data]
                parentChild = 'parent'
            except KeyError:
                parentChild = None
            
        database[index] = (index, data, parentChild)
    return database

dataOut = assign_parent_child(database)
print(dataOut)

[(0, '111111', 'parent'), (1, '111111_222222', 'child'), (2, '141561', None), (3, '123123', None)]

Eela6 fucked around with this message at 05:41 on Jul 2, 2016

# ? Jul 2, 2016 04:49

Shmoogy: Mar 21, 2007

vikingstrike posted:

Not sure what you're really looking for as the output goes, but is this similar to what you have in mind?

code:

df[['id_part1', 'id_part2']] = df.Unique_ID.str.split('_', expand=True)
df['num_id_part1'] = df.groupby('id_part1').id_part1.transform(len)
df.loc[(df.num_id_part1 > 1)&(df.id_part2.isnull()), 'ParentChild'] = 'Parent'
df.loc[(df.num_id_part1 > 1)&(df.id_part2.notnull()), 'ParentChild'] = 'Child'
df.drop(['id_part1', 'id_part2', 'num_id_part1'], axis=1, inplace=True)

code:

In [31]: df = pd.DataFrame(['423424234', '423424234_234234234234234', '97734950430'], columns=['Unique_ID'])

In [32]: df
Out[32]: 
                   Unique_ID
0                  423424234
1  423424234_234234234234234
2                97734950430

In [33]: %paste
df[['id_part1', 'id_part2']] = df.Unique_ID.str.split('_', expand=True)
df['num_id_part1'] = df.groupby('id_part1').id_part1.transform(len)
df.loc[(df.num_id_part1 > 1)&(df.id_part2.isnull()), 'ParentChild'] = 'Parent'
df.loc[(df.num_id_part1 > 1)&(df.id_part2.notnull()), 'ParentChild'] = 'Child'
df.drop(['id_part1', 'id_part2', 'num_id_part1'], axis=1, inplace=True)

## -- End pasted text --

In [34]: df
Out[34]: 
                   Unique_ID ParentChild
0                  423424234      Parent
1  423424234_234234234234234       Child
2                97734950430         NaN

This is pretty much it, thanks!

I was offered this solution by somebody as well, and it's pretty interesting.

code:

import pandas as pd
import numpy as np

 def check(x):
     result = x.split('_')
     if len(result) == 2:
        return result[1], result[0]
     else:
        return result[0], np.nan

df['Unique_id'], df['parent_id'] = zip(*df['Unique_id'].apply(check))
df['ParentChild'] = df['Unique_id'].isin(df['parent_id'])

Thanks for the help Viking and Elea.

Elea your solution is kind of what I was trying to figure out how to do, as it's a more intuitive approach, but going Pandas and a vectorized solution like the above two are generally much faster for the sort of data I deal with.

# ? Jul 3, 2016 13:45

Dominoes: Sep 20, 2007

Something to keep in mind: Manipulating DataFrames is very slow. If speed's an issue, you'll get OOM improvements by converting to an array first.

Dominoes fucked around with this message at 17:05 on Jul 3, 2016

# ? Jul 3, 2016 13:53

vikingstrike: Sep 23, 2007; whats happening, captain

I work with much larger data frames in pandas and for most things it's quick enough with the built in functions. Every now and then it will blow up and I'll get a bit more involved with coding what I need but the developers have really done a nice job over the last year or so at making it faster and using cython when possible.

# ? Jul 3, 2016 16:52

SurgicalOntologist: Jun 17, 2004

SurgicalOntologist posted:

Sorry for the doublepost, but I have my own question. I want to make a combination of collections.deque and asyncio.Queue. That is, I want a deque that I can do await deque.popleft with. To put it another way, I want a Queue that I can treat as a collection. Supposedly, deque is threadsafe and Queue is built on top of a deque, so is there an easier way than making my own class? If there is I can't figure it out.

Edit: maybe this is an XY problem? I have a data stream that writes to a Queue. I want to be able to keep some elements around, for example to always be able to see the previous element. Programming synchronously I'd use a deque but I need the consumer to wait without blocking until there's new data.

Popped back in to say that I built it. I used state-based testing with hypothesis to test it, which was cool. You tell it all the operations, when they're allowed and what they should return, and it puts them together in different ways to try to break your program. Found some really subtle bugs that way.

# ? Jul 4, 2016 19:44

QuarkJets: Sep 8, 2008

Does anyone know whether numba is able to use shared memory arrays (multiprocessing.Array)? I tried looking around for an answer, but most of the information that I found was related to CUDA programming, whereas the thing that I'm working on just uses the CPU.

Basically right now I'm using multiprocessing in order to utilize all of my cores, and I'm using a shared memory array to hold read-only data. I'd like to try compiling some of the more expensive functions, but I've never tried giving a shared memory array to a numba function before

# ? Jul 5, 2016 05:09

Communist Pie: Mar 11, 2007; ...Robot!

I'm by no means an expert on either multiprocessing or numba, but I threw together a quick test and it seems to be working. I used a write in it to make sure processes weren't dealing with copies:

code:

import multiprocessing as mp
import numpy as np
import numba as nb
import ctypes
import math

def process_function(input_array, index):
    @nb.jit(nopython=True)
    def numba_func(input_array, index):
        for i in range(100000000):
            input_array[index] += math.sin(index)

    np_array = np.frombuffer(input_array)
    numba_func(np_array, index)    
    return

if __name__ == '__main__':
    original_array = np.zeros(4)
    shared_array = mp.RawArray(ctypes.c_double, original_array)
    processes = [mp.Process(target=process_function, args=(shared_array, i),) for i in range(4)]
    for process in processes:
        process.start()
    for process in processes:
        process.join()
    for stuff in shared_array:
        print(stuff)

Task manager shows (on an i5) 4 python processes each using ~25% CPU (I increased the size of the for loop to see this) then spits out
0.0
84147098.6234422
90929742.60248955
14112000.778887846

If I comment out the @nb.jit(nopython=True), the runtime goes from half a second up to 38 seconds (and same results).

# ? Jul 6, 2016 02:22

QuarkJets: Sep 8, 2008

Nice! Thanks for doing the legwork, my project is on the back-burner and this should help me get jump-started once I have time to resume it

# ? Jul 6, 2016 06:24

Plasmafountain: Jun 17, 2008

I need some advice on how to structure some arrays of data.

I'm essentially trying to code something akin to a CFD problem. I have a grid of 2D points in X and Y. Each point has a bunch of properties associated with it - velocity in x and y, temperature, pressure, density.

In a lovely attempt from my undergrad days, I've got parallel arrays for each property - one 2D array for x coordinates, another for y coordinates, another for Vx, and so on.

I tried a quick mashup just now where I had a 1D array of these values assigned to one element of a 2D array representing my 2D XY grid, but I couldnt assign this correctly in numpy.

What are my options here for getting this kind of indexing to work instead of having a lot of arrays making GBS threads up the place?

# ? Jul 6, 2016 23:21

Nippashish: Nov 2, 2005; Let me see you dance!

Zero Gravitas posted:

I've got parallel arrays for each property - one 2D array for x coordinates, another for y coordinates, another for Vx, and so on.

This sounds like a good format for the type of data you have, especially if you want to do the same computation at every point in space.

I don't understand what

Zero Gravitas posted:

I tried a quick mashup just now where I had a 1D array of these values assigned to one element of a 2D array representing my 2D XY grid, but I couldnt assign this correctly in numpy.

means though. Can you maybe explain what you're trying to do in detail?

# ? Jul 6, 2016 23:28

VikingofRock: Aug 24, 2008

Is it possible to use anonymous pipes with subprocess? I have bash command which I would like to run from my python script, which looks like this:

Bash code:

command -arg1 file1 -arg2 -file2 -arg3 -file3

The expected contents of file1, file2, and file3 are determined from the rest of my script, and I'd rather not have to actually create those files if possible. In bash I would do

Bash code:

command -arg1 <(make_file1_contents) -arg2 <(make_file2_contents) -arg3 <(make_file3_contents)

Does anyone know how to do that with subprocess? I guess I could always use temporary files but I'm wondering if there is a more elegant way.

# ? Jul 7, 2016 01:01

FoiledAgain: May 6, 2007

n/m solved just after posting

# ? Jul 7, 2016 02:03

SurgicalOntologist: Jun 17, 2004

Zero Gravitas posted:

I need some advice on how to structure some arrays of data.

I'm essentially trying to code something akin to a CFD problem. I have a grid of 2D points in X and Y. Each point has a bunch of properties associated with it - velocity in x and y, temperature, pressure, density.

In a lovely attempt from my undergrad days, I've got parallel arrays for each property - one 2D array for x coordinates, another for y coordinates, another for Vx, and so on.

I tried a quick mashup just now where I had a 1D array of these values assigned to one element of a 2D array representing my 2D XY grid, but I couldnt assign this correctly in numpy.

What are my options here for getting this kind of indexing to work instead of having a lot of arrays making GBS threads up the place?

Use xarray.

# ? Jul 7, 2016 03:53

Dominoes: Sep 20, 2007

SurgicalOntologist posted:

Use xarray.

This seems like a cool module!

# ? Jul 7, 2016 05:51

SurgicalOntologist: Jun 17, 2004

Yeah. I rarely have timeseries of more than 1D but I still use it anyway, even if just to be able to do state.mean(dim='time') rather than state.mean(axis=0).

# ? Jul 7, 2016 06:14

Edison was a dick: Apr 3, 2010; direct current only

VikingofRock posted:

Is it possible to use anonymous pipes with subprocess? I have bash command which I would like to run from my python script, which looks like this:
Bash code:
command -arg1 file1 -arg2 -file2 -arg3 -file3
The expected contents of file1, file2, and file3 are determined from the rest of my script, and I'd rather not have to actually create those files if possible. In bash I would do
Bash code:
command -arg1 <(make_file1_contents) -arg2 <(make_file2_contents) -arg3 <(make_file3_contents)
Does anyone know how to do that with subprocess? I guess I could always use temporary files but I'm wondering if there is a more elegant way.

That works by bash either making a fifo or using the /dev/fd symlinks to turn an open file descriptor into a file path.

Python code:

arg1 = Popen(make_file1_contents, stdout=PIPE)
arg2 = Popen(make_file2_contents, stdout=PIPE)
arg3 = Popen(make_file3_contents, stdout=PIPE)
command = Popen('command', '-arg1', '/dev/fd/%d' % arg1.stdout.fileno()
                           '-arg2', '/dev/fd/%d' % arg2.stdout.fileno(),
                           '-arg3', '/dev/fd/%d' % arg3.stdout.fileno())

Substitute the final Popen for call, check_call or check_output as appropriate, and also wait for arg1, and remember to kill and poll the subprocesses.

# ? Jul 7, 2016 08:27

SirPablo: May 1, 2004; Pillbug

Anyone else heading to scipy2016 next week?

# ? Jul 7, 2016 08:31

VikingofRock: Aug 24, 2008

Edison was a dick posted:

That works by bash either making a fifo or using the /dev/fd symlinks to turn an open file descriptor into a file path.
Python code:
arg1 = Popen(make_file1_contents, stdout=PIPE)
arg2 = Popen(make_file2_contents, stdout=PIPE)
arg3 = Popen(make_file3_contents, stdout=PIPE)
command = Popen('command', '-arg1', '/dev/fd/%d' % arg1.stdout.fileno()
                           '-arg2', '/dev/fd/%d' % arg2.stdout.fileno(),
                           '-arg3', '/dev/fd/%d' % arg3.stdout.fileno())
Substitute the final Popen for call, check_call or check_output as appropriate, and also wait for arg1, and remember to kill and poll the subprocesses.

Thanks, this is really helpful.

# ? Jul 7, 2016 08:52

Plasmafountain: Jun 17, 2008

Nippashish posted:

This sounds like a good format for the type of data you have, especially if you want to do the same computation at every point in space.

I don't understand what

means though. Can you maybe explain what you're trying to do in detail?

OK.

My issue is that in my prototype script from last year is that I have a fuckton of duplicated arrays for each property of fluid in the flow. They're all 2D arrays. They're structured like this:

I've got a nozzle thats separated into three different regions, so three different grids, each grid has something like 10 properties. So I have 3x10x(whatever my grid size is). Error decreases with increasing grid density, so if I have a 100x100 sized grid, I have 300,000 points to calculate for. Most of these calculations have got a lot of steps and iteration to reduce residuals and thats before I start adding even more complex stuff in the latest version thats got lots of integration.

What I'd really like to do is clean these 30 arrays up so I dont have a ton of these separate arrays to try and keep track of, something nice like this:

If I try and create a np array of zeros, then set individual elements to be numbers, thats fine.

If I try and create an np array of zeros, and try and append a 1D np array or normal array, I get an error.

If I try and create a normal array and insert another 1D array into one of its elements, its fine.

Its looking like I can do what I need to do with ordinary arrays - I'm just concerned about the speed of operations and I'm sure there must be a better way.

SurgicalOntologist posted:

Use xarray.

I've had a look at that but I cannot get how it works. Might be because I've got the worst headache known to man, but it doesnt appear all that intuitive. I'll have another look in the morning.

# ? Jul 7, 2016 22:33

Nippashish: Nov 2, 2005; Let me see you dance!

Zero Gravitas posted:

If I try and create a np array of zeros, then set individual elements to be numbers, thats fine.

If I try and create an np array of zeros, and try and append a 1D np array or normal array, I get an error.

If I try and create a normal array and insert another 1D array into one of its elements, its fine.

Its looking like I can do what I need to do with ordinary arrays - I'm just concerned about the speed of operations and I'm sure there must be a better way.

If I follow you then you want to do something like:

code:

data = np.zeros((n_properties, n_x, n_y))
data[0,:,:] = Vx
data[1,:,:] = Vy
... etc

and now everything ends up in the 3d array data. This is kind of ugly though because no you need to remember that data[1] is Vy instead of calling it by name. It might be nicer to create a Grid object to hold the 10 different arrays for each grid so you can pass the grid around and still access the properties by name. It looks like xarray would let you have the best of both worlds (one big block of numbers where properties can still be accessed by name), but I've never used that library so I can't tell you how to use it.

You might think that you want to have a 2d array where you can do grid[x,y].Vy to get Vy[x,y], but that's a really terrible data layout for vectorized computations. You're much better off with grid.Vx[x,y] so you can write things like np.sqrt(grid.Vx ** 2 + grid.Vy ** 2) to get the norm of the velocity at every point simultaneously (or whatever else it is you need to calculate).

# ? Jul 7, 2016 23:11

Plasmafountain: Jun 17, 2008

Ah! I didnt realise that! That looks like it could be pretty useful.

It being ugly and the rememberance problem isnt so much of an issue because its still a marked improvement over what I've got already - and remembering the indexing of what array is which is rather trivial in comparison to the rest of the operations I need to do.

That said, theres not a lot of vectorised operations going on. The calculations propagate from left to right ( [:,0] are generally given as boundary conditions) so really they're mainly there as an ease to plotting the result with matplotlib later for comparison with CFD data from Fluent or Openfoam.

Havent done all that much object oriented stuff - am I right in thinking I do something like:

code:


class Grid:

	 def __init__(self, mesh_size):
        	self.XCoord = np.zeros((mesh_size,mesh_size))
        	self.YCoord = np.zeros((mesh_size,mesh_size))
        	self.Vx = np.zeros((mesh_size,mesh_size))
		self.Vy = np.zeros((mesh_size,mesh_size))
		self.Theta = np.zeros((mesh_size,mesh_size))

And can then do poo poo like:

code:


Grid.XCoord[:,0] = 1

....actually, just trying this out is like a breath of fresh air and my worries about this whole thing have evaporated.

Thanks for putting me on the right track.

# ? Jul 7, 2016 23:58

Communist Pie: Mar 11, 2007; ...Robot!

A structured array can probably do what you want: http://docs.scipy.org/doc/numpy/user/basics.rec.html

You create a custom dtype: instead of every element in the array being a single integer or float, etc, it's multiple. Each one can be accessed by name. So if you have an array that's size (100x100x3) if you had an "x_vel" property you could access it by cool_array_name[80, 63, 1]['x_vel']. If you wanted to find the maximum value of 'x_vel' in the entire array you would do numpy.max(cool_array_name['x_vel']).

# ? Jul 8, 2016 03:05

BigRedDot: Mar 6, 2008

SirPablo posted:

Anyone else heading to scipy2016 next week?

Yup. I'm co-presenting the Bokeh tutor, but I can't make the main conference.

BigRedDot fucked around with this message at 19:30 on Jul 9, 2016

# ? Jul 8, 2016 04:32

Nippashish: Nov 2, 2005; Let me see you dance!

Zero Gravitas posted:

Havent done all that much object oriented stuff - am I right in thinking I do something like:

code:


class Grid:

	 def __init__(self, mesh_size):
        	self.XCoord = np.zeros((mesh_size,mesh_size))
        	self.YCoord = np.zeros((mesh_size,mesh_size))
        	self.Vx = np.zeros((mesh_size,mesh_size))
		self.Vy = np.zeros((mesh_size,mesh_size))
		self.Theta = np.zeros((mesh_size,mesh_size))

And can then do poo poo like:

code:


Grid.XCoord[:,0] = 1

Yes exactly that.

# ? Jul 8, 2016 08:50

OnceIWasAnOstrich: Jul 22, 2006

I decided to try Bokeh today for an interactive plot, and it is...not going well. My goal is to generate a simple histogram using the chart API, and every few seconds update the data. After reading through most of the documentation, I cannot figure out how to do this. All of the information about streaming data to a browser requires you to interact directly with plotting api elements (and their datasource).

Is it not possible to directly give a chart a ColumnDataSource as the input data and then push that? Is there some other way to update chart-level plot data without extracting plotting elements from it and updating them individually? Alternatively, I haven't even been able to figure out how to delete a plot from the document so I can just replot. I have tried curdoc().clear() and curdoc().remove_root(hist), which successfully remove the plot from curdoc().roots, but I can't figure out what command is necessary to make the actual page which has loaded the session actually remove the plot, I just end up with infinite histograms down the page.

# ? Jul 9, 2016 15:51

BigRedDot: Mar 6, 2008

OnceIWasAnOstrich posted:

I decided to try Bokeh today for an interactive plot, and it is...not going well. My goal is to generate a simple histogram using the chart API, and every few seconds update the data. After reading through most of the documentation, I cannot figure out how to do this. All of the information about streaming data to a browser requires you to interact directly with plotting api elements (and their datasource).

Is it not possible to directly give a chart a ColumnDataSource as the input data and then push that? Is there some other way to update chart-level plot data without extracting plotting elements from it and updating them individually? Alternatively, I haven't even been able to figure out how to delete a plot from the document so I can just replot. I have tried curdoc().clear() and curdoc().remove_root(hist), which successfully remove the plot from curdoc().roots, but I can't figure out what command is necessary to make the actual page which has loaded the session actually remove the plot, I just end up with infinite histograms down the page.

You're working too hard

I guess need to try to make it clearer that it is much simpler than that. To replace one plot in a layout with a new, just assign to it, as the crossfilter example does:

https://github.com/bokeh/bokeh/blob/master/examples/app/crossfilter/main.py#L72-L73

Regarding ColumnDataSource and charts, the consideration is that the data you pass to the chart is not necessarily the data that is needed to draw. And in fact the whole point of charts is to automatically do things like grouping and aggregations automatically. The data needed to draw is derived from the original data. But this means there is not a straightforward relationship between the data you provide and the data in the ColumnDataSource.

The glyph-based plots in bokeh.plotting have a very straightforward 1-1 relationship from the data to what is drawn on the screen: every visual aspect of a glyph is either a fixed value, or a column of values from a column data source. The "updating and existing plot without replacing it" mode of operation works much better with bokeh.plotting for this reason.

# ? Jul 9, 2016 19:40

OnceIWasAnOstrich: Jul 22, 2006

Hmm, to rephrase what you are telling me, the sticking point is that I need to be obtaining a reference to the layout object that my plot is part of. I had been adding my plot directly to the document (like many examples do) so there was no obvious layout object, if there even is one in that situation.

Simply assigning to the variable that contains the plot obviously is not enough, and additionally assigning to curdoc().roots slot does not work, so it isn't immediately obvious why assigning to a layout.children slot does.

OnceIWasAnOstrich fucked around with this message at 22:29 on Jul 9, 2016

# ? Jul 9, 2016 22:22

BigRedDot: Mar 6, 2008

OnceIWasAnOstrich posted:

so it isn't immediately obvious why assigning to a layout.children slot does.

Fair enough. The primary purpose of the Bokeh server is to keep python a specific set of objects and their properties ("Models") automatically in sync with the corresponding JS objects and properties. .children is a property of a layout model, so changing it falls under that purpose, and causes the client view to automatically update accordingly. Documents are containers for Models, but are not Models themselves.

BigRedDot fucked around with this message at 22:49 on Jul 9, 2016

# ? Jul 9, 2016 22:39

Dominoes: Sep 20, 2007

Has anyone thought about making a wrapper for matplotlib to make the api easier? Seems kind of a mess atm*, especially when you have to invoke the figure/axes/subplot syntax... ie for doing basic things like drawing the axes. If no one has a suggestion, I might just do it!

*Some people like it I guess, but it seems like a lot of boilerplate to me. I have to look up the docs/examples whenever I plot something.

Dominoes fucked around with this message at 01:04 on Jul 10, 2016

# ? Jul 10, 2016 00:49

QuarkJets: Sep 8, 2008

matplotlib is a mess because it's designed to look and feel like MATLAB, which is also a mess. However, you can actually take a fully object-oriented approach with it instead of using bizarre global declarations with pyplot. The tutorials all use the MATLAB-style global syntax

What I would like to see is an overhaul of the matplotlib tutorials, and then any sensible changes that need to be made could be made as part of that. I'd be worried about accidentally trying to improve things that already have superior alternatives in the API, you'd really need to take a deep look at some of that code

# ? Jul 10, 2016 01:28

Dominoes: Sep 20, 2007

What about something like this:

Python code:

def set_grid(ax):
    ax.grid(True)

    ax.axhline(y=0, linewidth=1.5, color='k')
    ax.axvline(x=0, linewidth=1.5, color='k')
    return ax


def plot(f: Callable[[float], float], x_min: float, x_max: float,
         equal_aspect=False) -> None:
    """One input, one output."""
    resolution = 1e5

    x = np.linspace(x_min, x_max, resolution)
    y = f(x)

    fig, ax = plt.subplots()

    ax.plot(x, y)
    if equal_aspect:
        ax.set_aspect('equal')

    ax = set_grid(ax)

    plt.show()


def parametric(f: Callable[[float], Tuple[float, float]], t_min: float,
               t_max: float)-> None:
    """One input, two outputs."""
    resolution = 1e5

    t = np.linspace(t_min, t_max, resolution)
    x, y = f(t)

    fig, ax = plt.subplots()
    ax.plot(x, y)

    ax = set_grid(ax)

    plt.show()


def contour(f: Callable[[float, float], float], x_min: float, x_max: float) -> None:
    """Two inputs, one output."""
    resolution = 1e3

    x = np.linspace(x_min, x_max, resolution)
    y = np.linspace(x_min, x_max, resolution)

    x, y = np.meshgrid(x, y)
    z = f(x, y)

    fig, ax = plt.subplots()

    ax = set_grid(ax)

    ax.contour(x, y, z)
    plt.show()

Is there API syntax I'm missing that would replace these? Rough sketch; need to add some more things like 3d parametric plots, vector fields, custom contour options like setting the color, background color gradients etc.

Dominoes fucked around with this message at 01:49 on Jul 10, 2016

# ? Jul 10, 2016 01:38

QuarkJets: Sep 8, 2008

Not that I can think of, the syntax changes that I was referring to was using stuff like plt.subplots(2, 3) for a 2x3 figure instead of plt.subplot(231), and using object methods instead of calling various subplot functions

One thing I'd suggest would be returning the axis object. And making the plt.show() lines optional (I never ever use plt.show, I write the figures to a file instead)

# ? Jul 10, 2016 02:25

mystes: May 31, 2006

The newly released version 5 of the ipython console is really nice.

mystes fucked around with this message at 02:53 on Jul 10, 2016

# ? Jul 10, 2016 02:50

BigRedDot: Mar 6, 2008

Dominoes posted:

Has anyone thought about making a wrapper for matplotlib to make the api easier?

Isn't that what Seaborn is?

# ? Jul 10, 2016 03:39

Dominoes: Sep 20, 2007

QuarkJets posted:

Not that I can think of, the syntax changes that I was referring to was using stuff like plt.subplots(2, 3) for a 2x3 figure instead of plt.subplot(231), and using object methods instead of calling various subplot functions

That syntax feels frustrating too, and AFAIK it's required to customize plots or do anything complex: the default settings aren't very useful, and setting them requires several lines of code.

quote:

One thing I'd suggest would be returning the axis object. And making the plt.show() lines optional (I never ever use plt.show, I write the figures to a file instead)

Done.

BigRedDot posted:

Isn't that what Seaborn is?

Seaborn looks cool!

What I have so far: Fplot on Github

Goal is easy syntax with reasonable defaults. Ie: Show the axes and grid for 2d plots, don't make the user setup the arrays/meshes by hand, use a tight layout, and allow basic customization (colormaps, labels etc) via kwargs. The alternative AFAIK is entering the code the funcs on this github do every time. Syntax examples:

edit: Also:

Python code:

f = lambda x, y: (x, y*cos(x))
fplot.vector(f, -10, 10)

Dominoes fucked around with this message at 20:31 on Jul 10, 2016

# ? Jul 10, 2016 09:04

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 09:59

accipter: Sep 12, 2003

I just wanted to point out Pendulum: http://pendulum.eustace.io/ . It looks like it solves a number of datetime's issues without some of arrow's issues.

# ? Jul 11, 2016 18:30

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »