|
Alright, I'll change my terminology, thanks for the correction. By running it directly I mean cding into the directory and doing "python code2.py" to test some of its internal methods. Then doing the same thing in code1.py. Originally, it was setup such that package 1 and package 2 had to exist side by side and then package 1 added package 2 on the os path, but I changed it such that package 2 exists inside package 1. Changing it to fully qualified imports breaks doing python code2.py but does fix doing "python code1.py". I think my best solution would be to do: code:
|
# ? Jun 30, 2016 06:32 |
|
|
# ? May 9, 2024 09:59 |
|
Yeah, fully-qualified imports will work for every use case if you install the top-level as a distribution. Making a setup.py file and pip installing it is the right way to go, and will guarantee that it works however you want to use it. The only other way to meet your requirements is probably to start hacking with sys.path at the top of every file. When you run a file as a script, Python adds its directory to the path, but not the directory above it. That's the main source of your problems. SurgicalOntologist fucked around with this message at 06:46 on Jun 30, 2016 |
# ? Jun 30, 2016 06:44 |
|
I have a dataframe with 750,000 values in Unique_ID -- I need to test to see which Unique_ID is inside of any other Unique_ID -- and label it as a parent. Example rows 0 and 1 are parent and child. I want to label 0 as a parent -- 349894 is in 349894 _ 4073467. A child will always be in the form of PARENTID_####### -- code:
code:
|
# ? Jul 1, 2016 21:42 |
|
Not sure what you're really looking for as the output goes, but is this similar to what you have in mind?code:
code:
vikingstrike fucked around with this message at 22:58 on Jul 1, 2016 |
# ? Jul 1, 2016 22:52 |
I don't know how your data framework works, but this is the most 'Pythonic' way I can think of to handle the problem without needing to import anything. Hop[e this helps some!code:
Eela6 fucked around with this message at 05:41 on Jul 2, 2016 |
|
# ? Jul 2, 2016 04:49 |
|
vikingstrike posted:Not sure what you're really looking for as the output goes, but is this similar to what you have in mind? This is pretty much it, thanks! I was offered this solution by somebody as well, and it's pretty interesting. code:
Elea your solution is kind of what I was trying to figure out how to do, as it's a more intuitive approach, but going Pandas and a vectorized solution like the above two are generally much faster for the sort of data I deal with.
|
# ? Jul 3, 2016 13:45 |
|
Something to keep in mind: Manipulating DataFrames is very slow. If speed's an issue, you'll get OOM improvements by converting to an array first.
Dominoes fucked around with this message at 17:05 on Jul 3, 2016 |
# ? Jul 3, 2016 13:53 |
|
I work with much larger data frames in pandas and for most things it's quick enough with the built in functions. Every now and then it will blow up and I'll get a bit more involved with coding what I need but the developers have really done a nice job over the last year or so at making it faster and using cython when possible.
|
# ? Jul 3, 2016 16:52 |
|
SurgicalOntologist posted:Sorry for the doublepost, but I have my own question. I want to make a combination of collections.deque and asyncio.Queue. That is, I want a deque that I can do await deque.popleft with. To put it another way, I want a Queue that I can treat as a collection. Supposedly, deque is threadsafe and Queue is built on top of a deque, so is there an easier way than making my own class? If there is I can't figure it out. Popped back in to say that I built it. I used state-based testing with hypothesis to test it, which was cool. You tell it all the operations, when they're allowed and what they should return, and it puts them together in different ways to try to break your program. Found some really subtle bugs that way.
|
# ? Jul 4, 2016 19:44 |
|
Does anyone know whether numba is able to use shared memory arrays (multiprocessing.Array)? I tried looking around for an answer, but most of the information that I found was related to CUDA programming, whereas the thing that I'm working on just uses the CPU. Basically right now I'm using multiprocessing in order to utilize all of my cores, and I'm using a shared memory array to hold read-only data. I'd like to try compiling some of the more expensive functions, but I've never tried giving a shared memory array to a numba function before
|
# ? Jul 5, 2016 05:09 |
|
I'm by no means an expert on either multiprocessing or numba, but I threw together a quick test and it seems to be working. I used a write in it to make sure processes weren't dealing with copies:code:
0.0 84147098.6234422 90929742.60248955 14112000.778887846 If I comment out the @nb.jit(nopython=True), the runtime goes from half a second up to 38 seconds (and same results).
|
# ? Jul 6, 2016 02:22 |
|
Nice! Thanks for doing the legwork, my project is on the back-burner and this should help me get jump-started once I have time to resume it
|
# ? Jul 6, 2016 06:24 |
|
I need some advice on how to structure some arrays of data. I'm essentially trying to code something akin to a CFD problem. I have a grid of 2D points in X and Y. Each point has a bunch of properties associated with it - velocity in x and y, temperature, pressure, density. In a lovely attempt from my undergrad days, I've got parallel arrays for each property - one 2D array for x coordinates, another for y coordinates, another for Vx, and so on. I tried a quick mashup just now where I had a 1D array of these values assigned to one element of a 2D array representing my 2D XY grid, but I couldnt assign this correctly in numpy. What are my options here for getting this kind of indexing to work instead of having a lot of arrays making GBS threads up the place?
|
# ? Jul 6, 2016 23:21 |
|
Zero Gravitas posted:I've got parallel arrays for each property - one 2D array for x coordinates, another for y coordinates, another for Vx, and so on. This sounds like a good format for the type of data you have, especially if you want to do the same computation at every point in space. I don't understand what Zero Gravitas posted:I tried a quick mashup just now where I had a 1D array of these values assigned to one element of a 2D array representing my 2D XY grid, but I couldnt assign this correctly in numpy.
|
# ? Jul 6, 2016 23:28 |
Is it possible to use anonymous pipes with subprocess? I have bash command which I would like to run from my python script, which looks like this:Bash code:
Bash code:
|
|
# ? Jul 7, 2016 01:01 |
|
n/m solved just after posting
|
# ? Jul 7, 2016 02:03 |
|
Zero Gravitas posted:I need some advice on how to structure some arrays of data. Use xarray.
|
# ? Jul 7, 2016 03:53 |
|
This seems like a cool module!
|
# ? Jul 7, 2016 05:51 |
|
Yeah. I rarely have timeseries of more than 1D but I still use it anyway, even if just to be able to do state.mean(dim='time') rather than state.mean(axis=0).
|
# ? Jul 7, 2016 06:14 |
|
VikingofRock posted:Is it possible to use anonymous pipes with subprocess? I have bash command which I would like to run from my python script, which looks like this: That works by bash either making a fifo or using the /dev/fd symlinks to turn an open file descriptor into a file path. Python code:
|
# ? Jul 7, 2016 08:27 |
|
Anyone else heading to scipy2016 next week?
|
# ? Jul 7, 2016 08:31 |
Edison was a dick posted:That works by bash either making a fifo or using the /dev/fd symlinks to turn an open file descriptor into a file path. Thanks, this is really helpful.
|
|
# ? Jul 7, 2016 08:52 |
|
Nippashish posted:This sounds like a good format for the type of data you have, especially if you want to do the same computation at every point in space. OK. My issue is that in my prototype script from last year is that I have a fuckton of duplicated arrays for each property of fluid in the flow. They're all 2D arrays. They're structured like this: I've got a nozzle thats separated into three different regions, so three different grids, each grid has something like 10 properties. So I have 3x10x(whatever my grid size is). Error decreases with increasing grid density, so if I have a 100x100 sized grid, I have 300,000 points to calculate for. Most of these calculations have got a lot of steps and iteration to reduce residuals and thats before I start adding even more complex stuff in the latest version thats got lots of integration. What I'd really like to do is clean these 30 arrays up so I dont have a ton of these separate arrays to try and keep track of, something nice like this: If I try and create a np array of zeros, then set individual elements to be numbers, thats fine. If I try and create an np array of zeros, and try and append a 1D np array or normal array, I get an error. If I try and create a normal array and insert another 1D array into one of its elements, its fine. Its looking like I can do what I need to do with ordinary arrays - I'm just concerned about the speed of operations and I'm sure there must be a better way. I've had a look at that but I cannot get how it works. Might be because I've got the worst headache known to man, but it doesnt appear all that intuitive. I'll have another look in the morning.
|
# ? Jul 7, 2016 22:33 |
|
Zero Gravitas posted:
If I follow you then you want to do something like: code:
You might think that you want to have a 2d array where you can do grid[x,y].Vy to get Vy[x,y], but that's a really terrible data layout for vectorized computations. You're much better off with grid.Vx[x,y] so you can write things like np.sqrt(grid.Vx ** 2 + grid.Vy ** 2) to get the norm of the velocity at every point simultaneously (or whatever else it is you need to calculate).
|
# ? Jul 7, 2016 23:11 |
|
Ah! I didnt realise that! That looks like it could be pretty useful. It being ugly and the rememberance problem isnt so much of an issue because its still a marked improvement over what I've got already - and remembering the indexing of what array is which is rather trivial in comparison to the rest of the operations I need to do. That said, theres not a lot of vectorised operations going on. The calculations propagate from left to right ( [:,0] are generally given as boundary conditions) so really they're mainly there as an ease to plotting the result with matplotlib later for comparison with CFD data from Fluent or Openfoam. Havent done all that much object oriented stuff - am I right in thinking I do something like: code:
code:
Thanks for putting me on the right track.
|
# ? Jul 7, 2016 23:58 |
|
A structured array can probably do what you want: http://docs.scipy.org/doc/numpy/user/basics.rec.html You create a custom dtype: instead of every element in the array being a single integer or float, etc, it's multiple. Each one can be accessed by name. So if you have an array that's size (100x100x3) if you had an "x_vel" property you could access it by cool_array_name[80, 63, 1]['x_vel']. If you wanted to find the maximum value of 'x_vel' in the entire array you would do numpy.max(cool_array_name['x_vel']).
|
# ? Jul 8, 2016 03:05 |
|
SirPablo posted:Anyone else heading to scipy2016 next week? Yup. I'm co-presenting the Bokeh tutor, but I can't make the main conference. BigRedDot fucked around with this message at 19:30 on Jul 9, 2016 |
# ? Jul 8, 2016 04:32 |
|
Zero Gravitas posted:Havent done all that much object oriented stuff - am I right in thinking I do something like: Yes exactly that.
|
# ? Jul 8, 2016 08:50 |
|
I decided to try Bokeh today for an interactive plot, and it is...not going well. My goal is to generate a simple histogram using the chart API, and every few seconds update the data. After reading through most of the documentation, I cannot figure out how to do this. All of the information about streaming data to a browser requires you to interact directly with plotting api elements (and their datasource). Is it not possible to directly give a chart a ColumnDataSource as the input data and then push that? Is there some other way to update chart-level plot data without extracting plotting elements from it and updating them individually? Alternatively, I haven't even been able to figure out how to delete a plot from the document so I can just replot. I have tried curdoc().clear() and curdoc().remove_root(hist), which successfully remove the plot from curdoc().roots, but I can't figure out what command is necessary to make the actual page which has loaded the session actually remove the plot, I just end up with infinite histograms down the page.
|
# ? Jul 9, 2016 15:51 |
|
OnceIWasAnOstrich posted:I decided to try Bokeh today for an interactive plot, and it is...not going well. My goal is to generate a simple histogram using the chart API, and every few seconds update the data. After reading through most of the documentation, I cannot figure out how to do this. All of the information about streaming data to a browser requires you to interact directly with plotting api elements (and their datasource). You're working too hard I guess need to try to make it clearer that it is much simpler than that. To replace one plot in a layout with a new, just assign to it, as the crossfilter example does: https://github.com/bokeh/bokeh/blob/master/examples/app/crossfilter/main.py#L72-L73 Regarding ColumnDataSource and charts, the consideration is that the data you pass to the chart is not necessarily the data that is needed to draw. And in fact the whole point of charts is to automatically do things like grouping and aggregations automatically. The data needed to draw is derived from the original data. But this means there is not a straightforward relationship between the data you provide and the data in the ColumnDataSource. The glyph-based plots in bokeh.plotting have a very straightforward 1-1 relationship from the data to what is drawn on the screen: every visual aspect of a glyph is either a fixed value, or a column of values from a column data source. The "updating and existing plot without replacing it" mode of operation works much better with bokeh.plotting for this reason.
|
# ? Jul 9, 2016 19:40 |
|
Hmm, to rephrase what you are telling me, the sticking point is that I need to be obtaining a reference to the layout object that my plot is part of. I had been adding my plot directly to the document (like many examples do) so there was no obvious layout object, if there even is one in that situation. Simply assigning to the variable that contains the plot obviously is not enough, and additionally assigning to curdoc().roots slot does not work, so it isn't immediately obvious why assigning to a layout.children slot does. OnceIWasAnOstrich fucked around with this message at 22:29 on Jul 9, 2016 |
# ? Jul 9, 2016 22:22 |
|
OnceIWasAnOstrich posted:so it isn't immediately obvious why assigning to a layout.children slot does. BigRedDot fucked around with this message at 22:49 on Jul 9, 2016 |
# ? Jul 9, 2016 22:39 |
|
Has anyone thought about making a wrapper for matplotlib to make the api easier? Seems kind of a mess atm*, especially when you have to invoke the figure/axes/subplot syntax... ie for doing basic things like drawing the axes. If no one has a suggestion, I might just do it! *Some people like it I guess, but it seems like a lot of boilerplate to me. I have to look up the docs/examples whenever I plot something. Dominoes fucked around with this message at 01:04 on Jul 10, 2016 |
# ? Jul 10, 2016 00:49 |
|
matplotlib is a mess because it's designed to look and feel like MATLAB, which is also a mess. However, you can actually take a fully object-oriented approach with it instead of using bizarre global declarations with pyplot. The tutorials all use the MATLAB-style global syntax What I would like to see is an overhaul of the matplotlib tutorials, and then any sensible changes that need to be made could be made as part of that. I'd be worried about accidentally trying to improve things that already have superior alternatives in the API, you'd really need to take a deep look at some of that code
|
# ? Jul 10, 2016 01:28 |
|
What about something like this:Python code:
Dominoes fucked around with this message at 01:49 on Jul 10, 2016 |
# ? Jul 10, 2016 01:38 |
|
Not that I can think of, the syntax changes that I was referring to was using stuff like plt.subplots(2, 3) for a 2x3 figure instead of plt.subplot(231), and using object methods instead of calling various subplot functions One thing I'd suggest would be returning the axis object. And making the plt.show() lines optional (I never ever use plt.show, I write the figures to a file instead)
|
# ? Jul 10, 2016 02:25 |
|
The newly released version 5 of the ipython console is really nice.
mystes fucked around with this message at 02:53 on Jul 10, 2016 |
# ? Jul 10, 2016 02:50 |
|
Dominoes posted:Has anyone thought about making a wrapper for matplotlib to make the api easier? Isn't that what Seaborn is?
|
# ? Jul 10, 2016 03:39 |
|
QuarkJets posted:Not that I can think of, the syntax changes that I was referring to was using stuff like plt.subplots(2, 3) for a 2x3 figure instead of plt.subplot(231), and using object methods instead of calling various subplot functions quote:One thing I'd suggest would be returning the axis object. And making the plt.show() lines optional (I never ever use plt.show, I write the figures to a file instead) BigRedDot posted:Isn't that what Seaborn is? What I have so far: Fplot on Github Goal is easy syntax with reasonable defaults. Ie: Show the axes and grid for 2d plots, don't make the user setup the arrays/meshes by hand, use a tight layout, and allow basic customization (colormaps, labels etc) via kwargs. The alternative AFAIK is entering the code the funcs on this github do every time. Syntax examples: edit: Also: Python code:
Dominoes fucked around with this message at 20:31 on Jul 10, 2016 |
# ? Jul 10, 2016 09:04 |
|
|
# ? May 9, 2024 09:59 |
|
I just wanted to point out Pendulum: http://pendulum.eustace.io/ . It looks like it solves a number of datetime's issues without some of arrow's issues.
|
# ? Jul 11, 2016 18:30 |