|
I think it's important to know why you should never docode:
And that the reason is because floating point numbers are something like approximations of the actual real number. I think it's important to know the proper way of doing this comparison is code:
And that the difference between this and the first one is that you're now comparing the residual between the computed value against the expected value, and that eps should be chosen to be small enough for your application/hardware. I think for most people this is fine yeah. This is what I told the guy who asked and he seemed to appreciate it. Boris Galerkin fucked around with this message at 12:10 on Jun 12, 2017 |
# ? Jun 12, 2017 11:59 |
|
|
# ? May 16, 2024 18:37 |
|
LochNessMonster posted:Just read some stuff on the Floating-Point Guide that was linked earlier as well as the official docs and it makes some more sense now. At leas now I know to round my floats when using them for basic calculations! Actually no you don't want to round your floats. Let the computer handle the precision of your floats, and let yourself handle what an acceptable cutoff for "the answer" is. Unless you're talking about printing values purely for human eyes to look at them truncate/round away.
|
# ? Jun 12, 2017 12:09 |
|
shrike82 posted:An ML practitioner or academic could decide to ignore precision based on their needs, but I'd find it hard to argue that one can make an informed decision without understanding the basics of floating point arithmetic. (For applied guys like me, it might be different, and it's possible people are getting hosed over here from time to time.)
|
# ? Jun 12, 2017 12:19 |
|
QuarkJets posted:Back when bitcoin was still newer a bunch of people learned first-hand that floating-point arithmetic has precision issues. oh man, I vaguely remember hearing about this fiasco, do you have a link?
|
# ? Jun 12, 2017 12:40 |
|
shrike82 posted:An ML practitioner or academic could decide to ignore precision based on their needs, but I'd find it hard to argue that one can make an informed decision without understanding the basics of floating point arithmetic. Yeah, even then "I understand the complexity around floating point arithmetic and have concluded that for this application the precision limitations are not an issue" isn't really the same thing as ignoring it. Of course there are plenty of times it does matter. There's a reason we take the log of likelihoods and add those together rather than just multiplying.
|
# ? Jun 12, 2017 13:58 |
|
Symbolic Butt posted:oh man, I vaguely remember hearing about this fiasco, do you have a link? Nah that was years ago, sorry. I recall several different reddit threads about the problem appearing on "professional" exchanges though, back when everyone in the ecosystem had no more experience than "I built my gaming rig from parts I ordered on newegg"
|
# ? Jun 12, 2017 21:28 |
|
SurgicalOntologist posted:Do you have an actual algorithm in mind? Because if so I'd be curious to read up on it. I always figured that after solver issues (i.e. discretization error, stiffness, etc.) and measurement error on initial conditions, floating-point arithmetic is pretty low on the list of concerns. I guess in the sense that getting long term quantitative predictions using floating point is useless due to the accumulated error being magnified. It was a bit glib I have heard good things about differential quadrature methods but can't vouch personally
|
# ? Jun 13, 2017 19:43 |
|
Cingulate posted:I generally don't do this, but- if your fun is being needlessly mean towards other people, here's some life advice for you: try doing that less. Try changing. You'll become a happier person, and contribute more to the lives of those around you. Yes I'm going to take life advice from posting on the something awful comedy forums (I think it's pretty clear that every post is tongue firmly jammed in cheek) People who tell learners to ignore things that they barely understand never fails to amuse
|
# ? Jun 13, 2017 20:17 |
|
Malcolm XML posted:Yes I'm going to take life advice from posting on the something awful comedy forums Did anyone actually say that? I thought it was more "you need this basic level of knowledge" vs "you need this advanced level of knowledge"
|
# ? Jun 13, 2017 21:40 |
|
What would be the best way to take: [{u'amt': 10.0}, {u'amt': 0.1}, {u'amt': 1.0}, {u'amt': 3.0}, {u'amt': 5.0}] and turn it into [10.0, 0.1, 1.0, 3.0, 5.0] edit, think I found a good way: values = [value for singleDict in dictlist for value in dict.values()] huhu fucked around with this message at 15:50 on Jun 14, 2017 |
# ? Jun 14, 2017 15:43 |
|
huhu posted:What would be the best way to take: Python code:
|
# ? Jun 14, 2017 15:47 |
huhu posted:What would be the best way to take: I'd do Python code:
Edit: the difference between Thermopyle's solution and mine is that if there are other items in the dictionaries, his will select out the values with 'amt' keys, whereas mine will just give all the values regardless of key. VikingofRock fucked around with this message at 15:51 on Jun 14, 2017 |
|
# ? Jun 14, 2017 15:48 |
|
huhu posted:What would be the best way to take: If the key never changes code:
|
# ? Jun 14, 2017 15:53 |
|
I think I need async/await ..? But I have never used either, nor understood the descriptions. I am training a neural network on synthetic data, and I'm alternating data generation and training. So it looks like this: code:
code:
Edit: both functions fully occupy 1 core. I think that means await is not enough? Maybe I should just abuse joblib heavily, that I could do. Double Edit: joblib uses pickling and the model is compiled and sent to the GPU, so I think joblib wouldn't work. Cingulate fucked around with this message at 14:35 on Jun 15, 2017 |
# ? Jun 15, 2017 14:32 |
|
Async basically only helps with IO-bound tasks. If you are mostly CPU-bound you should probably use multiprocessing. (I'm on phone and basically didn't even look at your code)
|
# ? Jun 15, 2017 14:42 |
Thermopyle posted:Async basically only helps with IO-bound tasks. If you are mostly CPU-bound you should probably use multiprocessing. I agree with thermopyle. If you're actually pegging one of your CPUs asyncio is probably not the solution.
|
|
# ? Jun 15, 2017 14:52 |
|
Thermopyle posted:Async basically only helps with IO-bound tasks. If you are mostly CPU-bound you should probably use multiprocessing. Python is like the worst tool for this any code I have that's deeply improved by threading I've shifted over to elixir wince BEAM was designed right and can communicate with python processes for the numpy stuff I wish asyncio was more like erlang
|
# ? Jun 15, 2017 15:47 |
|
CPython has the Global Interpreter Lock, which means that only one thread can process Python bytecode at a time. It's not that Python code can't be multi-threaded, but that the interpreter is locked if one thread is executing Python bytecode; no others can process bytecode. Apparently it helps because the CPython memory management isn't thread-safe. As far as I understand it, anyway, I'm no CPython internals guru (I've looked in there precisely once). The GIL is acquired by a thread and released on I/O operations, if it helps you understand where bottlenecks might occur. I love Python, but the CPython implementation has some serious drawbacks and other languages should be used in certain cases. For what it's worth, most Python modules written in C don't acquire the lock (because they're running in C and aren't Python bytecode).
|
# ? Jun 15, 2017 16:12 |
|
Cingulate posted:I think I need async/await ..? But I have never used either, nor understood the descriptions. What libraries are you using for this? Have you looked into using dask? Take a look at this and see if it will help: http://matthewrocklin.com/blog/work/2016/07/12/dask-learn-part-1
|
# ? Jun 15, 2017 16:33 |
|
Thanks everyone. I think step 1 should be to make my generate_data less horribly ineffective, it's a big nested loop. Bummer - I thought this'd force me to learn something new
|
# ? Jun 15, 2017 16:48 |
|
Cingulate posted:Thanks everyone. This could be an opportunity to learn multiprocessing, if that's something you don't know already. I think your problem just calls for 2 Queues, and a Process that reads requests from queue1 and places X, y tuples into that queue2. Your main process reads X,y tuples from queue 2 and places restricted numbers of requests (which can just be None or whatever you want) into queue1 (this is to restrict the subprocess from just processing everything at once; if you don't mind all of your X,y tuples sitting in memory at the same time, you can skip having queue1 and just have your subprocess continuously pumping data into queue2) Multiprocessing sidesteps the GIL by launching multiple processes instead of multiple threads, so the best design is to generate simple data in separate processes and use more complex objects (such as the model) in your main process.
|
# ? Jun 15, 2017 21:03 |
|
I know joblib and multiprocessing's pool - I use them a lot actually, because most of my problems are trivially parallelizable. This one, I fear, is not: the training happens in Theano (I think I hosed something up with my Tensorflow installation?), so it compiles CUDA code for the GPU and then it sits there. So the training always needs to happen in the main thread/kernel/session. But yes, if I can send the data generation to a secondary process, that would save me some time. So this can be done with the multiprocessing module, did I get that right?.. (Having multiple X sit in memory probably isn't viable cause they're pretty big - tens of GB - , but I guess I can make them smaller and find a few GB somewhere and it should work.)
|
# ? Jun 15, 2017 22:12 |
|
Cingulate posted:I know joblib and multiprocessing's pool - I use them a lot actually, because most of my problems are trivially parallelizable. This one, I fear, is not: the training happens in Theano (I think I hosed something up with my Tensorflow installation?), so it compiles CUDA code for the GPU and then it sits there. So the training always needs to happen in the main thread/kernel/session. But yes, if I can send the data generation to a secondary process, that would save me some time. So this can be done with the multiprocessing module, did I get that right?.. Yes, you can do that with the multiprocessing module. In your main thread you'd start by adding a request for (X,y) to an input Queue, then you'd set up a for loop that reads (X,y) from an output Queue, places a new entry in the input Queue, and then begins running the model. A separate Process reads from the request Queue and writes to the output Queue. This creates a situation where your main process is handling the model training while at the same time a Process is creating the (X,y) tuple for the next iteration of model training. If you have enough memory to hold all of the (X,y) tuples in memory simultaneously then you can do all of the above with just 1 Queue instead of 2 (you can start a Process that just fills the output Queue with as many (X,y) tuples as you want without ever checking an input Queue for requests). Or you can use an input queue and just make sure that it never has more than M requests, in case you're worried about the generation of (X,y) tuples sometimes taking longer than a model training iteration QuarkJets fucked around with this message at 00:37 on Jun 16, 2017 |
# ? Jun 16, 2017 00:34 |
|
Out of curiosity, what is the NN problem you're working with? Training only for a single epoch for a given training set (some kind of augmenting going on there?) and data generation taking 30% of training time smells of there being room for cleaning up the code around there before spending too much time on shifting to a multiprocess setup.
|
# ? Jun 16, 2017 03:35 |
|
I'm reading up on Breadth First Search and this is the implementation the video I'm watching talks about :code:
Edit: I'm thinking this would be good: code:
huhu fucked around with this message at 15:13 on Jun 16, 2017 |
# ? Jun 16, 2017 15:08 |
|
shrike82 posted:Out of curiosity, what is the NN problem you're working with?
|
# ? Jun 16, 2017 21:05 |
|
Hello thread looking for some advice: I'm currently doing a project that involves running some test models against a medium sized data set and i'm trying to see how i can optimise the code. I've used pool.map with 5 processes on my PC which gets me from 26.2 seconds to 10. When i profile the code it seems to be that the majority of the time is running this one function that gets called about 2million times when i run the model i've made across the data set: code:
|
# ? Jun 17, 2017 16:27 |
|
Tried PyPy?
|
# ? Jun 17, 2017 16:36 |
Loving Africa Chaps posted:Hello thread looking for some advice: You should do this with linear algebra and numpy. There's no need to use PyPy or anything like that - this is the exact use case for numpy's numeric computing tools. Don't wrap it in a class. If you give me the actual math or more context about you're trying to do, I can show you how.
|
|
# ? Jun 17, 2017 16:44 |
|
Dunno about the C stuff but you're repeating a few calculations in therePython code:
Python code:
You might want to look into memoisation too, if that's appropriate to what you're working with e- thanks code formatting, I ain't fixing that
|
# ? Jun 17, 2017 17:02 |
|
Eela6 posted:You should do this with linear algebra and numpy. There's no need to use PyPy or anything like that - this is the exact use case for numpy's numeric computing tools. Don't wrap it in a class. If you give me the actual math or more context about you're trying to do, I can show you how. So it's a 3 compartment pharmacokinetic model. I have a 600 patient dataset where i'm modelling the predicted concentration in the central compartment (x1) and seeing how it matches up to the model. It's a series of differential equations that you can model fairly accurately by doing it in the 1 second steps shown by i don't doubt there's a better way of doing it. The data i have is also presented slightly oddly and slightly differently depending on the study it came from so modelling infusions and boluses of a drug as mg/s was convenient too. Will have a look into stuff numpy has to make this better as i'm already going to be using it for some of it's minimisation functions baka kaba posted:Dunno about the C stuff but you're repeating a few calculations in there Neat this brought it from 10.25 seconds to 7.53
|
# ? Jun 17, 2017 17:20 |
|
Loving Africa Chaps posted:So it's a 3 compartment pharmacokinetic model. I have a 600 patient dataset where i'm modelling the predicted concentration in the central compartment (x1) and seeing how it matches up to the model. Uh yeah definitely check my working pls and thx
|
# ? Jun 17, 2017 17:30 |
|
baka kaba posted:Uh yeah definitely check my working pls and thx Gives exactly the same results as before so your good pal
|
# ? Jun 17, 2017 17:37 |
|
I'm sure if you can organize it in matrix form/numpy/pandas, it's gonna run in a fraction of the time. Essentially, you want to vectorize it: instead of having objects per dataset, you just have one matrix, and apply each of these operations simultaneously to all of its rows. And to perhaps slightly increase readability for baka's code: code:
|
# ? Jun 17, 2017 17:38 |
Loving Africa Chaps posted:It's a series of differential equations Would you mind giving me the differential equations? My skills are a little rusty but this is something I used to be pretty good at. I have an easier time with the mathematical notation. (Funnily enough, I had my old Numerical Methods for Ordinary Differential Systems book on my lap as you posted this - I'm putting all my books in cardboard boxes before I move.)
|
|
# ? Jun 17, 2017 17:46 |
|
Lol use an actual diffeq solving codes instead of doing it by hand unless you are researching solvers
|
# ? Jun 17, 2017 18:37 |
|
Like instead of the janky first order stuff you could probably use the standard rk4 and be algorithmically faster
|
# ? Jun 17, 2017 18:38 |
I agree. Differential equations are prone to all sorts of numerical analysis problems. Don't roll your own solutions - use a known stable algorithm.
|
|
# ? Jun 17, 2017 18:41 |
|
Malcolm XML posted:Lol use an actual diffeq solving codes instead of doing it by hand unless you are researching solvers Malcolm XML posted:Like instead of the janky first order stuff you could probably use the standard rk4 and be algorithmically faster Did this thread get moved to yospos? (I don't disagree)
|
# ? Jun 17, 2017 19:04 |
|
|
# ? May 16, 2024 18:37 |
|
Specifically, you should be okay using scipy.integrate.odeint and not even thinking about the kind of solver it uses under the hood. You just need to formulate your problem in terms of a function that takes the state as an input vector and outputs a vector of the rates of change. It could easily be 1000x faster or more than doing Euler method by hand. Edit: and inside said function you should use dot product or matrix multiplication (via numpy) for all that multiplying and adding. SurgicalOntologist fucked around with this message at 19:15 on Jun 17, 2017 |
# ? Jun 17, 2017 19:12 |