Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I think it's important to know why you should never do

code:

if 0.1 + 0.2 == 0.3:
    do_stuff()

And that the reason is because floating point numbers are something like approximations of the actual real number.

I think it's important to know the proper way of doing this comparison is

code:

eps = 1e-8
if abs(a - b) < eps:
    do_stuff()

And that the difference between this and the first one is that you're now comparing the residual between the computed value against the expected value, and that eps should be chosen to be small enough for your application/hardware.

I think for most people this is fine yeah. This is what I told the guy who asked and he seemed to appreciate it.

Boris Galerkin fucked around with this message at 12:10 on Jun 12, 2017

Adbot
ADBOT LOVES YOU

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

LochNessMonster posted:

Just read some stuff on the Floating-Point Guide that was linked earlier as well as the official docs and it makes some more sense now. At leas now I know to round my floats when using them for basic calculations!

(and to use decimals if I want to do something with currency)

Actually no you don't want to round your floats. Let the computer handle the precision of your floats, and let yourself handle what an acceptable cutoff for "the answer" is. Unless you're talking about printing values purely for human eyes to look at them truncate/round away.

Cingulate
Oct 23, 2012

by Fluffdaddy

shrike82 posted:

An ML practitioner or academic could decide to ignore precision based on their needs, but I'd find it hard to argue that one can make an informed decision without understanding the basics of floating point arithmetic.
I guess in practice, it's irrelevant cause the guys who wrote e.g. Tensorflow definitely understand the fine-grained details here, and if the issue were relevant, they most certainly would be able to account for that. You probably don't get into a situation where you do heavy AI stuff without automatically understanding floating point issues.

(For applied guys like me, it might be different, and it's possible people are getting hosed over here from time to time.)

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

QuarkJets posted:

Back when bitcoin was still newer a bunch of people learned first-hand that floating-point arithmetic has precision issues.

oh man, I vaguely remember hearing about this fiasco, do you have a link?

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

shrike82 posted:

An ML practitioner or academic could decide to ignore precision based on their needs, but I'd find it hard to argue that one can make an informed decision without understanding the basics of floating point arithmetic.

Yeah, even then "I understand the complexity around floating point arithmetic and have concluded that for this application the precision limitations are not an issue" isn't really the same thing as ignoring it.

Of course there are plenty of times it does matter. There's a reason we take the log of likelihoods and add those together rather than just multiplying.

QuarkJets
Sep 8, 2008

Symbolic Butt posted:

oh man, I vaguely remember hearing about this fiasco, do you have a link?

Nah that was years ago, sorry. I recall several different reddit threads about the problem appearing on "professional" exchanges though, back when everyone in the ecosystem had no more experience than "I built my gaming rig from parts I ordered on newegg"

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

SurgicalOntologist posted:

Do you have an actual algorithm in mind? Because if so I'd be curious to read up on it. I always figured that after solver issues (i.e. discretization error, stiffness, etc.) and measurement error on initial conditions, floating-point arithmetic is pretty low on the list of concerns.

But yeah, dealing with chaotic systems, good luck if you need to make specific predictions. Investigating the qualitative behavior of the system is more useful anyway.

In any case, you started by saying "floats are completely useless" for simulating complex systems, which I still don't understand--should I not be using floats? Or are just making the point that chaos magnifies errors? In which case, I have errors of much higher orders of magnitude I'm already concerned about.

I guess in the sense that getting long term quantitative predictions using floating point is useless due to the accumulated error being magnified. It was a bit glib

I have heard good things about differential quadrature methods but can't vouch personally

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Cingulate posted:

I generally don't do this, but- if your fun is being needlessly mean towards other people, here's some life advice for you: try doing that less. Try changing. You'll become a happier person, and contribute more to the lives of those around you.
I don't mean this to disparage you or as a counterargument to your position - which it is not - but as advice from someone who should've been given that lesson much earlier themselves, too.

They can, and do, largely ignore them. They can also try to reduce precision even more to get some performance benefits in certain applied cases.

(I'm obviously not saying these people are ignorant of the issues, just that the issues are largely irrelevant, and being ignored.)

Yes I'm going to take life advice from posting on the something awful comedy forums

(I think it's pretty clear that every post is tongue firmly jammed in cheek)

People who tell learners to ignore things that they barely understand never fails to amuse

QuarkJets
Sep 8, 2008

Malcolm XML posted:

Yes I'm going to take life advice from posting on the something awful comedy forums

(I think it's pretty clear that every post is tongue firmly jammed in cheek)

People who tell learners to ignore things that they barely understand never fails to amuse

Did anyone actually say that? I thought it was more "you need this basic level of knowledge" vs "you need this advanced level of knowledge"

huhu
Feb 24, 2006
What would be the best way to take:

[{u'amt': 10.0}, {u'amt': 0.1}, {u'amt': 1.0}, {u'amt': 3.0}, {u'amt': 5.0}]

and turn it into

[10.0, 0.1, 1.0, 3.0, 5.0]

edit, think I found a good way:

values = [value for singleDict in dictlist for value in dict.values()]

huhu fucked around with this message at 15:50 on Jun 14, 2017

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

huhu posted:

What would be the best way to take:

[{u'amt': 10.0}, {u'amt': 0.1}, {u'amt': 1.0}, {u'amt': 3.0}, {u'amt': 5.0}]

and turn it into

[10.0, 0.1, 1.0, 3.0, 5.0]

Python code:
x = [{u'amt': 10.0}, {u'amt': 0.1}, {u'amt': 1.0}, {u'amt': 3.0}, {u'amt': 5.0}]
[d['amt'] for d in x]

VikingofRock
Aug 24, 2008




huhu posted:

What would be the best way to take:

[{u'amt': 10.0}, {u'amt': 0.1}, {u'amt': 1.0}, {u'amt': 3.0}, {u'amt': 5.0}]

and turn it into

[10.0, 0.1, 1.0, 3.0, 5.0]

I'd do

Python code:

[val for dict in list for val in dict.values()]

Edit: the difference between Thermopyle's solution and mine is that if there are other items in the dictionaries, his will select out the values with 'amt' keys, whereas mine will just give all the values regardless of key.

VikingofRock fucked around with this message at 15:51 on Jun 14, 2017

MonkeyMaker
May 22, 2006

What's your poison, sir?

huhu posted:

What would be the best way to take:

[{u'amt': 10.0}, {u'amt': 0.1}, {u'amt': 1.0}, {u'amt': 3.0}, {u'amt': 5.0}]

and turn it into

[10.0, 0.1, 1.0, 3.0, 5.0]

If the key never changes

code:
amounts = [{u'amt': 10.0}, {u'amt': 0.1}, {u'amt': 1.0}, {u'amt': 3.0}, {u'amt': 5.0}]
[v['amt'] for v in amounts]

Cingulate
Oct 23, 2012

by Fluffdaddy
I think I need async/await ..? But I have never used either, nor understood the descriptions.

I am training a neural network on synthetic data, and I'm alternating data generation and training. So it looks like this:

code:
for epochs in range(number_of_epochs):
    X, y = generate_data(10000)  # generate training data
    model.train(X, y)  # train the model for one epoch
Now the data generation actually takes a significant fraction of the time it takes to train the model on the generated data (about 30%). But both run only on 1 CPU core (the model runs on the GPU). So it would save me some time if I could generate the next batch of data while the model is training. I.e., if I could do something like:

code:
X, y = generate_data(10000)  # generate training data
for epochs in range(number_of_epochs):
    model_finished = without_GIL(model.train(X, y))  # start training and release GIL
    if epochs < number_of_epochs:  # don't need for last epoch
        X, y = generate_data(10000)  # use CPU while model is running on GPU
        wait_for(model_finished)  # only proceed to next iteration once both training and data generation have finished
Is that something I could use await for?

Edit: both functions fully occupy 1 core. I think that means await is not enough? Maybe I should just abuse joblib heavily, that I could do.
Double Edit: joblib uses pickling and the model is compiled and sent to the GPU, so I think joblib wouldn't work.

Cingulate fucked around with this message at 14:35 on Jun 15, 2017

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Async basically only helps with IO-bound tasks. If you are mostly CPU-bound you should probably use multiprocessing.

(I'm on phone and basically didn't even look at your code)

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

Async basically only helps with IO-bound tasks. If you are mostly CPU-bound you should probably use multiprocessing.

(I'm on phone and basically didn't even look at your code)

I agree with thermopyle. If you're actually pegging one of your CPUs asyncio is probably not the solution.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Thermopyle posted:

Async basically only helps with IO-bound tasks. If you are mostly CPU-bound you should probably use multiprocessing.

(I'm on phone and basically didn't even look at your code)

Python is like the worst tool for this any code I have that's deeply improved by threading I've shifted over to elixir wince BEAM was designed right and can communicate with python processes for the numpy stuff

I wish asyncio was more like erlang :(

Ghost of Reagan Past
Oct 7, 2003

rock and roll fun
CPython has the Global Interpreter Lock, which means that only one thread can process Python bytecode at a time. It's not that Python code can't be multi-threaded, but that the interpreter is locked if one thread is executing Python bytecode; no others can process bytecode. Apparently it helps because the CPython memory management isn't thread-safe. As far as I understand it, anyway, I'm no CPython internals guru (I've looked in there precisely once).

The GIL is acquired by a thread and released on I/O operations, if it helps you understand where bottlenecks might occur. I love Python, but the CPython implementation has some serious drawbacks and other languages should be used in certain cases.

For what it's worth, most Python modules written in C don't acquire the lock (because they're running in C and aren't Python bytecode).

accipter
Sep 12, 2003

Cingulate posted:

I think I need async/await ..? But I have never used either, nor understood the descriptions.

I am training a neural network on synthetic data, and I'm alternating data generation and training. So it looks like this:

code:
for epochs in range(number_of_epochs):
    X, y = generate_data(10000)  # generate training data
    model.train(X, y)  # train the model for one epoch
Now the data generation actually takes a significant fraction of the time it takes to train the model on the generated data (about 30%). But both run only on 1 CPU core (the model runs on the GPU). So it would save me some time if I could generate the next batch of data while the model is training. I.e., if I could do something like:

code:
X, y = generate_data(10000)  # generate training data
for epochs in range(number_of_epochs):
    model_finished = without_GIL(model.train(X, y))  # start training and release GIL
    if epochs < number_of_epochs:  # don't need for last epoch
        X, y = generate_data(10000)  # use CPU while model is running on GPU
        wait_for(model_finished)  # only proceed to next iteration once both training and data generation have finished
Is that something I could use await for?

Edit: both functions fully occupy 1 core. I think that means await is not enough? Maybe I should just abuse joblib heavily, that I could do.
Double Edit: joblib uses pickling and the model is compiled and sent to the GPU, so I think joblib wouldn't work.

What libraries are you using for this? Have you looked into using dask? Take a look at this and see if it will help:

http://matthewrocklin.com/blog/work/2016/07/12/dask-learn-part-1

Cingulate
Oct 23, 2012

by Fluffdaddy
Thanks everyone.

I think step 1 should be to make my generate_data less horribly ineffective, it's a big nested loop. Bummer - I thought this'd force me to learn something new :v:

QuarkJets
Sep 8, 2008

Cingulate posted:

Thanks everyone.

I think step 1 should be to make my generate_data less horribly ineffective, it's a big nested loop. Bummer - I thought this'd force me to learn something new :v:

This could be an opportunity to learn multiprocessing, if that's something you don't know already. I think your problem just calls for 2 Queues, and a Process that reads requests from queue1 and places X, y tuples into that queue2. Your main process reads X,y tuples from queue 2 and places restricted numbers of requests (which can just be None or whatever you want) into queue1 (this is to restrict the subprocess from just processing everything at once; if you don't mind all of your X,y tuples sitting in memory at the same time, you can skip having queue1 and just have your subprocess continuously pumping data into queue2)

Multiprocessing sidesteps the GIL by launching multiple processes instead of multiple threads, so the best design is to generate simple data in separate processes and use more complex objects (such as the model) in your main process.

Cingulate
Oct 23, 2012

by Fluffdaddy
I know joblib and multiprocessing's pool - I use them a lot actually, because most of my problems are trivially parallelizable. This one, I fear, is not: the training happens in Theano (I think I hosed something up with my Tensorflow installation?), so it compiles CUDA code for the GPU and then it sits there. So the training always needs to happen in the main thread/kernel/session. But yes, if I can send the data generation to a secondary process, that would save me some time. So this can be done with the multiprocessing module, did I get that right?..

(Having multiple X sit in memory probably isn't viable cause they're pretty big - tens of GB - , but I guess I can make them smaller and find a few GB somewhere and it should work.)

QuarkJets
Sep 8, 2008

Cingulate posted:

I know joblib and multiprocessing's pool - I use them a lot actually, because most of my problems are trivially parallelizable. This one, I fear, is not: the training happens in Theano (I think I hosed something up with my Tensorflow installation?), so it compiles CUDA code for the GPU and then it sits there. So the training always needs to happen in the main thread/kernel/session. But yes, if I can send the data generation to a secondary process, that would save me some time. So this can be done with the multiprocessing module, did I get that right?..

(Having multiple X sit in memory probably isn't viable cause they're pretty big - tens of GB - , but I guess I can make them smaller and find a few GB somewhere and it should work.)

Yes, you can do that with the multiprocessing module. In your main thread you'd start by adding a request for (X,y) to an input Queue, then you'd set up a for loop that reads (X,y) from an output Queue, places a new entry in the input Queue, and then begins running the model. A separate Process reads from the request Queue and writes to the output Queue. This creates a situation where your main process is handling the model training while at the same time a Process is creating the (X,y) tuple for the next iteration of model training.

If you have enough memory to hold all of the (X,y) tuples in memory simultaneously then you can do all of the above with just 1 Queue instead of 2 (you can start a Process that just fills the output Queue with as many (X,y) tuples as you want without ever checking an input Queue for requests). Or you can use an input queue and just make sure that it never has more than M requests, in case you're worried about the generation of (X,y) tuples sometimes taking longer than a model training iteration

QuarkJets fucked around with this message at 00:37 on Jun 16, 2017

shrike82
Jun 11, 2005

Out of curiosity, what is the NN problem you're working with?
Training only for a single epoch for a given training set (some kind of augmenting going on there?) and data generation taking 30% of training time smells of there being room for cleaning up the code around there before spending too much time on shifting to a multiprocess setup.

huhu
Feb 24, 2006
I'm reading up on Breadth First Search and this is the implementation the video I'm watching talks about :

code:
class Node(object):
    def __init__(self, name):
        self.name = name
        self.adjacencyList = []
        self.visited = False
        self.predecessor = None

class BreadthFirstSearch(object):
    def bfs(self, startNode):
        queue = []
        queue.append(startNode)
        startNode.visited = True

        while queue:
            actualNode = queue.pop(0)
            print("{}".format(actualNode.name))

            for n in actualNode.adjacencyList:
                if not n.visited:
                    n.visited = True
                    queue.append(n)

node1 = Node("A")
node2 = Node("B")
node3 = Node("C")

node1.adjacencyList.append(node2)
node1.adjacencyList.append(node3)

bfs = BreadthFirstSearch()
bfs.bfs(node1)
I feel like this is missing a part about where if X.adjacencyList.append(Y), then Y.adjacencyList.append(X) should also happen automatically. Because if you only did the first part, and tried to call BreathFirstSearch on Y, you wouldn't get X. Is that how it should be done?

Edit:

I'm thinking this would be good:
code:
    def appendNode(self, node):
        self.adjacencyList.append(node)
        node.adjacencyList.append(self)

huhu fucked around with this message at 15:13 on Jun 16, 2017

Cingulate
Oct 23, 2012

by Fluffdaddy

shrike82 posted:

Out of curiosity, what is the NN problem you're working with?
Training only for a single epoch for a given training set (some kind of augmenting going on there?) and data generation taking 30% of training time smells of there being room for cleaning up the code around there before spending too much time on shifting to a multiprocess setup.
Turns out there is a perfectly fine Keras module for doing just what I did manually very badly. :downs:

Loving Africa Chaps
Dec 3, 2007


We had not left it yet, but when I would wake in the night, I would lie, listening, homesick for it already.

Hello thread looking for some advice:

I'm currently doing a project that involves running some test models against a medium sized data set and i'm trying to see how i can optimise the code. I've used pool.map with 5 processes on my PC which gets me from 26.2 seconds to 10. When i profile the code it seems to be that the majority of the time is running this one function that gets called about 2million times when i run the model i've made across the data set:

code:
def wait_time(self, time_seconds):
        current_x1 = self.x1
        current_x2 = self.x2
        current_x3 = self.x3
        current_xeo = self.xeo

        self.x1 = current_x1 + (self.k21 * current_x2 - self.k12 * current_x1 + self.k31 * current_x3 - self.k13 * current_x1 - self.k10 * current_x1) * time_seconds
        self.x2 = current_x2 + (-self.k21 * current_x2 + self.k12 * current_x1) * time_seconds
        self.x3 = current_x3 + (-self.k31 * current_x3 + self.k13 * current_x1) * time_seconds
        self.xeo = current_xeo + (-self.keo * current_xeo + self.keo * current_x1) * time_seconds
Reducing the number of times this runs is not really an option but i wonder if this is maybe another option such as using a library so this function runs in pure C or something? does anyone think they might be useful?

shrike82
Jun 11, 2005

Tried PyPy?

Eela6
May 25, 2007
Shredded Hen

Loving Africa Chaps posted:

Hello thread looking for some advice:

I'm currently doing a project that involves running some test models against a medium sized data set and i'm trying to see how i can optimise the code. I've used pool.map with 5 processes on my PC which gets me from 26.2 seconds to 10. When i profile the code it seems to be that the majority of the time is running this one function that gets called about 2million times when i run the model i've made across the data set:

code:
def wait_time(self, time_seconds):
        current_x1 = self.x1
        current_x2 = self.x2
        current_x3 = self.x3
        current_xeo = self.xeo

        self.x1 = current_x1 + (self.k21 * current_x2 - self.k12 * current_x1 + self.k31 * current_x3 - self.k13 * current_x1 - self.k10 * current_x1) * time_seconds
        self.x2 = current_x2 + (-self.k21 * current_x2 + self.k12 * current_x1) * time_seconds
        self.x3 = current_x3 + (-self.k31 * current_x3 + self.k13 * current_x1) * time_seconds
        self.xeo = current_xeo + (-self.keo * current_xeo + self.keo * current_x1) * time_seconds
Reducing the number of times this runs is not really an option but i wonder if this is maybe another option such as using a library so this function runs in pure C or something? does anyone think they might be useful?

You should do this with linear algebra and numpy. There's no need to use PyPy or anything like that - this is the exact use case for numpy's numeric computing tools. Don't wrap it in a class. If you give me the actual math or more context about you're trying to do, I can show you how.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Dunno about the C stuff but you're repeating a few calculations in there

Python code:
def wait_time(self, time_seconds):
        current_x1 = self.x1
        current_x2 = self.x2
        current_x3 = self.x3
        current_xeo = self.xeo

	x1k10 = current_x1 * self.k10
	x1k12 = current_x1 * self.k12
	x1k13 = current_x1 * self.k13
	x2k21 = current_x2 * self.k21
	x3k31 = current_x3 * self.k31

        self.x1 = current_x1 + (x2k21 - x1k12 + x3k31 -x1k13 - x1k10) * time_seconds
        self.x2 = current_x2 + (x1k12 - x2k21) * time_seconds
        self.x3 = current_x3 + (x1k13 - x3k31) * time_seconds
        self.xeo = current_xeo + (-self.keo * current_xeo + self.keo * current_x1) * time_seconds
And you could cut it down a bit more (maybe it's getting less readable here)
Python code:
def wait_time(self, time_seconds):
	# skipping the temp variables
	x1k10 = self.x1 * self.k10
	x1k12 = self.x1 * self.k12
	x1k13 = self.x1 * self.k13
	x2k21 = self.x2 * self.k21
	x3k31 = self.x3 * self.k31

        self.x1 = self.x1 + (x2k21 - x1k12 + x3k31 - x1k13 - x1k10) * time_seconds
        self.x2 = self.x2 + (x1k12 - x2k21) * time_seconds
        self.x3 = self.x3 + (x1k13 - x3k31) * time_seconds
	# factoring out self.keo so you multiply once
        self.xeo = self.xeo + self.keo * (self.x1 - self.xeo) * time_seconds
It might run a little faster (also make sure I didn't mess it up!)
You might want to look into memoisation too, if that's appropriate to what you're working with

e- thanks code formatting, I ain't fixing that

Loving Africa Chaps
Dec 3, 2007


We had not left it yet, but when I would wake in the night, I would lie, listening, homesick for it already.

Eela6 posted:

You should do this with linear algebra and numpy. There's no need to use PyPy or anything like that - this is the exact use case for numpy's numeric computing tools. Don't wrap it in a class. If you give me the actual math or more context about you're trying to do, I can show you how.

So it's a 3 compartment pharmacokinetic model. I have a 600 patient dataset where i'm modelling the predicted concentration in the central compartment (x1) and seeing how it matches up to the model. It's a series of differential equations that you can model fairly accurately by doing it in the 1 second steps shown by i don't doubt there's a better way of doing it. The data i have is also presented slightly oddly and slightly differently depending on the study it came from so modelling infusions and boluses of a drug as mg/s was convenient too. Will have a look into stuff numpy has to make this better as i'm already going to be using it for some of it's minimisation functions

baka kaba posted:

Dunno about the C stuff but you're repeating a few calculations in there

Python code:
def wait_time(self, time_seconds):
        current_x1 = self.x1
        current_x2 = self.x2
        current_x3 = self.x3
        current_xeo = self.xeo

	x1k10 = current_x1 * self.k10
	x1k12 = current_x1 * self.k12
	x1k13 = current_x1 * self.k13
	x2k21 = current_x2 * self.k21
	x3k31 = current_x3 * self.k31

        self.x1 = current_x1 + (x2k21 - x1k12 + x3k31 -x1k13 - x1k10) * time_seconds
        self.x2 = current_x2 + (x1k12 - x2k21) * time_seconds
        self.x3 = current_x3 + (x1k13 - x3k31) * time_seconds
        self.xeo = current_xeo + (-self.keo * current_xeo + self.keo * current_x1) * time_seconds
And you could cut it down a bit more (maybe it's getting less readable here)
Python code:
def wait_time(self, time_seconds):
	# skipping the temp variables
	x1k10 = self.x1 * self.k10
	x1k12 = self.x1 * self.k12
	x1k13 = self.x1 * self.k13
	x2k21 = self.x2 * self.k21
	x3k31 = self.x3 * self.k31

        self.x1 = self.x1 + (x2k21 - x1k12 + x3k31 - x1k13 - x1k10) * time_seconds
        self.x2 = self.x2 + (x1k12 - x2k21) * time_seconds
        self.x3 = self.x3 + (x1k13 - x3k31) * time_seconds
	# factoring out self.keo so you multiply once
        self.xeo = self.xeo + self.keo * (self.x1 - self.xeo) * time_seconds
It might run a little faster (also make sure I didn't mess it up!)
You might want to look into memoisation too, if that's appropriate to what you're working with

e- thanks code formatting, I ain't fixing that

Neat this brought it from 10.25 seconds to 7.53

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Loving Africa Chaps posted:

So it's a 3 compartment pharmacokinetic model. I have a 600 patient dataset where i'm modelling the predicted concentration in the central compartment (x1) and seeing how it matches up to the model.

Uh yeah definitely check my working pls and thx :v:

Loving Africa Chaps
Dec 3, 2007


We had not left it yet, but when I would wake in the night, I would lie, listening, homesick for it already.

baka kaba posted:

Uh yeah definitely check my working pls and thx :v:

Gives exactly the same results as before so your good pal

Cingulate
Oct 23, 2012

by Fluffdaddy
I'm sure if you can organize it in matrix form/numpy/pandas, it's gonna run in a fraction of the time. Essentially, you want to vectorize it: instead of having objects per dataset, you just have one matrix, and apply each of these operations simultaneously to all of its rows.

And to perhaps slightly increase readability for baka's code:
code:
def wait_time(self, time_seconds):
    # skipping the temp variables
    x1k10 = self.x1 * self.k10
    x1k12 = self.x1 * self.k12
    x1k13 = self.x1 * self.k13
    x2k21 = self.x2 * self.k21
    x3k31 = self.x3 * -self.k31

    self.x1 += sum((x2k21, x1k12, x3k31, x1k13, x1k10)) * time_seconds
    self.x2 += (x1k12 - x2k21) * time_seconds
    self.x3 += (x1k13 + x3k31) * time_seconds
    # factoring out self.keo so you multiply once
    self.xeo += self.keo * (self.x1 - self.xeo) * time_seconds
(Please double check if I botched any of the signs :v:)

Eela6
May 25, 2007
Shredded Hen

Loving Africa Chaps posted:

It's a series of differential equations

Would you mind giving me the differential equations? My skills are a little rusty but this is something I used to be pretty good at. I have an easier time with the mathematical notation. (Funnily enough, I had my old Numerical Methods for Ordinary Differential Systems book on my lap as you posted this - I'm putting all my books in cardboard boxes before I move.)

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Lol use an actual diffeq solving codes instead of doing it by hand unless you are researching solvers

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Like instead of the janky first order stuff you could probably use the standard rk4 and be algorithmically faster

Eela6
May 25, 2007
Shredded Hen
I agree. Differential equations are prone to all sorts of numerical analysis problems. Don't roll your own solutions - use a known stable algorithm.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Malcolm XML posted:

Lol use an actual diffeq solving codes instead of doing it by hand unless you are researching solvers


Malcolm XML posted:

Like instead of the janky first order stuff you could probably use the standard rk4 and be algorithmically faster

Did this thread get moved to yospos?

(I don't disagree)

Adbot
ADBOT LOVES YOU

SurgicalOntologist
Jun 17, 2004

Specifically, you should be okay using scipy.integrate.odeint and not even thinking about the kind of solver it uses under the hood. You just need to formulate your problem in terms of a function that takes the state as an input vector and outputs a vector of the rates of change. It could easily be 1000x faster or more than doing Euler method by hand.

Edit: and inside said function you should use dot product or matrix multiplication (via numpy) for all that multiplying and adding.

SurgicalOntologist fucked around with this message at 19:15 on Jun 17, 2017

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply