|
hyphz posted:Is there a good clear explanation of how backpropagation works and why? Backprop is a computer sciencey version of the chain rule from elementary calculus. If you (or anyone who ever reads this) don't know elementary calculus, here's a crash course: - The derivative: If you take the slope of a line over an infinitely small interval, it tells you the "instantaneous rate of change" of a function. You can tell whether the function is increasing, decreasing, or staying the same (at a maximum or minimum value) based on whether that value is positive, negative, or 0. - The derivative of F(X) is written F'(X). - In machine learning, you're trying to minimize a loss function. So that's why you need a derivative--I know I'm at a minimum if my derivative is negative for a while, then turns to 0. - If you have multiple variables, F(x, y, z, t, w, ...) then let's just say we call this a "gradient." It tells us how much we're changing in a whole bunch of directions at once. So here's backpropagation: 1) if I have some poo poo, and multiply that poo poo by a bunch of matrices, that's a composite function--x times a matrix F, times a matrix G, times a matrix H etc can be written as F(G(H(x))). This is what deep learning does. All deep learning actually is are a graph of tensors and ops as the edges connecting those tensors. 1.5) let's assume we can calculate a derivative of a function on a computer pretty easily. 2) the chain rule: the derivative ("gradient", it's actually the Jacobian iirc but who cares) of a well-behaved* composite function, F(G(H(X)) is F'(G(H(X)) * G'(H(X)) * H'(X). There's a very easy proof/derivation of this actually, but you can live a rich and fulfilling life without ever bothering to look it up or verify it. So, backpropagation: - In deep learning, I'm doing (x * F ( * G ( * H))) where let's say our * operator boils down to matrix multiplication or something similar. - I can rewrite that as F(G(H(x))). - As I go through each step, I can compute a derivative, F', G', H' and set up placeholders for when I know what G(X) and H(X) are. - I can then go backwards when I'm done, and start with H'(X), then plug H(X) into my placeholder for G'(H(X)), on and on and on. - That's backpropagation, and I finally get something that behaves like a derivative from that, and that derivative will tell me if I'm heading in the right direction. So if I calculate H'(?), G'(?), F'(?) as I go, where "?" is a placeholder, I can just shove the outputs of F(G(H(x))) (because I'm computing them sequentially) into those derivatives and "backpropagate" the gradients. In Tensorflow this is done by creating a second backwards graph before running your stuff, in PyTorch every tensor you create has a "backwards" method that gets called by the autograd engine (remember, everything in TF/PT are tensors, the whole notion of "layers" is syntactic sugar). *"well-behaved" in this context means continuously differentiable on the interval you're interested in or something It really is that simple!
|
# ¿ Jun 13, 2023 22:22 |
|
|
# ¿ May 21, 2024 02:27 |
|
Yeah the actual human brain works basically nothing like a deep neural network (to the extent we know how it works at all). Neural networks and "activations" get their name from superficial similarities. The branch of CS devoted to simulating brains is called "neuromorphic" computing, that's basically the extent of my knowledge on it but there's a Wikipedia article about it https://en.wikipedia.org/wiki/Neuromorphic_engineering
|
# ¿ Jun 14, 2023 01:56 |
|
Rahu posted:I've been trying to learn some ML stuff lately and to that end I've been reading over Andrej Karpathy's nanoGPT. The forward method of their GPT model feeds that input through an nn.Embedding layer which requires torch.long (int64) input, so they're doing the conversion on the batch code because that will generally run on the CPU, or at the very least can be precomputed/queued, whereas a conversion further down the actual network in the Embedding layer will happen on the GPU. Xun posted:Anyone going to ICML? Im not presenting (rip) but I managed to get a travel grant for it anyway Honestly the major conferences are notorious for being a horseshit arbitrary process to get accepted, you really deserve respect for being able to get a travel grant these days
|
# ¿ Jun 14, 2023 19:39 |