|
Suspicious Dish posted:dumb serious question. why do we use nonlinear functions instead of linear ones if they're all dumb poo poo like ReLU which seems like it barely works. I guess second question: is there any theoretical basis for why ReLU works at all? i thought the state of the art with why some activation functions work better than other functions in some cases is essentially
|
# ? Dec 15, 2019 08:19 |
|
|
# ? May 25, 2024 14:19 |
|
could be based on imperical observation & adjustment, ie tweak the nn input until you get the desired output
|
# ? Dec 15, 2019 08:42 |
|
nonlinear sounds cooler
|
# ? Dec 15, 2019 08:50 |
|
You'd think training on a linear activation function would be a whole lot easier to optimize for
|
# ? Dec 15, 2019 09:17 |
|
Suspicious Dish posted:dumb serious question. why do we use nonlinear functions instead of linear ones if they're all dumb poo poo like ReLU which seems like it barely works. I guess second question: is there any theoretical basis for why ReLU works at all? I know this one. the problem with linear activation functions is that they collapse the representative power of the whole network down to the representative power of a single layer. the final outputs become a linear combination of linear combinations...of linear combinations of the inputs, which can be simplified to a single linear combination of the inputs. on ReLU, I can say why it works better than e.g. a sigmoid function, but I don't have much of a math background so no difficult followup questions the problem with the sigmoid is that these networks are trained using gradient descent, which depends on the existence of a gradient to descend. if a neuron's output barely changes as the weights are adjusted, it's hard to choose an adjustment that will move the output in the right direction. the sigmoid function flattens out for high-magnitude input values, so if, during training, a neuron's weights shift such that the value going in to the (sigmoid) activation function is huge, the sigmoid "saturates" and kills the gradient and that part of the network gets stuck ReLU doesn't have that problem because there's always a gradient for positive inputs. neurons can still get stuck when the activation function receives large negative inputs, but in practice the weights in the rest of the network move around enough during training that stuck neurons usually get unstuck. people have also experimented with "leaky ReLU" aka f(x) = max(0.01*x, x) but last I heard it wasn't much of an improvement
|
# ? Dec 15, 2019 09:26 |
|
ninepints posted:I know this one. the problem with linear activation functions is that they collapse the representative power of the whole network down to the representative power of a single layer. the final outputs become a linear combination of linear combinations...of linear combinations of the inputs, which can be simplified to a single linear combination of the inputs. one can for intuition frame this as not having "decisions"; if a layer wants to differentiate between constant values x and y, while being reasonably robust to small errors, it can have parameters which place x and y into a part of the activation function where such fuzziness is truncated out; e.g. scale x so values in its neighborhood lands in the flat 0 of ReLU, or x and y on the bottom and top of a sigmoid for a cleaned up {x -> 0, y -> 1} mapping. that is, the layer can take a value z=0.9*x+0.1*y, and simulate deciding that z is actually x by outputting z'=0.99999*x+0.00001*y (in the sigmoid case). with only linear activation functions you cannot achieve the same thing, since the output will just be some linear scaling of z, retaining both its x and y "parts". it just collapsing to one layer is the more powerful argument, but it might be useful to think about how non-linearity lets the network tweak what information is retained.
|
# ? Dec 15, 2019 11:06 |
|
there's also the universal approximation theorem which states that single-layer (shallow) NNs can approximate functions over reasonable chunks of R^n. that's been proved for sigmoid and ReLU. In practice we use deep networks instead, i think there's some results about how they're much more efficient in terms of representation power. there's also some work with "verifying" ReLU-based networks, which rely on the simple structure of ReLU to prove geometric properties of the network. in practice these can only prove generalities, though. Stuff like "if you make a small change to the input, the output can only change by this amount." they can't prove deeper specs about network correctness, because, if we could specify exactly what operation we wanted the network to do, we wouldn't need a neural network, now would we? one other way to think about the linear stuff is just to intuit that a linear operation followed by another linear operation is linear. so, stacking linear layers, you're really only training a single linear transformation. you don't generally have that problem with nonlinear activations by definition; that property is pretty unusual. and the algorithms can optimize through pretty much whatever operations you want, so, the nonlinearity isn't a problem. ReLU is used because it works very very well in practice (and is cheap in hardware), idk why it works so well though.
|
# ? Dec 15, 2019 14:23 |
|
https://twitter.com/dril_gpt2/status/1208788034407292930 https://twitter.com/dril_gpt2/status/1208854462984577025 https://twitter.com/dril_gpt2/status/1208454587054735361 https://twitter.com/dril_gpt2/status/1208419178442641408
|
# ? Dec 23, 2019 04:00 |
|
https://www.youtube.com/watch?v=1zZZjaYl4AA lol
|
# ? Dec 24, 2019 01:21 |
|
hahaha. he made a lovely course that cost $200 saying he'd personally mentor at most 500 students, then opened multiple slack accounts and loaded them up. https://medium.com/@gantlaborde/siraj-rival-no-thanks-fe23092ecd20
|
# ? Dec 24, 2019 04:55 |
|
oh my god... https://www.youtube.com/watch?v=2GwzlT2M59A
|
# ? Dec 24, 2019 04:56 |
|
https://twitter.com/lindsey/status/1211698750759944193 lol (Also read the thread the quoted tweet is in. Dude knows what's up.)
|
# ? Dec 31, 2019 01:22 |
|
lol https://twitter.com/bschulz5/status/1212198171436310533?s=21
|
# ? Jan 3, 2020 03:11 |
|
A NN classifier for fizz buzz but unironically
|
# ? Jan 3, 2020 03:15 |
|
this guy might be a dumbass https://twitter.com/Bschulz5/status/1202631062410674176
|
# ? Jan 3, 2020 03:18 |
|
it's a pity that nlp and cv have fallen to the ml gremlins
|
# ? Jan 3, 2020 03:20 |
|
don't mind me, just informing the director of the cornell CS department about machine learning
|
# ? Jan 3, 2020 03:48 |
|
https://twitter.com/Bschulz5/status/1206577850234658817 That's really deep, man.
|
# ? Jan 3, 2020 03:52 |
|
I love the idea of our entire universe being the waste heat from gods GPU while they’re playing 20-dimensional call of duty. it makes at least as much sense as any other explanation too
|
# ? Jan 3, 2020 05:19 |
|
Nomnom Cookie posted:I love the idea of our entire universe being the waste heat from gods GPU while they’re playing 20-dimensional call of duty. it makes at least as much sense as any other explanation too hell is amd cpu cooling
|
# ? Jan 3, 2020 07:07 |
|
redleader posted:it's a pity that nlp and cv have fallen to the ml gremlins a lot of ml research is dumb and bad (almost as dumb and bad as "tech visionaries" ideas of what ml can do), amounting to a phd student twiddling various parameters until a model seems to learn something, with no deeper analysis or insight. however, it is not right to view ml as having been bad for computer vision and natural language processing, not least ml is pretty simple, so it is not a hard tool to apply. i've historically been in nlp myself, and while its been a weird decade ever since statistical n-gram methods broke the backs of the chomsky'ites things are looking better and better now. it has gotten clear which ml bits are indispensable and possible to analyse (e.g. word embeddings) and the field is just getting a lot more high-level thanks to ml bits handling a lot of the nitty-gritty (e.g. making it a lot easier to make stuff robust against grammatical mistakes). it has chilled interest in formal grammars and automata solutions a lot, but there was so much theoretical navel-gazing there that i don't think that's bad (and that was my primary research area). i am also currently pretty excited about a new research program from the research group i primarily affiliate with, where they are embracing the bias-soaking nature of ml to study gender bias in written text. that is, as a very first step, looking at the word embeddings for ostensibly non-gendered words in a given publication and see how orthogonal those feature vectors are from gendered vectors. a ton of tricky issues there (e.g. it matters a lot how the dimensionality reduction, i.e. the ml, works), but a bright new phd student (affiliated both with us at cs and the dept. of gender studies) is working on it, and i think it'll be extremely interesting research no matter the exact outcome. well, that's a long post. tl,dr: ml *in* research often good.
|
# ? Jan 3, 2020 12:57 |
|
Cybernetic Vermin posted:
it's definitely a handy hammer to have. not everything's a nail though. also lol at the legions of prospective PhD students whose life aspiration is to just twiddle knobs until they can finally get all those overpaid truckers fired
|
# ? Jan 3, 2020 13:25 |
|
https://twitter.com/dril_gpt2/status/1212951044143104001
|
# ? Jan 3, 2020 13:30 |
|
Cybernetic Vermin posted:a lot of ml research is dumb and bad (almost as dumb and bad as "tech visionaries" ideas of what ml can do), amounting to a phd student twiddling various parameters until a model seems to learn something, with no deeper analysis or insight. however, it is not right to view ml as having been bad for computer vision and natural language processing, not least ml is pretty simple, so it is not a hard tool to apply. is nlp still basically only done in english
|
# ? Jan 3, 2020 17:51 |
|
fart simpson posted:is nlp still basically only done in english the available good *structured* data is mostly english (e.g. the penn treebank and the amr semantics bank), and of course the papers are written in english. so to some extent it always remains the case. when doing formal grammar/automata work one habitually invokes random languages for having tricky grammatical structures. like swiss german for having cross-serial dependencies (they exist in english only in contrived cases, like "the coffee, cake, and biscuit cost $2, $3 and $4, respectively", but are a normal grammatical feature in swiss german). that is mostly a matter of motivating the navel-gazing however, keeping alive research in "mildly context-sensitive" grammatical formalisms for ages without them ever really demonstrating any practical usefulness. dumb statistical and ml models have luckily improved the situation a fair bit, since it doesn't matter nearly as much how much painstakingly cleaned and hand-annotated data you have. initially it was all n-gram work entirely devoid of grammar, but most research now mix things a bit, with a bit of grammatical structure both induced by statistical models and fed to other statistical models, in a way that generalizes pretty easily to most languages. it also now seems obvious that this is the only way to do it, the idea that humans have some inherent grammar which is not hopelessly intermingled with general intelligence seeming hopelessly naive. well, to me. this is plenty controversial.
|
# ? Jan 3, 2020 18:38 |
|
Sagebrush posted:don't mind me, just informing the director of the cornell CS department about machine learning "smartest guy in the room syndrome" is an epidemic among nerds
|
# ? Jan 3, 2020 21:37 |
|
https://twitter.com/joose_rajamaeki/status/1096397000520749056
|
# ? Jan 3, 2020 21:59 |
|
i was looking around for some software to segment chinese text into words and it seems anything with more than 90% accuracy is like cutting edge university research algorithms the most popular one people in china actually use is so bad that i, a non native speaker, can find errors in basically every sentence i throw it at fart simpson fucked around with this message at 02:26 on Jan 4, 2020 |
# ? Jan 4, 2020 02:18 |
|
i don't know chinese, but i presume that word segmentation is a mostly artificial idea in it. that is, it does not exist explicitly on the page and is not central in the mind of the people communicating in chinese (who will compound concepts as they see useful), making the problem pretty ill-defined. a lot of nlp problems suffer badly from an (elitist) normative view of language, e.g. grammar checkers defined entirely by a certain kind of person going "well that's not *proper* english" over and over. statistical methods are unlikely to do a better job segmenting text, rather they are used to scan through the text, extracting the relevant concepts and components (an abstract paraphrase in a sense), hopefully leaping over a bunch of hurdles like "incorrect" segmenting, compounding, typos, unexpected typography, etc. (in effect by looking at a larger context with a bit more "understanding"). ye olde nlp systems would just get this wrong in step 1 of a cascade of transformations, and then never recover. in many ways this is precisely the kind of thing the thread (rightly) hates, in that it takes something that used to be about a strict syntactic understanding of something important, and then throws ml at it muddling parts into a pile of incomprehensible statistics. the difference imho is that here the strict understandable solution never existed, at least not in any human brain, and that the ml bits are well-defined in both extent and purpose.
|
# ? Jan 4, 2020 12:09 |
|
idk maybe? i dont really know anything about linguistics but i do know you can give a chinese speaker a sentence and they can easily split it into component words. i mean it is more complicated in chinese because words can have sorta layered meanings in a way so maybe that means it's an artificial idea? but like, what seems to be the most popular tool that people in china use for this is a python library called jieba. i downloaded jieba and the very first sentence i threw into it was the first sentence from the chinese wikipedia article for "flower": 花是被子植物的繁殖器官 (flowers are the reproductive organs of angiosperms) if you asked any chinese reader they'd come up with: 花 / 是 / 被子植物 / 的 / 繁殖 / 器官 flowers / are / angiosperm / (posessive marker) / reproductive / organs jieba segmented this as the obviously nonsensical: 花是 / 被子植物 / 的 / 繁殖 / 器官 flowersare / angiosperm / (possessive marker) / reproductive / organs actually google translate does word segmentation too and also fails even worse than jieba on this sentence although it gets the overall meaning correct so i guess i see your point about the statistical methods thing
|
# ? Jan 4, 2020 12:47 |
|
should note that i am neither a linguist, a chinese speaker, nor all that successful a nlp researcher, so everything with a grain of salt. it is pretty interesting though. from a quick google 是 is common enough in compounds, but extremely common in this copular verb form. despite cynicism about the problem in general i'd have expected one of these dictionary-driven things to manage this much, maybe still some flag/parameter set particularly poorly?
|
# ? Jan 4, 2020 13:24 |
|
yeah i was surprised too. i think there's probably some settings i can adjust because i found a javascript reimplementation of jieba that gets this sentence correct. but yeah i'm playing around with a dataset i found of 300k news articles written in chinese and just this usage of "是" as a verb makes up 1% of the entire body of text of the dataset. it's probably either the 1st or 2nd most commonly used verb in chinese. google's segmenter got that word correct but totally butchered the segmentation of angiosperms into 3 separate words which would translate as like, "blanket seed plants" or something which i guess is kinda what angiosperms are anyway? especially because the word 被 can be a noun meaning blanket or a verb meaning to cover
|
# ? Jan 4, 2020 13:52 |
|
WASHINGTON (Reuters) - The Trump administration took measures on Friday to crimp exports of artificial intelligence software as part of a bid to keep sensitive technologies out of the hands of rival powers like China. i'm glad theyre finally doing the responsible thing and making machine learning illegal
|
# ? Jan 4, 2020 16:28 |
|
https://twitter.com/VPrasadMDMPH/status/1212840987363442689
|
# ? Jan 4, 2020 16:36 |
|
is the reason "the outcome you're looking for (presence of cancer) is not the same as outcome you're training on (diagnosis of cancer among women tested)"
|
# ? Jan 4, 2020 17:00 |
|
is breast cancer such a popular target because the image sets are large and well-labelled?
|
# ? Jan 5, 2020 04:51 |
|
perhaps it’s unique amongst screenings in that it uses radiographic data rather than values 🤷♂️
|
# ? Jan 5, 2020 06:21 |
|
Pinterest Mom posted:is the reason "the outcome you're looking for (presence of cancer) is not the same as outcome you're training on (diagnosis of cancer among women tested)" The outcome you're looking for is improved quality adjusted life years*, and the relationship between that and seeing a tumour in a screen is so complex that you have to measure that, not just if you can find a tumour. *Sometimes just a cost reduction is ok too.
|
# ? Jan 5, 2020 09:37 |
|
Pinterest Mom posted:is the reason "the outcome you're looking for (presence of cancer) is not the same as outcome you're training on (diagnosis of cancer among women tested)" according to the tweets, the outcome they're actually looking for is "distinguishing cancer that's aggressive enough to be dangerous but not so aggressive that it's uncurable". meanwhile, the algorithms are just being trained on "presence of cancer" apparently there's some concern these days that cancer screening may be driving overdiagnosis and overtreatment, because individual cancer cases vary wildly in behavior, ranging from "spreads rapidly and grows aggressively" to "just sits there growing a little and doesn't really do anything". the former extreme is often incurable even if treated early, while the latter extreme is fine even if it's left alone, and there's a sweet spot somewhere in the middle that gets by far the most benefit from treatment. so cancer researchers are now less interested in identifying the presence of cancer and more interested in finding a way to identify its aggressiveness level
|
# ? Jan 5, 2020 20:17 |
|
|
# ? May 25, 2024 14:19 |
|
Yes, that's the major issue. And the only way you can get the data that you'd need to train a classifier is to identify potentially harmful cancers and not treat them, which flies in the face of how medicine is practiced.
|
# ? Jan 5, 2020 20:23 |