Richard_Ngo comments on Brains and backprop: a key timeline crux

Richard_Ngo 11 Mar 2018 1:50 UTC
10 points
Thanks for the excellent post, Jacob. I think you might be placing too much emphasis on learning algorithms as opposed to knowledge representations, though. It seems very likely to me that at least one theoretical breakthrough in knowledge representation will be required to make significant progress (for one argument along these lines, see Pearl 2018). Even if it turns out that the brain implements backpropagation, that breakthrough will still be a bottleneck. In biological terms, I’m thinking of the knowledge representations as analogous to innate aspects of cognition impressed upon us by evolution, and learning algorithms as what an individual human uses to learn from their experiences.
Two examples which suggest that the former are more important than the latter. The first is the “poverty of stimulus” argument in linguistics: that children simply don’t hear enough words to infer language from first principles. This suggests that ingrained grammatical instincts are doing most of the work in narrowing down what the sentences they hear mean. Even if we knew that the kids were doing backpropagation whenever they heard new sentences, that doesn’t tell us much about how that grammatical knowledge works, because you can do backpropagation on lots of different things. (You know more psycholinguistics than I do, though, so let me know if I’m misrepresenting anything).
Second example: Hinton argues in this talk that CNNs don’t create representations of three-dimensional objects from two-dimensional pictures in the same way as the human brain does; that’s why he invented capsule networks, which (he claims) do use such representations. Both capsules and CNNs use backpropagation, but the architecture of capsules is meant to be an extra “secret sauce”. Seeing whether they end up working well on vision tasks will be quite interesting, because vision is better-understood and easier than abstract thought (for example, it’s very easy to theoretically specify how to translate between any two visual perspectives, it’s just a matrix multiplication).
Lastly, as a previous commentator pointed out, it’s not backpropagation but rather gradient descent which seems like the important factor. More specifically, recent research suggests that Stochastic Gradient Descent leads to particularly good outcomes, for interesting theoretical reasons (see Zhang 2017 and this blog post by Huzcar). Since the brain does online learning, if it’s doing gradient descent then it’s doing a variant of SGD. I discuss why SGD works well in more detail in the first section of this blog post.