Thanks for writing this up! I love this topic and I think everyone should talk about it more!
On cortical uniformity:
My take (largely pro-cortical-uniformity) is in the first part of this post. I never did find better or more recent sources than those two book chapters, but have gradually grown a bit more confident in what I wrote for various more roundabout reasons. See also my more recent post here.
On the similarity of neocortical algorithms to modern ML:
I am pretty far on the side of “neocortical algorithms are different than today’s most popular ANNs”, i.e. I think that both are “general” but I reached that conclusion independently for each. If I had to pick one difference, I would say it’s that neocortical algorithms use analysis-by-synthesis—i.e., searching through a space of generative models for one that matches the data—and relatedly planning by probabilistic inference. This type of algorithm is closely related to probabilistic programming and PGMs—see, for example, Dileep George’s work. In today’s popular ANNs, this kind of analysis-by-synthesis and planning is either entirely absent or arguably present as a kind of add-on, but it’s not a core principle of the algorithm. This is obviously not the only difference between neocortical algorithms and mainstream ANNs. Some are really obvious: the neocortex doesn’t use backprop! More controversially, I don’t even think the neocortex even uses real-valued variables in its models, as opposed to booleans—well, I would want to put some caveats on that, but I believe something in that general vicinity.
So basically, I think the algorithms most similar to the neocortex are a bit of a backwater within mainstream ML research, with essentially no SOTA results on popular benchmarks … which makes it a bit awkward for me to argue that this is the corner from which we will get AGI. Oh well, that’s what I believe anyway!
On predictive coding:
Depending on context, I’ll say I’m either an enthusiastic proponent or strong critic of predictive coding. Really, I have a particular version of it I like, described here. I guess I disagree with Friston, Clark, etc. most strongly in that they argue that predictive coding is a helpful way to think about the operation of the whole brain, whereas I only find it helpful when discussing the neocortex in particular. Again see here for my take on the rest of the brain. My other primary disagreement is that I don’t see “minimizing prediction error” as a foundational principle, but rather an incidental consequence of properly-functioning neocortical algorithms under certain conditions. (Specifically, from the fact that the neocortex will discard generative models that get repeatedly falsified.)
I think there is a lot of evidence for the neocortex having a zoo of generative models that can be efficiently searched through and glued together, not only for low-level perception but also for high-level stuff. I guess the evidence I think about is mostly introspective though. For example, this book review about therapy has (in my biased opinion) an obvious and direct correspondence with how I think the neocortex processes generative models.
Hmm, well I should say that my impression is that there’s frustrating lack of consensus on practically everything in systems neuroscience, but “brain doesn’t do backpropagation” seems about as close to consensus as anything. This Yoshua Bengio paper has a quick summary of the reasons:
The following difficulties can be raised regarding the biological plausibility of back-propagation: (1) the back-propagation computation (coming down from the output layer to lower hidden layers) is purely linear, whereas biological neurons interleave linear and non-linear operations, (2) if the feedback paths known to exist in the brain (with their own synapses and maybe their own neurons) were used to propagate credit assignment by backprop, they would need precise knowledge of the derivatives of the non-linearities at the operating point used in the corresponding feedforward computation on the feedforward path, (3) similarly, these feedback paths would have to use exact symmetric weights (with the same connectivity, transposed) of the feedforward connections, (4) real neurons communicate by (possibly stochastic) binary values (spikes), not by clean continuous values, (5) the computation would have to be precisely clocked to alternate between feedforward and back-propagation phases (since the latter needs the former’s results), and (6) it is not clear where the output targets would come from.
(UPDATE 1 YEAR LATER: after reading more Randall O’Reilly, I am now pretty convinced that error-driven learning is one aspect of neocortex learning, and I’m open-minded to the possibility that the errors can propagate up at least one or maybe two layers of hierarchy. Beyond that, I dunno, but brain hierarchies don’t go too much deeper than that anyway, I think.)
Then you ask the obvious followup question: “if not backprop, then what?” Well, this is unknown and controversial; the Yoshua Bengio paper above offers its own answer which I am disinclined to believe (but want to think about more). Of course there is more than one right answer; indeed, my general attitude is that if someone tells me a biologically-plausible learning mechanism, it’s probably in use somewhere in the brain, even if it’s only playing a very minor and obscure role in regulating heart rhythms or whatever, just because that’s the way evolution tends to work.
But anyway, I expect that the lion’s share of learning in the neocortex comes from just a few mechanisms. My favorite example is probably high-order sequence memory learning. There’s a really good story for that:
At the lowest level—biochemistry—we have Why Neurons Have Thousands of Synapses, a specific and biologically-plausible mechanism for the creation and deactivation of synapses.
At the middle level—algorithms—we have papers like this and this and this where Dileep George takes pretty much that exact algorithm (which he calls “cloned hidden markov model”), abstracted away from the biological implementation details, and shows that it displays all sorts of nice behavior in practice.
At the highest level—behavior—we have observable human behaviors, like the fact that we can hear a snippet of a song, and immediately know how that snippet continues, but still have trouble remembering the song title. And no matter how well we know a song, we cannot easily sing the notes in reverse order. Both of these are exactly as expected from the properties of this sequence memory algorithm.
This sequence memory thing obviously isn’t the whole story of what the neocortex does, but it fits together so well, I feel like it has to be one of the ingredients. :-)
Thanks for writing this up! I love this topic and I think everyone should talk about it more!
On cortical uniformity:
My take (largely pro-cortical-uniformity) is in the first part of this post. I never did find better or more recent sources than those two book chapters, but have gradually grown a bit more confident in what I wrote for various more roundabout reasons. See also my more recent post here.
On the similarity of neocortical algorithms to modern ML:
I am pretty far on the side of “neocortical algorithms are different than today’s most popular ANNs”, i.e. I think that both are “general” but I reached that conclusion independently for each. If I had to pick one difference, I would say it’s that neocortical algorithms use analysis-by-synthesis—i.e., searching through a space of generative models for one that matches the data—and relatedly planning by probabilistic inference. This type of algorithm is closely related to probabilistic programming and PGMs—see, for example, Dileep George’s work. In today’s popular ANNs, this kind of analysis-by-synthesis and planning is either entirely absent or arguably present as a kind of add-on, but it’s not a core principle of the algorithm. This is obviously not the only difference between neocortical algorithms and mainstream ANNs. Some are really obvious: the neocortex doesn’t use backprop! More controversially, I don’t even think the neocortex even uses real-valued variables in its models, as opposed to booleans—well, I would want to put some caveats on that, but I believe something in that general vicinity.
So basically, I think the algorithms most similar to the neocortex are a bit of a backwater within mainstream ML research, with essentially no SOTA results on popular benchmarks … which makes it a bit awkward for me to argue that this is the corner from which we will get AGI. Oh well, that’s what I believe anyway!
On predictive coding:
Depending on context, I’ll say I’m either an enthusiastic proponent or strong critic of predictive coding. Really, I have a particular version of it I like, described here. I guess I disagree with Friston, Clark, etc. most strongly in that they argue that predictive coding is a helpful way to think about the operation of the whole brain, whereas I only find it helpful when discussing the neocortex in particular. Again see here for my take on the rest of the brain. My other primary disagreement is that I don’t see “minimizing prediction error” as a foundational principle, but rather an incidental consequence of properly-functioning neocortical algorithms under certain conditions. (Specifically, from the fact that the neocortex will discard generative models that get repeatedly falsified.)
I think there is a lot of evidence for the neocortex having a zoo of generative models that can be efficiently searched through and glued together, not only for low-level perception but also for high-level stuff. I guess the evidence I think about is mostly introspective though. For example, this book review about therapy has (in my biased opinion) an obvious and direct correspondence with how I think the neocortex processes generative models.
That doesn’t seem obvious to me. Could you point to some evidence, or flesh out your model for how data influences neural connections?
Hmm, well I should say that my impression is that there’s frustrating lack of consensus on practically everything in systems neuroscience, but “brain doesn’t do backpropagation” seems about as close to consensus as anything. This Yoshua Bengio paper has a quick summary of the reasons:
(UPDATE 1 YEAR LATER: after reading more Randall O’Reilly, I am now pretty convinced that error-driven learning is one aspect of neocortex learning, and I’m open-minded to the possibility that the errors can propagate up at least one or maybe two layers of hierarchy. Beyond that, I dunno, but brain hierarchies don’t go too much deeper than that anyway, I think.)
Then you ask the obvious followup question: “if not backprop, then what?” Well, this is unknown and controversial; the Yoshua Bengio paper above offers its own answer which I am disinclined to believe (but want to think about more). Of course there is more than one right answer; indeed, my general attitude is that if someone tells me a biologically-plausible learning mechanism, it’s probably in use somewhere in the brain, even if it’s only playing a very minor and obscure role in regulating heart rhythms or whatever, just because that’s the way evolution tends to work.
But anyway, I expect that the lion’s share of learning in the neocortex comes from just a few mechanisms. My favorite example is probably high-order sequence memory learning. There’s a really good story for that:
At the lowest level—biochemistry—we have Why Neurons Have Thousands of Synapses, a specific and biologically-plausible mechanism for the creation and deactivation of synapses.
At the middle level—algorithms—we have papers like this and this and this where Dileep George takes pretty much that exact algorithm (which he calls “cloned hidden markov model”), abstracted away from the biological implementation details, and shows that it displays all sorts of nice behavior in practice.
At the highest level—behavior—we have observable human behaviors, like the fact that we can hear a snippet of a song, and immediately know how that snippet continues, but still have trouble remembering the song title. And no matter how well we know a song, we cannot easily sing the notes in reverse order. Both of these are exactly as expected from the properties of this sequence memory algorithm.
This sequence memory thing obviously isn’t the whole story of what the neocortex does, but it fits together so well, I feel like it has to be one of the ingredients. :-)