Thanks for the great back-and-forth! Did you guys see the first author’s comment? What are the main updates you’ve had re this debate now that it’s been a couple years?
I have not thought about these issues too much in the intervening time. Re-reading the discussion, it sounds plausible to me that the evidence is compatible with roughly brain-sized NNs being roughly as data-efficient as humans. Daniel claims:
If we assume for humans it’s something like 1 second on average (because our brains are evaluating-and-updating weights etc. on about that timescale) then we have a mere 10^9 data points, which is something like 4 OOMs less than the scaling laws would predict. If instead we think it’s longer, then the gap in data-efficiency grows.
I think the human observation-reaction loop is closer to ten times that fast, which results in a 3 OOM difference. This sounds like a gap which is big, but could potentially be explained by architectural differences or other factors, thus preserving a possibility like “human learning is more-or-less gradient descent”. Without articulating the various hypotheses in more detail, this doesn’t seem like strong evidence in any direction.
Not before now. I think the comment had a relatively high probability in my world, where we still have a poor idea of what algorithm the brain is running, and a low probability in Daniel’s world, where evidence is zooming in on predictive coding as the correct hypothesis. Some quotes which I think support my hypothesis better than Daniel’s:
If we (speculatively) associate alpha/beta waves with iterations in predictive coding,
This illustrates how we haven’t pinned down the mechanical parts of algorithms. What this means is that speculation about the algorithm of the brain isn’t yet causally grounded—it’s not as if we’ve been looking at what’s going on and can build up a firm abstract picture of the algorithm from there, the way you might successfully infer rules of traffic by watching a bunch of cars. Instead, we have a bunch of different kinds of information at different resolutions, which we are still trying to stitch together into a coherent picture.
While it’s often claimed that predictive coding is biologically plausible and the best explanation for cortical function, this isn’t really all that clear cut. Firstly, predictive coding itself actually has a bunch of implausibilities. Predictive coding suffers from the same weight transport problem as backprop, and secondly it requires that the prediction and prediction error neurons are 1-1 (i.e. one prediction error neuron for every prediction neuron) which is way too precise connectivity to actually happen in the brain. I’ve been working on ways to adapt predictive coding around these problems as in this paper (https://arxiv.org/pdf/2010.01047.pdf), but this work is currently very preliminary and its unclear if the remedies proposed here will scale to larger architectures.
This directly addresses the question of how clear-cut things are right now, while also pointing to many concrete problems the predictive coding hypothesis faces. The comment continues on that subject for several more paragraphs.
The brain being able to do backprop does not mean that the brain is just doing gradient descent like we do to train ANNs. It is still very possible (in my opinion likely) that the brain could be using a more powerful algorithm for inference and learning—just one that has backprop as a subroutine. Personally (and speculatively) I think it’s likely that the brain performs some highly parallelized advanced MCMC algorithm like Hamiltonian MCMC where each neuron or small group of neurons represents a single ‘particle’ following its own MCMC path. This approach naturally uses the stochastic nature of neural computation to its advantage, and allows neural populations to represent the full posterior distribution rather than just a point prediction as in ANNs.
This paragraph supports my picture that hypotheses about what the brain is doing are still largely being pulled from ML, which speaks against the hypothesis of a growing consensus about what the brain is doing, and also illustrates the lack of direct looking-at-the-brain-and-reporting-what-we-see.
On the other hand, it seems quite plausible that this particular person is especially enthusiastic about analogizing ML algorithms and the brain, since that is what they work on; in which case, this might not be so much evidence about the state of neuroscience as a whole. Some neuroscientist could come in and tell us why all of this stuff is bunk, or perhaps why Predictive Coding is right and all of the other ideas are wrong, or perhaps why the MCMC thing is right and everything else is wrong, etc etc.
But I take it that Daniel isn’t trying to claim that there is a consensus in the field of neuroscience; rather, he’s probably trying to claim that the actual evidence is piling up in favor of predictive coding. I don’t know. Maybe it is. But this particular domain expert doesn’t seem to think so, based on the SSC comment.
Thanks for the great back-and-forth! Did you guys see the first author’s comment? What are the main updates you’ve had re this debate now that it’s been a couple years?
I have not thought about these issues too much in the intervening time. Re-reading the discussion, it sounds plausible to me that the evidence is compatible with roughly brain-sized NNs being roughly as data-efficient as humans. Daniel claims:
I think the human observation-reaction loop is closer to ten times that fast, which results in a 3 OOM difference. This sounds like a gap which is big, but could potentially be explained by architectural differences or other factors, thus preserving a possibility like “human learning is more-or-less gradient descent”. Without articulating the various hypotheses in more detail, this doesn’t seem like strong evidence in any direction.
Not before now. I think the comment had a relatively high probability in my world, where we still have a poor idea of what algorithm the brain is running, and a low probability in Daniel’s world, where evidence is zooming in on predictive coding as the correct hypothesis. Some quotes which I think support my hypothesis better than Daniel’s:
This illustrates how we haven’t pinned down the mechanical parts of algorithms. What this means is that speculation about the algorithm of the brain isn’t yet causally grounded—it’s not as if we’ve been looking at what’s going on and can build up a firm abstract picture of the algorithm from there, the way you might successfully infer rules of traffic by watching a bunch of cars. Instead, we have a bunch of different kinds of information at different resolutions, which we are still trying to stitch together into a coherent picture.
This directly addresses the question of how clear-cut things are right now, while also pointing to many concrete problems the predictive coding hypothesis faces. The comment continues on that subject for several more paragraphs.
This paragraph supports my picture that hypotheses about what the brain is doing are still largely being pulled from ML, which speaks against the hypothesis of a growing consensus about what the brain is doing, and also illustrates the lack of direct looking-at-the-brain-and-reporting-what-we-see.
On the other hand, it seems quite plausible that this particular person is especially enthusiastic about analogizing ML algorithms and the brain, since that is what they work on; in which case, this might not be so much evidence about the state of neuroscience as a whole. Some neuroscientist could come in and tell us why all of this stuff is bunk, or perhaps why Predictive Coding is right and all of the other ideas are wrong, or perhaps why the MCMC thing is right and everything else is wrong, etc etc.
But I take it that Daniel isn’t trying to claim that there is a consensus in the field of neuroscience; rather, he’s probably trying to claim that the actual evidence is piling up in favor of predictive coding. I don’t know. Maybe it is. But this particular domain expert doesn’t seem to think so, based on the SSC comment.