Agreed. I didn’t mean to imply that you thought otherwise.
“just” a memory system + learning algorithm—with a dismissive tone of voice on the “just”: [...]
I apologize for how that came across. I had no intention of being dismissive. When I respond to a post or comment, I typically try to frame what I say for a typical reader as much for the original poster. In this case, I had a sense that a typical reader could get the wrong impression about how the neocortex does what it does if the only sorts of memory systems and learning algorithms that came to mind were things like a blank computer drive and stochastic gradient descent on a feed-forward neural network.
You are absolutely right that the neocortex is equipped to learn from scratch, starting out generating garbage and gradually learning to make sense of the world/body/other-brain-regions/etc., which can legitimately be described as a memory system + learning algorithm. I just wanted anyone reading to appreciate that, at least in biological brains, there is no clean separation between learning algorithm and memory, but that the neocortex’s role as a hierarchical, dynamic, generative simulator is precisely what makes learning from scratch so efficient, since it only has to correlate its intrinsic dynamics with the statistically similar dynamics of learned experience.
I’m sure that there are vastly more ways of implementing learning-from-scratch, maybe some much better ways in fact, and I realize that the exact implementation is probably not relevant to the arguments you plan to make in this sequence. I just feel that a basic understanding of what a real learning-from-scratch system looks like could help drive intuitions of what is possible.
Generative models can be learned from scratch [...]
Indeed, but of course including their own particular structural priors.
Dynamics is not unrelated to neural architecture: [...]
Well, what is a recurrent neural network after all but an arbitrarily deep feed-forward neural network with shared weights across layers? My comment on cortical waves was just to point out a clever way that the brain learns to organize its cortical maps and primes them to expect causality to operate (mostly) locally in space and time. For example, orientation columns in V1 may be adjacent to each other because similarly oriented edges (from moving objects) were consistently presented to the same part of the visual field close in time, such that traveling waves of excitation would teach pyramidal cell A to learn orientation A at time A and then teach neuron B to learn orientation B at time B.
Lottery ticket hypothesis: [...]
“Lottery tickets” (i.e., subnetworks with random initializations that just so happen to give them the right inductive bias to generalize well from the training data for a particular task) probably occur in the brain as much as in current deep learning architectures. However, the issue in DL is that the rest of the network often fails to contribute much to test performance beyond what the lottery ticket subnetwork was able to achieve, as though there was a chasm in model space that the other subnetworks were unable to cross to reach a solution.
Evolution seems to have found a way around this problem, at least by the time the neocortex came along, in the sense that the brain seems adept at giving every subnetwork a useful job.
Again, I think the nature of the neocortex as a generative simulator is what makes this feasible with sparse training data. The structure and dynamics of the neocortex have enough in common statistically with the structure and dynamics of the natural world that it is easy for it to align with experience. In contrast, the structure and (lack of) dynamics of current DL systems makes them more brittle when trying to make sense of the natural world.
All that being said, I realize that my comments might be taking things down a rabbit hole. But I appreciate your feedback and look forward to seeing your perspective fleshed out more in the rest of this sequence.
Agreed. I didn’t mean to imply that you thought otherwise.
I apologize for how that came across. I had no intention of being dismissive. When I respond to a post or comment, I typically try to frame what I say for a typical reader as much for the original poster. In this case, I had a sense that a typical reader could get the wrong impression about how the neocortex does what it does if the only sorts of memory systems and learning algorithms that came to mind were things like a blank computer drive and stochastic gradient descent on a feed-forward neural network.
You are absolutely right that the neocortex is equipped to learn from scratch, starting out generating garbage and gradually learning to make sense of the world/body/other-brain-regions/etc., which can legitimately be described as a memory system + learning algorithm. I just wanted anyone reading to appreciate that, at least in biological brains, there is no clean separation between learning algorithm and memory, but that the neocortex’s role as a hierarchical, dynamic, generative simulator is precisely what makes learning from scratch so efficient, since it only has to correlate its intrinsic dynamics with the statistically similar dynamics of learned experience.
I’m sure that there are vastly more ways of implementing learning-from-scratch, maybe some much better ways in fact, and I realize that the exact implementation is probably not relevant to the arguments you plan to make in this sequence. I just feel that a basic understanding of what a real learning-from-scratch system looks like could help drive intuitions of what is possible.
Indeed, but of course including their own particular structural priors.
Well, what is a recurrent neural network after all but an arbitrarily deep feed-forward neural network with shared weights across layers? My comment on cortical waves was just to point out a clever way that the brain learns to organize its cortical maps and primes them to expect causality to operate (mostly) locally in space and time. For example, orientation columns in V1 may be adjacent to each other because similarly oriented edges (from moving objects) were consistently presented to the same part of the visual field close in time, such that traveling waves of excitation would teach pyramidal cell A to learn orientation A at time A and then teach neuron B to learn orientation B at time B.
“Lottery tickets” (i.e., subnetworks with random initializations that just so happen to give them the right inductive bias to generalize well from the training data for a particular task) probably occur in the brain as much as in current deep learning architectures. However, the issue in DL is that the rest of the network often fails to contribute much to test performance beyond what the lottery ticket subnetwork was able to achieve, as though there was a chasm in model space that the other subnetworks were unable to cross to reach a solution. Evolution seems to have found a way around this problem, at least by the time the neocortex came along, in the sense that the brain seems adept at giving every subnetwork a useful job.
Again, I think the nature of the neocortex as a generative simulator is what makes this feasible with sparse training data. The structure and dynamics of the neocortex have enough in common statistically with the structure and dynamics of the natural world that it is easy for it to align with experience. In contrast, the structure and (lack of) dynamics of current DL systems makes them more brittle when trying to make sense of the natural world.
All that being said, I realize that my comments might be taking things down a rabbit hole. But I appreciate your feedback and look forward to seeing your perspective fleshed out more in the rest of this sequence.