People came up with convnets because fully-connected, randomly initialized networks were not great at image classification and we needed some inductive bias in the form of a locality constraint to learn in a reasonable time. That’s the point I wanted to make.
I’m pretty confused here. To me, that doesn’t seem to support your point, which suggests that one of us is confused, or else I don’t understand your point.
Specifically: If I switch from a fully-connected DNN to a ConvNet, I’m switching from one learning-from-scratch algorithm to a different learning-from-scratch algorithm.
I feel like your perspective is that {inductive biases, non-learning-from-scratch} are a pair that go inexorably together, and you are strongly in favor of both, and I am strongly opposed to both. But that’s not right: they don’t inexorably go together. The ConvNet example proves it.
I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don’t think those two things are in opposition to each other.
Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain.
I think you’re misunderstanding me. Random chunks of matter do not learn language, but the neocortex does. There’s a reason for that—aspects of the neocortex are designed by evolution to do certain computations that result in the useful functionality of learning language (as an example). There is a reason that these particular computations, unlike the computations performed by random chunks of matter, are able to learn language. And this reason can be described in purely computational terms—”such-and-such process performs a kind of search over this particular space, and meanwhile this other process breaks down the syntactic tree using such-and-such algorithm…”, I dunno, whatever. The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations.
Whatever that explanation is, it’s a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results.
In particular, our code will be just as data-efficient as the neocortex is, and it will make the same types of mistakes in the same types of situations as the neocortex does, etc. etc.
when you record activity from neurons in the cortex of an animal that had zero visual experience prior to the experiment (lid-suture), they are still orientation-selective
is that true even if there haven’t been any retinal waves?
Yeah, the feeling’s mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!
I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don’t think those two things are in opposition to each other.
A couple of thoughts:
Yes, I agree that the inductive bias (/genetically hardcoded information) can live in different components: the learning rule, the network architecture, or the initialization of the weights. So learning-from-scratch is logically compatible with inductive biases—we can just put all the inductive bias into the learning rule and the architecture and none in the weights.
But from the architecture and the learning rule, the hardcoded info can enter into the weights very rapidly (f.e. first step of the learning rule: set all the weights to the values appropriate for an adult brain. Or, more realistically, a ConvNet architecture can be learned from a DNN by setting a lot of connections to zero). Therefore I don’t see what it could buy you to assume the weights to be free of inductive bias.
There might also be a case that in the actual biological brain the weights are not initialized randomly. See f.e. this work on clonally related neurons.
Something that is not appreciated a lot outside of neuroscience: “Learning” in the brain is as much a structural process as it is a “changing weights” process. This is particularly true throughout development but also into adulthood—activity-dependent learning rules do not only adjust the weights of connections, but they can also prune bad connections and add new connections. The brain simultaneously produces activity, which induces plasticity, which changes the circuit, which produces slightly different activity in turn.
The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations.
That sounds a lot more like cognitive science than neuroscience! This is completely fine (I did my undergrad in CogSci), but it requires a different set of arguments from the ones you are providing in your post, I think. If you want to make a CogSci case for learning from scratch, then your argument has to be a lot more constructive (i.e. literally walk us through the steps of how your proposed system can learn all/a lot of what humans can learn). Either you take a look at what is there in the brain (subplate, synapses, …), describe how these things interact, and (correctly) infer that it’s sufficient to produce a mind (this is the neuroscience strategy); Or you propose an abstract system, demonstrate that it can do the same thing as the mind, and then demonstrate that the components of the abstract system can be identified with the biological brain (this is the CogSci strategy). I think you’re skipping step two of the CogSci strategy.
Whatever that explanation is, it’s a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results.
I’m on board with that. I anticipate that the design spec will contain (the equivalent of) a ton of hardcoded genetic stuff also for the “learning subsystem”/cortex. From a CogSci perspective, I’m willing to assume that this genetic stuff could be in the learning rule and the architecture, not in the initial weights. From a neuroscience perspective, I’m not convinced that’s the case.
is that true even if there haven’t been any retinal waves?
Thanks for your interesting comments!
I’m pretty confused here. To me, that doesn’t seem to support your point, which suggests that one of us is confused, or else I don’t understand your point.
Specifically: If I switch from a fully-connected DNN to a ConvNet, I’m switching from one learning-from-scratch algorithm to a different learning-from-scratch algorithm.
I feel like your perspective is that {inductive biases, non-learning-from-scratch} are a pair that go inexorably together, and you are strongly in favor of both, and I am strongly opposed to both. But that’s not right: they don’t inexorably go together. The ConvNet example proves it.
I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don’t think those two things are in opposition to each other.
I think you’re misunderstanding me. Random chunks of matter do not learn language, but the neocortex does. There’s a reason for that—aspects of the neocortex are designed by evolution to do certain computations that result in the useful functionality of learning language (as an example). There is a reason that these particular computations, unlike the computations performed by random chunks of matter, are able to learn language. And this reason can be described in purely computational terms—”such-and-such process performs a kind of search over this particular space, and meanwhile this other process breaks down the syntactic tree using such-and-such algorithm…”, I dunno, whatever. The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations.
Whatever that explanation is, it’s a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results.
In particular, our code will be just as data-efficient as the neocortex is, and it will make the same types of mistakes in the same types of situations as the neocortex does, etc. etc.
is that true even if there haven’t been any retinal waves?
Yeah, the feeling’s mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!
A couple of thoughts:
Yes, I agree that the inductive bias (/genetically hardcoded information) can live in different components: the learning rule, the network architecture, or the initialization of the weights. So learning-from-scratch is logically compatible with inductive biases—we can just put all the inductive bias into the learning rule and the architecture and none in the weights.
But from the architecture and the learning rule, the hardcoded info can enter into the weights very rapidly (f.e. first step of the learning rule: set all the weights to the values appropriate for an adult brain. Or, more realistically, a ConvNet architecture can be learned from a DNN by setting a lot of connections to zero). Therefore I don’t see what it could buy you to assume the weights to be free of inductive bias.
There might also be a case that in the actual biological brain the weights are not initialized randomly. See f.e. this work on clonally related neurons.
Something that is not appreciated a lot outside of neuroscience: “Learning” in the brain is as much a structural process as it is a “changing weights” process. This is particularly true throughout development but also into adulthood—activity-dependent learning rules do not only adjust the weights of connections, but they can also prune bad connections and add new connections. The brain simultaneously produces activity, which induces plasticity, which changes the circuit, which produces slightly different activity in turn.
That sounds a lot more like cognitive science than neuroscience! This is completely fine (I did my undergrad in CogSci), but it requires a different set of arguments from the ones you are providing in your post, I think. If you want to make a CogSci case for learning from scratch, then your argument has to be a lot more constructive (i.e. literally walk us through the steps of how your proposed system can learn all/a lot of what humans can learn). Either you take a look at what is there in the brain (subplate, synapses, …), describe how these things interact, and (correctly) infer that it’s sufficient to produce a mind (this is the neuroscience strategy); Or you propose an abstract system, demonstrate that it can do the same thing as the mind, and then demonstrate that the components of the abstract system can be identified with the biological brain (this is the CogSci strategy). I think you’re skipping step two of the CogSci strategy.
I’m on board with that. I anticipate that the design spec will contain (the equivalent of) a ton of hardcoded genetic stuff also for the “learning subsystem”/cortex. From a CogSci perspective, I’m willing to assume that this genetic stuff could be in the learning rule and the architecture, not in the initial weights. From a neuroscience perspective, I’m not convinced that’s the case.
Blocking retinal waves messes up the cortex pretty substantially (same as if the animal were born without eyes). There is the beta-2 knockout mouse, which has retinal waves but with weaker spatiotemporal correlations. As a consequence beta-2 mice fail to track moving gratings and have disrupted receptive fields.