Hey thanks for explaining this—makes sense to me and I think we are mostly in agreement. Using the proxy signal as a supervised learning target to recognize the learned target pattern in IT is a straightforward way to implement the matching, but probably not quite complete in practice. I suspect you also need to combine that with some strong priors to correctly carve out the target concept.
Consider the equivalent example of trying to train a highly accurate cat image detector given a dataset containing say 20% cats combined with a crappy low complexity proxy cat detector to provide the labels. Can you really bootstrap improve discriminative models in that way with non-trivial proxy label noise? I suspect that the key to making this work is using the powerful generative model of the cortex as a regularizer, so you train it to recognize images the proxy detector labels as cats that are also close to the generative model’s data manifold. If you then reoptimize (in evolutionary time) the proxy detector to leverage that I think it makes the problem much more tractable. The generative model allows you to make the learned model far more selective around the actual data manifold to increase robustness. In very simple vague terms the model would then be learning the combination of high proxy probability combined with low distance to the data manifold of examples from the critical training set.
Later if you then test OoD on vague non-cats (dogs, stuffed animals) not encountered in training that would confuse the simple proxy the learned model can reject those—even though it never saw them during critical training—simply because they are far from the generative manifold, and the learned model is ‘shrunk’ to fit that manifold.
I do agree the amygdala does seem like a good fit for the location of the learned symbol circuit, although at that point it raises the question of why not also just have the proxy in the amygdala? If the amygdala has the required inputs from LGN and/or V1 it would be my guess that it could also just colocate the innate proxy circuit. (I haven’t looked in the lit to see if those connections exist)
Also 6 seems required for the system to work as well in adulthood as it typically does, and yet also explain the out of distribution failures for imprinting etc. (Once the IT representation is learned you want to use that exclusively, as it should be strictly superior to the proxy circuit. This seems a little weird at first, but the)
The hope is that this same mechanism which seems well suited for handling imprinting also works for grounding sexual attraction (as an elaboration of imprinting) and then more complex concepts like representations of other’s emotions from facial expression, vocal tone, etc proxies, and then combining that with empathic simulation to ground a model of other’s values/utility for social game theory, altruism, etc.
The hope is that this same mechanism which seems well suited for handling imprinting also works for grounding sexual attraction (as an elaboration of imprinting) and then more complex concepts like representations of other’s emotions from facial expression, vocal tone, etc proxies, and then combining that with empathic simulation to ground a model of other’s values/utility for social game theory, altruism, etc.
Yes, that is my hope too! And the main thing I’m working on most days is trying to flesh out the details.
I do agree the amygdala does seem like a good fit for the location of the learned symbol circuit, although at that point it raises the question of why not also just have the proxy in the amygdala? If the amygdala has the required inputs from LGN and/or V1 it would be my guess that it could also just colocate the innate proxy circuit. (I haven’t looked in the lit to see if those connections exist)
For example, I claim that all the vision-related inputs to the amygdala have at some point passed through at least one locally-random filter stage (cf. “pattern separation” in neuro literature or “compressed sensing” in DSP literature). That’s perfectly fine if the amygdala is just going to use those inputs as feedstock for an SL model. SL models don’t need to know a priori which input neuron is representing which object-level pattern, because it’s going to learn the connections, so if there’s some randomness involved, it’s fine. But the randomness would be a very big problem if the amygdala needs to use those input signals to calculate a ground-truth proxy.
As another example, a ground-truth proxy requires zero adjustable parameters (because how would you adjust them?), whereas a learning algorithm does well with as many adjustable parameters as possible, more or less.
So I see these as very different algorithmic tasks—so different that I would expect them to wind up in different parts of the brain, just on general principles.
The amygdala is a hodgepodge grouping of nuclei, some of which are “really” (embryologically & evolutionarily) part of the cortex, and the rest of which are “really” part of the striatum (ref). So if we’re going to say that the cortex and striatum are dedicated to running within-lifetime learning algorithms (which I do say), then we should expect the amygdala to be in that same category too.
By contrast, SC is in the brainstem, and if you go far enough back, SC is supposedly a cousin of the part of the pre-vertebrate (e.g. amphioxus) nervous system that implements a simple “escape circuit” by triggering swimming when it detects a shadow—in other words, a part of the brain that triggers an innate reaction based on a “hardcoded” type of pattern in visual input. So it would make sense to say that the SC is still more-or-less doing those same types of calculations.
Hey thanks for explaining this—makes sense to me and I think we are mostly in agreement. Using the proxy signal as a supervised learning target to recognize the learned target pattern in IT is a straightforward way to implement the matching, but probably not quite complete in practice. I suspect you also need to combine that with some strong priors to correctly carve out the target concept.
Consider the equivalent example of trying to train a highly accurate cat image detector given a dataset containing say 20% cats combined with a crappy low complexity proxy cat detector to provide the labels. Can you really bootstrap improve discriminative models in that way with non-trivial proxy label noise? I suspect that the key to making this work is using the powerful generative model of the cortex as a regularizer, so you train it to recognize images the proxy detector labels as cats that are also close to the generative model’s data manifold. If you then reoptimize (in evolutionary time) the proxy detector to leverage that I think it makes the problem much more tractable. The generative model allows you to make the learned model far more selective around the actual data manifold to increase robustness. In very simple vague terms the model would then be learning the combination of high proxy probability combined with low distance to the data manifold of examples from the critical training set.
Later if you then test OoD on vague non-cats (dogs, stuffed animals) not encountered in training that would confuse the simple proxy the learned model can reject those—even though it never saw them during critical training—simply because they are far from the generative manifold, and the learned model is ‘shrunk’ to fit that manifold.
I do agree the amygdala does seem like a good fit for the location of the learned symbol circuit, although at that point it raises the question of why not also just have the proxy in the amygdala? If the amygdala has the required inputs from LGN and/or V1 it would be my guess that it could also just colocate the innate proxy circuit. (I haven’t looked in the lit to see if those connections exist)
Also 6 seems required for the system to work as well in adulthood as it typically does, and yet also explain the out of distribution failures for imprinting etc. (Once the IT representation is learned you want to use that exclusively, as it should be strictly superior to the proxy circuit. This seems a little weird at first, but the)
The hope is that this same mechanism which seems well suited for handling imprinting also works for grounding sexual attraction (as an elaboration of imprinting) and then more complex concepts like representations of other’s emotions from facial expression, vocal tone, etc proxies, and then combining that with empathic simulation to ground a model of other’s values/utility for social game theory, altruism, etc.
Yes, that is my hope too! And the main thing I’m working on most days is trying to flesh out the details.
For example, I claim that all the vision-related inputs to the amygdala have at some point passed through at least one locally-random filter stage (cf. “pattern separation” in neuro literature or “compressed sensing” in DSP literature). That’s perfectly fine if the amygdala is just going to use those inputs as feedstock for an SL model. SL models don’t need to know a priori which input neuron is representing which object-level pattern, because it’s going to learn the connections, so if there’s some randomness involved, it’s fine. But the randomness would be a very big problem if the amygdala needs to use those input signals to calculate a ground-truth proxy.
As another example, a ground-truth proxy requires zero adjustable parameters (because how would you adjust them?), whereas a learning algorithm does well with as many adjustable parameters as possible, more or less.
So I see these as very different algorithmic tasks—so different that I would expect them to wind up in different parts of the brain, just on general principles.
The amygdala is a hodgepodge grouping of nuclei, some of which are “really” (embryologically & evolutionarily) part of the cortex, and the rest of which are “really” part of the striatum (ref). So if we’re going to say that the cortex and striatum are dedicated to running within-lifetime learning algorithms (which I do say), then we should expect the amygdala to be in that same category too.
By contrast, SC is in the brainstem, and if you go far enough back, SC is supposedly a cousin of the part of the pre-vertebrate (e.g. amphioxus) nervous system that implements a simple “escape circuit” by triggering swimming when it detects a shadow—in other words, a part of the brain that triggers an innate reaction based on a “hardcoded” type of pattern in visual input. So it would make sense to say that the SC is still more-or-less doing those same types of calculations.