The evidence seems pretty clear that the SC controls unconscious saccades/gazes. Given that background it makes perfect sense the SC is also a good location for simple crude innate detectors which bias saccades towards important targets: especially for infants, because human infants are barely functionally conscious at birth and so in the beginning the SC may have complete control. But gradually the higher ‘conscious loops’ involving BG, PFC and various other modules begin to take more control through the FEF (although not always of course).
That all seems compatible with your evidence—I also remember reading that there are central pattern generators which actually start training the visual cortex in the womb on simple face like patterns. But I believe in a general rule of three for the brain: whenever you find evidence that the brain is doing something two different ways that both seem functionally correct, it’s probably using both of those methods and some third one you haven’t thought of yet. And SC having innate circuits to bias saccades seems likely.
But that same evidence doesn’t show the SC is much involved in higher level conscious decisions about whether to stare at a painting or listen to a song for a few minutes vs eating ice cream. That is all reward-shaped high level planning involving the various known dopaminergic decision pathways combined with serotogenic (and other) pathways that feed into general info-value.
Likewise, I wasn’t sure before, but my current impression is that you do NOT claim that info-value is a grand unified theory that completely explains what we are looking at at any given moment.
I do claim it is the likely grand unified theory that most completely explains conscious decisions about info consumption choices in adults—and the evidence from which I sampled earlier is fairly extensive IMHO; whereas innate SC circuits explain infant gazes (in humans, SC probably has a larger role in older smaller brained vertebrates). Moreover, if we generalize from SC to other similar subcortical structures, I do agree those together mostly control the infant for the first few years—as the higher level loops which conscious thinking depend on all require significant training.
Also—as I mentioned earlier I agree that fear of heights is probably innate, simple food taste biases are innate, obviously sexual attraction has some innate bootstrapping, etc so I’m open to the idea there is some landscape biasing in theory, but clearly the SC is unlikely to be involved in food taste shaping, and I don’t think you have shown much convincing evidence it is involved in visual taste shaping. But clearly there most be some innate visual shaping for at least sexual attraction—so evidence that SC drives that would also be good evidence it drives some landscape biasing, for example. But it seems like those reward shapers would need to be looking primarily at higher visual cortex rather than V1. So evidence that the SC’s inputs shift from V1 in infants up to higher visual cortex in adulthood would also be convincing, as that seems somewhat necessary for it to be involved in reward prediction of higher level learned visual patterns.
I’m also curious about this part:
And if it’s not the cortex, the SC is the only other possibility.
And generally if we replace the reward shaping/bias of “savanna landscape” with “sexually attractive humanoid” then I’m more onboard the concept of something that is highly likely an innate circuit somewhere. (I don’t even buy the evolutionary argument for a savana landscape bias—humans spread out to many ecological niches including coastal zones which are nothing like the savana)
Here are some of my complaints about info-value as a grand unified theory by itself (i.e., in the absence of innate biases towards certain types of information over other types):
There are endless fractal-like depths of complexity in rocks, and there are endless fractal-like depths of complexity in ants, and there are endless fractal-like depths of complexity in birdsong, and there are endless fractal-like depths of complexity in the shape of trees, etc. So “follow the gradient where you’re learning new things” by itself seems wildly under-constrained. You cite video-game ML papers, but in this respect, video-games (especially 1980s video-games) are not analogous to the real world. You can easily saturate the novelty in Pac-Man, and then the only way to get more novelty is to “progress” in the game in roughly the way that the game-designers intended. Pac-Man does not have 10,000 elaborate branching fascinating “side-quests” that don’t advance your score, right? If it did, I claim those ML papers would not have worked. But the real world does have “side quests” like that. You can get a lifetime supply of novelty by just closing your eyes and thinking about patterns in prime numbers, etc. Yet children reliably learn relevant things like language and culture much more than irrelevant things like higher-order patterns in pebble shape and coloration. Therefore, I am skeptical of any curiosity / novelty drive completely divorced from “hardcoded” drives that induce disproportionate curiosity / interest in certain specific types of things (e.g. human speech sounds) over other things (e.g. the detailed coloration of pebbles).
(I think these hardcoded drives, like all hardcoded drives, are based on relatively simple heuristics, as opposed to being exquisitely aimed at specific complex concepts like “hunting”. I think “some simple auditory calculation that disproportionately triggers on human speech sounds” is a very plausible example.)
If your response is “There is an objective content-neutral metric in which human speech sounds are more interesting than the detailed coloration of pebbles”, then I’m skeptical, especially if the metric looks like a kind of “greedy algorithm” that does not rely on the benefit of hindsight. In other words, once you’ve invested years into learning to decode human speech sounds, then it’s clear that they are surprisingly information-rich. But before making that investment, I think that human speech sounds wouldn’t stand out compared to the coloration of pebbles or the shapes of trees or the behavior of ants or whatever. Or at least they wouldn’t stand out so much that it would explain human children’s attention to them.
We need to explain the fact that (I claim) different people seem very interested in different things, and these interests are heritable, e.g. interest-in-people versus interest-in-machines. Hardcoded drives in “what is interesting” would explain that, and I’m not sure what else would.
This is unlikely to convince you, but there is a thing called “specific language impairment” where (according to my understanding) certain otherwise-intelligent kids are unusually inattentive to language, and wind up learning language much slower than their peers (although they often catch up eventually). I’m familiar with this because I think one of my kids has a mild case. If he’s playing, and someone talks to him, he rarely orients to it, just as I rarely orient to bird sounds if I’m in the middle of an activity. Speech just doesn’t draw his attention much! And both his tendency to converse and ability to articulate clearly are way below age level. I claim that’s not a coincidence—learning follows attention. Anyway, a nice theory of this centers around an innate human-speech-sound detector being less active than usual. (Conversely, long story, but I think one aspect of autism is kinda the opposite of that—overwhelming hypersensitivity to certain stimuli often including speech sounds and eye contact, which then leads to avoidance behavior.)
There’s some evidence that a certain gene variant (ASPM) helps people learn tonal languages like Chinese, and the obvious-to-me mechanism is tweaking the innate human-speech-sound heuristics. That’s probably unlikely to convince you, because there are other possible mechanisms too, and ASPM is expressed all over the brain and you can’t ethically do experiments to figure out what part of the brain is mediating this.
whenever you find evidence that the brain is doing something two different ways that both seem functionally correct, it’s probably using both of those methods and some third one you haven’t thought of yet
I strongly disagree with the idea that SC and cortex are doing similar things. See discussion here. I think the cortex + striatum is fundamentally incapable of having an innate snake detector, because the cortex + striatum is fundamentally implementing a learning algorithm. Given a ground-truth loss function for the presence / absence of snakes, the cortex + striatum can do an excellent job learning to detect snakes in particular. But without such a loss function, they can’t. (Well, they can detect snakes without a special loss function, but only as “just another learned latent variable”. This latent variable couldn’t get tied to any special innate reaction, in the absence of trial-and-error experience.)
Anyway, I claim that SC is playing the role of implementing the snake-heuristic calculations that underlie that loss function. (Among other things.)
But that same evidence doesn’t show the SC is much involved in higher level conscious decisions about whether to stare at a painting or listen to a song for a few minutes vs eating ice cream. That is all reward-shaped high level planning involving the various known dopaminergic decision pathways combined with serotogenic (and other) pathways that feed into general info-value.
SC projects to VTA/SNc, which is related to whether we find things positive/negative valence, pleasant/unpleasant etc. It’s not the only contribution, but I claim it’s one contribution.
clearly the SC is unlikely to be involved in food taste shaping, and I don’t think you have shown much convincing evidence it is involved in visual taste shaping.
I think the relevant unit is “brainstem and hypothalamus”, of which the SC is one part, the part that seems like it has the right inputs and multi-layer architecture to do things like calculate heuristics on the visual FOV. Food taste shaping is a different part of the brainstem, namely the gustatory nucleus of the medulla.
But it seems like those reward shapers would need to be looking primarily at higher visual cortex rather than V1.
I’m surprised that you wrote this. I thought you were on board with the idea that we should think of visual cortex as loosely (or even tightly) analogous to deep learning? Let’s train a 12-layer randomly-initialized ConvNet, and look at the vector of activations from layer 10, and decide on that basis whether you’re looking at a person, in the absence of any ground truth. It’s impossible, right? The ConvNet was randomly initialized, you can’t get any object-level information from the fact that neuron X in layer 10 has positive or negative activation, because it’s not a priori determined what role neuron X is going to wind up playing in the trained model.
We need ground truth somehow, and my claim is that SC provides it. So my mainline expectation is that SC gets visual information in a way that bypasses the cortex altogether. This is at least partly true (retina→LGN→SC pathway). SC does get inputs from visual cortex, as it turns out, which had me confused for a while but I’m OK with it now. That’s a long story, but I still think the cortical input is unrelated to how SC detects human faces and snakes and whatnot.
There are endless fractal-like depths of complexity in rocks, and there are endless fractal-like depths of complexity in ants, and there are endless fractal-like depths of complexity in birdsong, and there are endless fractal-like depths of complexity in the shape of trees, etc. So “follow the gradient where you’re learning new things” by itself seems wildly under-constrained.
There is not ‘endless fractal-like depths of complexity’ in the retinal images of rocks or ants or trees, which is what is actually relevant here. For any model, a flat uniform color wall has near zero compressible complexity, as does a picture of noise (max entropy but it’s not learnable). Real world images have learnable complexity which crucially varies based on both the image and the model’s current knowledge. But it’s never “endless”: generally it’s going to be on order or less than the image entropy the cortex gets from the retina, which is comparable to compression with modern codecs.
You cite video-game ML papers,
Actually in this thread I cited neurosci papers: first that curiosity/info-value is a reward processed like hunger[1], and a review article from 2020[2] which is an update from a similar 2015 paper[3].
So “follow the gradient where you’re learning new things” by itself seems wildly under-constrained.
Sure—but curiosity/info-gain obviously isn’t all of reward, so the various other components can also steer behavior paths towards fitness relevant directions, which then can indirectly biases the trajectory of the curiosity-driven learning as it’s always relevant to the models’ current knowledge and thus the experience trajectory.
Therefore, I am skeptical of any curiosity / novelty drive completely divorced from “hardcoded” drives that induce disproportionate curiosity / interest in certain specific types of things (e.g. human speech sounds) over other things (e.g. the detailed coloration of pebbles).
Recall that the infant is mostly driven by subcortical structures and innate patterns for the early years, and during all this time it is absolutely bombarded with human speech as the primary consistent complex audio signal. There may be some attentional bias towards human speech, but it may not be necessary, as there aren’t many other audio streams that could come close to competing. Birdsong is both less interesting and far less pervasive/frequent for most children’s audio experience. Also ‘visually interesting pebbles’ don’t seem that different than other early children’s toys: seems children would find them interesting (although there shape is typically boring).
I strongly disagree with the idea that SC and cortex are doing similar things.
I didn’t say they did—I said I’m aware of two proposals for an innate learning bias for faces: CPGs pretraining the viscortex in the womb, and innate attentional bias circuits in the SC. These are effectively doing a similar thing.
We need ground truth somehow, and my claim is that SC provides it. So my mainline expectation is that SC gets visual information in a way that bypasses the cortex altogether. T
For attentional bias/shaping the SC likely can only support very simple pattern biases close to a linear readout. So a simple bias to attend to faces seems possible, but I was actually talking about sexual attraction when I said:
But clearly there most be some innate visual shaping for at least sexual attraction—so evidence that SC drives that would also be good evidence it drives some landscape biasing, for example. But it seems like those reward shapers would need to be looking primarily at higher visual cortex rather than V1.
For sexual attraction the patterns are just too complex, so they are represented in IT or similar higher visual cortex. Any innate circuit that references human body shape images and computes their sexual attraction must get that input from higher viscortex—which then leads to the whole symbol grounding problem—as you point out, and I naturally agree.
But regardless of the specific solution to the symbol grounding problem, a consequence of that solution is that the putative brain region computing attraction value of images of humanoid shapes would need to compute that primarily from higher viscortex/IT input.
I think your model may be something like the SC computes an attentional bias which encodes all the innate sexiness geometry and thus guides us to spend more time saccading at sexy images rather than others, and possibly also outputs some reward info for this.
But that could not work as stated, simply because the sexiness concept is very complex and requires a deepnet to compute (symmetry, fat content, various feature ratios, etc).
Also this must be true, because otherwise we wouldn’t see the failures of sexual imprinting in birds that we do in fact observe.
So how can the genome best specify a complex concept innately using the least number of bits? Just indexing neurons in the learned cortex directly would be bit-minimal, but as you point out that isn’t robust.
However the topographic organization of cortex can help, as it naturally clusters neurons semantically.
Another way to ‘locate’ specific learned neurons more robustly is through proxy matching, where you have dumb simple humanoid shape detectors and symmetry detectors etc encoding a simple sexiness visual concept—which could potentially be in the SC. But then during some critical window the firing patterns of those proxy circuits are used to locate the matching visual concept in visual cortex and connect to that. In other words, you can use the simple innate proxy circuit to indirectly locate a cluster of neurons in cortex, simply based on firing pattern correlation.
This allows the genome to link to a high complex concept by specifying a low complexity proxy match for that concept in its earlier low complexity larval stage.
Proxy matching implies that after critical period training whatever neurons represents innate sexiness must then shift to get their input from higher viscortex: IT rather than just V1, and certainly not LGN.
Another related possibility is that the SC is just used to create the initial bootstrapping signal, and then some other brain region actually establishes the connection to innate downstream dependencies of sexiness and learned sexiness—so separating out the sexiness proxy from the used sexiness concept.
Anyway my point was more that innate sexual attraction must be encoded somewhere, and any evidence that the SC is crucially involved with that is evidence it is crucially involved with other innate visual bias/shaping.
Thanks again for taking the time to chat, I am finding this super helpful in understanding where you’re coming from.
Your description of proxy matching is a close match to what I’m thinking. (Sorry if I’ve been describing it poorly!)
I think I got confused because I’m mapping it to neuroanatomy differently than you. I think SC is the “proxy” part, but the “matching” part is somewhere else, not SC. For example, it might look something like this:
SC calculates a proxy, based on LGN→SC inputs. The output of this calculation is a signal, which I’ll call “Fits proxy?”
The “Fits proxy?” signal then gets sent (indirectly) to a certain part of the amygdala, where it’s used as a “ground truth” for supervised learning.
This part of the amygdala builds a trained model. The input to the trained model is (mostly-high-level) visual information, especially from IT. The output of the trained model is some signal, which I’ll call “Fits model?”
The SC’s “Fits proxy?” signal and the amygdala’s “Fits model?” signal both go down to the hypothalamus and brainstem, possibly hitting the very same neurons that trigger specific innate reactions.
Optionally, the trained model in the amygdala could stop updating itself after a critical period early in life.
Also optionally, as the animal gets older, the “Fits proxy?” signal could have less and less influence on those innate reaction neurons in the hypothalamus & brainstem, while the “Fits model?” signal would have more and more influence.
(This is one example; there are a bunch of other variations on this theme, including ones where you replace “part of the amygdala” with other parts of the forebrain like nucleus accumbens shell or lateral septum, and also where the proxy is coming from other places besides SC.)
(This is a generalization of “calculating correlations”. If the amygdala trained model is only one “layer” in the deep learning sense, then it would be just calculating linear correlations between IT signals and the proxy, I think. My best guess is that the amygdala is learning a two-layer feedforward model (more or less), so a bit more complicated than linear correlations, although low confidence on that.)
Again, since the trained model is in the amygdala, not SC, there’s no need to “shift” the SC’s inputs to IT. That’s why I was confused by what you wrote. :)
Hey thanks for explaining this—makes sense to me and I think we are mostly in agreement. Using the proxy signal as a supervised learning target to recognize the learned target pattern in IT is a straightforward way to implement the matching, but probably not quite complete in practice. I suspect you also need to combine that with some strong priors to correctly carve out the target concept.
Consider the equivalent example of trying to train a highly accurate cat image detector given a dataset containing say 20% cats combined with a crappy low complexity proxy cat detector to provide the labels. Can you really bootstrap improve discriminative models in that way with non-trivial proxy label noise? I suspect that the key to making this work is using the powerful generative model of the cortex as a regularizer, so you train it to recognize images the proxy detector labels as cats that are also close to the generative model’s data manifold. If you then reoptimize (in evolutionary time) the proxy detector to leverage that I think it makes the problem much more tractable. The generative model allows you to make the learned model far more selective around the actual data manifold to increase robustness. In very simple vague terms the model would then be learning the combination of high proxy probability combined with low distance to the data manifold of examples from the critical training set.
Later if you then test OoD on vague non-cats (dogs, stuffed animals) not encountered in training that would confuse the simple proxy the learned model can reject those—even though it never saw them during critical training—simply because they are far from the generative manifold, and the learned model is ‘shrunk’ to fit that manifold.
I do agree the amygdala does seem like a good fit for the location of the learned symbol circuit, although at that point it raises the question of why not also just have the proxy in the amygdala? If the amygdala has the required inputs from LGN and/or V1 it would be my guess that it could also just colocate the innate proxy circuit. (I haven’t looked in the lit to see if those connections exist)
Also 6 seems required for the system to work as well in adulthood as it typically does, and yet also explain the out of distribution failures for imprinting etc. (Once the IT representation is learned you want to use that exclusively, as it should be strictly superior to the proxy circuit. This seems a little weird at first, but the)
The hope is that this same mechanism which seems well suited for handling imprinting also works for grounding sexual attraction (as an elaboration of imprinting) and then more complex concepts like representations of other’s emotions from facial expression, vocal tone, etc proxies, and then combining that with empathic simulation to ground a model of other’s values/utility for social game theory, altruism, etc.
The hope is that this same mechanism which seems well suited for handling imprinting also works for grounding sexual attraction (as an elaboration of imprinting) and then more complex concepts like representations of other’s emotions from facial expression, vocal tone, etc proxies, and then combining that with empathic simulation to ground a model of other’s values/utility for social game theory, altruism, etc.
Yes, that is my hope too! And the main thing I’m working on most days is trying to flesh out the details.
I do agree the amygdala does seem like a good fit for the location of the learned symbol circuit, although at that point it raises the question of why not also just have the proxy in the amygdala? If the amygdala has the required inputs from LGN and/or V1 it would be my guess that it could also just colocate the innate proxy circuit. (I haven’t looked in the lit to see if those connections exist)
For example, I claim that all the vision-related inputs to the amygdala have at some point passed through at least one locally-random filter stage (cf. “pattern separation” in neuro literature or “compressed sensing” in DSP literature). That’s perfectly fine if the amygdala is just going to use those inputs as feedstock for an SL model. SL models don’t need to know a priori which input neuron is representing which object-level pattern, because it’s going to learn the connections, so if there’s some randomness involved, it’s fine. But the randomness would be a very big problem if the amygdala needs to use those input signals to calculate a ground-truth proxy.
As another example, a ground-truth proxy requires zero adjustable parameters (because how would you adjust them?), whereas a learning algorithm does well with as many adjustable parameters as possible, more or less.
So I see these as very different algorithmic tasks—so different that I would expect them to wind up in different parts of the brain, just on general principles.
The amygdala is a hodgepodge grouping of nuclei, some of which are “really” (embryologically & evolutionarily) part of the cortex, and the rest of which are “really” part of the striatum (ref). So if we’re going to say that the cortex and striatum are dedicated to running within-lifetime learning algorithms (which I do say), then we should expect the amygdala to be in that same category too.
By contrast, SC is in the brainstem, and if you go far enough back, SC is supposedly a cousin of the part of the pre-vertebrate (e.g. amphioxus) nervous system that implements a simple “escape circuit” by triggering swimming when it detects a shadow—in other words, a part of the brain that triggers an innate reaction based on a “hardcoded” type of pattern in visual input. So it would make sense to say that the SC is still more-or-less doing those same types of calculations.
The evidence seems pretty clear that the SC controls unconscious saccades/gazes. Given that background it makes perfect sense the SC is also a good location for simple crude innate detectors which bias saccades towards important targets: especially for infants, because human infants are barely functionally conscious at birth and so in the beginning the SC may have complete control. But gradually the higher ‘conscious loops’ involving BG, PFC and various other modules begin to take more control through the FEF (although not always of course).
That all seems compatible with your evidence—I also remember reading that there are central pattern generators which actually start training the visual cortex in the womb on simple face like patterns. But I believe in a general rule of three for the brain: whenever you find evidence that the brain is doing something two different ways that both seem functionally correct, it’s probably using both of those methods and some third one you haven’t thought of yet. And SC having innate circuits to bias saccades seems likely.
But that same evidence doesn’t show the SC is much involved in higher level conscious decisions about whether to stare at a painting or listen to a song for a few minutes vs eating ice cream. That is all reward-shaped high level planning involving the various known dopaminergic decision pathways combined with serotogenic (and other) pathways that feed into general info-value.
I do claim it is the likely grand unified theory that most completely explains conscious decisions about info consumption choices in adults—and the evidence from which I sampled earlier is fairly extensive IMHO; whereas innate SC circuits explain infant gazes (in humans, SC probably has a larger role in older smaller brained vertebrates). Moreover, if we generalize from SC to other similar subcortical structures, I do agree those together mostly control the infant for the first few years—as the higher level loops which conscious thinking depend on all require significant training.
Also—as I mentioned earlier I agree that fear of heights is probably innate, simple food taste biases are innate, obviously sexual attraction has some innate bootstrapping, etc so I’m open to the idea there is some landscape biasing in theory, but clearly the SC is unlikely to be involved in food taste shaping, and I don’t think you have shown much convincing evidence it is involved in visual taste shaping. But clearly there most be some innate visual shaping for at least sexual attraction—so evidence that SC drives that would also be good evidence it drives some landscape biasing, for example. But it seems like those reward shapers would need to be looking primarily at higher visual cortex rather than V1. So evidence that the SC’s inputs shift from V1 in infants up to higher visual cortex in adulthood would also be convincing, as that seems somewhat necessary for it to be involved in reward prediction of higher level learned visual patterns.
I’m also curious about this part:
And generally if we replace the reward shaping/bias of “savanna landscape” with “sexually attractive humanoid” then I’m more onboard the concept of something that is highly likely an innate circuit somewhere. (I don’t even buy the evolutionary argument for a savana landscape bias—humans spread out to many ecological niches including coastal zones which are nothing like the savana)
Here are some of my complaints about info-value as a grand unified theory by itself (i.e., in the absence of innate biases towards certain types of information over other types):
There are endless fractal-like depths of complexity in rocks, and there are endless fractal-like depths of complexity in ants, and there are endless fractal-like depths of complexity in birdsong, and there are endless fractal-like depths of complexity in the shape of trees, etc. So “follow the gradient where you’re learning new things” by itself seems wildly under-constrained. You cite video-game ML papers, but in this respect, video-games (especially 1980s video-games) are not analogous to the real world. You can easily saturate the novelty in Pac-Man, and then the only way to get more novelty is to “progress” in the game in roughly the way that the game-designers intended. Pac-Man does not have 10,000 elaborate branching fascinating “side-quests” that don’t advance your score, right? If it did, I claim those ML papers would not have worked. But the real world does have “side quests” like that. You can get a lifetime supply of novelty by just closing your eyes and thinking about patterns in prime numbers, etc. Yet children reliably learn relevant things like language and culture much more than irrelevant things like higher-order patterns in pebble shape and coloration. Therefore, I am skeptical of any curiosity / novelty drive completely divorced from “hardcoded” drives that induce disproportionate curiosity / interest in certain specific types of things (e.g. human speech sounds) over other things (e.g. the detailed coloration of pebbles).
(I think these hardcoded drives, like all hardcoded drives, are based on relatively simple heuristics, as opposed to being exquisitely aimed at specific complex concepts like “hunting”. I think “some simple auditory calculation that disproportionately triggers on human speech sounds” is a very plausible example.)
If your response is “There is an objective content-neutral metric in which human speech sounds are more interesting than the detailed coloration of pebbles”, then I’m skeptical, especially if the metric looks like a kind of “greedy algorithm” that does not rely on the benefit of hindsight. In other words, once you’ve invested years into learning to decode human speech sounds, then it’s clear that they are surprisingly information-rich. But before making that investment, I think that human speech sounds wouldn’t stand out compared to the coloration of pebbles or the shapes of trees or the behavior of ants or whatever. Or at least they wouldn’t stand out so much that it would explain human children’s attention to them.
We need to explain the fact that (I claim) different people seem very interested in different things, and these interests are heritable, e.g. interest-in-people versus interest-in-machines. Hardcoded drives in “what is interesting” would explain that, and I’m not sure what else would.
This is unlikely to convince you, but there is a thing called “specific language impairment” where (according to my understanding) certain otherwise-intelligent kids are unusually inattentive to language, and wind up learning language much slower than their peers (although they often catch up eventually). I’m familiar with this because I think one of my kids has a mild case. If he’s playing, and someone talks to him, he rarely orients to it, just as I rarely orient to bird sounds if I’m in the middle of an activity. Speech just doesn’t draw his attention much! And both his tendency to converse and ability to articulate clearly are way below age level. I claim that’s not a coincidence—learning follows attention. Anyway, a nice theory of this centers around an innate human-speech-sound detector being less active than usual. (Conversely, long story, but I think one aspect of autism is kinda the opposite of that—overwhelming hypersensitivity to certain stimuli often including speech sounds and eye contact, which then leads to avoidance behavior.)
There’s some evidence that a certain gene variant (ASPM) helps people learn tonal languages like Chinese, and the obvious-to-me mechanism is tweaking the innate human-speech-sound heuristics. That’s probably unlikely to convince you, because there are other possible mechanisms too, and ASPM is expressed all over the brain and you can’t ethically do experiments to figure out what part of the brain is mediating this.
I strongly disagree with the idea that SC and cortex are doing similar things. See discussion here. I think the cortex + striatum is fundamentally incapable of having an innate snake detector, because the cortex + striatum is fundamentally implementing a learning algorithm. Given a ground-truth loss function for the presence / absence of snakes, the cortex + striatum can do an excellent job learning to detect snakes in particular. But without such a loss function, they can’t. (Well, they can detect snakes without a special loss function, but only as “just another learned latent variable”. This latent variable couldn’t get tied to any special innate reaction, in the absence of trial-and-error experience.)
Anyway, I claim that SC is playing the role of implementing the snake-heuristic calculations that underlie that loss function. (Among other things.)
SC projects to VTA/SNc, which is related to whether we find things positive/negative valence, pleasant/unpleasant etc. It’s not the only contribution, but I claim it’s one contribution.
I think the relevant unit is “brainstem and hypothalamus”, of which the SC is one part, the part that seems like it has the right inputs and multi-layer architecture to do things like calculate heuristics on the visual FOV. Food taste shaping is a different part of the brainstem, namely the gustatory nucleus of the medulla.
I’m surprised that you wrote this. I thought you were on board with the idea that we should think of visual cortex as loosely (or even tightly) analogous to deep learning? Let’s train a 12-layer randomly-initialized ConvNet, and look at the vector of activations from layer 10, and decide on that basis whether you’re looking at a person, in the absence of any ground truth. It’s impossible, right? The ConvNet was randomly initialized, you can’t get any object-level information from the fact that neuron X in layer 10 has positive or negative activation, because it’s not a priori determined what role neuron X is going to wind up playing in the trained model.
We need ground truth somehow, and my claim is that SC provides it. So my mainline expectation is that SC gets visual information in a way that bypasses the cortex altogether. This is at least partly true (retina→LGN→SC pathway). SC does get inputs from visual cortex, as it turns out, which had me confused for a while but I’m OK with it now. That’s a long story, but I still think the cortical input is unrelated to how SC detects human faces and snakes and whatnot.
There is not ‘endless fractal-like depths of complexity’ in the retinal images of rocks or ants or trees, which is what is actually relevant here. For any model, a flat uniform color wall has near zero compressible complexity, as does a picture of noise (max entropy but it’s not learnable). Real world images have learnable complexity which crucially varies based on both the image and the model’s current knowledge. But it’s never “endless”: generally it’s going to be on order or less than the image entropy the cortex gets from the retina, which is comparable to compression with modern codecs.
Actually in this thread I cited neurosci papers: first that curiosity/info-value is a reward processed like hunger[1], and a review article from 2020[2] which is an update from a similar 2015 paper[3].
Sure—but curiosity/info-gain obviously isn’t all of reward, so the various other components can also steer behavior paths towards fitness relevant directions, which then can indirectly biases the trajectory of the curiosity-driven learning as it’s always relevant to the models’ current knowledge and thus the experience trajectory.
Recall that the infant is mostly driven by subcortical structures and innate patterns for the early years, and during all this time it is absolutely bombarded with human speech as the primary consistent complex audio signal. There may be some attentional bias towards human speech, but it may not be necessary, as there aren’t many other audio streams that could come close to competing. Birdsong is both less interesting and far less pervasive/frequent for most children’s audio experience. Also ‘visually interesting pebbles’ don’t seem that different than other early children’s toys: seems children would find them interesting (although there shape is typically boring).
I didn’t say they did—I said I’m aware of two proposals for an innate learning bias for faces: CPGs pretraining the viscortex in the womb, and innate attentional bias circuits in the SC. These are effectively doing a similar thing.
For attentional bias/shaping the SC likely can only support very simple pattern biases close to a linear readout. So a simple bias to attend to faces seems possible, but I was actually talking about sexual attraction when I said:
For sexual attraction the patterns are just too complex, so they are represented in IT or similar higher visual cortex. Any innate circuit that references human body shape images and computes their sexual attraction must get that input from higher viscortex—which then leads to the whole symbol grounding problem—as you point out, and I naturally agree.
But regardless of the specific solution to the symbol grounding problem, a consequence of that solution is that the putative brain region computing attraction value of images of humanoid shapes would need to compute that primarily from higher viscortex/IT input.
I think your model may be something like the SC computes an attentional bias which encodes all the innate sexiness geometry and thus guides us to spend more time saccading at sexy images rather than others, and possibly also outputs some reward info for this.
But that could not work as stated, simply because the sexiness concept is very complex and requires a deepnet to compute (symmetry, fat content, various feature ratios, etc).
Also this must be true, because otherwise we wouldn’t see the failures of sexual imprinting in birds that we do in fact observe.
So how can the genome best specify a complex concept innately using the least number of bits? Just indexing neurons in the learned cortex directly would be bit-minimal, but as you point out that isn’t robust.
However the topographic organization of cortex can help, as it naturally clusters neurons semantically.
Another way to ‘locate’ specific learned neurons more robustly is through proxy matching, where you have dumb simple humanoid shape detectors and symmetry detectors etc encoding a simple sexiness visual concept—which could potentially be in the SC. But then during some critical window the firing patterns of those proxy circuits are used to locate the matching visual concept in visual cortex and connect to that. In other words, you can use the simple innate proxy circuit to indirectly locate a cluster of neurons in cortex, simply based on firing pattern correlation.
This allows the genome to link to a high complex concept by specifying a low complexity proxy match for that concept in its earlier low complexity larval stage.
Proxy matching implies that after critical period training whatever neurons represents innate sexiness must then shift to get their input from higher viscortex: IT rather than just V1, and certainly not LGN.
Another related possibility is that the SC is just used to create the initial bootstrapping signal, and then some other brain region actually establishes the connection to innate downstream dependencies of sexiness and learned sexiness—so separating out the sexiness proxy from the used sexiness concept.
Anyway my point was more that innate sexual attraction must be encoded somewhere, and any evidence that the SC is crucially involved with that is evidence it is crucially involved with other innate visual bias/shaping.
Shared striatal activity in decisions to satisfy curiosity and hunger at the risk of electric shocks
Systems neuroscience of curiosity
The psychology and neuroscience of curiosity
Thanks again for taking the time to chat, I am finding this super helpful in understanding where you’re coming from.
Your description of proxy matching is a close match to what I’m thinking. (Sorry if I’ve been describing it poorly!)
I think I got confused because I’m mapping it to neuroanatomy differently than you. I think SC is the “proxy” part, but the “matching” part is somewhere else, not SC. For example, it might look something like this:
SC calculates a proxy, based on LGN→SC inputs. The output of this calculation is a signal, which I’ll call “Fits proxy?”
The “Fits proxy?” signal then gets sent (indirectly) to a certain part of the amygdala, where it’s used as a “ground truth” for supervised learning.
This part of the amygdala builds a trained model. The input to the trained model is (mostly-high-level) visual information, especially from IT. The output of the trained model is some signal, which I’ll call “Fits model?”
The SC’s “Fits proxy?” signal and the amygdala’s “Fits model?” signal both go down to the hypothalamus and brainstem, possibly hitting the very same neurons that trigger specific innate reactions.
Optionally, the trained model in the amygdala could stop updating itself after a critical period early in life.
Also optionally, as the animal gets older, the “Fits proxy?” signal could have less and less influence on those innate reaction neurons in the hypothalamus & brainstem, while the “Fits model?” signal would have more and more influence.
(This is one example; there are a bunch of other variations on this theme, including ones where you replace “part of the amygdala” with other parts of the forebrain like nucleus accumbens shell or lateral septum, and also where the proxy is coming from other places besides SC.)
(This is a generalization of “calculating correlations”. If the amygdala trained model is only one “layer” in the deep learning sense, then it would be just calculating linear correlations between IT signals and the proxy, I think. My best guess is that the amygdala is learning a two-layer feedforward model (more or less), so a bit more complicated than linear correlations, although low confidence on that.)
Again, since the trained model is in the amygdala, not SC, there’s no need to “shift” the SC’s inputs to IT. That’s why I was confused by what you wrote. :)
Hey thanks for explaining this—makes sense to me and I think we are mostly in agreement. Using the proxy signal as a supervised learning target to recognize the learned target pattern in IT is a straightforward way to implement the matching, but probably not quite complete in practice. I suspect you also need to combine that with some strong priors to correctly carve out the target concept.
Consider the equivalent example of trying to train a highly accurate cat image detector given a dataset containing say 20% cats combined with a crappy low complexity proxy cat detector to provide the labels. Can you really bootstrap improve discriminative models in that way with non-trivial proxy label noise? I suspect that the key to making this work is using the powerful generative model of the cortex as a regularizer, so you train it to recognize images the proxy detector labels as cats that are also close to the generative model’s data manifold. If you then reoptimize (in evolutionary time) the proxy detector to leverage that I think it makes the problem much more tractable. The generative model allows you to make the learned model far more selective around the actual data manifold to increase robustness. In very simple vague terms the model would then be learning the combination of high proxy probability combined with low distance to the data manifold of examples from the critical training set.
Later if you then test OoD on vague non-cats (dogs, stuffed animals) not encountered in training that would confuse the simple proxy the learned model can reject those—even though it never saw them during critical training—simply because they are far from the generative manifold, and the learned model is ‘shrunk’ to fit that manifold.
I do agree the amygdala does seem like a good fit for the location of the learned symbol circuit, although at that point it raises the question of why not also just have the proxy in the amygdala? If the amygdala has the required inputs from LGN and/or V1 it would be my guess that it could also just colocate the innate proxy circuit. (I haven’t looked in the lit to see if those connections exist)
Also 6 seems required for the system to work as well in adulthood as it typically does, and yet also explain the out of distribution failures for imprinting etc. (Once the IT representation is learned you want to use that exclusively, as it should be strictly superior to the proxy circuit. This seems a little weird at first, but the)
The hope is that this same mechanism which seems well suited for handling imprinting also works for grounding sexual attraction (as an elaboration of imprinting) and then more complex concepts like representations of other’s emotions from facial expression, vocal tone, etc proxies, and then combining that with empathic simulation to ground a model of other’s values/utility for social game theory, altruism, etc.
Yes, that is my hope too! And the main thing I’m working on most days is trying to flesh out the details.
For example, I claim that all the vision-related inputs to the amygdala have at some point passed through at least one locally-random filter stage (cf. “pattern separation” in neuro literature or “compressed sensing” in DSP literature). That’s perfectly fine if the amygdala is just going to use those inputs as feedstock for an SL model. SL models don’t need to know a priori which input neuron is representing which object-level pattern, because it’s going to learn the connections, so if there’s some randomness involved, it’s fine. But the randomness would be a very big problem if the amygdala needs to use those input signals to calculate a ground-truth proxy.
As another example, a ground-truth proxy requires zero adjustable parameters (because how would you adjust them?), whereas a learning algorithm does well with as many adjustable parameters as possible, more or less.
So I see these as very different algorithmic tasks—so different that I would expect them to wind up in different parts of the brain, just on general principles.
The amygdala is a hodgepodge grouping of nuclei, some of which are “really” (embryologically & evolutionarily) part of the cortex, and the rest of which are “really” part of the striatum (ref). So if we’re going to say that the cortex and striatum are dedicated to running within-lifetime learning algorithms (which I do say), then we should expect the amygdala to be in that same category too.
By contrast, SC is in the brainstem, and if you go far enough back, SC is supposedly a cousin of the part of the pre-vertebrate (e.g. amphioxus) nervous system that implements a simple “escape circuit” by triggering swimming when it detects a shadow—in other words, a part of the brain that triggers an innate reaction based on a “hardcoded” type of pattern in visual input. So it would make sense to say that the SC is still more-or-less doing those same types of calculations.