The way I’m currently thinking about it is: In everywhere but the frontal lobe, the task is something like
Find a generative model that accurately predicts Input Signal X based on Contextual Information Y.
But it’s different X & Y for different parts of the cortex, and there can even be cascades where one region needs to predict the residual prediction error from another region (ref). And there’s also a top-down attention mechanism such that not all prediction errors are equally bad.
The frontal lobe is a bit different in that it’s choosing what action to take or what thought to think (at least in part). That’s not purely a prediction task, because it has more than one right answer. I mean, you can predict that you’ll go left, then go left, and that’s a correct prediction. Or you can predict that you’ll go right, then go right, and that’s a correct prediction too! So it’s not just predictions; we need reinforcement learning / rewards too. In those cases, the task is “Find a generative model that is making correct predictions AND leading to high rewards,” presumably. But I don’t think that’s really something that the neocortex is doing, per se. I think it’s the basal ganglia (BG), which sends outputs to the frontal lobe. I think the BG looks at what the neocortex is doing, calculates a value function (using TD learning, and storing its information in the striatum), and then (loosely speaking) the BG reaches up into the neocortex and fiddles with it, trying to suppress the patterns of activity that it thinks would lead to lower rewards and trying to amplify the patterns of activity that it thinks would lead to higher rewards.
See my Predictive coding = RL + SL + Bayes + MPC for my old first-cut attempt to think through this stuff. Meanwhile I’ve been reading all about the striatum and RL stuff, more posts forthcoming I hope.
A few points where clarification would help, if you don’t mind (feel free to skip some):
What are the capabilities of the “generative model”? In general, the term seems to be used in various ways. e.g.
Sampling from the learned distribution (analogous to GPT-3 at temp=1)
Evaluating the probability of a given point
Producing the predicted most likely point (analogous to GPT-3 at temp=0)
Is what we’re predicting the input at the next time step? (Sometimes “predict” can be used to mean filling in missing information, but that doesn’t seem to make sense in this context.) Also, I’m not sure what I mean by “time step” here.
The “input signal” here is coming from whatever is wired into the cortex, right? Does it work to think of this as a vector in Rn?
Is the contextual information just whatever is the current input, plus whatever signals are still bouncing around?
Also, the capability described may be a bit too broad, since there are some predictions that the cortex seems to be bad at. Consider predicting the sum of two 8-digit integers. Digital computers compute that easily, so it’s fundamentally an easy problem, but for humans to do it requires effort. Yet for some other predictions, the cortex easily outperforms today’s digital computers. What characterizes the prediction problems that the cortex does well?
Think of a generative model as something like “This thing I’m looking at is a red bouncy ball”. Just looking at it you can guess pretty well how much it would weigh if you lifted it, how it would feel if you rubbed it, how it would smell if you smelled it, and how it would bounce if you threw it. Lots of ways to query these models! Powerful stuff!
some predictions that the cortex seems to be bad at
If a model is trained to minimize a loss function L, that doesn’t mean that, after training, it winds up with a very low value of L in every possible case. Right? I’m confused about why you’re confused. :-P
Thanks!
The way I’m currently thinking about it is: In everywhere but the frontal lobe, the task is something like
But it’s different X & Y for different parts of the cortex, and there can even be cascades where one region needs to predict the residual prediction error from another region (ref). And there’s also a top-down attention mechanism such that not all prediction errors are equally bad.
The frontal lobe is a bit different in that it’s choosing what action to take or what thought to think (at least in part). That’s not purely a prediction task, because it has more than one right answer. I mean, you can predict that you’ll go left, then go left, and that’s a correct prediction. Or you can predict that you’ll go right, then go right, and that’s a correct prediction too! So it’s not just predictions; we need reinforcement learning / rewards too. In those cases, the task is “Find a generative model that is making correct predictions AND leading to high rewards,” presumably. But I don’t think that’s really something that the neocortex is doing, per se. I think it’s the basal ganglia (BG), which sends outputs to the frontal lobe. I think the BG looks at what the neocortex is doing, calculates a value function (using TD learning, and storing its information in the striatum), and then (loosely speaking) the BG reaches up into the neocortex and fiddles with it, trying to suppress the patterns of activity that it thinks would lead to lower rewards and trying to amplify the patterns of activity that it thinks would lead to higher rewards.
See my Predictive coding = RL + SL + Bayes + MPC for my old first-cut attempt to think through this stuff. Meanwhile I’ve been reading all about the striatum and RL stuff, more posts forthcoming I hope.
Happy for any thoughts on that. :-)
Thanks for your reply!
A few points where clarification would help, if you don’t mind (feel free to skip some):
What are the capabilities of the “generative model”? In general, the term seems to be used in various ways. e.g.
Sampling from the learned distribution (analogous to GPT-3 at temp=1)
Evaluating the probability of a given point
Producing the predicted most likely point (analogous to GPT-3 at temp=0)
Is what we’re predicting the input at the next time step? (Sometimes “predict” can be used to mean filling in missing information, but that doesn’t seem to make sense in this context.) Also, I’m not sure what I mean by “time step” here.
The “input signal” here is coming from whatever is wired into the cortex, right? Does it work to think of this as a vector in Rn?
Is the contextual information just whatever is the current input, plus whatever signals are still bouncing around?
Also, the capability described may be a bit too broad, since there are some predictions that the cortex seems to be bad at. Consider predicting the sum of two 8-digit integers. Digital computers compute that easily, so it’s fundamentally an easy problem, but for humans to do it requires effort. Yet for some other predictions, the cortex easily outperforms today’s digital computers. What characterizes the prediction problems that the cortex does well?
Think of a generative model as something like “This thing I’m looking at is a red bouncy ball”. Just looking at it you can guess pretty well how much it would weigh if you lifted it, how it would feel if you rubbed it, how it would smell if you smelled it, and how it would bounce if you threw it. Lots of ways to query these models! Powerful stuff!
If a model is trained to minimize a loss function L, that doesn’t mean that, after training, it winds up with a very low value of L in every possible case. Right? I’m confused about why you’re confused. :-P