7. “Changing your predictions to match the world” and “Changing the world to match your predictions” are (at least partly) two different systems / algorithms in the brain. So lumping them together is counterproductive
The title of this section contradicts the ensuing text. The title says that recognition and action selection (plus planning, if the system is sufficiently advanced) are “at least partially” two different algorithms. Well, yes, they could, and almost always are implemented separately, in one or another way. But we also should look at the emergent algorithm that comes out of coupling these two algorithms (they are physically lumped, because they are situated within a single system, like a brain, which you pragmatically should model as a unified whole).
So, the statement that considering these two algorithms as a whole is “counterproductive” doesn’t make sense to me, just as saying that in GAN, you should consider only the two DNNs separately, rather than the dynamics of the coupled architecture. You should also look at the algorithms separately, of course, at the “gears level” (or, we can call it the mechanistic interpretability level), but it doesn’t make the unified view perspective counterproductive. The are both productive.
Since they’re (at least partly) two different algorithms, unifying them is a way of moving away from a “gears-level” understanding of how the brain works. They shouldn’t be the same thing in your mental model, if they’re not the same thing in the brain.
Then, in the text of the section, you also say something stronger than these two algorithms are “at least partially” separate algorithms, but that they “can’t be” the same algorithm:
Yes they sound related. Yes you can write one equation that unifies them. But they can’t be the same algorithm, for the following reason:
“Changing your predictions to match the world” is a (self-) supervised learning problem. When a prediction fails, there’s a ground truth about what you should have predicted instead. More technically, you get a full error gradient “for free” with each query, at least in principle. Both ML algorithms and brains use those sensory prediction errors to update internal models, in a way that relies on the rich high-dimensional error information that arrives immediately-after-the-fact.
“Changing the world to match your predictions” is a reinforcement learning (RL) problem. No matter what action you take, there is no ground truth about what action you counterfactually should have taken. So you can’t use a supervised learning algorithm. You need a different algorithm.
I think you miss the perspective that justifies that perception and action should be considered within a single algorithm, namely, the pragmatics of perception. In “Being You” (2021), Anil Seth writes:
Action is inseparable from perception. Perception and action are so tightly coupled that they determine and define each other. Every action alters perception by changing the incoming sensory data, and every perception is the way it is in order to help guide action. There is simply no point to perception in the absence of action. We perceive the world around us in order to act effectively within it, to achieve our goals and – in the long run – to promote our prospects of survival. We don’t perceive the world as it is, we perceive it as it is useful for us to do so.
It may even be that action comes first. Instead of picturing the brain as reaching perceptual best guesses in order to then guide behaviour, we can think of brains as fundamentally in the business of generating actions, and continually calibrating these actions using sensory signals, so as to best achieve the organism’s goals. This view casts the brain as an intrinsically dynamic, active system, continually probing its environment and examining the consequences.
In predictive processing, action and perception are two sides of the same coin. Both are underpinned by the minimisation of sensory prediction errors. Until now, I’ve described this minimisation process in terms of updating perceptual predictions, but this is not the only possibility. Prediction errors can also be quenched by performing actions in order to change the sensory data, so that the new sensory data matches an existing prediction.
The pragmatics of perception don’t allow considering self-supervised learning in complete isolation from the action-selection algorithm.
The title of this section contradicts the ensuing text. The title says that recognition and action selection (plus planning, if the system is sufficiently advanced) are “at least partially” two different algorithms. Well, yes, they could, and almost always are implemented separately, in one or another way. But we also should look at the emergent algorithm that comes out of coupling these two algorithms (they are physically lumped, because they are situated within a single system, like a brain, which you pragmatically should model as a unified whole).
So, the statement that considering these two algorithms as a whole is “counterproductive” doesn’t make sense to me, just as saying that in GAN, you should consider only the two DNNs separately, rather than the dynamics of the coupled architecture. You should also look at the algorithms separately, of course, at the “gears level” (or, we can call it the mechanistic interpretability level), but it doesn’t make the unified view perspective counterproductive. The are both productive.
As I also mentioned in another comment, moving up the abstraction stack is useful as well as moving down.
Then, in the text of the section, you also say something stronger than these two algorithms are “at least partially” separate algorithms, but that they “can’t be” the same algorithm:
I think you miss the perspective that justifies that perception and action should be considered within a single algorithm, namely, the pragmatics of perception. In “Being You” (2021), Anil Seth writes:
The pragmatics of perception don’t allow considering self-supervised learning in complete isolation from the action-selection algorithm.