The human brain has a model-based RL system that it uses for within-lifetime learning. I guess that previous sentence is somewhat controversial, but it really shouldn’t be:
The brain has a model—If I go to the toy store, I expect to be able to buy a ball.
The model is updated by self-supervised learning (i.e., predicting imminent sensory inputs and editing the model in response to prediction errors)—if I expect the ball to bounce, and then I see the ball hit the ground without bouncing, then next time I see that ball heading towards the ground, I won’t expect it to bounce.
The model informs decision-making—If I want a bouncy ball, I won’t buy that ball, instead I’ll buy a different ball.
There’s reinforcement learning—If I drop the ball on my foot just to see what will happen, and it really hurts, then I probably won’t do it again, and relatedly I will think of doing so as a bad idea.
…And that’s all I mean by “the brain has a model-based RL system”.
I emphatically do not mean that, if you just read a “model-based RL” paper on arxiv last week, then I think the brain works exactly like that paper you just read. On the contrary, “model-based RL” is a big tent comprising many different algorithms, once you get into details. And indeed, I don’t think “model-based RL as implemented in the brain” is exactly the same as any model-based RL algorithm on arxiv.
Here’s an elaboration of what I mean: