What happens? Your behavior would change in response, but I claim it would change very gradually.
For model-free learning. For model-based learning, your behavior changes instantly, as you are now able to project forward into the future, examine the utilities, discover that scenarios with social approval now have zero utility, and all actions are then directed towards whatever is still rewarding. (In psychological experiments, the speed of adaptation is in fact considered one way to distinguish between a mouse using model-based RL and when it is using model-free: when a reward at the end of the maze changes, does it need to hit the end of the maze several times before any decisions start changing, or is it able to rapidly switch plans, implying a model of the environment separate from the learning about rewards?)
For model-free learning. For model-based learning, your behavior changes instantly, as you are now able to project forward into the future, examine the utilities, discover that scenarios with social approval now have zero utility, and all actions are then directed towards whatever is still rewarding. (In psychological experiments, the speed of adaptation is in fact considered one way to distinguish between a mouse using model-based RL and when it is using model-free: when a reward at the end of the maze changes, does it need to hit the end of the maze several times before any decisions start changing, or is it able to rapidly switch plans, implying a model of the environment separate from the learning about rewards?)
Thanks for your comment! I don’t exactly agree with it, mostly because I think “model-based” and “model-free” are big tents that include lots of different things (to make a long story short). But it’s a moot point anyway because after writing this I came to believe that the brain is in fact using an algorithm that’s spiritually similar to what I was talking about in this post.