Yeah, I haven’t read any of these references, but I’ll elaborate on why I’m currently very skeptical that “model-free” vs “model-based” is a fundamental difference.
I’ll start with an example unrelated to motivation, to take it one step at a time.
Imagine that, every few hours, your whole field of vision turns bright blue for a couple seconds, then turns yellow, then goes back to normal. You have no idea why. But pretty soon, every time your field of vision turns blue, you’ll start expecting it to then turn yellow within a couple seconds. This expectation is completely divorced from everything else you know, since you have no idea why it’s happening, and indeed all your understanding of the world says that this shouldn’t be happening.
Now maybe there’s a temptation here to say that the expectation of yellow is model-free pattern recognition, and to contrast it with model-based pattern recognition, which would be something like expecting a chess master to beat a beginner, which is a pattern that you can only grasp using your rich contextual knowledge of the world.
But I would not draw that contrast. I would say that the kind of pattern recognition that makes us expect to see yellow after blue just from direct experience without understanding why, is exactly the same kind of pattern recognition that originally built up our entire world-model from scratch, and which continues to modify it throughout our lives.
For example, to a 1-year-old, the fact that the words “1 2 3 4...” is usually followed by “5″ is just an arbitrary pattern, a memorized sequence of sounds. But over time we learn other patterns, like seeing two things while someone says “two”, and we build connections between all these different patterns, and wind up with a rich web of memorized patterns that comprises our entire world-model.
Different bits of knowledge can be more or less integrated into this web. “I see yellow after blue, and I have no idea why” would be an extreme example—an island of knowledge isolated from everything else we know. But it’s a spectrum. For example, take everyone on Earth who knows the phrase “E=mc²”. There’s a continuum, from people who treat it as a memorized sequence of meaningless sounds in the same category as “yabba dabba doo”, to people who know that the E stands for energy but nothing else, to physics students who kinda get it, all the way to professional physicists who find E=mc² to be perfectly obvious and inevitable and then try to explain it on Quora because I guess I had nothing better to do on New Years Day 2014… :-)
So, I think model-based and model-free is not a fundamental distinction. But I do think that with different ways of acquiring knowledge, there are systematic trends in prediction strength, with first-hand experience leading to much stronger predictions than less-direct inferences. If I have repeated direct experience of my whole field of vision filling with yellow after blue, that will develop into a very very strong (confident) prediction. After enough times seeing blue-then-yellow, if I see blue-then-green I might literally jump out of my seat and scream!! By contrast, the kind of expectation that we arrive at indirectly via our world model tends to be a weaker prediction. If I see a chess master lose to a beginner, I’ll be surprised, but I won’t jump out of my seat and scream. Of course that’s appropriate: I only predicted the chess master would win via a long chain of uncertain probabilistic inferences, like “the master was trying to win”, “nobody cheated”, “the master was sober”, “chess is not the kind of game where you can win just by getting lucky”, etc. So it’s appropriate for me to be predicting the win with less confidence. As yet a third example, let’s say a professional chess commentator is watching the same match, in the context of a proper tournament. The commentator actually might jump out of her chair and scream when the master loses! For her, the sight of masters crushing beginners is something that she has repeatedly and directly experienced. Thus her prediction is much stronger than mine. (I’m not really into chess.)
All this is about perception, not motivation. Now, back to motivation. I think we are motivated to do things proportionally to our prediction of the associated reward.
I think we learn to predict reward in a similar way that we learn to predict anything else. So it’s the same idea. Some reward predictions will be from direct experience, and not necessarily well-integrated with the rest of our world-model: “Don’t know why, but it feels good when I do X”. It’s tempting to call these “model-free”. Other reward predictions will be more indirect, mediated by our understanding of how some plan will unfold. The latter will tend to be weaker reward predictions in general (as is appropriate since they rely on a longer chain of uncertain inferences), and hence they tend to be less motivating. It’s tempting to call these “model-based”. But I don’t think it’s a fundamental or sharp distinction. Even if you say “it feels good when I do X”, we have to use our world-model to construct the category X and classify things as X or not-X. Conversely, if you make a plan expecting good results, you implicitly have some abstract category of “plans of this type” and you do have previous direct experience of rewards coming from the objects in this abstract category.
Again, this is just my current take without having read the literature :-D
(Update 6 months later: I have read more of the relevant literature since writing this, but basically stand by what I said here.)
This reminds me of my discussion with johnswentworth, where I was the one arguing that model-free vs. model-based is a sliding scale. :)
So yes, it seems reasonable to me that these might be best understood as extreme ends of a spectrum… which was part of the reason why I copied that excerpt, as it included the concluding sentence of “‘Model-based’ and ‘model-free’ modes of valuation and response, if this is correct, simply name extremes along a single continuum and may appear in many mixtures and combinations determined by the task at hand” at the end. :)
I am not the party that used the terms but ot me “yellow then blue” reads as a very simple model and model based thinking.
The part of ” we have to use our world-model to construct the category X and classify things as X or not-X ” reads to me that you do not think that model-free thinking is possible.
You can be a situation and something in it elict you to respond in a way Y without you being aware what is the condition that makes that expereince fall within a triggering reference class. Now if you know you have such a reaction you can by experiment try to to inductive investigation by carefully varying the environment and check whether you do the react or not. Then you might reverse engineer the reflex and end up with a model how the reflex works.
The question of ineffabiolity of neural network might be relevant. If a neural network makes a mistake and tries to avoid doing that mistake in the future a lot of weights are adjusted none of which is easily expressible as a doing a different action in some discrete situation. Even if it is a simple model a model like “blue” seemss to point out a set of criteria how you could rule whether a novel experince falls wihtin the perfew of the model or not. But if you have a ill or fuzzily defined “this kind of situation” that is a completely different thing.
Yeah, I haven’t read any of these references, but I’ll elaborate on why I’m currently very skeptical that “model-free” vs “model-based” is a fundamental difference.
I’ll start with an example unrelated to motivation, to take it one step at a time.
Imagine that, every few hours, your whole field of vision turns bright blue for a couple seconds, then turns yellow, then goes back to normal. You have no idea why. But pretty soon, every time your field of vision turns blue, you’ll start expecting it to then turn yellow within a couple seconds. This expectation is completely divorced from everything else you know, since you have no idea why it’s happening, and indeed all your understanding of the world says that this shouldn’t be happening.
Now maybe there’s a temptation here to say that the expectation of yellow is model-free pattern recognition, and to contrast it with model-based pattern recognition, which would be something like expecting a chess master to beat a beginner, which is a pattern that you can only grasp using your rich contextual knowledge of the world.
But I would not draw that contrast. I would say that the kind of pattern recognition that makes us expect to see yellow after blue just from direct experience without understanding why, is exactly the same kind of pattern recognition that originally built up our entire world-model from scratch, and which continues to modify it throughout our lives.
For example, to a 1-year-old, the fact that the words “1 2 3 4...” is usually followed by “5″ is just an arbitrary pattern, a memorized sequence of sounds. But over time we learn other patterns, like seeing two things while someone says “two”, and we build connections between all these different patterns, and wind up with a rich web of memorized patterns that comprises our entire world-model.
Different bits of knowledge can be more or less integrated into this web. “I see yellow after blue, and I have no idea why” would be an extreme example—an island of knowledge isolated from everything else we know. But it’s a spectrum. For example, take everyone on Earth who knows the phrase “E=mc²”. There’s a continuum, from people who treat it as a memorized sequence of meaningless sounds in the same category as “yabba dabba doo”, to people who know that the E stands for energy but nothing else, to physics students who kinda get it, all the way to professional physicists who find E=mc² to be perfectly obvious and inevitable and then try to explain it on Quora because I guess I had nothing better to do on New Years Day 2014… :-)
So, I think model-based and model-free is not a fundamental distinction. But I do think that with different ways of acquiring knowledge, there are systematic trends in prediction strength, with first-hand experience leading to much stronger predictions than less-direct inferences. If I have repeated direct experience of my whole field of vision filling with yellow after blue, that will develop into a very very strong (confident) prediction. After enough times seeing blue-then-yellow, if I see blue-then-green I might literally jump out of my seat and scream!! By contrast, the kind of expectation that we arrive at indirectly via our world model tends to be a weaker prediction. If I see a chess master lose to a beginner, I’ll be surprised, but I won’t jump out of my seat and scream. Of course that’s appropriate: I only predicted the chess master would win via a long chain of uncertain probabilistic inferences, like “the master was trying to win”, “nobody cheated”, “the master was sober”, “chess is not the kind of game where you can win just by getting lucky”, etc. So it’s appropriate for me to be predicting the win with less confidence. As yet a third example, let’s say a professional chess commentator is watching the same match, in the context of a proper tournament. The commentator actually might jump out of her chair and scream when the master loses! For her, the sight of masters crushing beginners is something that she has repeatedly and directly experienced. Thus her prediction is much stronger than mine. (I’m not really into chess.)
All this is about perception, not motivation. Now, back to motivation. I think we are motivated to do things proportionally to our prediction of the associated reward.
I think we learn to predict reward in a similar way that we learn to predict anything else. So it’s the same idea. Some reward predictions will be from direct experience, and not necessarily well-integrated with the rest of our world-model: “Don’t know why, but it feels good when I do X”. It’s tempting to call these “model-free”. Other reward predictions will be more indirect, mediated by our understanding of how some plan will unfold. The latter will tend to be weaker reward predictions in general (as is appropriate since they rely on a longer chain of uncertain inferences), and hence they tend to be less motivating. It’s tempting to call these “model-based”. But I don’t think it’s a fundamental or sharp distinction. Even if you say “it feels good when I do X”, we have to use our world-model to construct the category X and classify things as X or not-X. Conversely, if you make a plan expecting good results, you implicitly have some abstract category of “plans of this type” and you do have previous direct experience of rewards coming from the objects in this abstract category.
Again, this is just my current take without having read the literature :-D
(Update 6 months later: I have read more of the relevant literature since writing this, but basically stand by what I said here.)
This reminds me of my discussion with johnswentworth, where I was the one arguing that model-free vs. model-based is a sliding scale. :)
So yes, it seems reasonable to me that these might be best understood as extreme ends of a spectrum… which was part of the reason why I copied that excerpt, as it included the concluding sentence of “‘Model-based’ and ‘model-free’ modes of valuation and response, if this is correct, simply name extremes along a single continuum and may appear in many mixtures and combinations determined by the task at hand” at the end. :)
I am not the party that used the terms but ot me “yellow then blue” reads as a very simple model and model based thinking.
The part of ” we have to use our world-model to construct the category X and classify things as X or not-X ” reads to me that you do not think that model-free thinking is possible.
You can be a situation and something in it elict you to respond in a way Y without you being aware what is the condition that makes that expereince fall within a triggering reference class. Now if you know you have such a reaction you can by experiment try to to inductive investigation by carefully varying the environment and check whether you do the react or not. Then you might reverse engineer the reflex and end up with a model how the reflex works.
The question of ineffabiolity of neural network might be relevant. If a neural network makes a mistake and tries to avoid doing that mistake in the future a lot of weights are adjusted none of which is easily expressible as a doing a different action in some discrete situation. Even if it is a simple model a model like “blue” seemss to point out a set of criteria how you could rule whether a novel experince falls wihtin the perfew of the model or not. But if you have a ill or fuzzily defined “this kind of situation” that is a completely different thing.