The thought of having clean healthy teeth is attractive.
We make the decision by weighing these against each other, I think. You are categorizing the former as a craving and the latter as motivation-that-is-not-craving (right?),
I mostly used examples of aversion in the post, but to be clear, both desire and aversion can be forms of craving. As I noted in another comment, basically any goal can be either craving-based, non-craving-based, or (typically) a mixture of both.
After reading this and lukeprog’s post you referenced, I’m still not convinced that there is fundamentally more than one motivational system—although I don’t have high confidence and still want to chase down the references. [...] I’m just suggesting that maybe craving vs motivations-that-are-not-craving is a difference of degree not kind .
Possible; subjectively they feel like differences in kind, but of course subjective experience is not strong evidence for how something is implemented neurally. Large enough quantitative differences can produce effects that feel like qualitative differences.
I wonder about the connection to the referenced motivational systems; based on a superficial description (e.g. the below excerpt from Surfing Uncertainty), it kinda sounds like the model-free motivational system in neuroscience could be craving, and the model-based system non-craving. (Or maybe not, since it’s suggested that model-free would involve more bottom-up influences, which sounds contrary to craving; I’m confused by that.) That discussion of how the brain learns which system to use in which situation, would be compatible with the model where one can gradually unlearn craving using various methods (I’ll get to that in a later post). But I would need to look into this more.
To see how this might work in practice, it helps to start with some examples from a different (but in fact quite closely related) literature. This is the extensive literature concerning choice and decision-making. Within that literature, it is common to distinguish between ‘model-based’ and ‘model-free’ approaches (see, e.g., Dayan, 2012; Dayan & Daw, 2008; Wolpert, Doya, & Kawato, 2003). Model-based strategies rely, as the name suggests, on a model of the domain that includes information about how various states (worldly situations) are connected, thus allowing a kind of principled estimation (given some cost function) of the value of a putative action. Such approaches involve the acquisition and the (computationally challenging) deployment of fairly rich bodies of information concerning the structure of the task-domain. Model-free strategies, by contrast, are said to ‘learn action values directly, by trial and error, without building an explicit model of the environment, and thus retain no explicit estimate of the probabilities that govern state transitions’ (Gläscher et al., 2010, p. 585). Such approaches implement pre-computed ‘policies’ that associate actions directly with rewards, and that typically exploit simple cues and regularities while nonetheless delivering fluent, often rapid, response.
Model-free learning has been associated with a ‘habitual’ system for the automatic control of choice and action, whose neural underpinnings include the midbrain dopamine system and its projections to the striatum, while model-based learning has been more closely associated with the action of cortical (parietal and frontal) regions (see Gläscher et al., 2010). Learning in these systems has been thought to be driven by different forms of prediction error signal—affectively salient ‘reward prediction error’ (see, e.g., Hollerman & Schultz, 1998; Montague et al., 1996; Schultz, 1999; Schultz et al., 1997) for the model-free case, and more affectively neutral ‘state prediction error’ (e.g., in ventromedial prefrontal cortex) for the model-based case. These relatively crude distinctions are, however, now giving way to a much more integrated story (see, e.g., Daw et al., 2011; Gershman & Daw, 2012) as we shall see.
How should we conceive the relations between PP and such ‘model-free’ learning? One interesting possibility is that an onboard process of reliability estimation might select strategies according to context. If we suppose that there exist multiple, competing neural resources capable of addressing some current problem, there needs to be some mechanism that arbitrates between them. With this in mind, Daw et al. (2005) describe a broadly Bayesian ‘principle of arbitration’ whereby estimations of the relative uncertainty associated with distinct ‘neural controllers’ (e.g., ‘model-based’ versus ‘model-free’ controllers) allows the most accurate controller, in the current circumstances, to determine action and choice. Within the PP framework this would be implemented using the familiar mechanisms of precision estimation and precision-weighting. Each resource would compute a course of action, but only the most reliable resource (the one associated with the least uncertainty when deployed in the current context) would get to determine high-precision prediction errors of the kind needed to drive action and choice. In other words, a kind of meta-model (one rich in precision expectations) would be used to determine and deploy whatever resource is best in the current situation, toggling between them when the need arises.
Such a story is, however, almost certainly over-simplistic. Granted, the ‘model-based / model-free’ distinction is intuitive and resonates with old (but increasingly discredited) dichotomies between habit and reason, and between emotion and analytic evaluation. But it seems likely that the image of parallel, functionally independent, neural sub-systems will not stand the test of time. For example, a recent fMRI study (Daw, Gershman, et al., 2011) suggests that rather than thinking in terms of distinct (functionally isolated) model-based and model-free learning systems, we may need to posit a single ‘more integrated computational architecture’ (p. 1204) in which the different brain areas most commonly associated with model-based and model-free learning (pre-frontal cortex and dorsolateral striatum, respectively) each trade in both model-free and model-based modes of evaluations and do so ‘in proportions matching those that determine choice behavior’ (p. 1209). One way to think about this, from within the PP perspective, is by associating ‘model-free’ responses with processing dominated (‘bottom-up’) by the sensory flow, while ‘model-based’ responses are those that involve greater and more widespread kinds of ‘top-down’ influence.5 The context-dependent balancing between these two sources of information, achieved by adjusting the precision-weighting of prediction error, then allows for whatever admixtures of strategy task and circumstances dictate. Support for this notion of a more integrated inner economy was provided by a decision task (Daw, Gershman et al., 2011) in which experimenters were able to distinguish between apparently model-based and apparently model-free influences on subsequent choice and action. This is possible because model-free response is inherently backwards-looking, associating specific actions with previously encountered rewards. Animals exhibiting only model-free responses are, in that sense, condemned to repeat the past, releasing previously reinforced actions when circumstances dictate. A model-based system, by contrast, is able to evaluate potential actions using (as the name suggests) some kind of inner surrogate of the external arena in which actions are to be performed and choices made—such systems may, for example, deploy mental simulations to determine whether or not one action is to be preferred over another. Animals that deploy a model-based system are thus able, in the terms of Seligman et al. (2013), to ‘navigate into the future’ rather than remaining ‘driven by the past’.
Most animals, it now seems clear, are capable of both forms of response and combine dense enabling webs of habit with sporadic bursts of genuine prospection. According to the standard picture, recall, there exist distinct neural valuation systems and distinct forms of prediction error signal supporting each type of learning and response. Using a sequential choice task, Daw et al. were able to create conditions under which the computations of one or other of these neural valuation systems should dissociate from behaviour, revealing the presence of independent computations (in different, previously identified, brain areas) of value by a model-free and a model-based system. Instead they found neural correlates of apparently model-free and apparently model-based responses in both areas. Strikingly, this means that even striatally computed ‘reward prediction errors’ do not simply reflect learning using a truly model-free system. Instead, recorded activity in the striatum ‘reflected a mixture of model-free and model-based evaluations’ (Daw et al., 2011, p. 1209) and ‘even the signal most associated with model-free RL [reinforcement learning], the striatal RPE [reward prediction error], reflects both types of valuation, combined in a way that matches their observed contributions to choice behavior’ (Daw et al., 2011, p. 1210). Top-down information, Daw et al. (2011) suggest, might here control the way different strategies are combined in differing contexts for action and choice. Greater integration between model-based and model-free valuations might also, they speculate, flow from the action of some kind of hybrid learning routine in which a model-based resource may train and tune the responses of a (quicker, in context more efficient) model-free resource.
At a more general level, such results add to a growing literature (for a review, see Gershman & Daw, 2012) that suggests the need for a deep reworking of the standard decision-theoretic model. Where that model posits distinct representations of utility and probability, associated with the activity of more-or-less independent neural sub-systems, we may actually confront a more deeply integrated architecture in which ‘perception, action, and utility are ensnared in a tangled skein [involving] a richer ensemble of dynamical interactions between perceptual and motivational systems’ (Gershman & Daw, 2012, p. 308). The larger picture scouted in this section here makes good functional sense, allowing ‘model-free’ modes to use model-based schemes to teach them how to respond. Within the PP framework, this results in a hierarchical embedding of the (shallow) ‘model-free’ responses in a (deeper) model-based economy. This has many advantages, since model-based schemes are (chapter 5 above) profoundly context-sensitive, whereas model-free or habitual schemes—once in place—are fixed, bound to the details of previous contexts of successful action. By delicately combining the two modes within an overarching economy, adaptive agents may identify the appropriate contexts in which to deploy the model-free (‘habitual’) schemes. ‘Model-based’ and ‘model-free’ modes of valuation and response, if this is correct, simply name extremes along a single continuum and may appear in many mixtures and combinations determined by the task at hand.
Yeah, I haven’t read any of these references, but I’ll elaborate on why I’m currently very skeptical that “model-free” vs “model-based” is a fundamental difference.
I’ll start with an example unrelated to motivation, to take it one step at a time.
Imagine that, every few hours, your whole field of vision turns bright blue for a couple seconds, then turns yellow, then goes back to normal. You have no idea why. But pretty soon, every time your field of vision turns blue, you’ll start expecting it to then turn yellow within a couple seconds. This expectation is completely divorced from everything else you know, since you have no idea why it’s happening, and indeed all your understanding of the world says that this shouldn’t be happening.
Now maybe there’s a temptation here to say that the expectation of yellow is model-free pattern recognition, and to contrast it with model-based pattern recognition, which would be something like expecting a chess master to beat a beginner, which is a pattern that you can only grasp using your rich contextual knowledge of the world.
But I would not draw that contrast. I would say that the kind of pattern recognition that makes us expect to see yellow after blue just from direct experience without understanding why, is exactly the same kind of pattern recognition that originally built up our entire world-model from scratch, and which continues to modify it throughout our lives.
For example, to a 1-year-old, the fact that the words “1 2 3 4...” is usually followed by “5″ is just an arbitrary pattern, a memorized sequence of sounds. But over time we learn other patterns, like seeing two things while someone says “two”, and we build connections between all these different patterns, and wind up with a rich web of memorized patterns that comprises our entire world-model.
Different bits of knowledge can be more or less integrated into this web. “I see yellow after blue, and I have no idea why” would be an extreme example—an island of knowledge isolated from everything else we know. But it’s a spectrum. For example, take everyone on Earth who knows the phrase “E=mc²”. There’s a continuum, from people who treat it as a memorized sequence of meaningless sounds in the same category as “yabba dabba doo”, to people who know that the E stands for energy but nothing else, to physics students who kinda get it, all the way to professional physicists who find E=mc² to be perfectly obvious and inevitable and then try to explain it on Quora because I guess I had nothing better to do on New Years Day 2014… :-)
So, I think model-based and model-free is not a fundamental distinction. But I do think that with different ways of acquiring knowledge, there are systematic trends in prediction strength, with first-hand experience leading to much stronger predictions than less-direct inferences. If I have repeated direct experience of my whole field of vision filling with yellow after blue, that will develop into a very very strong (confident) prediction. After enough times seeing blue-then-yellow, if I see blue-then-green I might literally jump out of my seat and scream!! By contrast, the kind of expectation that we arrive at indirectly via our world model tends to be a weaker prediction. If I see a chess master lose to a beginner, I’ll be surprised, but I won’t jump out of my seat and scream. Of course that’s appropriate: I only predicted the chess master would win via a long chain of uncertain probabilistic inferences, like “the master was trying to win”, “nobody cheated”, “the master was sober”, “chess is not the kind of game where you can win just by getting lucky”, etc. So it’s appropriate for me to be predicting the win with less confidence. As yet a third example, let’s say a professional chess commentator is watching the same match, in the context of a proper tournament. The commentator actually might jump out of her chair and scream when the master loses! For her, the sight of masters crushing beginners is something that she has repeatedly and directly experienced. Thus her prediction is much stronger than mine. (I’m not really into chess.)
All this is about perception, not motivation. Now, back to motivation. I think we are motivated to do things proportionally to our prediction of the associated reward.
I think we learn to predict reward in a similar way that we learn to predict anything else. So it’s the same idea. Some reward predictions will be from direct experience, and not necessarily well-integrated with the rest of our world-model: “Don’t know why, but it feels good when I do X”. It’s tempting to call these “model-free”. Other reward predictions will be more indirect, mediated by our understanding of how some plan will unfold. The latter will tend to be weaker reward predictions in general (as is appropriate since they rely on a longer chain of uncertain inferences), and hence they tend to be less motivating. It’s tempting to call these “model-based”. But I don’t think it’s a fundamental or sharp distinction. Even if you say “it feels good when I do X”, we have to use our world-model to construct the category X and classify things as X or not-X. Conversely, if you make a plan expecting good results, you implicitly have some abstract category of “plans of this type” and you do have previous direct experience of rewards coming from the objects in this abstract category.
Again, this is just my current take without having read the literature :-D
(Update 6 months later: I have read more of the relevant literature since writing this, but basically stand by what I said here.)
This reminds me of my discussion with johnswentworth, where I was the one arguing that model-free vs. model-based is a sliding scale. :)
So yes, it seems reasonable to me that these might be best understood as extreme ends of a spectrum… which was part of the reason why I copied that excerpt, as it included the concluding sentence of “‘Model-based’ and ‘model-free’ modes of valuation and response, if this is correct, simply name extremes along a single continuum and may appear in many mixtures and combinations determined by the task at hand” at the end. :)
I am not the party that used the terms but ot me “yellow then blue” reads as a very simple model and model based thinking.
The part of ” we have to use our world-model to construct the category X and classify things as X or not-X ” reads to me that you do not think that model-free thinking is possible.
You can be a situation and something in it elict you to respond in a way Y without you being aware what is the condition that makes that expereince fall within a triggering reference class. Now if you know you have such a reaction you can by experiment try to to inductive investigation by carefully varying the environment and check whether you do the react or not. Then you might reverse engineer the reflex and end up with a model how the reflex works.
The question of ineffabiolity of neural network might be relevant. If a neural network makes a mistake and tries to avoid doing that mistake in the future a lot of weights are adjusted none of which is easily expressible as a doing a different action in some discrete situation. Even if it is a simple model a model like “blue” seemss to point out a set of criteria how you could rule whether a novel experince falls wihtin the perfew of the model or not. But if you have a ill or fuzzily defined “this kind of situation” that is a completely different thing.
I mostly used examples of aversion in the post, but to be clear, both desire and aversion can be forms of craving. As I noted in another comment, basically any goal can be either craving-based, non-craving-based, or (typically) a mixture of both.
Possible; subjectively they feel like differences in kind, but of course subjective experience is not strong evidence for how something is implemented neurally. Large enough quantitative differences can produce effects that feel like qualitative differences.
I wonder about the connection to the referenced motivational systems; based on a superficial description (e.g. the below excerpt from Surfing Uncertainty), it kinda sounds like the model-free motivational system in neuroscience could be craving, and the model-based system non-craving. (Or maybe not, since it’s suggested that model-free would involve more bottom-up influences, which sounds contrary to craving; I’m confused by that.) That discussion of how the brain learns which system to use in which situation, would be compatible with the model where one can gradually unlearn craving using various methods (I’ll get to that in a later post). But I would need to look into this more.
Yeah, I haven’t read any of these references, but I’ll elaborate on why I’m currently very skeptical that “model-free” vs “model-based” is a fundamental difference.
I’ll start with an example unrelated to motivation, to take it one step at a time.
Imagine that, every few hours, your whole field of vision turns bright blue for a couple seconds, then turns yellow, then goes back to normal. You have no idea why. But pretty soon, every time your field of vision turns blue, you’ll start expecting it to then turn yellow within a couple seconds. This expectation is completely divorced from everything else you know, since you have no idea why it’s happening, and indeed all your understanding of the world says that this shouldn’t be happening.
Now maybe there’s a temptation here to say that the expectation of yellow is model-free pattern recognition, and to contrast it with model-based pattern recognition, which would be something like expecting a chess master to beat a beginner, which is a pattern that you can only grasp using your rich contextual knowledge of the world.
But I would not draw that contrast. I would say that the kind of pattern recognition that makes us expect to see yellow after blue just from direct experience without understanding why, is exactly the same kind of pattern recognition that originally built up our entire world-model from scratch, and which continues to modify it throughout our lives.
For example, to a 1-year-old, the fact that the words “1 2 3 4...” is usually followed by “5″ is just an arbitrary pattern, a memorized sequence of sounds. But over time we learn other patterns, like seeing two things while someone says “two”, and we build connections between all these different patterns, and wind up with a rich web of memorized patterns that comprises our entire world-model.
Different bits of knowledge can be more or less integrated into this web. “I see yellow after blue, and I have no idea why” would be an extreme example—an island of knowledge isolated from everything else we know. But it’s a spectrum. For example, take everyone on Earth who knows the phrase “E=mc²”. There’s a continuum, from people who treat it as a memorized sequence of meaningless sounds in the same category as “yabba dabba doo”, to people who know that the E stands for energy but nothing else, to physics students who kinda get it, all the way to professional physicists who find E=mc² to be perfectly obvious and inevitable and then try to explain it on Quora because I guess I had nothing better to do on New Years Day 2014… :-)
So, I think model-based and model-free is not a fundamental distinction. But I do think that with different ways of acquiring knowledge, there are systematic trends in prediction strength, with first-hand experience leading to much stronger predictions than less-direct inferences. If I have repeated direct experience of my whole field of vision filling with yellow after blue, that will develop into a very very strong (confident) prediction. After enough times seeing blue-then-yellow, if I see blue-then-green I might literally jump out of my seat and scream!! By contrast, the kind of expectation that we arrive at indirectly via our world model tends to be a weaker prediction. If I see a chess master lose to a beginner, I’ll be surprised, but I won’t jump out of my seat and scream. Of course that’s appropriate: I only predicted the chess master would win via a long chain of uncertain probabilistic inferences, like “the master was trying to win”, “nobody cheated”, “the master was sober”, “chess is not the kind of game where you can win just by getting lucky”, etc. So it’s appropriate for me to be predicting the win with less confidence. As yet a third example, let’s say a professional chess commentator is watching the same match, in the context of a proper tournament. The commentator actually might jump out of her chair and scream when the master loses! For her, the sight of masters crushing beginners is something that she has repeatedly and directly experienced. Thus her prediction is much stronger than mine. (I’m not really into chess.)
All this is about perception, not motivation. Now, back to motivation. I think we are motivated to do things proportionally to our prediction of the associated reward.
I think we learn to predict reward in a similar way that we learn to predict anything else. So it’s the same idea. Some reward predictions will be from direct experience, and not necessarily well-integrated with the rest of our world-model: “Don’t know why, but it feels good when I do X”. It’s tempting to call these “model-free”. Other reward predictions will be more indirect, mediated by our understanding of how some plan will unfold. The latter will tend to be weaker reward predictions in general (as is appropriate since they rely on a longer chain of uncertain inferences), and hence they tend to be less motivating. It’s tempting to call these “model-based”. But I don’t think it’s a fundamental or sharp distinction. Even if you say “it feels good when I do X”, we have to use our world-model to construct the category X and classify things as X or not-X. Conversely, if you make a plan expecting good results, you implicitly have some abstract category of “plans of this type” and you do have previous direct experience of rewards coming from the objects in this abstract category.
Again, this is just my current take without having read the literature :-D
(Update 6 months later: I have read more of the relevant literature since writing this, but basically stand by what I said here.)
This reminds me of my discussion with johnswentworth, where I was the one arguing that model-free vs. model-based is a sliding scale. :)
So yes, it seems reasonable to me that these might be best understood as extreme ends of a spectrum… which was part of the reason why I copied that excerpt, as it included the concluding sentence of “‘Model-based’ and ‘model-free’ modes of valuation and response, if this is correct, simply name extremes along a single continuum and may appear in many mixtures and combinations determined by the task at hand” at the end. :)
I am not the party that used the terms but ot me “yellow then blue” reads as a very simple model and model based thinking.
The part of ” we have to use our world-model to construct the category X and classify things as X or not-X ” reads to me that you do not think that model-free thinking is possible.
You can be a situation and something in it elict you to respond in a way Y without you being aware what is the condition that makes that expereince fall within a triggering reference class. Now if you know you have such a reaction you can by experiment try to to inductive investigation by carefully varying the environment and check whether you do the react or not. Then you might reverse engineer the reflex and end up with a model how the reflex works.
The question of ineffabiolity of neural network might be relevant. If a neural network makes a mistake and tries to avoid doing that mistake in the future a lot of weights are adjusted none of which is easily expressible as a doing a different action in some discrete situation. Even if it is a simple model a model like “blue” seemss to point out a set of criteria how you could rule whether a novel experince falls wihtin the perfew of the model or not. But if you have a ill or fuzzily defined “this kind of situation” that is a completely different thing.