Do Humans Want Things?
Summary: Recent posts like The Neuroscience of Desire and To what degree do we have goals? have explored the question of whether humans have desires (or ‘goals’). If we don’t have desires, how can we tell an AI what kind of world we ‘want’? Recent work in economics and neuroscience has clarified the nature of this problem.
We begin, as is so often the case on Less Wrong, with Kahneman & Tversky.
In 1981, K&T found that human choice was not always guided by the objective value of possible outcomes, but by the way those outcomes were ‘framed’.1 For example in one study, K&T told subjects the following story:
Imagine that the U.S. is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed.
Half the participants were given the following choice:
If program A is adopted, 200 people will be saved. If Program B is adopted, there is a 1⁄3 probability that 600 people will be saved and a 2⁄3 probability that no people will be saved.
The second half of participants were given a different choice:
If Program C is adopted 400 people will die. If Program D is adopted there is a 1⁄3 probability that nobody will die, and a 2⁄3 probability that 600 people will die.
Each of these choice sets is identical, except that one is framed with language about people being saved, and the other is framed with language about people dying.
In the first group, 72% of subjects chose Program A. In the second group, only 22% of people chose the numerically identical option: Program C.
K&T explained the difference by noting that in option A we consider the happy thought of saving 200 people, but in option C we confront the dreadful thought of 400 deaths. Our choice seems to depend not only on the objective properties of the options before us, but also on the reference point used to frame the options.
But if this is how human desire works, we are left with a worrying problem about how to translate human desires into the goals of an AI. Surely we don’t want an AI to realize one state of affairs over another based merely on how the options are framed!
Before we begin to solve this problem, though, let’s look at a similar result from neurobiology.
Reference-Dependence in Neurobiology
A different kind of reference-dependence has been discovered in the way that neurons encode value.
Imagine sitting in a windowless room with Mark, who is wearing blue jeans and a green t-shirt. Your perception of Mark results from about 1017 photons/second with a mean wavelength of 450 nanometers coming from every square centimeter of Mark’s blue jeans, and about 1017 photons/second with a mean wavelength of 550 nanometers coming from every square centimeter of his green shirt.
Now, you and Mark step outside, and are briefly blinded by the sun. A minute later you sit on a park bench. Mark looks the same as before: blue jeans, green shirt. But now, in the bright sun, your identical subjective perceptual experience of Mark results from about 1023 450-nm photons/second/cm2 coming from his blue jeans, and about 1023 550-nm photons/second/cm2 coming off his green shirt.
A six-order-of-magnitude shift in the objective reality of the stimulus has resulted in no change in your subjective experience of Mark.2
How did this happen?
What changed was the illuminant, the sun. But for Earth-bound mammals, changes in an object millions of miles away are not very important. What matters for our survival and reproduction is information about the objects immediately around us. So our brains subtract away the changing effects of the sun as we move in and out of direct sunlight.
This ‘illuminant subtraction’ process occurs during the first step of visual processing, during transduction. The rods and cones of the retina compute an average of local light intensity, which is used as a reference point.3 Changes of light intensity from this reference point are what the rods and cones communicate to the rest of the nervous system.
Thus: information about the objective intensity of incoming light is irretrievably lost at the transducer. Light intensity is stored in the brain only in a reference-dependent way.
The same is true of our other senses. Sound intensity can differ between a quiet room and a rock concert by as much as 10 order of magnitude,4 and our ears respond by shifting the reference point and encoding sound intensity relative to that reference point.5 A rose may smell sweet in my bedroom, but its scent will be hidden in a field of roses.6 The somatosensory system appears to operate with the same principle. You feel your clothes when you first put them on, but the nerve endings in your skin stop reporting their existence except where your clothes are shifting across your skin or their pressure on your skin is changing.7 And the same is true for taste. How salty something tastes, for example, depends on the amount of sodium in your blood and in surrounding tissue in your mouth.8
I wrote before about how neurons encode value. But now it seems that, as neuroscientist Paul Glimcher puts it:
All sensory encoding is reference dependent: nowhere in the nervous system are the objective values of consumable rewards encoded.9
Thus we smack headlong into another constraint for our theories about human values and their extrapolation. Human brains can’t (directly) encode value for the objective intensities of stimuli because that information is lost at the transducer.
It’s beginning to seem that our folk theories about humans ‘wanting’ things in the world were naive.
Do Humans Want Things?
It has traditionally been thought that humans desire (or value) states of affairs:
A desire for tea is a desire for a certain state of affairs one has in mind: that one drink some tea. A desire for a new pair of skates is likewise a desire for another state of affairs: that one own a new pair of skates. And so on.10
Intuitively, when we think about what we want, it seems that we want certain states of affairs to obtain. We want to be smarter. We want there to be world peace. We want to live forever while having fun.
But as far as we can tell, our behavior is often not determined by our wanting a particular state of affairs, but by how our options are framed.
Moreover, neurons in the parietal and orbitofrontal corticies encode value in a reference-dependent way — that is, they do not encode value for objective states of affairs.11 So in what sense do humans ‘want’ objective states of affairs?
(Compare: In what sense does the blue-minimizing robot ‘want’ anything?)
In a later post, I’ll explain in greater detail how brains do (and don’t) encode value for states of affairs. In the meantime, you might want to try to figure out on your own in what sense the brain might want things.
Notes
1 Tversky & Kahneman (1981).
2 This example, and the outline of this post, is taken from Glimcher (2010), ch. 12.
3 Burns & Baylor (2001).
4 Bacchus (2006); Robinson & McAlpine (2009).
5 Squire et al. (2008), ch. 26.
6 Mountcastle (2005); Squire et al. (2008), pp. 565-567.
7 Squire et al. (2008), ch. 25.
8 Squire et al. (2008), pp. 555-556.
9 Glimcher (2010), p. 278. Moreover, objective properties of the real world are not even linearly related to our subjective experience. The intensity of our perception of the world grows as a power law, the exact rate of which depends on the kind of stimulus (Stevens 1951, 1970, 1975). For example, we’ve found that:
Perceived warmth of a patch of skin = (temp. of that skin)0.7
And, another example:
Perceived intensity of an electrical shock = (electrical current)3.5
10 Schroeder (2009).
11 It’s less certain how values are encoded in the medial prefrontal cortex and in the temporal cortex, but Paul Glimcher predicts (in personal communication with me from June 2011) that this will also be a largely reference-dependent process.
References
Baccus (2006). From a whisper to a roar: Adaptation to the mean and variance of naturalistic sounds. Neuron, 51: 682-684.
Burns & Baylor (2001). Activation, deactivation, and adaptation in vertebrate photoreceptor cells. Annual Review of Neuroscience, 24: 779-805.
Glimcher (2010). Foundations of Neuroeconomic Analaysis. Oxford University Press.
Mountcastle (2005). The Sensory Hand: Neural Mechanisms of Somatic Sensation. Harvard University Press.
Robinson & McAlpine (2009). Gain control mechanisms in the auditory pathway. Current Opinion in Neurobiology, 19: 402-407.
Schroeder (2009). Desire. Stanford Encyclopedia of Philosophy.
Squire, Berg, Bloom, du Lac, & Ghosh, eds. (2008). Fundamental Neuroscience, Third Edition. Academic Press.
Stevens (1951). Handbook of Experimental Psychology, 1st edition. John Wiley & Sons.
Stevens (1970). Neural events and the psychophysical law. Science, 170: 1043-1050.
Stevens (1975). Psychophysics: Introduction to its Perceptual, Neural and Social Prospects. Wiley.
Tversky & Kahneman (1981). The framing of decisions and the psychology of choice. Science, 211: 453-458.
- A Crash Course in the Neuroscience of Human Motivation by 19 Aug 2011 21:15 UTC; 203 points) (
- The Human’s Hidden Utility Function (Maybe) by 23 Jan 2012 19:39 UTC; 67 points) (
- What AI Safety Researchers Have Written About the Nature of Human Values by 16 Jan 2019 13:59 UTC; 52 points) (
- 20 Aug 2011 18:03 UTC; 30 points) 's comment on What a practical plan for Friendly AI looks like by (
- A brief tutorial on preferences in AI by 21 Feb 2012 5:29 UTC; 19 points) (
- Holden’s Objection 1: Friendliness is dangerous by 18 May 2012 0:48 UTC; 13 points) (
- 16 Mar 2017 22:54 UTC; 0 points) 's comment on Could utility functions be for narrow AI only, and downright antithetical to AGI? by (
I’m not certain your examples of reference-dependent encoding of sense-data really demonstrate or have much to do with a lack of objective goals. (Of course, the framing effect example demonstrates this plenty well. :P ) As you point out, this is largely just adjusting for irrelevant background, like whether the sun is out, when what we care about has nothing to do with that. This is just throwing away the information at an early stage, rather than later after having explicitly determined that it’s irrelevant to our goals.
I agree that the framing effect is more important than the reference-dependence of sense-data encoding. However, the loss of sense-data is not always just “adjusting for irrelevant background”, and is not always throwing away something we would later have decided is “irrelevant to our goals.”
When I first read the post, I thought you were going to say something along the lines of:
“Evolution has optimized us to strip away the irrelevant features when it comes to vision, since it’s been vital for our survival. But evolution hasn’t done that for things like abstract value, since there’s been no selection pressure for that. It’s bad that our judgments in cases like the K&T examples don’t work more like vision, but that’s how it goes”.
Indeed, saying “let’s make the problem worse” and then bringing up vision feels a bit weird. After all, vision seems like a case where our brain does things exactly right—it ignores the “framing effects” caused by changed lightning conditions and leaves invariant the things that actually matter.
I wrote a response here.
An illuminating (no pun intended) example of when the adjustment to the ambient level of sense-data affects what people think they want would be nice. Without it the whole section seems to detract from your point.
I wrote a response here.
But I’m not raising a puzzle about how people think they want things even when they are behavioristic machines. I’m raising a puzzle about how we can be said to actually want things even when they are behavioristic machines that, for example, exhibit framing effects and can’t use neurons to encode value for the objective intensities of stimuli.
Suppose you have a neurological disorder that will be cured by a 140-volt electrical shock. If your brain can’t encode value for propositions or simulated states of affairs or anything like that, but only for stimuli, then this reference point business I described means that your brain doesn’t have the option of encoding value for a 140-volt electrical shock, because it never receives that kind of information in the first place. The transducer discards information about the objective intensity of the stimuli before the signal reaches the brain.
As Kaj says, this is a smart solution to lots of problems, but it does mean that the brain cannot encode value for the objective intensity of stimuli… at least given what I’ve explained in this post so far. (Model-based representations of value will be described later.)
Does that make sense?
If the neurological problem is located in the brain, then the brain does record information about the objective intensity of the stimuli, by being cured or not cured.
I’m confused about what the purpose of this example is. There are easier ways to show why not encoding values for propositions is problematic.
Sure, but what I’m saying is that this doesn’t happen in a way that allows your neurons to encode value for a 140-volt electrical shock. Perhaps you’ve already accepted this and find it obvious, but others (e.g. economists) do not. This kind of information about how the brain works constrains our models of human behavior, just like the stochasticity of neuron firing does.
But I’m not trying to show why encoding values for propositions is problematic. I’m trying to say that the brain does not encode values for objective intensities of stimuli.
Given that one could use propositions about objective intensities of stimuli (as you do now, to point out that what’s encoded in this particular simple way is not it), the thesis is still unclear.
Sure, but that depends on a different mechanism we don’t know much about, then. What I’m saying is that “Whaddyaknow, we discovered a mechanism that actually encodes value for stimuli with neuron firing rates! Ah, but it can’t encode value for the objective intensities of stimuli, because the brain doesn’t have that information. So that constrains our theories about the motivation of human behavior.”
The brain has (some measure of reference/access to) that information, just not in that particular form. And if it has (reference to) that information, it’s not possible to conclude that motivation doesn’t refer to it. It just doesn’t refer to it through exclusively the form of representation that doesn’t have the information, but then it would be very surprising if motivation compartmentalized so.
Right. I guess I’m struggling for a concise way to say what I’m trying to say, and hoping you’ll interpret me correctly based on the long paragraphs I’ve written explaining what I mean by these shorter sentences. Maybe something like:
“Whaddyaknow, we discovered a mechanism that actually encodes value for stimuli with neuron firing rates! Ah, but this particular mechanism can’t encode value for the objective intensities of stimuli, because this mechanism discards that information at the transducer. So that constrains our theories about the motivation of human behavior.”
Also, this doesn’t sound right. Why is that behavioral pattern “value”? Maybe it should be edited out of the system, like pain, or reversed, or modified in some complicated way.
Doesn’t really help. The problem is that (normative) motivation is the whole thing, particular (unusual) decisions can be formed by any component, so it’s unclear how to rule out stuff on the basis of properties of particular better-understood components.
Behavior is easier to analyze, you can see which factors contribute how much, and in this sense you can say that particular classes of behavior are determined mostly by this here mechanism that doesn’t have certain data, and so behavior is independent from that data. But such conclusions won’t generalize to normative motivation, because prevailing patterns of behavior might be suboptimal, and it’s possible to improve them (by exercising the less-prevalent modes of behavior that are less understood), making them depend on things that they presently don’t depend on.
What do you mean by ‘normative motivation’?
Considerations that should motivate you. What do you mean by “motivation”?
Uh oh. How did ‘should’ sneak its way into our discussion? I’m just talking about positive accounts of human motivation.
Until data give us a clearer picture of what we’re talking about, ‘motivation’ is whatever drives (apparently) goal-seeking behavior.
I guess the objection I have is to calling the behavioral summary “motivation”, a term that has normative connotations (similarly, “value”, “desire”, “wants”, etc.). Asking “Do we really want X?” (as in, does a positive account of some notion of “wanting” say that we “want” X, to the best of our scientific knowledge) sounds too similar to asking “Should we pursue X?” or even “Can we pursue X?”, but is a largely unrelated question with similarly unrelated answers.
I’m using these terms the way they are standardly used in the literature. If you object to the common usage, perhaps you could just read my articles with the assumption that I’m using these words the way neuroscientists and psychologists do, and then state your concerns about the standard language in the comments? I can’t rewrite my articles for each reader who has their own peculiar language preferences...
The real question is, do you agree with my characterization of the intended meaning of these intentionality-scented words (as used in particularly this article, say) as being mostly unrelated to normativity, that is to FAI-grade machine ethics? It is unclear to me if you agree or not. If there is some connection, what is it? It is also unclear to me how confusing or clear this question appears to other readers.
(On the other hand, who or what bears the blame for my (or others’) peculiar confusions is uninteresting.)
I don’t recall bringing up the issue of blame. All I’m saying is that I don’t have time to write a separate version of each post to accomodate each person’s language preferences, so I’m usually going to use the standard language used by the researchers in the field I’m discussing.
Words like ‘motivation’, ‘value’, ‘desire’, ‘want’ don’t have normative connotations in my head when I’m discussing them in the context of descriptivist neuroscience. The connotations in your brain may vary. I’m trying to discuss merely descriptive issues; I intend to start using descriptive facts to solve normative problems later. For now, I want to focus on getting a correct descriptive understanding of the system that causes humans do what they do before applying that knowledge to normative questions about what humans should do or what a Friendly AI should do.
Does that make sense?
Yes, it clarifies your intended meaning for the words, and resolves my confusion (for the second time; better watch for the confusing influence of those connotations in the future).
(I’m still deeply skeptical that descriptive understanding can help with FAI, but this is mostly unrelated to this post and others, which are good LW material when not confused for discussion of normativity.)
How would you (descriptively, “from the outside”) explain the fact that you didn’t provide information that would resolve my confusion (that you provided now), and instead pointed out that the reason for your actions lies in a tradition (conventional usage), and that I should engage that tradition directly? It seems like you were moved by considerations of assignment of blame (or responsibility), specifically you directed my attention to the process responsible for the problem. (I don’t expect you thought of this so explicitly, but still something caused your response to go the way it went.)
I don’t think blame works the way you seem to.
Blame is condemnation useful in shaping the future. It’s not latent in who had the best opportunity to avoid a problem, or the last clear chance to avoid a problem, or who began a problem, etc.
Responsibility is something political beings invent to relate agents to causation.
When people talk about causation they’re not necessarily playing that game.
Hmmm. I’m sorry you took it that way. I’m starting to get the sense that perhaps you see more connotations of normativity and judgment in general, and I try to see the world through the lens of a descriptivist project by default except for those rare occasions when I’m going to take a dangerous leap into the confusing lands of normativity.
I didn’t know which information would resolve your confusion until after I stumbled upon it. The point about common usage merely meant to explain why I’m using the terms the way I am.
(Strictly speaking, it’s not necessary to know something in order to be motivated by it. If a fact is considered relevant, but isn’t known, that creates instrumental motivation for finding out what it is! And even if you can’t learn something, you might want to establish a certain dependence of the outcome on that fact, no matter what the fact is.)
Ah, understood.
Okay. I understand that it’s a fact that the brain doesn’t encode values for objective intensities of sensory stimuli. My puzzlement comes from when you say
I don’t see the fact as an additional problem for a theory of human values. But there’s no point in arguing about this, as I think we’d both agree that any theory of human values would have to accommodate the fact.
Hmmm. Maybe a clearer way to say it is just that this neurobiological finding further constrains our theories. I’ll change the wording in the OP, thanks.
Personally, I’d be pretty happy with an FAI that just let me appreciate what I already have more effectively. I know that I’m living in an awesome science fiction world relative to how things were in the 60s, not to mention how things were in the 1600s, but even though I remind myself of this on a regular basis I’m still not as ecstatically happy as I think I ought to be.
This morning I accidentally locked myself out of my bedroom after getting up in the middle of the night to go to the restroom. I waited several hours, trying to get some sleep on the carpet, before the manager for the apartment complex I live in came and unlocked my door for me. Lying down in my bed again, I realized more than I ever have in my life how awesome beds are, and this realization caused me to giggle with delight.
I thought you preferred sleeping on piles of coins?
Being part of medieval recreation society helps with that. Apart from the fun of dressing up and fighting other people with swords for a few days on camp… you then get to go home to antibiotics, clean water and washing machines. :)
This is wrong:
Human brains throw away some of that information at the transducer. (What an odd way of putting it, by the way: the whole point is that it isn’t the brain that’s doing the throwing away.) That means that some objective properties aren’t directly available to us. It doesn’t mean that no objective properties are available to us. And, in any case, …
The fact that something isn’t directly available to our senses is no reason why we can’t value it—excuse me, I mean no reason why our brains can’t encode value for it. I can perfectly easily care whether I have $2 or $2000000 in the bank even though my nervous system isn’t wired to my bank’s computers. I can perfectly easily care whether my wife loves me even though my assessment of whether (and how much, how consistently, etc.) she does is a matter of subtle inferences. Why on earth should we only be able to value things we can perceive directly?
Further to #2, you could certainly argue that there’s a particular kind of valuing—a particularly immediate and instinctual sort—that can’t be applied to the absolute values of things we perceive only relatively. Maybe so (though I’m not convinced; it’s possible, e.g., to feel visceral terror at the prospect of losing your job or having a slow degenerative disease or something, even though those aren’t things we’re wired up to perceive directly) but so what?
Agreed, fixed.
Well, it throws away information about objective stimuli intensity in such away that the objective stimuli intensity cannot be recovered. Obviously it doesn’t throw away all information, but merely information needed to encode value for objective stimuli intensities.
Exactly; hence the puzzle. If our brains aren’t encoding value for X, how can we be said to value X? This is something we’ll explore in future posts.
I’ve had the most pleasant evening trying to find research discounting your claims and instead having my beliefs whipped around by evidence (though I still don’t understand how a neuron can be said encode in a purely reference independent manner given as with pain receptors’ sensing thresholds for mechanical, thermal, and chemical changes or absolute pitch recognition).
One of the few sources my motivated cognition discovered was the work of Padoa-Schioppa, who found, for instance,
Which of course seems another reference independent encoding, though there is just about as much evidence the other way on the subject of the OFC, such as Elliott (2008).
Which Padoa-Schioppa paper is that?
The passage was from Range-adapting representation of economic value in the orbitofrontal cortex. You might also be interested in The orbitofrontal cortex and beyond: from affect to decision-making (Rolls, Grabenhorst 2008), which presents a high level summary of research on the topic, with dozens of citations of consistent and continuous stimulus representations in OFC for a few species and and primary reinforcers.
Psychophysics also provides examples of absolute value encodings over external stimuli such as the thresholds of pain, the absolute threshold of hearing, and absolute pitch.
Related: What is Evidence?. There needs to be some way that makes the facts about the world control your actions (beliefs, application of moral considerations), but it can be arbitrarily complicated, so long as some dependence is retained.
My hypothesis is that something qualifies as agent’s goal to the extent the agent tries to make actions that argmax the dependence of that goal on those actions.
In this case, where information about the relevant states of the world gets obfuscated in various ways before it reaches the decision, the agent looks at how its action affects those states of the world, but it doesn’t look at how its action affects the framing, or the way information is distorted. The objective in making a decision is to affect the states of the world, not to affect the way information about the states of the world is delivered.
This then is the distinction between the facts in the world the agent cares about and the way in which information about those facts travels to agent’s decisions: agent’s actions depend on the dependence of the former on its actions, but not on the dependence of the latter on its actions.
I’d really like to see a K&T-style study that wasn’t about death. We already know mortality salience has a priming effect. But is it correct to generalize it to other kinds of framing?
Check out most of behavioral economics. (I recommend Dan Gilbert on Ted, not linked to avoid trivial chances to waste time)
Yeah, I don’t watch TED anymore. Any other specific suggestions?
I can’t give another suggestion unless you tell me what’s undesirable about watching TED. There’s a transcript on the site, but he uses graphics copiously, so I’m curious how useful it is. Less Wrong says it is too long to post as a comment.
I don’t like watching videos of lectures. I thought perhaps you had more references on behavioral economics; if you don’t, no big deal.
Typo alert: “obritofrontal”.
Fixed, thanks.
Maybe we don’t want specific things in a consistent way. Generalizing from that to we don’t want things seems premature. Maybe we just want the taste of tea and are willing to adapt that desire to whatever sort of tea cup we see moment by moment. I would say those recent posts have explored the degree to which we have goals, not effectively opened the question of whether or not we have them.
At the very least goals which we write down and tell others about have some impetus in our lives, through social reinforcement, commitment effects, and the capability to enlist others and build structures which cause those goals to be expressed more powerfully.
Though it seems our brains might not be very good at having explicit goals in useful ways.
Is this that later post: https://www.lesswrong.com/posts/fa5o2tg9EfJE77jEQ/the-human-s-hidden-utility-function-maybe ?
This is very interesting; but perhaps we can rescue values by a transformation. As I’ve argued before, happiness is roughly the first derivative of utility. You are saying that our other values are, similarly, based on changes between two measurements rather than just a single measurement. Sounds much the same—and finding the right transformation may make values tractable.
One real problem is if there is sufficient circularity in the set of values and measurements to cause ambiguity. That is, if you have a set of equations describing the values resulting from a particular trajectory in measurements, we hope it has at most one solution.
Some human values are directly about our own sensory experiences (sunshine feels good, loud noises are unpleasant) and some values are about the state of the world (such as the sensory experiences of other people). That our bodies throw away information at the transducer is not a problem for a theory of the values that are about our own sensory experiences.
The big problem for a theory of those values that are about the state of the world is that our brains might not consistently associate objective values to symbolic representations of the state of the world.
I wrote here: