Here’s a comment that I posted in a discussion on Eliezer’s FB wall a few days back but didn’t receive much of a response there, maybe it’ll prompt more discussion here:
--
So this reminds me, I’ve been thinking for a while that VNM utility might be a hopelessly flawed framework for thinking about human value, but I’ve had difficulties putting this intuition in words. I’m also pretty unfamiliar with the existing literature around VNM utility, so maybe there is already a standard answer to the problem that I’ve been thinking about. If so, I’d appreciate a pointer to it. But the theory described in the linked paper seems (based on a quick skim) like it’s roughly in the same direction as my thoughts, so maybe there’s something to them.
Here my stab at trying to describe what I’ve been thinking: VNM utility implicitly assumes an agent with “self-contained” preferences, and which is trying to maximize the satisfaction of those preferences. By self-contained, I mean that they are not a function of the environment, though they can and do take inputs from the environment. So an agent could certainly have a preference that made him e.g. want to acquire more money if he had less than $5000, and which made him indifferent to money if he had more than that. But this preference would be conceptualized as something internal to the agent, and essentially unchanging.
That doesn’t seem to be how human preferences actually work. For example, suppose that John Doe is currently indifferent between whether to study in college A or college B, so he flips a coin to choose. Unbeknownst to him, if he goes to college A he’ll end up doing things together with guy A until they fall in love and get monogamously married; if he goes to college B he’ll end up doing things with gal B until they fall in love and get monogamously married. It doesn’t seem sensible to ask which choice better satisfies his romantic preferences as they are at the time of the coin flip. Rather, the preference for either person develops as a result of their shared life-histories, and both are equally good in terms of intrinsic preference towards someone (though of course one of them could be better or worse at helping John achieve some other set of preferences).
More generally, rather than having stable goal-oriented preferences, it feels like we acquire different goals as a result of being in different environments: these goals may persist for an extended time, or be entirely transient and vanish as soon as we’ve left the environment.
As an another example, my preference for “what do I want to do with my life” feels like it has changed at least three times today alone: I started the morning with a fiction-writing inspiration that had carried over from the previous day, so I wished that I could spend my life being a fiction writer; then I read some e-mails on a mailing list devoted to educational games and was reminded of how neat such a career might be; and now this post made me think of how interesting and valuable all the FAI philosophy stuff is, and right now I feel like I’d want to just do that. I don’t think that I have any stable preference with regard to this question: rather, I could be happy in any career path as long as there were enough influences in my environment that continued to push me towards that career.
There are a few basic life activities (eating, sleeping, etc.) that cannot be ignored and have to be maintained to some degree in order to function. Beyond these, however, it’s remarkable how much variation is possible in what people care about and spend their time thinking about. Merely reflecting upon my own life, I can see how vastly the kinds of things I find interesting and important have changed. Some topics that used to matter so much to me are now essentially irrelevant except as whimsical amusements, while others that I had never even considered are now my top priorities.
The scary thing is just how easily and imperceptibly these sorts of shifts can happen. I’ve been amazed to observe how much small, seemingly trivial cues build up to have an enormous impact on the direction of one’s concerns. The types of conversations I overhear, blog entries and papers and emails I read, people I interact with, and visual cues I see in my environment tend basically to determine what I think about during the day and, over the long run, what I spend my time and efforts doing. One can maintain a stated claim that “X is what I find overridingly important,” but as a practical matter, it’s nearly impossible to avoid the subtle influences of minor day-to-day cues that can distract from such ideals.
If this is the case, then it feels like trying to maximize preference satisfaction is an incoherent idea in the first place. If I’m put in environment A, I will have one set of goals; if I’m put in environment B, I will have another set of goals. There might not be any way of constructing a coherent utility function so that we could compare the utility that we obtain from being put in environment A versus environment B, since our goals and preferences can be completely path- and environment-dependent. Extrapolated meta-preferences don’t seem to solve this either, because there seems to be no reason to assume that they’d any less stable or self-contained.
I don’t know what we could use in place of VNM utility, though. At it feels like the alternate formalism should include the agent’s environment/life history in determining its preferences.
I also have lots of objections to using VNM utility to model human preferences. (A comment on your example: if you conceive of an agent as accruing value and making decisions over time, to meaningfully apply the VNM framework you need to think of their preferences as being over world-histories, not over world-states, and of their actions as being plans for the rest of time rather than point actions.) I might write a post about this if there’s enough interest.
I’ve always thought of it as preferences over world-histories and I don’t see any problem with that. I’d be interested in the post if it covers a problem with that formulation
Robin Hanson writes about rank linear utility. This formalism asserts that we value options by their rank in a list of options available at any one time, making it impossible to construct a coherent classical utility function.
I don’t think of things like “what I want to do with my life” as terminal preferences—just instrumental preferences that depend on the niche you find yourself in. Terminal stuff is more likely to be simple/human universal stuff (think Maslow’s hierarchy of needs)
I think you’ll probably find Kevin Simler’s essays on personality interesting, and he does a good job explaining and exploring this idea.
What I think is happening is that we’re allowed to think of humans as having VNM utility functions ( see also my discussion with Stuart Armstrong ), but the utility function is not constant over time (since we’re not introspective recursively modifying AIs that can keep their utility functions stable).
Here’s a comment that I posted in a discussion on Eliezer’s FB wall a few days back but didn’t receive much of a response there, maybe it’ll prompt more discussion here:
--
So this reminds me, I’ve been thinking for a while that VNM utility might be a hopelessly flawed framework for thinking about human value, but I’ve had difficulties putting this intuition in words. I’m also pretty unfamiliar with the existing literature around VNM utility, so maybe there is already a standard answer to the problem that I’ve been thinking about. If so, I’d appreciate a pointer to it. But the theory described in the linked paper seems (based on a quick skim) like it’s roughly in the same direction as my thoughts, so maybe there’s something to them.
Here my stab at trying to describe what I’ve been thinking: VNM utility implicitly assumes an agent with “self-contained” preferences, and which is trying to maximize the satisfaction of those preferences. By self-contained, I mean that they are not a function of the environment, though they can and do take inputs from the environment. So an agent could certainly have a preference that made him e.g. want to acquire more money if he had less than $5000, and which made him indifferent to money if he had more than that. But this preference would be conceptualized as something internal to the agent, and essentially unchanging.
That doesn’t seem to be how human preferences actually work. For example, suppose that John Doe is currently indifferent between whether to study in college A or college B, so he flips a coin to choose. Unbeknownst to him, if he goes to college A he’ll end up doing things together with guy A until they fall in love and get monogamously married; if he goes to college B he’ll end up doing things with gal B until they fall in love and get monogamously married. It doesn’t seem sensible to ask which choice better satisfies his romantic preferences as they are at the time of the coin flip. Rather, the preference for either person develops as a result of their shared life-histories, and both are equally good in terms of intrinsic preference towards someone (though of course one of them could be better or worse at helping John achieve some other set of preferences).
More generally, rather than having stable goal-oriented preferences, it feels like we acquire different goals as a result of being in different environments: these goals may persist for an extended time, or be entirely transient and vanish as soon as we’ve left the environment.
As an another example, my preference for “what do I want to do with my life” feels like it has changed at least three times today alone: I started the morning with a fiction-writing inspiration that had carried over from the previous day, so I wished that I could spend my life being a fiction writer; then I read some e-mails on a mailing list devoted to educational games and was reminded of how neat such a career might be; and now this post made me think of how interesting and valuable all the FAI philosophy stuff is, and right now I feel like I’d want to just do that. I don’t think that I have any stable preference with regard to this question: rather, I could be happy in any career path as long as there were enough influences in my environment that continued to push me towards that career.
It’s as Brian Tomasik wrote at http://reducing-suffering.blogspot.fi/2010/04/salience-and-motivation.html :
If this is the case, then it feels like trying to maximize preference satisfaction is an incoherent idea in the first place. If I’m put in environment A, I will have one set of goals; if I’m put in environment B, I will have another set of goals. There might not be any way of constructing a coherent utility function so that we could compare the utility that we obtain from being put in environment A versus environment B, since our goals and preferences can be completely path- and environment-dependent. Extrapolated meta-preferences don’t seem to solve this either, because there seems to be no reason to assume that they’d any less stable or self-contained.
I don’t know what we could use in place of VNM utility, though. At it feels like the alternate formalism should include the agent’s environment/life history in determining its preferences.
I also have lots of objections to using VNM utility to model human preferences. (A comment on your example: if you conceive of an agent as accruing value and making decisions over time, to meaningfully apply the VNM framework you need to think of their preferences as being over world-histories, not over world-states, and of their actions as being plans for the rest of time rather than point actions.) I might write a post about this if there’s enough interest.
I’ve always thought of it as preferences over world-histories and I don’t see any problem with that. I’d be interested in the post if it covers a problem with that formulation
I would be very interested in that.
Robin Hanson writes about rank linear utility. This formalism asserts that we value options by their rank in a list of options available at any one time, making it impossible to construct a coherent classical utility function.
Yeah, that was my first link in the comment. :-) Still good that you summarized it, though, since not everyone’s going to click on the link.
Oops, I frankly did not see the link. The one time I thought I could contribute …
Well, like I said, it was probably a good thing to post and briefly summarize anyway. If you missed the link, others probably did too.
I don’t think of things like “what I want to do with my life” as terminal preferences—just instrumental preferences that depend on the niche you find yourself in. Terminal stuff is more likely to be simple/human universal stuff (think Maslow’s hierarchy of needs)
I think you’ll probably find Kevin Simler’s essays on personality interesting, and he does a good job explaining and exploring this idea.
http://www.meltingasphalt.com/personality-the-body-in-society/ http://www.meltingasphalt.com/personality-an-ecosystems-perspective/ http://www.meltingasphalt.com/personality-beyond-social-and-beyond-human/
Thanks, those are good essays. :-)
What I think is happening is that we’re allowed to think of humans as having VNM utility functions ( see also my discussion with Stuart Armstrong ), but the utility function is not constant over time (since we’re not introspective recursively modifying AIs that can keep their utility functions stable).