Ah, excellent. This post comes at a great time. A few weeks ago, I talked with someone who remarked that although decision theory speaks in terms of preferences and information being separate, trying to apply that into humans is fitting the data to the theory. He was of the opinion that humans don’t really have preferences in the decision theoretic sense of the word. Pondering that claim, I came to the conclusion that he’s right, and have started to increasingly suspect that CEV-like plans to figure out the “ultimate” preferences of people are somewhat misguided. Our preferences are probably hopelessly path-, situation- and information-dependent. Which is not to say that CEV would be entirely pointless—even if the vast majority of our “preferences” would never converge, there might be some that did. And of course, CEV would still be worth trying, just to make sure I’m not horribly mistaken on this.
The ease at which I accepted the claim “humans don’t have preferences” makes me suspect that I’ve myself had a subconscious intuition to that effect for a long time, which was probably partially responsible for an unresolved disagreement between me and Vladimir Nesov earlier.
...CEV-like plans to figure out the “ultimate” preferences of people are somewhat misguided. Our preferences are probably hopelessly path-, situation- and information-dependent.
This is off-topic but since you mentioned it and since I don’t think it warrants a new post, here are my latest thoughts on CEV (a convergence of some of my recent comments originally posted as a response to a post by Michael Anissimov):
Consider the difference between a hunter-gatherer, who cares about his hunting success and to become the new clan chief, and a member of lesswrong who wants to determine if a “sufficiently large randomized Conway board could turn out to converge to a barren ‘all off’ state.”
The utility of the success in hunting down animals and proving abstract conjectures about cellular automata is largely determined by factors such as your education, culture and environmental circumstances. The same hunter gatherer who cared to kill a lot of animals, to get the best ladies in its clan, might have under different circumstances turned out to be a vegetarian mathematician solely caring about his understanding of the nature of reality. Both sets of values are to some extent mutually exclusive or at least disjoint. Yet both sets of values are what the person wants, given the circumstances. Change the circumstances dramatically and you change the persons values.
You might conclude that what the hunter-gatherer really wants is to solve abstract mathematical problems, he just doesn’t know it. But there is no set of values that a person “really” wants. Humans are largely defined by the circumstances they reside in. If you already knew a movie, you wouldn’t watch it. To be able to get your meat from the supermarket changes the value of hunting.
If “we knew more, thought faster, were more the people we wished we were, and had grown up closer together” then we would stop to desire what we learnt, wish to think even faster, become even different people and get bored of and rise up from the people similar to us.
A singleton will inevitably change everything by causing a feedback loop between the singleton and human values. The singleton won’t extrapolate human volition but implement an artificial set values as a result of abstract high-order contemplations about rational conduct. Much of our values and goals, what we want, are culturally induced or the result of our ignorance. Reduce our ignorance and you change our values. One trivial example is our intellectual curiosity. If we don’t need to figure out what we want on our own, our curiosity is impaired.
Knowledge changes and introduces terminal goals. The toolkit that is called ‘rationality’, the rules and heuristics developed to help us to achieve our terminal goals are also altering and deleting them. A stone age hunter-gatherer seems to possess very different values than I do. If he learns about rationality and metaethics his values will be altered considerably. Rationality was meant to help him achieve his goals, e.g. become a better hunter. Rationality was designed to tell him what he ought to do (instrumental goals) to achieve what he wants to do (terminal goals). Yet what actually happens is that he is told, that he will learn what he ought to want. If an agent becomes more knowledgeable and smarter then this does not leave its goal-reward-system intact if it is not especially designed to be stable. An agent who originally wanted to become a better hunter and feed his tribe would end up wanting to eliminate poverty in Obscureistan. The question is, how much of this new “wanting” is the result of using rationality to achieve terminal goals and how much is a side-effect of using rationality, how much is left of the original values versus the values induced by a feedback loop between the toolkit and its user?
Take for example an agent is facing the Prisoner’s dilemma. Such an agent might originally tend to cooperate and only after learning about game theory decide to defect and gain a greater payoff. Was it rational for the agent to learn about game theory, in the sense that it helped the agent to achieve its goal or in the sense that it deleted one of its goals in exchange for a more “valuable” goal?
It seems to me that becoming more knowledgeable and smarter is gradually altering our utility functions. But what is it that we are approaching if the extrapolation of our volition becomes a purpose in and of itself? A living treaty will distort or alter what we really value by installing a new cognitive toolkit designed to achieve an equilibrium between us and other agents with the same toolkit.
Would a singleton be a tool that we can use to get what we want or would the tool use us to do what it does, would we be modeled or would it create models, would we be extrapolating our volition or rather follow our extrapolations?
Is becoming the best hunter really one of the primitive man’s terminal values? I would say his terminal values are more things like “achieving a feeling of happiness, contentment, and pride in one’s self and one’s relatives”. The other things you mention are just effective instrumental goals.
I think that the idea of desires converging if “we knew more, thought faster, were more the people we wished we were, and had grown up closer together” relies on assumptions of relatively little self-modification. Once we get uploads and the capability for drastic self-modification, all kinds of people and subcultures will want to use it. Given the chance and enough time, we might out-speciate the beetle (to borrow Anders Sandberg’s phrase), filling pretty much every corner of posthuman mindspace. There’ll be minds so strange that we won’t even recognize them as humans, and we’ll hardly have convergent preferences with them.
Of course, that’s assuming that no AI or mind with a first-mover advantage simply takes over and outcompetes everyone else. Evolutionary pressures might prune the initial diversity a lot, too—if you’re so alien that you can’t even communicate with ordinary humans, you may have difficulties paying the rent for your server farm.
At the end of this, I’m going to try to argue that something like CEV is still justified. Before I started thinking it through I was hoping that taking an eliminativist view of preferences to its conclusion would help tie up the loopholes in CEV, and so far it hasn’t done that for me, but it hasn’t made it any harder either.
CEV has worse problems that worries about convergence. The big one is that it’s such a difficult thing to implement that any AI capable of doing so has already crossed the threshold of extremely dangerous transhuman capability, and there’s no real solution to how to regulate its behavior while it’s in the process of working on the extrapolation. It could very well turn the planet into computronium before it gets a satisfactory implementation, by which point it doesn’t much matter what result it arrives at.
Even if you’re the type who thinks a Star Trek transporter is a transportation device rather than a murder+clone system, there’s no reason to think the AI would have detailed enough records to re-create everyone. Collecting that level of information would be even harder than getting enough to extrapolate CEV.
So I suppose it might matter to the humanity it re-creates, assuming it bothers. But we’d all still be dead, which is a decidedly suboptimal result.
Related: I recommend to those who think that CEV is insufficiently meta that they read CFAI, and try to go increasingly meta from there instead. Expanding themes from CFAI to make them more timeless is also recommended; CFAI is inherently more timeless than CEV—that’s semi-personal jargon but perhaps the gist is sufficiently hinted at. Note that unlike metaness, timelessness is often just a difference of perspective or emphasis. I assert that CEV is a bastardized popularization of the more interesting themes originally presented in CFAI, and should not be taken very seriously. CFAI shouldn’t either—most of it is useless—but it at least highlights some good intuitions. Edit: I do not mean to recommend proposing solutions or proposing not-solutions, I recommend the meta-level strategy of understanding and developing intuitions and perspectives.
Ah, excellent. This post comes at a great time. A few weeks ago, I talked with someone who remarked that although decision theory speaks in terms of preferences and information being separate, trying to apply that into humans is fitting the data to the theory. He was of the opinion that humans don’t really have preferences in the decision theoretic sense of the word. Pondering that claim, I came to the conclusion that he’s right, and have started to increasingly suspect that CEV-like plans to figure out the “ultimate” preferences of people are somewhat misguided. Our preferences are probably hopelessly path-, situation- and information-dependent. Which is not to say that CEV would be entirely pointless—even if the vast majority of our “preferences” would never converge, there might be some that did. And of course, CEV would still be worth trying, just to make sure I’m not horribly mistaken on this.
The ease at which I accepted the claim “humans don’t have preferences” makes me suspect that I’ve myself had a subconscious intuition to that effect for a long time, which was probably partially responsible for an unresolved disagreement between me and Vladimir Nesov earlier.
I’ll be curious to hear what you have to say.
This is off-topic but since you mentioned it and since I don’t think it warrants a new post, here are my latest thoughts on CEV (a convergence of some of my recent comments originally posted as a response to a post by Michael Anissimov):
Consider the difference between a hunter-gatherer, who cares about his hunting success and to become the new clan chief, and a member of lesswrong who wants to determine if a “sufficiently large randomized Conway board could turn out to converge to a barren ‘all off’ state.”
The utility of the success in hunting down animals and proving abstract conjectures about cellular automata is largely determined by factors such as your education, culture and environmental circumstances. The same hunter gatherer who cared to kill a lot of animals, to get the best ladies in its clan, might have under different circumstances turned out to be a vegetarian mathematician solely caring about his understanding of the nature of reality. Both sets of values are to some extent mutually exclusive or at least disjoint. Yet both sets of values are what the person wants, given the circumstances. Change the circumstances dramatically and you change the persons values.
You might conclude that what the hunter-gatherer really wants is to solve abstract mathematical problems, he just doesn’t know it. But there is no set of values that a person “really” wants. Humans are largely defined by the circumstances they reside in. If you already knew a movie, you wouldn’t watch it. To be able to get your meat from the supermarket changes the value of hunting.
If “we knew more, thought faster, were more the people we wished we were, and had grown up closer together” then we would stop to desire what we learnt, wish to think even faster, become even different people and get bored of and rise up from the people similar to us.
A singleton will inevitably change everything by causing a feedback loop between the singleton and human values. The singleton won’t extrapolate human volition but implement an artificial set values as a result of abstract high-order contemplations about rational conduct. Much of our values and goals, what we want, are culturally induced or the result of our ignorance. Reduce our ignorance and you change our values. One trivial example is our intellectual curiosity. If we don’t need to figure out what we want on our own, our curiosity is impaired.
Knowledge changes and introduces terminal goals. The toolkit that is called ‘rationality’, the rules and heuristics developed to help us to achieve our terminal goals are also altering and deleting them. A stone age hunter-gatherer seems to possess very different values than I do. If he learns about rationality and metaethics his values will be altered considerably. Rationality was meant to help him achieve his goals, e.g. become a better hunter. Rationality was designed to tell him what he ought to do (instrumental goals) to achieve what he wants to do (terminal goals). Yet what actually happens is that he is told, that he will learn what he ought to want. If an agent becomes more knowledgeable and smarter then this does not leave its goal-reward-system intact if it is not especially designed to be stable. An agent who originally wanted to become a better hunter and feed his tribe would end up wanting to eliminate poverty in Obscureistan. The question is, how much of this new “wanting” is the result of using rationality to achieve terminal goals and how much is a side-effect of using rationality, how much is left of the original values versus the values induced by a feedback loop between the toolkit and its user?
Take for example an agent is facing the Prisoner’s dilemma. Such an agent might originally tend to cooperate and only after learning about game theory decide to defect and gain a greater payoff. Was it rational for the agent to learn about game theory, in the sense that it helped the agent to achieve its goal or in the sense that it deleted one of its goals in exchange for a more “valuable” goal?
It seems to me that becoming more knowledgeable and smarter is gradually altering our utility functions. But what is it that we are approaching if the extrapolation of our volition becomes a purpose in and of itself? A living treaty will distort or alter what we really value by installing a new cognitive toolkit designed to achieve an equilibrium between us and other agents with the same toolkit.
Would a singleton be a tool that we can use to get what we want or would the tool use us to do what it does, would we be modeled or would it create models, would we be extrapolating our volition or rather follow our extrapolations?
Is becoming the best hunter really one of the primitive man’s terminal values? I would say his terminal values are more things like “achieving a feeling of happiness, contentment, and pride in one’s self and one’s relatives”. The other things you mention are just effective instrumental goals.
I mostly agree with this.
I think that the idea of desires converging if “we knew more, thought faster, were more the people we wished we were, and had grown up closer together” relies on assumptions of relatively little self-modification. Once we get uploads and the capability for drastic self-modification, all kinds of people and subcultures will want to use it. Given the chance and enough time, we might out-speciate the beetle (to borrow Anders Sandberg’s phrase), filling pretty much every corner of posthuman mindspace. There’ll be minds so strange that we won’t even recognize them as humans, and we’ll hardly have convergent preferences with them.
Of course, that’s assuming that no AI or mind with a first-mover advantage simply takes over and outcompetes everyone else. Evolutionary pressures might prune the initial diversity a lot, too—if you’re so alien that you can’t even communicate with ordinary humans, you may have difficulties paying the rent for your server farm.
At the end of this, I’m going to try to argue that something like CEV is still justified. Before I started thinking it through I was hoping that taking an eliminativist view of preferences to its conclusion would help tie up the loopholes in CEV, and so far it hasn’t done that for me, but it hasn’t made it any harder either.
CEV has worse problems that worries about convergence. The big one is that it’s such a difficult thing to implement that any AI capable of doing so has already crossed the threshold of extremely dangerous transhuman capability, and there’s no real solution to how to regulate its behavior while it’s in the process of working on the extrapolation. It could very well turn the planet into computronium before it gets a satisfactory implementation, by which point it doesn’t much matter what result it arrives at.
Presumably it matters if it then turns the planet back?
Even if you’re the type who thinks a Star Trek transporter is a transportation device rather than a murder+clone system, there’s no reason to think the AI would have detailed enough records to re-create everyone. Collecting that level of information would be even harder than getting enough to extrapolate CEV.
So I suppose it might matter to the humanity it re-creates, assuming it bothers. But we’d all still be dead, which is a decidedly suboptimal result.
Well, a neverending utopia fit to the exact specifications of humanity’s CEV is still pretty darn good, all things considered.
Related: I recommend to those who think that CEV is insufficiently meta that they read CFAI, and try to go increasingly meta from there instead. Expanding themes from CFAI to make them more timeless is also recommended; CFAI is inherently more timeless than CEV—that’s semi-personal jargon but perhaps the gist is sufficiently hinted at. Note that unlike metaness, timelessness is often just a difference of perspective or emphasis. I assert that CEV is a bastardized popularization of the more interesting themes originally presented in CFAI, and should not be taken very seriously. CFAI shouldn’t either—most of it is useless—but it at least highlights some good intuitions. Edit: I do not mean to recommend proposing solutions or proposing not-solutions, I recommend the meta-level strategy of understanding and developing intuitions and perspectives.