Aside from the more obvious cases, like the murder pill above, I haven’t nailed down exactly which parts of a sentience’s motivational structure give me positive utility if fulfilled. My intuition says that I would care about the particular nature of someone’s utility function if I knew them, and would only care about maximizing it (pretty much whatever it was) if I didn’t, but this doesn’t seem to be what I truly want. I consider this to be a Hard Question, at least for myself.
Say there’s a planet, far away from ours, where gravity is fairly low, atmospheric density fairly high, and the ground uniformly dangerous, and the sentient resident species has wings and two feet barely fitted for walking.
Suppose, also, that by some amazingly unlikely (as far as I can see) series of evolutionary steps, these people have a strong tendency to highly value walking and negatively value flying.
If you had the ability to change their hardwired values toward transportation (and, for whatever reason, did not have the ability to change their non-neural physiology and the nature of their planet), would it be wrong to do so? If it’s wrong, what makes it wrong? Your (or my, because I seem to agree with you) personal negative-valuation of {changing someone else’s utility function} is heavily outweighed by the near-constant increase in happiness for generations of these people. If anything, it appears it would be wrong not to make that change.
If that’s the case, though, the surely it’d be wrong not to build a superintelligence designed to maximise “minds that most-value the universe they perceive”, which, while not quite a smiley-face maximizer, still leads to tiling behaviour.
No matter how I go at it reasonably, it seems tiling behaviour isn’t necessarily bad. My emotions say it’s bad, and Eliezer seems to agree. Does Aumann’s Agreement Theorem apply to utility?
I think that an important question would be ‘would their current utility function assign positive utility to modifying it in the suggested manner if they knew what they will experience after the change?’, or, more briefly, ‘what would their CEV say?’
It might seem like they would automatically object to having their utility function changed, but here’s a counterexample to show that it’s at least possible that they would not: I like eating ice cream, but ice cream isn’t very healthy—I would much rather like eating veggies and hate eating ice cream, and would welcome the opportunity to have my preferences changed in such a way.
I’m not very sure what precisely you mean with Aumann’s Agreement Theorem applying to utility, but I think the answer’s ‘no’—AFAIK, Aumann’s Agreement Theorem is a result of the structure of Bayes Theorem, and I don’t see a relation which would allow us to conclude something similar for different utility functions.
But why does it matter what they think about it for the short time before it happens, compared to the enjoyment of it long after?
So you positively value “eating ice cream” and negatively value “having eaten ice cream”—I can relate. What if the change, instead of making you dislike ice cream and like veggies, made you dislike fitness and enjoy sugar crashes? The only real difference I can see is that the first increases your expected lifespan and so increases the overall utility. They both resolve the conflict and make you happy, though, so aren’t they both better than what you have now?
I guess you’re right. It’s the difference between “what I expect” and “what I want”.
I’m suspicious of the implied claim that the ‘change in sustained happiness over time’ term is so large in the relevant utility calculation that it dominates other terminal values.
No—liking sugar crashes would cause me to have more sugar crashes, and I’m not nearly as productive during sugar crashes as otherwise. So if I evaluated the new situation with my current utility function, I would find increased happiness (which is good), and very decreased productivity (which is more bad than the happiness is good). So, to clarify, liking sugar crashes would be significantly worse than what I have now, because I value other things than pleasure.
I kinda suspect that you would have the same position—modifying other sentiences’ utility functions in order to maximize happiness, but evaluating changes to your own utility function with your current utility function. One of the more obvious problems with this asymmetry is that if we had the power to rewire each other’s brain, we would be in conflict—each would, in essence, be hostile to the other, even though we would consider our intentions benevolent.
However, I’m unsatisfied with the ‘evaluate your proposed change to someone’s utility function with their CEV’d current utility function’, because quite a bit is relying on the ‘CEV’ bit. Let’s say that someone was a heroin addict, and I could rewire them to remove their heroin addiction (so that it’s the least-convenient-possible-world, let’s say that I can remove the physical and mental withdrawal as well). I’m pretty sure that their current utility function (which is super-duper time discounted—one of the things heroin does) would significantly oppose the change, but I’m not willing to stop here, because it’s obviously a good thing for them.
So the question becomes ‘what should I actually do to their current utility function to CEV it, so I can evaluate the new utility function with it.’ Well, first I’ll strip the actual cognitive biases (including the super-time-discounting caused by the heroin) -- then I’ll give it as much computing power as possible so that it can reasonably determine the respective utility and probability of different world-states if I change the utility function to remove the heroin addiction. If I could do this, I would be comfortable with applying this solution generally.
If someone’s bias-free utility function running on an awesome supercomputer determined that the utility of you changing their utility function in the way you intend was negative, would you still think it was the right thing to do? Or should we consider changing someone’s utility function without their predicted consent only desirable to the extent that their current utility function is biases and has limited computing power? (Neglecting, of course, effects upon other sentiences that the modification would cause.)
I can’t figure out an answer to any of those questions without having a way to decide which utility function is better.
This seems to be a problem, because I don’t see how it’s even possible.
Yep, that’s what I mean.
I’m pretty sure that the amount of utility you lose (or gain?) through value drift is going to depend on the direction that your values drift in. For example, Gandhi would assign significant negative utility to taking a pill that made him want to kill people, but he might not care if he took a pill that changed that made him like vanilla ice cream more than chocolate ice cream.
Aside from the more obvious cases, like the murder pill above, I haven’t nailed down exactly which parts of a sentience’s motivational structure give me positive utility if fulfilled. My intuition says that I would care about the particular nature of someone’s utility function if I knew them, and would only care about maximizing it (pretty much whatever it was) if I didn’t, but this doesn’t seem to be what I truly want. I consider this to be a Hard Question, at least for myself.
Say there’s a planet, far away from ours, where gravity is fairly low, atmospheric density fairly high, and the ground uniformly dangerous, and the sentient resident species has wings and two feet barely fitted for walking. Suppose, also, that by some amazingly unlikely (as far as I can see) series of evolutionary steps, these people have a strong tendency to highly value walking and negatively value flying.
If you had the ability to change their hardwired values toward transportation (and, for whatever reason, did not have the ability to change their non-neural physiology and the nature of their planet), would it be wrong to do so? If it’s wrong, what makes it wrong? Your (or my, because I seem to agree with you) personal negative-valuation of {changing someone else’s utility function} is heavily outweighed by the near-constant increase in happiness for generations of these people. If anything, it appears it would be wrong not to make that change. If that’s the case, though, the surely it’d be wrong not to build a superintelligence designed to maximise “minds that most-value the universe they perceive”, which, while not quite a smiley-face maximizer, still leads to tiling behaviour.
No matter how I go at it reasonably, it seems tiling behaviour isn’t necessarily bad. My emotions say it’s bad, and Eliezer seems to agree. Does Aumann’s Agreement Theorem apply to utility?
I think that an important question would be ‘would their current utility function assign positive utility to modifying it in the suggested manner if they knew what they will experience after the change?’, or, more briefly, ‘what would their CEV say?’
It might seem like they would automatically object to having their utility function changed, but here’s a counterexample to show that it’s at least possible that they would not: I like eating ice cream, but ice cream isn’t very healthy—I would much rather like eating veggies and hate eating ice cream, and would welcome the opportunity to have my preferences changed in such a way.
I’m not very sure what precisely you mean with Aumann’s Agreement Theorem applying to utility, but I think the answer’s ‘no’—AFAIK, Aumann’s Agreement Theorem is a result of the structure of Bayes Theorem, and I don’t see a relation which would allow us to conclude something similar for different utility functions.
But why does it matter what they think about it for the short time before it happens, compared to the enjoyment of it long after?
So you positively value “eating ice cream” and negatively value “having eaten ice cream”—I can relate. What if the change, instead of making you dislike ice cream and like veggies, made you dislike fitness and enjoy sugar crashes? The only real difference I can see is that the first increases your expected lifespan and so increases the overall utility. They both resolve the conflict and make you happy, though, so aren’t they both better than what you have now?
I guess you’re right. It’s the difference between “what I expect” and “what I want”.
I’m suspicious of the implied claim that the ‘change in sustained happiness over time’ term is so large in the relevant utility calculation that it dominates other terminal values.
No—liking sugar crashes would cause me to have more sugar crashes, and I’m not nearly as productive during sugar crashes as otherwise. So if I evaluated the new situation with my current utility function, I would find increased happiness (which is good), and very decreased productivity (which is more bad than the happiness is good). So, to clarify, liking sugar crashes would be significantly worse than what I have now, because I value other things than pleasure.
I kinda suspect that you would have the same position—modifying other sentiences’ utility functions in order to maximize happiness, but evaluating changes to your own utility function with your current utility function. One of the more obvious problems with this asymmetry is that if we had the power to rewire each other’s brain, we would be in conflict—each would, in essence, be hostile to the other, even though we would consider our intentions benevolent.
However, I’m unsatisfied with the ‘evaluate your proposed change to someone’s utility function with their CEV’d current utility function’, because quite a bit is relying on the ‘CEV’ bit. Let’s say that someone was a heroin addict, and I could rewire them to remove their heroin addiction (so that it’s the least-convenient-possible-world, let’s say that I can remove the physical and mental withdrawal as well). I’m pretty sure that their current utility function (which is super-duper time discounted—one of the things heroin does) would significantly oppose the change, but I’m not willing to stop here, because it’s obviously a good thing for them.
So the question becomes ‘what should I actually do to their current utility function to CEV it, so I can evaluate the new utility function with it.’ Well, first I’ll strip the actual cognitive biases (including the super-time-discounting caused by the heroin) -- then I’ll give it as much computing power as possible so that it can reasonably determine the respective utility and probability of different world-states if I change the utility function to remove the heroin addiction. If I could do this, I would be comfortable with applying this solution generally.
If someone’s bias-free utility function running on an awesome supercomputer determined that the utility of you changing their utility function in the way you intend was negative, would you still think it was the right thing to do? Or should we consider changing someone’s utility function without their predicted consent only desirable to the extent that their current utility function is biases and has limited computing power? (Neglecting, of course, effects upon other sentiences that the modification would cause.)
I can’t figure out an answer to any of those questions without having a way to decide which utility function is better. This seems to be a problem, because I don’t see how it’s even possible.
Can you taboo ‘better’?