If I had some reason (say an impending mental reconfiguration to change my values) to expect my utility function to change soon and stay relatively constant for a comparatively long time after that, what does “maximizing my utility function now” look like? If I were about to be conditioned to highly-value eating babies, should I start a clone farm to make my future selves most happy or should I kill myself in accordance with my current function’s negative valuation to that action?
That depends: how much do you (currently) value the happiness of your future self versus the life-experience of the expected number of babies you’re going to kill? If possible, it would probably be optimal to take measures that would both make your future self happy and not-kill babies, but if not, the above question should help you make your decision.
Well, the situation I was referencing assumed baby-eating without the actual sentience at any point of the babies, but that’s not relevant to the actual situation. You’re saying that my expected future utility functions, in the end, are just more values in my current function?
I can accept that.
The problem now is that I can’t tell what those values are. It seems there’s a number N large enough that if N people were to be reconfigured to heavily value a situation and the situation was then to be implemented, I’d accept the reconfiguration. This was counterintuitive and, due to habit, feels it should still be, but makes a surprising amount of sense.
Aside from the more obvious cases, like the murder pill above, I haven’t nailed down exactly which parts of a sentience’s motivational structure give me positive utility if fulfilled. My intuition says that I would care about the particular nature of someone’s utility function if I knew them, and would only care about maximizing it (pretty much whatever it was) if I didn’t, but this doesn’t seem to be what I truly want. I consider this to be a Hard Question, at least for myself.
Say there’s a planet, far away from ours, where gravity is fairly low, atmospheric density fairly high, and the ground uniformly dangerous, and the sentient resident species has wings and two feet barely fitted for walking.
Suppose, also, that by some amazingly unlikely (as far as I can see) series of evolutionary steps, these people have a strong tendency to highly value walking and negatively value flying.
If you had the ability to change their hardwired values toward transportation (and, for whatever reason, did not have the ability to change their non-neural physiology and the nature of their planet), would it be wrong to do so? If it’s wrong, what makes it wrong? Your (or my, because I seem to agree with you) personal negative-valuation of {changing someone else’s utility function} is heavily outweighed by the near-constant increase in happiness for generations of these people. If anything, it appears it would be wrong not to make that change.
If that’s the case, though, the surely it’d be wrong not to build a superintelligence designed to maximise “minds that most-value the universe they perceive”, which, while not quite a smiley-face maximizer, still leads to tiling behaviour.
No matter how I go at it reasonably, it seems tiling behaviour isn’t necessarily bad. My emotions say it’s bad, and Eliezer seems to agree. Does Aumann’s Agreement Theorem apply to utility?
I think that an important question would be ‘would their current utility function assign positive utility to modifying it in the suggested manner if they knew what they will experience after the change?’, or, more briefly, ‘what would their CEV say?’
It might seem like they would automatically object to having their utility function changed, but here’s a counterexample to show that it’s at least possible that they would not: I like eating ice cream, but ice cream isn’t very healthy—I would much rather like eating veggies and hate eating ice cream, and would welcome the opportunity to have my preferences changed in such a way.
I’m not very sure what precisely you mean with Aumann’s Agreement Theorem applying to utility, but I think the answer’s ‘no’—AFAIK, Aumann’s Agreement Theorem is a result of the structure of Bayes Theorem, and I don’t see a relation which would allow us to conclude something similar for different utility functions.
But why does it matter what they think about it for the short time before it happens, compared to the enjoyment of it long after?
So you positively value “eating ice cream” and negatively value “having eaten ice cream”—I can relate. What if the change, instead of making you dislike ice cream and like veggies, made you dislike fitness and enjoy sugar crashes? The only real difference I can see is that the first increases your expected lifespan and so increases the overall utility. They both resolve the conflict and make you happy, though, so aren’t they both better than what you have now?
I guess you’re right. It’s the difference between “what I expect” and “what I want”.
I’m suspicious of the implied claim that the ‘change in sustained happiness over time’ term is so large in the relevant utility calculation that it dominates other terminal values.
No—liking sugar crashes would cause me to have more sugar crashes, and I’m not nearly as productive during sugar crashes as otherwise. So if I evaluated the new situation with my current utility function, I would find increased happiness (which is good), and very decreased productivity (which is more bad than the happiness is good). So, to clarify, liking sugar crashes would be significantly worse than what I have now, because I value other things than pleasure.
I kinda suspect that you would have the same position—modifying other sentiences’ utility functions in order to maximize happiness, but evaluating changes to your own utility function with your current utility function. One of the more obvious problems with this asymmetry is that if we had the power to rewire each other’s brain, we would be in conflict—each would, in essence, be hostile to the other, even though we would consider our intentions benevolent.
However, I’m unsatisfied with the ‘evaluate your proposed change to someone’s utility function with their CEV’d current utility function’, because quite a bit is relying on the ‘CEV’ bit. Let’s say that someone was a heroin addict, and I could rewire them to remove their heroin addiction (so that it’s the least-convenient-possible-world, let’s say that I can remove the physical and mental withdrawal as well). I’m pretty sure that their current utility function (which is super-duper time discounted—one of the things heroin does) would significantly oppose the change, but I’m not willing to stop here, because it’s obviously a good thing for them.
So the question becomes ‘what should I actually do to their current utility function to CEV it, so I can evaluate the new utility function with it.’ Well, first I’ll strip the actual cognitive biases (including the super-time-discounting caused by the heroin) -- then I’ll give it as much computing power as possible so that it can reasonably determine the respective utility and probability of different world-states if I change the utility function to remove the heroin addiction. If I could do this, I would be comfortable with applying this solution generally.
If someone’s bias-free utility function running on an awesome supercomputer determined that the utility of you changing their utility function in the way you intend was negative, would you still think it was the right thing to do? Or should we consider changing someone’s utility function without their predicted consent only desirable to the extent that their current utility function is biases and has limited computing power? (Neglecting, of course, effects upon other sentiences that the modification would cause.)
I can’t figure out an answer to any of those questions without having a way to decide which utility function is better.
This seems to be a problem, because I don’t see how it’s even possible.
Depends on a few things: Can you make the clones anencephalic, so you become neutral in respect to them? If you kill yourself, will someone else be conditioned in your place?
Well, I’m not sure making the clones anencephalic would make eating them truly neutral. I’d have to examine that more.
The linked situation proposes that the babies are in no way conscious and that all humans are conditioned, such that killing myself will actually result in a fewer number of people happily eating babies.
If I had some reason (say an impending mental reconfiguration to change my values) to expect my utility function to change soon and stay relatively constant for a comparatively long time after that, what does “maximizing my utility function now” look like? If I were about to be conditioned to highly-value eating babies, should I start a clone farm to make my future selves most happy or should I kill myself in accordance with my current function’s negative valuation to that action?
That depends: how much do you (currently) value the happiness of your future self versus the life-experience of the expected number of babies you’re going to kill? If possible, it would probably be optimal to take measures that would both make your future self happy and not-kill babies, but if not, the above question should help you make your decision.
Well, the situation I was referencing assumed baby-eating without the actual sentience at any point of the babies, but that’s not relevant to the actual situation. You’re saying that my expected future utility functions, in the end, are just more values in my current function?
I can accept that.
The problem now is that I can’t tell what those values are. It seems there’s a number N large enough that if N people were to be reconfigured to heavily value a situation and the situation was then to be implemented, I’d accept the reconfiguration. This was counterintuitive and, due to habit, feels it should still be, but makes a surprising amount of sense.
Yep, that’s what I mean.
I’m pretty sure that the amount of utility you lose (or gain?) through value drift is going to depend on the direction that your values drift in. For example, Gandhi would assign significant negative utility to taking a pill that made him want to kill people, but he might not care if he took a pill that changed that made him like vanilla ice cream more than chocolate ice cream.
Aside from the more obvious cases, like the murder pill above, I haven’t nailed down exactly which parts of a sentience’s motivational structure give me positive utility if fulfilled. My intuition says that I would care about the particular nature of someone’s utility function if I knew them, and would only care about maximizing it (pretty much whatever it was) if I didn’t, but this doesn’t seem to be what I truly want. I consider this to be a Hard Question, at least for myself.
Say there’s a planet, far away from ours, where gravity is fairly low, atmospheric density fairly high, and the ground uniformly dangerous, and the sentient resident species has wings and two feet barely fitted for walking. Suppose, also, that by some amazingly unlikely (as far as I can see) series of evolutionary steps, these people have a strong tendency to highly value walking and negatively value flying.
If you had the ability to change their hardwired values toward transportation (and, for whatever reason, did not have the ability to change their non-neural physiology and the nature of their planet), would it be wrong to do so? If it’s wrong, what makes it wrong? Your (or my, because I seem to agree with you) personal negative-valuation of {changing someone else’s utility function} is heavily outweighed by the near-constant increase in happiness for generations of these people. If anything, it appears it would be wrong not to make that change. If that’s the case, though, the surely it’d be wrong not to build a superintelligence designed to maximise “minds that most-value the universe they perceive”, which, while not quite a smiley-face maximizer, still leads to tiling behaviour.
No matter how I go at it reasonably, it seems tiling behaviour isn’t necessarily bad. My emotions say it’s bad, and Eliezer seems to agree. Does Aumann’s Agreement Theorem apply to utility?
I think that an important question would be ‘would their current utility function assign positive utility to modifying it in the suggested manner if they knew what they will experience after the change?’, or, more briefly, ‘what would their CEV say?’
It might seem like they would automatically object to having their utility function changed, but here’s a counterexample to show that it’s at least possible that they would not: I like eating ice cream, but ice cream isn’t very healthy—I would much rather like eating veggies and hate eating ice cream, and would welcome the opportunity to have my preferences changed in such a way.
I’m not very sure what precisely you mean with Aumann’s Agreement Theorem applying to utility, but I think the answer’s ‘no’—AFAIK, Aumann’s Agreement Theorem is a result of the structure of Bayes Theorem, and I don’t see a relation which would allow us to conclude something similar for different utility functions.
But why does it matter what they think about it for the short time before it happens, compared to the enjoyment of it long after?
So you positively value “eating ice cream” and negatively value “having eaten ice cream”—I can relate. What if the change, instead of making you dislike ice cream and like veggies, made you dislike fitness and enjoy sugar crashes? The only real difference I can see is that the first increases your expected lifespan and so increases the overall utility. They both resolve the conflict and make you happy, though, so aren’t they both better than what you have now?
I guess you’re right. It’s the difference between “what I expect” and “what I want”.
I’m suspicious of the implied claim that the ‘change in sustained happiness over time’ term is so large in the relevant utility calculation that it dominates other terminal values.
No—liking sugar crashes would cause me to have more sugar crashes, and I’m not nearly as productive during sugar crashes as otherwise. So if I evaluated the new situation with my current utility function, I would find increased happiness (which is good), and very decreased productivity (which is more bad than the happiness is good). So, to clarify, liking sugar crashes would be significantly worse than what I have now, because I value other things than pleasure.
I kinda suspect that you would have the same position—modifying other sentiences’ utility functions in order to maximize happiness, but evaluating changes to your own utility function with your current utility function. One of the more obvious problems with this asymmetry is that if we had the power to rewire each other’s brain, we would be in conflict—each would, in essence, be hostile to the other, even though we would consider our intentions benevolent.
However, I’m unsatisfied with the ‘evaluate your proposed change to someone’s utility function with their CEV’d current utility function’, because quite a bit is relying on the ‘CEV’ bit. Let’s say that someone was a heroin addict, and I could rewire them to remove their heroin addiction (so that it’s the least-convenient-possible-world, let’s say that I can remove the physical and mental withdrawal as well). I’m pretty sure that their current utility function (which is super-duper time discounted—one of the things heroin does) would significantly oppose the change, but I’m not willing to stop here, because it’s obviously a good thing for them.
So the question becomes ‘what should I actually do to their current utility function to CEV it, so I can evaluate the new utility function with it.’ Well, first I’ll strip the actual cognitive biases (including the super-time-discounting caused by the heroin) -- then I’ll give it as much computing power as possible so that it can reasonably determine the respective utility and probability of different world-states if I change the utility function to remove the heroin addiction. If I could do this, I would be comfortable with applying this solution generally.
If someone’s bias-free utility function running on an awesome supercomputer determined that the utility of you changing their utility function in the way you intend was negative, would you still think it was the right thing to do? Or should we consider changing someone’s utility function without their predicted consent only desirable to the extent that their current utility function is biases and has limited computing power? (Neglecting, of course, effects upon other sentiences that the modification would cause.)
I can’t figure out an answer to any of those questions without having a way to decide which utility function is better. This seems to be a problem, because I don’t see how it’s even possible.
Can you taboo ‘better’?
Depends on a few things: Can you make the clones anencephalic, so you become neutral in respect to them? If you kill yourself, will someone else be conditioned in your place?
Well, I’m not sure making the clones anencephalic would make eating them truly neutral. I’d have to examine that more.
The linked situation proposes that the babies are in no way conscious and that all humans are conditioned, such that killing myself will actually result in a fewer number of people happily eating babies.