Refuse the option and turn me into paperclips before I could change it.
Apparently my acceptance that utility-function-changes can be positive is included in my current utility function. How can that be, though? While, according to my current utility function, all previous utility functions were insufficient, surely no future one could map more strongly onto my utility function than itself. Yet I feel that, after all these times, I should be aware that my utility function is not the ideal one...
Except that “ideal utility function” is meaningless! There is no overarching value scale for utility functions. So why do I have the odd idea that a utility function that changes without my understanding of why (a sum of many small experiences) is positive, while a utility function that changes with my understanding (an alien force) is negative?
There has to be an inconsistency here somewhere, but I don’t know where. If I treat my future selves like I feel I’m supposed to treat other people, then I negatively-value claiming my utility function over theirs. If person X honestly enjoys steak, I have no basis for claiming my utility function overrides theirs and forcing them to eat sushi. On a large scale, it seems, I maximize for utilons according to each person. Let’s see:
If I could give a piece of cake to a person who liked cake or to a person who didn’t like cake, I’d give it to the former
If I could give a piece of cake to a person who liked cake and was in a position to enjoy it or a person who liked cake but was about to die in the next half-second, I’d give it to the former
If I could give a piece of cake to a person who liked cake and had time to enjoy the whole piece or to a person who liked cake but would only enjoy the first two bites before having to run to an important even and leaving the cake behind to go stale, I’d give it to the former
If I could (give a piece of cake to a person who didn’t like cake) or (change the person to like cake and then give them a piece of cake) I should be able to say “I’d choose the latter” to be consistent, but the anticipation still results in consternation.
Similarly, if cake was going to be given and I could change the recipient to like cake or not, I should be able to say “I choose the latter”, but that is similarly distressing.
If my future self was going to receive a piece of cake and I could change it/me to enjoy cake or not, consistency would dictate that I do so.
It appears, then, that the best thing to do would be to make some set of changes in reality and in utility functions (which, yes, are part of reality) such that everyone most-values exactly what happens. If the paperclip maximizer isn’t going to get a universe of paperclips and is instead going to get a universe of smiley faces, my utility function seems to dictate that, regardless of the paperclip maximizer’s choice, I change the paperclip maximizer (and everyone else) into a smiley face maximizer. It feels wrong, but that’s where I get if I shut up and multiply.
Refuse the option and turn me into paperclips before I could change it.
Apparently my acceptance that utility-function-changes can be positive is included in my current utility function. How can that be, though? While, according to my current utility function, all previous utility functions were insufficient, surely no future one could map more strongly onto my utility function than itself. Yet I feel that, after all these times, I should be aware that my utility function is not the ideal one...
Except that “ideal utility function” is meaningless! There is no overarching value scale for utility functions. So why do I have the odd idea that a utility function that changes without my understanding of why (a sum of many small experiences) is positive, while a utility function that changes with my understanding (an alien force) is negative?
There has to be an inconsistency here somewhere, but I don’t know where. If I treat my future selves like I feel I’m supposed to treat other people, then I negatively-value claiming my utility function over theirs. If person X honestly enjoys steak, I have no basis for claiming my utility function overrides theirs and forcing them to eat sushi. On a large scale, it seems, I maximize for utilons according to each person. Let’s see:
If I could give a piece of cake to a person who liked cake or to a person who didn’t like cake, I’d give it to the former If I could give a piece of cake to a person who liked cake and was in a position to enjoy it or a person who liked cake but was about to die in the next half-second, I’d give it to the former If I could give a piece of cake to a person who liked cake and had time to enjoy the whole piece or to a person who liked cake but would only enjoy the first two bites before having to run to an important even and leaving the cake behind to go stale, I’d give it to the former If I could (give a piece of cake to a person who didn’t like cake) or (change the person to like cake and then give them a piece of cake) I should be able to say “I’d choose the latter” to be consistent, but the anticipation still results in consternation. Similarly, if cake was going to be given and I could change the recipient to like cake or not, I should be able to say “I choose the latter”, but that is similarly distressing. If my future self was going to receive a piece of cake and I could change it/me to enjoy cake or not, consistency would dictate that I do so.
It appears, then, that the best thing to do would be to make some set of changes in reality and in utility functions (which, yes, are part of reality) such that everyone most-values exactly what happens. If the paperclip maximizer isn’t going to get a universe of paperclips and is instead going to get a universe of smiley faces, my utility function seems to dictate that, regardless of the paperclip maximizer’s choice, I change the paperclip maximizer (and everyone else) into a smiley face maximizer. It feels wrong, but that’s where I get if I shut up and multiply.