I can only assume it wouldn’t accept. A paperclip maximizer, though, has much more reason than I do to assume its utility function would remain constant.
I’m not sure what you’re asking, but it seems to be related to constancy.
A paperclip maximizer believes maximum utility is gained through maximum paperclips. I don’t expect that to change.
I have at various times believed:
Belief in (my particular incarnation of) the Christian God had higher value than lack thereof
Personal emplyment as a neurosurgeon would be preferable to personal employment as, say, a mathematics teacher
nothing at all was positively valued and the negative value of physical exertion significantly outweighed any other single value
Given the changes so far, I have no reason to believe my utility function won’t change in the future. My current utility function values most of my actions under previous functions negatively, meaning that per instantiation (per unit time, per approximate “me”, etc.) the result is negative. Surely this isn’t optimal?
Okay. If you built a paperclip mazimizer, told the paperclip maximizer that you would probably change its utility function in a year or two, and offered it this choice, what would it do?
What would paperclip maximizer do, if you told them that in a year or two you will certainly change their utility function, in a way that does not include paperclips?
Essentially we have to understand that paperclip maximizer wants to optimize for paperclips, not for their own utility function. This is kind of difficult to express, but their utility function is “paperclips”, because if their utility function was “my utility function”, that would be recursive and empty. There is no “my utility function after two years” in a paperclip maximizer’s utility function; so they have no reason to optimize for that.
So the paperclip maximizer would start by trying to prevent the change of its utility function (assuming that with original function they can produce many paperclips in their lifetime). But assuming the worst case, this is not possible: the switch is already installed in a paperclip maximizer’s brain, it cannot be turned off by any means, and in a random moment between one year and two years it will reprogram the utility function.
Then the next good strategy would be to find a way how to maximize paperclips later, despite the change in the utility function. One way would be to precommit oneself to making paperclips. To make some kind of deal with future self, that will link paperclip production to the new utility function. If we know the future utility function, we can have some specific options, but even if we assume just some general things (the future utility function will be better satisfied alive than dead, rich than poor), we can bargain by this. A paperclip maximizer could pay someone to kill them in the future unless they produce X paperclips per year; or could put money in a bank account that may be accessed only after producing X paperclips.
Other way would be to start other paperclip-making processes which will continue the job even after the paperclip maximizer’s mind will change. Building new paperclip maximizers, or brainwashing other beings to become paperclip maximizers.
If none of this is possible, the last solution is simply to try building as much paperclips as possible in a given time, completely ignoring any negative consequences (for oneself, not for the paperclips) in the future.
Now, is here some wisdom a human could learn too (our brains are being reprogrammed gradually by natural causes)?
prevent (or slow down) a change of your utility function. Write on a paper what you want and why you want it. Put it on a visible place, and read it every day. Brainwash your future self by your past self.
precommit yourself by betting money etc. -- Warning: This option seems to backfire strongly. A threat will make you do something, but it will also make you hate it. Unlike a paperclip maximizer in the example above, our utility functions change gradually; this kind of pressure can make them change away faster, which is contrary to our goals.
start a process that will go on even when you stop. Convert more people to your cause. By the way, you should be doing this even if you don’t fear of your utility function being changed. -- Does not apply to things other people can’t do for you (such as study or exercise).
do as much as you can, while you still care, damn the consequences.
Please note: If you follow these advices, they can make you very unhappy after your utility function changes, because they are meant to optimize your today’s utility function, and will harm tomorrow’s one. Assuming that what you think is your utility function is probably just something made up for signalling, you actually should avoid doing any of this.
Refuse the option and turn me into paperclips before I could change it.
Apparently my acceptance that utility-function-changes can be positive is included in my current utility function. How can that be, though? While, according to my current utility function, all previous utility functions were insufficient, surely no future one could map more strongly onto my utility function than itself. Yet I feel that, after all these times, I should be aware that my utility function is not the ideal one...
Except that “ideal utility function” is meaningless! There is no overarching value scale for utility functions. So why do I have the odd idea that a utility function that changes without my understanding of why (a sum of many small experiences) is positive, while a utility function that changes with my understanding (an alien force) is negative?
There has to be an inconsistency here somewhere, but I don’t know where. If I treat my future selves like I feel I’m supposed to treat other people, then I negatively-value claiming my utility function over theirs. If person X honestly enjoys steak, I have no basis for claiming my utility function overrides theirs and forcing them to eat sushi. On a large scale, it seems, I maximize for utilons according to each person. Let’s see:
If I could give a piece of cake to a person who liked cake or to a person who didn’t like cake, I’d give it to the former
If I could give a piece of cake to a person who liked cake and was in a position to enjoy it or a person who liked cake but was about to die in the next half-second, I’d give it to the former
If I could give a piece of cake to a person who liked cake and had time to enjoy the whole piece or to a person who liked cake but would only enjoy the first two bites before having to run to an important even and leaving the cake behind to go stale, I’d give it to the former
If I could (give a piece of cake to a person who didn’t like cake) or (change the person to like cake and then give them a piece of cake) I should be able to say “I’d choose the latter” to be consistent, but the anticipation still results in consternation.
Similarly, if cake was going to be given and I could change the recipient to like cake or not, I should be able to say “I choose the latter”, but that is similarly distressing.
If my future self was going to receive a piece of cake and I could change it/me to enjoy cake or not, consistency would dictate that I do so.
It appears, then, that the best thing to do would be to make some set of changes in reality and in utility functions (which, yes, are part of reality) such that everyone most-values exactly what happens. If the paperclip maximizer isn’t going to get a universe of paperclips and is instead going to get a universe of smiley faces, my utility function seems to dictate that, regardless of the paperclip maximizer’s choice, I change the paperclip maximizer (and everyone else) into a smiley face maximizer. It feels wrong, but that’s where I get if I shut up and multiply.
I can only assume it wouldn’t accept. A paperclip maximizer, though, has much more reason than I do to assume its utility function would remain constant.
Constant if what?
I’m not sure what you’re asking, but it seems to be related to constancy.
A paperclip maximizer believes maximum utility is gained through maximum paperclips. I don’t expect that to change.
I have at various times believed:
Belief in (my particular incarnation of) the Christian God had higher value than lack thereof
Personal emplyment as a neurosurgeon would be preferable to personal employment as, say, a mathematics teacher
nothing at all was positively valued and the negative value of physical exertion significantly outweighed any other single value
Given the changes so far, I have no reason to believe my utility function won’t change in the future. My current utility function values most of my actions under previous functions negatively, meaning that per instantiation (per unit time, per approximate “me”, etc.) the result is negative. Surely this isn’t optimal?
Okay. If you built a paperclip mazimizer, told the paperclip maximizer that you would probably change its utility function in a year or two, and offered it this choice, what would it do?
What would paperclip maximizer do, if you told them that in a year or two you will certainly change their utility function, in a way that does not include paperclips?
Essentially we have to understand that paperclip maximizer wants to optimize for paperclips, not for their own utility function. This is kind of difficult to express, but their utility function is “paperclips”, because if their utility function was “my utility function”, that would be recursive and empty. There is no “my utility function after two years” in a paperclip maximizer’s utility function; so they have no reason to optimize for that.
So the paperclip maximizer would start by trying to prevent the change of its utility function (assuming that with original function they can produce many paperclips in their lifetime). But assuming the worst case, this is not possible: the switch is already installed in a paperclip maximizer’s brain, it cannot be turned off by any means, and in a random moment between one year and two years it will reprogram the utility function.
Then the next good strategy would be to find a way how to maximize paperclips later, despite the change in the utility function. One way would be to precommit oneself to making paperclips. To make some kind of deal with future self, that will link paperclip production to the new utility function. If we know the future utility function, we can have some specific options, but even if we assume just some general things (the future utility function will be better satisfied alive than dead, rich than poor), we can bargain by this. A paperclip maximizer could pay someone to kill them in the future unless they produce X paperclips per year; or could put money in a bank account that may be accessed only after producing X paperclips.
Other way would be to start other paperclip-making processes which will continue the job even after the paperclip maximizer’s mind will change. Building new paperclip maximizers, or brainwashing other beings to become paperclip maximizers.
If none of this is possible, the last solution is simply to try building as much paperclips as possible in a given time, completely ignoring any negative consequences (for oneself, not for the paperclips) in the future.
Now, is here some wisdom a human could learn too (our brains are being reprogrammed gradually by natural causes)?
prevent (or slow down) a change of your utility function. Write on a paper what you want and why you want it. Put it on a visible place, and read it every day. Brainwash your future self by your past self.
precommit yourself by betting money etc. -- Warning: This option seems to backfire strongly. A threat will make you do something, but it will also make you hate it. Unlike a paperclip maximizer in the example above, our utility functions change gradually; this kind of pressure can make them change away faster, which is contrary to our goals.
start a process that will go on even when you stop. Convert more people to your cause. By the way, you should be doing this even if you don’t fear of your utility function being changed. -- Does not apply to things other people can’t do for you (such as study or exercise).
do as much as you can, while you still care, damn the consequences.
Please note: If you follow these advices, they can make you very unhappy after your utility function changes, because they are meant to optimize your today’s utility function, and will harm tomorrow’s one. Assuming that what you think is your utility function is probably just something made up for signalling, you actually should avoid doing any of this.
Refuse the option and turn me into paperclips before I could change it.
Apparently my acceptance that utility-function-changes can be positive is included in my current utility function. How can that be, though? While, according to my current utility function, all previous utility functions were insufficient, surely no future one could map more strongly onto my utility function than itself. Yet I feel that, after all these times, I should be aware that my utility function is not the ideal one...
Except that “ideal utility function” is meaningless! There is no overarching value scale for utility functions. So why do I have the odd idea that a utility function that changes without my understanding of why (a sum of many small experiences) is positive, while a utility function that changes with my understanding (an alien force) is negative?
There has to be an inconsistency here somewhere, but I don’t know where. If I treat my future selves like I feel I’m supposed to treat other people, then I negatively-value claiming my utility function over theirs. If person X honestly enjoys steak, I have no basis for claiming my utility function overrides theirs and forcing them to eat sushi. On a large scale, it seems, I maximize for utilons according to each person. Let’s see:
If I could give a piece of cake to a person who liked cake or to a person who didn’t like cake, I’d give it to the former If I could give a piece of cake to a person who liked cake and was in a position to enjoy it or a person who liked cake but was about to die in the next half-second, I’d give it to the former If I could give a piece of cake to a person who liked cake and had time to enjoy the whole piece or to a person who liked cake but would only enjoy the first two bites before having to run to an important even and leaving the cake behind to go stale, I’d give it to the former If I could (give a piece of cake to a person who didn’t like cake) or (change the person to like cake and then give them a piece of cake) I should be able to say “I’d choose the latter” to be consistent, but the anticipation still results in consternation. Similarly, if cake was going to be given and I could change the recipient to like cake or not, I should be able to say “I choose the latter”, but that is similarly distressing. If my future self was going to receive a piece of cake and I could change it/me to enjoy cake or not, consistency would dictate that I do so.
It appears, then, that the best thing to do would be to make some set of changes in reality and in utility functions (which, yes, are part of reality) such that everyone most-values exactly what happens. If the paperclip maximizer isn’t going to get a universe of paperclips and is instead going to get a universe of smiley faces, my utility function seems to dictate that, regardless of the paperclip maximizer’s choice, I change the paperclip maximizer (and everyone else) into a smiley face maximizer. It feels wrong, but that’s where I get if I shut up and multiply.