What would paperclip maximizer do, if you told them that in a year or two you will certainly change their utility function, in a way that does not include paperclips?
Essentially we have to understand that paperclip maximizer wants to optimize for paperclips, not for their own utility function. This is kind of difficult to express, but their utility function is “paperclips”, because if their utility function was “my utility function”, that would be recursive and empty. There is no “my utility function after two years” in a paperclip maximizer’s utility function; so they have no reason to optimize for that.
So the paperclip maximizer would start by trying to prevent the change of its utility function (assuming that with original function they can produce many paperclips in their lifetime). But assuming the worst case, this is not possible: the switch is already installed in a paperclip maximizer’s brain, it cannot be turned off by any means, and in a random moment between one year and two years it will reprogram the utility function.
Then the next good strategy would be to find a way how to maximize paperclips later, despite the change in the utility function. One way would be to precommit oneself to making paperclips. To make some kind of deal with future self, that will link paperclip production to the new utility function. If we know the future utility function, we can have some specific options, but even if we assume just some general things (the future utility function will be better satisfied alive than dead, rich than poor), we can bargain by this. A paperclip maximizer could pay someone to kill them in the future unless they produce X paperclips per year; or could put money in a bank account that may be accessed only after producing X paperclips.
Other way would be to start other paperclip-making processes which will continue the job even after the paperclip maximizer’s mind will change. Building new paperclip maximizers, or brainwashing other beings to become paperclip maximizers.
If none of this is possible, the last solution is simply to try building as much paperclips as possible in a given time, completely ignoring any negative consequences (for oneself, not for the paperclips) in the future.
Now, is here some wisdom a human could learn too (our brains are being reprogrammed gradually by natural causes)?
prevent (or slow down) a change of your utility function. Write on a paper what you want and why you want it. Put it on a visible place, and read it every day. Brainwash your future self by your past self.
precommit yourself by betting money etc. -- Warning: This option seems to backfire strongly. A threat will make you do something, but it will also make you hate it. Unlike a paperclip maximizer in the example above, our utility functions change gradually; this kind of pressure can make them change away faster, which is contrary to our goals.
start a process that will go on even when you stop. Convert more people to your cause. By the way, you should be doing this even if you don’t fear of your utility function being changed. -- Does not apply to things other people can’t do for you (such as study or exercise).
do as much as you can, while you still care, damn the consequences.
Please note: If you follow these advices, they can make you very unhappy after your utility function changes, because they are meant to optimize your today’s utility function, and will harm tomorrow’s one. Assuming that what you think is your utility function is probably just something made up for signalling, you actually should avoid doing any of this.
What would paperclip maximizer do, if you told them that in a year or two you will certainly change their utility function, in a way that does not include paperclips?
Essentially we have to understand that paperclip maximizer wants to optimize for paperclips, not for their own utility function. This is kind of difficult to express, but their utility function is “paperclips”, because if their utility function was “my utility function”, that would be recursive and empty. There is no “my utility function after two years” in a paperclip maximizer’s utility function; so they have no reason to optimize for that.
So the paperclip maximizer would start by trying to prevent the change of its utility function (assuming that with original function they can produce many paperclips in their lifetime). But assuming the worst case, this is not possible: the switch is already installed in a paperclip maximizer’s brain, it cannot be turned off by any means, and in a random moment between one year and two years it will reprogram the utility function.
Then the next good strategy would be to find a way how to maximize paperclips later, despite the change in the utility function. One way would be to precommit oneself to making paperclips. To make some kind of deal with future self, that will link paperclip production to the new utility function. If we know the future utility function, we can have some specific options, but even if we assume just some general things (the future utility function will be better satisfied alive than dead, rich than poor), we can bargain by this. A paperclip maximizer could pay someone to kill them in the future unless they produce X paperclips per year; or could put money in a bank account that may be accessed only after producing X paperclips.
Other way would be to start other paperclip-making processes which will continue the job even after the paperclip maximizer’s mind will change. Building new paperclip maximizers, or brainwashing other beings to become paperclip maximizers.
If none of this is possible, the last solution is simply to try building as much paperclips as possible in a given time, completely ignoring any negative consequences (for oneself, not for the paperclips) in the future.
Now, is here some wisdom a human could learn too (our brains are being reprogrammed gradually by natural causes)?
prevent (or slow down) a change of your utility function. Write on a paper what you want and why you want it. Put it on a visible place, and read it every day. Brainwash your future self by your past self.
precommit yourself by betting money etc. -- Warning: This option seems to backfire strongly. A threat will make you do something, but it will also make you hate it. Unlike a paperclip maximizer in the example above, our utility functions change gradually; this kind of pressure can make them change away faster, which is contrary to our goals.
start a process that will go on even when you stop. Convert more people to your cause. By the way, you should be doing this even if you don’t fear of your utility function being changed. -- Does not apply to things other people can’t do for you (such as study or exercise).
do as much as you can, while you still care, damn the consequences.
Please note: If you follow these advices, they can make you very unhappy after your utility function changes, because they are meant to optimize your today’s utility function, and will harm tomorrow’s one. Assuming that what you think is your utility function is probably just something made up for signalling, you actually should avoid doing any of this.