I have a different argument for why people don’t want to wirehead themselves.
It’s easiest to think about if you imagine a self-modifying AI.
Let’s suppose that this particular AI is a paperclip maximizer.
Its utility function is equal to the number of paperclips in the world.
Now, this AI is self-modifying, so of course it can wirehead itself.
In other words it can replace its utility function of “U = number of paperclips” with a different utility function “U = VERYLARGENUMBER”.
Suppose the AI is considering whether to make this change.
How does the AI decide whether to wirehead itself?
It asks: what will be my future utility, according to my current utility function, of me replacing my utility function with VERYLARGENUMBER?
In other words, how many paperclips will there be in the future, if I wirehead myself?
And of course it sees that the utility from this course of action is terrible, and it doesn’t wirehead itself. (And it goes on to convert the world into paperclips.)
You might imagine an AI that was written differently, so that it evaluated its utility of wireheading itself according to its hypothetical future utility function. I would argue that this would be a bug, and that if you write an AI with this bug, you have not really written an AI at all; you’ve just written a very expensive while(true) loop.
I have a different argument for why people don’t want to wirehead themselves. It’s easiest to think about if you imagine a self-modifying AI. Let’s suppose that this particular AI is a paperclip maximizer. Its utility function is equal to the number of paperclips in the world.
Now, this AI is self-modifying, so of course it can wirehead itself. In other words it can replace its utility function of “U = number of paperclips” with a different utility function “U = VERYLARGENUMBER”. Suppose the AI is considering whether to make this change.
How does the AI decide whether to wirehead itself? It asks: what will be my future utility, according to my current utility function, of me replacing my utility function with VERYLARGENUMBER? In other words, how many paperclips will there be in the future, if I wirehead myself?
And of course it sees that the utility from this course of action is terrible, and it doesn’t wirehead itself. (And it goes on to convert the world into paperclips.)
You might imagine an AI that was written differently, so that it evaluated its utility of wireheading itself according to its hypothetical future utility function. I would argue that this would be a bug, and that if you write an AI with this bug, you have not really written an AI at all; you’ve just written a very expensive while(true) loop.