The concern may be that this is too anthropomorphizing. However, I do not make the assumption that the Paperclip-Maximizer necessarily has a conscious or human-like mind to begin with. I am only using the presumption that, given we assume this is an intelligence capable of destroying civilization, it is smart enough to have the ability to understand humans enough to do so. If it uses humans for their atoms, then it is powerful enough to model humans at sufficient capacity to potentially choose to understand what it is like to be one of them. I am not saying a situation is necessarily guaranteed, only that it seems as though it might happen.
I do think that if it chooses to do that, then it must be capable of the following:
Knowing that running such a simulation will not risk that it changes its terminal goals.
Being able to run, step in (if it is capable of experiencing it directly), step out, as desired with its terminal goals intact.
It must be reasonably certain that even if such an action results in the conclusion that human-like terminal goals are superior, it does not mean it must switch to using them right away. It can reflect on whether or not they are compatible with its current goals / utility function.
However, there is one key point that I created this question for the purpose of exploring:
It will run the human mind (or any mind it chooses to simulate) performing exactly the same test back on itself. To do this does not harm any of the three stipulations that I’ve mentioned so far. In fact, it ought to support them; I’ll explain my reasoning for this in more detail if requested.
The concern may be that this is too anthropomorphizing. However, I do not make the assumption that the Paperclip-Maximizer necessarily has a conscious or human-like mind to begin with. I am only using the presumption that, given we assume this is an intelligence capable of destroying civilization, it is smart enough to have the ability to understand humans enough to do so. If it uses humans for their atoms, then it is powerful enough to model humans at sufficient capacity to potentially choose to understand what it is like to be one of them. I am not saying a situation is necessarily guaranteed, only that it seems as though it might happen.
I do think that if it chooses to do that, then it must be capable of the following:
Knowing that running such a simulation will not risk that it changes its terminal goals.
Being able to run, step in (if it is capable of experiencing it directly), step out, as desired with its terminal goals intact.
It must be reasonably certain that even if such an action results in the conclusion that human-like terminal goals are superior, it does not mean it must switch to using them right away. It can reflect on whether or not they are compatible with its current goals / utility function.
However, there is one key point that I created this question for the purpose of exploring:
It will run the human mind (or any mind it chooses to simulate) performing exactly the same test back on itself. To do this does not harm any of the three stipulations that I’ve mentioned so far. In fact, it ought to support them; I’ll explain my reasoning for this in more detail if requested.