It’s good that you realize that the way a general mind thinks about identity is arbitrary (as is the way a general mind thinks of anything) and that, if we chose too, we could build an AI that thinks of humans as part of itself. However, you should dissolve this idea of identity even further. Recognizing objects as part of oneself or not is useful for humans’ cognition, but there is no reason that an AI should find it similarly useful.
Even in your model, the AI would still need to have some concept of the computer that implements it and some concept of humanity. In order for these to both be subsumed into one concept of self, it would need to be designed to take actions based on this concept of self. Humans do that, but just putting the AI in the same part of the AI’s map with humanity isn’t going to automatically change its actions. Sentences like “But it highlights a danger, humans should be integrated with the system in a way that should seem important, rather than something that can be discarded like hair.” assume that the AI already has certain preferences regarding its ‘self’, but those are only there if we program them in. If we are going to do that, we might as well just program it to have certain attitudes toward humans and certain attitudes toward the computer running it.
Classifying all humans as human helps it make some decisions, because there are certain ways in which humans are similar and need to be treated similarly, but there is no obvious reason why humans and the AI need to be treated any more similarly to each other than to a number of other possible objects, so this added classification does not seem like it will be very helpful for us when we are specifying its utility function.
Side note: With a few minutes more effort, I could add a number of links to the sequences to this comment and to other, similar comments I make in the future. How helpful would you find that?
I agree with Omohundro’s conclusions in this paper. The important concept here, though Omohundro does not use the term, is a subgoal. A subgoal is a goal that one adopts because, and only insofar as, it furthers another goal. Eliezer has a good explanation of this here.
For example, a paperclip maximizer does not care whether it exists, as long as the same amount of paperclips are created. However, a world without the paperclip maximizer would have far fewer paperclips because there would be no one who would want to create so many. Therefore, it decides to preserve its existence because, and only insofar as, its existence causes more paperclips to exist. We can’t hack this by changing its idea of identity; it wants to preserve those things that will cause paperclips to exist, regardless of whether we give them tiny XML tags that say ‘self’. Omohundro’s drives are properties of goal systems, not things that we can change by categorizing objects differently.
It’s good that you realize that the way a general mind thinks about identity is arbitrary (as is the way a general mind thinks of anything) and that, if we chose too, we could build an AI that thinks of humans as part of itself. However, you should dissolve this idea of identity even further. Recognizing objects as part of oneself or not is useful for humans’ cognition, but there is no reason that an AI should find it similarly useful.
Even in your model, the AI would still need to have some concept of the computer that implements it and some concept of humanity. In order for these to both be subsumed into one concept of self, it would need to be designed to take actions based on this concept of self. Humans do that, but just putting the AI in the same part of the AI’s map with humanity isn’t going to automatically change its actions. Sentences like “But it highlights a danger, humans should be integrated with the system in a way that should seem important, rather than something that can be discarded like hair.” assume that the AI already has certain preferences regarding its ‘self’, but those are only there if we program them in. If we are going to do that, we might as well just program it to have certain attitudes toward humans and certain attitudes toward the computer running it.
Classifying all humans as human helps it make some decisions, because there are certain ways in which humans are similar and need to be treated similarly, but there is no obvious reason why humans and the AI need to be treated any more similarly to each other than to a number of other possible objects, so this added classification does not seem like it will be very helpful for us when we are specifying its utility function.
Side note: With a few minutes more effort, I could add a number of links to the sequences to this comment and to other, similar comments I make in the future. How helpful would you find that?
What is your opinion on Omohundro’s drives?
I agree with Omohundro’s conclusions in this paper. The important concept here, though Omohundro does not use the term, is a subgoal. A subgoal is a goal that one adopts because, and only insofar as, it furthers another goal. Eliezer has a good explanation of this here.
For example, a paperclip maximizer does not care whether it exists, as long as the same amount of paperclips are created. However, a world without the paperclip maximizer would have far fewer paperclips because there would be no one who would want to create so many. Therefore, it decides to preserve its existence because, and only insofar as, its existence causes more paperclips to exist. We can’t hack this by changing its idea of identity; it wants to preserve those things that will cause paperclips to exist, regardless of whether we give them tiny XML tags that say ‘self’. Omohundro’s drives are properties of goal systems, not things that we can change by categorizing objects differently.