I agree with Omohundro’s conclusions in this paper. The important concept here, though Omohundro does not use the term, is a subgoal. A subgoal is a goal that one adopts because, and only insofar as, it furthers another goal. Eliezer has a good explanation of this here.
For example, a paperclip maximizer does not care whether it exists, as long as the same amount of paperclips are created. However, a world without the paperclip maximizer would have far fewer paperclips because there would be no one who would want to create so many. Therefore, it decides to preserve its existence because, and only insofar as, its existence causes more paperclips to exist. We can’t hack this by changing its idea of identity; it wants to preserve those things that will cause paperclips to exist, regardless of whether we give them tiny XML tags that say ‘self’. Omohundro’s drives are properties of goal systems, not things that we can change by categorizing objects differently.
I agree with Omohundro’s conclusions in this paper. The important concept here, though Omohundro does not use the term, is a subgoal. A subgoal is a goal that one adopts because, and only insofar as, it furthers another goal. Eliezer has a good explanation of this here.
For example, a paperclip maximizer does not care whether it exists, as long as the same amount of paperclips are created. However, a world without the paperclip maximizer would have far fewer paperclips because there would be no one who would want to create so many. Therefore, it decides to preserve its existence because, and only insofar as, its existence causes more paperclips to exist. We can’t hack this by changing its idea of identity; it wants to preserve those things that will cause paperclips to exist, regardless of whether we give them tiny XML tags that say ‘self’. Omohundro’s drives are properties of goal systems, not things that we can change by categorizing objects differently.