But it seems like it would still need to have a worked-out theory of mind, just to get to the point of understanding that humans are agent-like things that could bear on the AGI’s self-preservation.
It could happen before it understands us—if you don’t like things that are difficult to predict*, and you find people difficult to predict, then do you dislike people?
(And killing living creatures seems a bit easier than destroying rocks.)
Wouldn’t an AI following that procedure be really easy to spot? (Because it’s not deceptive, and it just starts trying to destroy things it can’t predict as it encounters them.)
It could happen before it understands us—if you don’t like things that are difficult to predict*, and you find people difficult to predict, then do you dislike people?
(And killing living creatures seems a bit easier than destroying rocks.)
Wouldn’t an AI following that procedure be really easy to spot? (Because it’s not deceptive, and it just starts trying to destroy things it can’t predict as it encounters them.)