This might be a good strategy for an AI to use, but it is not an existential risk.
The risk is that AI may pretend to be friendly in self-defence, to avoid conflict during its early fragile phase. The cooperation with humans may be only partial; for example AI may give us useful things that will make us happy (for example cure for cancer), but withold things that would make us stronger (for example its new discoveries about self-modification and self-improvement).
Later, if the AI grows stronger faster than humans, and its goals are incompatible with human goals, it may be too late for humans to do anything about it. The AI will use the time to gain power and build backup systems.
Even if AI’s utility value is maximizing the total number of paperclips, it may realise that the best strategy for increasing the number of papierclips includes securing its survival, and this is best done by pretending to be human-friendly, and leave the open conflict for later.
The risk is that AI may pretend to be friendly in self-defence, to avoid conflict during its early fragile phase. The cooperation with humans may be only partial; for example AI may give us useful things that will make us happy (for example cure for cancer), but withold things that would make us stronger (for example its new discoveries about self-modification and self-improvement).
Later, if the AI grows stronger faster than humans, and its goals are incompatible with human goals, it may be too late for humans to do anything about it. The AI will use the time to gain power and build backup systems.
Even if AI’s utility value is maximizing the total number of paperclips, it may realise that the best strategy for increasing the number of papierclips includes securing its survival, and this is best done by pretending to be human-friendly, and leave the open conflict for later.