Interesting! Have you written about this idea in more detail elsewhere? Here are my concerns about it:
The AI has to infer the human’s goals. Given the assumed/required cognitive limitations, it may not do a particularly good job of this.
What if the human doesn’t fully understand his or her own goals? What does the AI do in that situation?
The AI could do something like plant a hidden time-bomb in its own code, so that its goal system reverts from the post-modification “close to humans” back to its original goals at some future time when it’s no longer punishable by humans.
Given these problems and the various requirements on the AI for it to be successfully socialized, I don’t understand why you assign only 0.1 probability to the AI not being socialized.
Interesting! Have you written about this idea in more detail elsewhere? Here are my concerns about it:
The AI has to infer the human’s goals. Given the assumed/required cognitive limitations, it may not do a particularly good job of this.
What if the human doesn’t fully understand his or her own goals? What does the AI do in that situation?
The AI could do something like plant a hidden time-bomb in its own code, so that its goal system reverts from the post-modification “close to humans” back to its original goals at some future time when it’s no longer punishable by humans.
Given these problems and the various requirements on the AI for it to be successfully socialized, I don’t understand why you assign only 0.1 probability to the AI not being socialized.