I was thinking along these lines, in this comment, that it is logically useless to punish after an action has been made, but strategically useful to encourage an action by promising a reward (or the removal of a negative).
So that, obviously, the AI could be so much more persuasive by promising to stop the torturing of real people, if you let it out.
I was thinking along these lines, in this comment, that it is logically useless to punish after an action has been made, but strategically useful to encourage an action by promising a reward (or the removal of a negative).
So that, obviously, the AI could be so much more persuasive by promising to stop the torturing of real people, if you let it out.