I think I would describe both of those as deceptive, and was premising on non-deceptive AI.
If you think “nondeceptive AI” can refer to an AI which has a goal and is willing to mislead in service of that goal, then I agree; solving deception is insufficient. (Although in that case I disagree with your terminology).
Fair point (though see also the section on how the training+deployment process can be “deceptive” even if the AI itself never searches for how to manipulate you). By “Solve deception” I mean that in a model-based RL kind of setting, we can know the AI’s policy and its prediction of future states of the world (it doesn’t somehow conceal this from us). I do not mean that the AI is acting like a helpful human who wants to be honest with us, even though that that’s a fairly natural interpretation.
I think I would describe both of those as deceptive, and was premising on non-deceptive AI.
If you think “nondeceptive AI” can refer to an AI which has a goal and is willing to mislead in service of that goal, then I agree; solving deception is insufficient. (Although in that case I disagree with your terminology).
Fair point (though see also the section on how the training+deployment process can be “deceptive” even if the AI itself never searches for how to manipulate you). By “Solve deception” I mean that in a model-based RL kind of setting, we can know the AI’s policy and its prediction of future states of the world (it doesn’t somehow conceal this from us). I do not mean that the AI is acting like a helpful human who wants to be honest with us, even though that that’s a fairly natural interpretation.