Nathaniel Monson comments on How to solve deception and still fail.

Nathaniel Monson 5 Oct 2023 7:02 UTC
3 points
0
I think I would describe both of those as deceptive, and was premising on non-deceptive AI.

If you think “nondeceptive AI” can refer to an AI which has a goal and is willing to mislead in service of that goal, then I agree; solving deception is insufficient. (Although in that case I disagree with your terminology).
- Charlie Steiner 5 Oct 2023 16:56 UTC
  3 points
  1
  Parent
  Fair point (though see also the section on how the training+deployment process can be “deceptive” even if the AI itself never searches for how to manipulate you). By “Solve deception” I mean that in a model-based RL kind of setting, we can know the AI’s policy and its prediction of future states of the world (it doesn’t somehow conceal this from us). I do not mean that the AI is acting like a helpful human who wants to be honest with us, even though that that’s a fairly natural interpretation.