I think it’s very important to be clear you’re not conditioning on something incoherent here.
In particular, [an AI that never misleads the user about anything (whether intentional or otherwise)] is incoherent: any statement an AI can make will update some of your expectations in the direction of being more correct, and some away from being correct. (it’s important here that when a statement is made you don’t learn [statement], but rather [x made statement]; only the former can be empty)
I say non-misleading-to-you things to the extent that I understand your capabilities and what you value, and apply that understanding in forming my statements.
[Don’t ever be misleading] cannot be satisfied. [Don’t ever be misleading in ways that we consider important] requires understanding human values and optimizing answers for non-misleadingness given those values. NB not [answer as a human would], or [give an answer that a human would approve of].
With a fuzzy notion of deception, it’s too easy to do a selective, post-hoc classification and say “Ah well, that would be deception” for any outcome we don’t like. But the outcomes we like are also misleading—just in ways we didn’t happen to notice and care about. This smuggles in a requirement that’s closer in character to alignment than to non-deception.
Conversely, non-fuzzy notions of deception don’t tend to cover all the failure modes (e.g. this is nice, but avoiding deception-in-this-sense doesn’t guarantee much).
I think it’s very important to be clear you’re not conditioning on something incoherent here.
In particular, [an AI that never misleads the user about anything (whether intentional or otherwise)] is incoherent: any statement an AI can make will update some of your expectations in the direction of being more correct, and some away from being correct. (it’s important here that when a statement is made you don’t learn [statement], but rather [x made statement]; only the former can be empty)
I say non-misleading-to-you things to the extent that I understand your capabilities and what you value, and apply that understanding in forming my statements.
[Don’t ever be misleading] cannot be satisfied.
[Don’t ever be misleading in ways that we consider important] requires understanding human values and optimizing answers for non-misleadingness given those values.
NB not [answer as a human would], or [give an answer that a human would approve of].
With a fuzzy notion of deception, it’s too easy to do a selective, post-hoc classification and say “Ah well, that would be deception” for any outcome we don’t like. But the outcomes we like are also misleading—just in ways we didn’t happen to notice and care about.
This smuggles in a requirement that’s closer in character to alignment than to non-deception.
Conversely, non-fuzzy notions of deception don’t tend to cover all the failure modes (e.g. this is nice, but avoiding deception-in-this-sense doesn’t guarantee much).