TurnTrout comments on Do what we mean vs. do what we say

TurnTrout 31 Aug 2018 3:57 UTC
2 points
Perhaps “do what we say” is more like “know when the outside view says you’ve incorrectly converged to the wrong value function, so we’re probably right and you should listen to us”.
- Charlie Steiner 31 Aug 2018 20:44 UTC
  3 points
  Parent
  It’s somewhat more subtle than that. The ideal (and maybe impossible) corrigible AI should protect us even if we accidentally give the AI the wrong process for figuring out what to value. It should protect us even if the AI becomes omniscient.
  If the AI knows vastly more than we do, there’s no sense in which we are providing extra evidence or an information-carrying “outside view”. We are instead just registering a sort of complaint and hoping we’ve programmed the AI to listen.
  I’m still not convinced that such a sort of corrigibility is in any way distinct from some extra complications in the process we give the AI for figuring out what to value.
  - TurnTrout 1 Sep 2018 2:44 UTC
    3 points
    Parent
    The outside view I had in mind wasn’t with respect to its knowledge, but to empirical data on how often its exact value-learning algorithm converges to the correct set of preferences for agents-like-us. That feels different.
- Rohin Shah 1 Sep 2018 18:34 UTC
  1 point
  Parent
  I assume you’re talking about the particular “do what we say” subsystem described in the second last paragraph? If so, that seems plausibly right.