Vladimir_Nesov comments on Formalizing Value Extrapolation

Vladimir_Nesov 26 Apr 2012 22:09 UTC
4 points
I’m not talking about Paul’s proposal in particular, but about eventually-Friendly AIs in general. Their defining feature is that they have correct Friendly goal given by a complicated definition that leaves a lot of logical uncertainty about the goal until it’s eventually made more explicit. So we might explore the neighborhood of normal FAIs, increasing the initial logical uncertainty about their goal, so that they become more and more prone to initial pursuit of generic instrumental gains at the expense of what they eventually realize to be their values.
- Wei Dai 26 Apr 2012 22:21 UTC
  4 points
  Parent
  Oh, please reinterpret my comment as replying to this comment of yours. (That one is specifically talking about Paul’s proposal, right?)
  - Vladimir_Nesov 26 Apr 2012 22:33 UTC
    5 points
    Parent
    Well, yes, but I interpreted the problem of impossibly complicated value definition as the eFAI* (which does seem to be a problem with Paul’s specific proposal, even if we assume that it theoretically converges to a FAI) never coming out of its destructive phase, and hence possibly just eating the universe without producing anything of value, so “destroy the world” is in a sense the sole manifestation of the problem with a hypothetical implementation of that proposal...
    
    [* eFAI = eventually-Friendly AI, let’s coin this term]