Interesting. In that case, would you say an AI that provably implements CEV’s replacement is, for that reason, provably Friendly? That is, AIs implementing CEV’s replacement form an analytical subset of Friendly AIs? What is the current replacement for CEV anyway? Having some technical material would be even better. If it’s open to the public, then I’d like to understand how EY proposes to install a general framework similar to CEV at the “initial dynamic” stage that can predictably generate a provably Friendly AI without explicitly modeling the target of its Friendliness.
There isn’t really one as far as I know; “The Value Learning Problem” discusses some of the questions involved, but seems to mostly at be the point of defining the problem rather than trying to answer it. (This seems appropriate to me; trying to answer the problem at this point seems premature.)
Interesting. In that case, would you say an AI that provably implements CEV’s replacement is, for that reason, provably Friendly? That is, AIs implementing CEV’s replacement form an analytical subset of Friendly AIs? What is the current replacement for CEV anyway? Having some technical material would be even better. If it’s open to the public, then I’d like to understand how EY proposes to install a general framework similar to CEV at the “initial dynamic” stage that can predictably generate a provably Friendly AI without explicitly modeling the target of its Friendliness.
There isn’t really one as far as I know; “The Value Learning Problem” discusses some of the questions involved, but seems to mostly at be the point of defining the problem rather than trying to answer it. (This seems appropriate to me; trying to answer the problem at this point seems premature.)
Thanks. That makes sense to me.
I think that’s MIRI’s usage of the term friendly.
He’s not proposing a mechanism as far as I know. That’s another open problem.
See Miris research for details.