Stuart_Armstrong comments on How much can value learning be disentangled?

Stuart_Armstrong 1 Feb 2019 13:47 UTC
3 points

My impression is that in full generality it is unsolvable, but something like starting with an imprecise model of approval / utility function learned via ambitious value learning and restricting explanations/questions/manipulation by that may be work.

Yep. As so often, I think these things are not fully value agnostic, but don’t need full human values to be defined.