Since Stuart Russell’s proposed alignment solution in Human Compatible is the most publicly-prominent alignment agenda, I should be more explicit about my belief that it almost entirely fails to address the core problems I expect on realistic pathways to AGI.
Specifying an update rule which converges to a desirable goal is just a reframing of the problem of specifying a desirable goal, with the “uncertainty” part as a red herring. https://arbital.com/p/updated_deference/… In other words, Russell gives a wrong-way reduction.
My impression is that ~everyone I know in the alignment community is very pessimistic about SR’s agenda. Does it sound right that your view is basically a consensus? (There’s prob some selection bias in who I know).
Richard responds:
I think it’s fair to say that this is a pretty widespread opinion. Partly it’s because Stuart is much more skeptical of deep learning (and even machine learning more generally!) than almost any other alignment researcher, and so he’s working in a different paradigm.
Is Richard correct and if so why? (I would also like a clearer explanation why Richard is skeptical of Stuart’s agenda. I agree that the reframing doesn’t completely solve the problem, but I don’t understand why it can’t be a useful piece).
[Question] Is CIRL a promising agenda?
Richard Ngo writes
Howie writes:
Richard responds:
Is Richard correct and if so why? (I would also like a clearer explanation why Richard is skeptical of Stuart’s agenda. I agree that the reframing doesn’t completely solve the problem, but I don’t understand why it can’t be a useful piece).