This really is the most realistic scenario for AGI in general, given the generality of the RL architecture.
Of course, “gradually training the AGI’s values through an extended childhood” gets tricky if it turns out that there’s a hard takeoff.
So say you train the AI to compute a mapping between a sentence in english describing a moral scenario and a corresponding sentiment/utility, how do you translate that into the AI’s reward/utility function? You’d need to somehow also map encodings of imagined moral scenarios back and forth between encodings of observation histories.
I was thinking that the task of training the AI to classify human judgments would then lead to it building up a model of human values, similar to the way that training a system to do word prediction builds up a language / world model. You make a good point of the need to then ground those values further; I haven’t really thought about that part.
Of course, “gradually training the AGI’s values through an extended childhood” gets tricky if it turns out that there’s a hard takeoff.
Yes. Once you get the AGI up to roughly human child level, presumably autodidactic learning could takeover. Reading and understanding text on the internet is a specific key activity that could likely be sped up by a large factor.
So—then we need ways to speed up human interaction and supervision to match.
Of course, “gradually training the AGI’s values through an extended childhood” gets tricky if it turns out that there’s a hard takeoff.
I was thinking that the task of training the AI to classify human judgments would then lead to it building up a model of human values, similar to the way that training a system to do word prediction builds up a language / world model. You make a good point of the need to then ground those values further; I haven’t really thought about that part.
Yes. Once you get the AGI up to roughly human child level, presumably autodidactic learning could takeover. Reading and understanding text on the internet is a specific key activity that could likely be sped up by a large factor.
So—then we need ways to speed up human interaction and supervision to match.