I believe this is the idea of “motivational scaffolding” described in Superintelligence. Make an AI that just learns a model of the world, including what words mean. Then you can describe its utility function in terms of that model—without having to define exactly what the words and concepts mean.
This is much easier said than done. It’s “easy” to train an AI to learn a model of the world, but how exactly do you use that model to make a utility function?
I believe this is the idea of “motivational scaffolding” described in Superintelligence. Make an AI that just learns a model of the world, including what words mean. Then you can describe its utility function in terms of that model—without having to define exactly what the words and concepts mean.
This is much easier said than done. It’s “easy” to train an AI to learn a model of the world, but how exactly do you use that model to make a utility function?