If you limit the domain of your utility function to a sensory channel, you have already lost; you are forced into a choice between a utility function that is wrong, or a utility function with a second induction system hidden inside it. This is definitely unrecoverable.
However, I see no reason for Solomonoff-inspired agents to be structured that way. If the utility function’s domain is a world-model instead, then it can find itself in that world-model and the self-modeling problem vanishes immediately, leaving only the hard but philosophically-valid problem of defining the utility function we want.
There’s also the problem of actually building such a thing.
edit: I should add, the problem of building this particular thing is above and beyond the already difficult problem of building any AGI, let alone a friendly one: how do you make a thing’s utility function correspond to the world and not to its perceptions? All it has immediately available to it is perception.
Predicting the input to a sensory channel is easy and straightforward. I’m not even sure where you would begin creating a program that can model the universe in a way that it can find a copy of itself inside of it. Then creating a utility function that can assign a sensible utility to the state of any arbitrary Turing machine?
If you limit the domain of your utility function to a sensory channel, you have already lost; you are forced into a choice between a utility function that is wrong, or a utility function with a second induction system hidden inside it. This is definitely unrecoverable.
However, I see no reason for Solomonoff-inspired agents to be structured that way. If the utility function’s domain is a world-model instead, then it can find itself in that world-model and the self-modeling problem vanishes immediately, leaving only the hard but philosophically-valid problem of defining the utility function we want.
Relevant paper: Model-based Utility Functions.
There’s also the problem of actually building such a thing.
edit: I should add, the problem of building this particular thing is above and beyond the already difficult problem of building any AGI, let alone a friendly one: how do you make a thing’s utility function correspond to the world and not to its perceptions? All it has immediately available to it is perception.
This is exactly how my formalism works.
Alex Mennen has described a version of AIXI with a utility function of the environment.
Predicting the input to a sensory channel is easy and straightforward. I’m not even sure where you would begin creating a program that can model the universe in a way that it can find a copy of itself inside of it. Then creating a utility function that can assign a sensible utility to the state of any arbitrary Turing machine?