I can see this being a problem, though. What is to stop us from doing the following. Let Ut(x) be my true utility function and let Up(y) be my so called *practical* utility function. Furthermore, let Up(y)=x so that Ut(x)=Ut(Up(y)). If we agree that changing the taste function doesn’t alter the utility function, then changing Up(y)shouldn’t alter my utility function—but this is all it is based on!
In the apple/chocolate/banana case, I prefer worlds in which I have a subjective feeling of good taste, so taking the pill doesn’t change my preferences or utility function. In this case, I care directly about y, so if you change Up(y) that is going to change my preferences/utility function. It is not the case that if I can define my utility function in terms of some other function, I can just change that other function and things are still fine—it depends on a case-by-case basis.
In particular, with my current preferences/utility function, in the apple/chocolate/banana case, I would say “I would prefer to have a banana and the ability to find bananas tasty to having chocolate right now”. I wouldn’t say the corresponding statement for Up(y).
Also, general note—when you aren’t dealing with probability, “having a utility function” means “having transitive preferences about all world-histories” (or world-states if you don’t care about actions or paths). In that case, it’s better to stick to thinking about preferences, they are easier to work with. (For example, this comment is probably best understood from a preferences perspective, not a utility function perspective)
In the apple/chocolate/banana case, I prefer worlds in which I have a subjective feeling of good taste, so taking the pill doesn’t change my preferences or utility function. In this case, I care directly about
y
, so if you changeUp(y)
that is going to change my preferences/utility function. It is not the case that if I can define my utility function in terms of some other function, I can just change that other function and things are still fine—it depends on a case-by-case basis.In particular, with my current preferences/utility function, in the apple/chocolate/banana case, I would say “I would prefer to have a banana and the ability to find bananas tasty to having chocolate right now”. I wouldn’t say the corresponding statement for
Up(y)
.Also, general note—when you aren’t dealing with probability, “having a utility function” means “having transitive preferences about all world-histories” (or world-states if you don’t care about actions or paths). In that case, it’s better to stick to thinking about preferences, they are easier to work with. (For example, this comment is probably best understood from a preferences perspective, not a utility function perspective)
Also, Self-Modification of Policy and Utility Function in Rational Agents looks into related issues, and my take on Towards Interactive Inverse Reinforcement Learning is that the problem it points at is similar in flavor to the ideas in this post.