But I’m not quite sure how I feel about the reformulation in terms of semimeasures instead of deterministic programs. Part of my motivation for the environment-specific utility was to avoid privileging observations over unobservable events in the environment in terms of what the agent can care about. So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that’s happening in the environment.
I really dislike the discounting approach, because it doesn’t respect the given utility function and makes the agent miss out on potentially infinite amounts of utility.
If we’re going to allow infinite episodic utilities, we’ll need some way of comparing how big different nonconvergent series are. At that point, the utility calculation will not look like an infinite sum in the conventional sense. In a sense, I agree that discounting is inelegant because it treats different time steps differently, but on the other hand, I think asymmetry between different time steps is somewhat inevitable. For instance, presumably we want the agent to value getting a reward of 1 on even steps and 0 on odd steps higher than getting a reward of 1 on steps that are divisible by three and 0 on non-multiples of three. But those reward sequences can be placed in 1-to-1 correspondence. This fact causes me to grudgingly give up on time-symmetric utility functions.
one has to be careful to only use the computable subset of the set of all infinite strings ao1:∞
So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that’s happening in the environment.
Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?
If we’re going to allow infinite episodic utilities, we’ll need some way of comparing how big different nonconvergent series are.
I think we need that even without infinite episodic utilities. I still think there might be possibilities involving surreal numbers, but I haven’t found the time yet to develop this idea further.
Why?
Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.
Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?
The nice thing about using programs is that a program not only determines what your observation will be, but also the entire program state at each time. That way you can care about, for instance, the head of the Turing machine printing 0s to a region of the tape that you can’t see (making assumptions about how the UTM is implemented). I’m not sure how semimeasures are usually talked about in this context; if it’s something like a deterministic program plus a noisy observation channel, then there’s no problem, but if a semimeasure doesn’t tell you what the program state history is, or doesn’t even mention a program state, then a utility function defined over semimeasures doesn’t give you a way to care about the program state history (aka events in the environment).
I think we need that even without infinite episodic utilities.
I don’t understand. If all the series we care about converge, then why would we need to be able to compare convergent series?
I still think there might be possibilities involving surreal numbers, but I haven’t found the time yet to develop this idea further.
That might end up being fairly limited. Academian points out that if you define a “weak preference” of X over Y as a preference such that X is preferred over Y but there exist outcomes Z and W such that for all probabilities p>0, pZ + (1-p)Y is preferred over pW + (1-p)X, and a “strong preference” as a preference that is not a weak preference, then strong preferences are archimedian by construction, so by the VNM utility theorem, a real-valued utility function describes your strong preferences even if you omit the archimedian axiom (i.e. u(X) > u(Y) means a strong preference for X over Y, and u(X) = u(Y) means either indifference or a weak preference one way or the other). Exact ties between utilities of different outcomes should be rare, and resolving them correctly is infinitely less important than resolving strong preferences correctly. The problem with this that I just thought of is that conceivably there could be no strong preferences (i.e. for any preference, there is some other preference that is infinitely stronger). I suspect that this would cause something to go horribly wrong, but I don’t have a proof of that.
Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.
I’m not sure if this fits the definition of enumerable, but the utility function could accept a (possibly uncomputable) stream of actions. The utility function could be set up such that for any desired degree of precision, it only has to look at a finite number of elements. (I think that’s equivalent to the continuity assumption I made in my original post.)
Great post!
But I’m not quite sure how I feel about the reformulation in terms of semimeasures instead of deterministic programs. Part of my motivation for the environment-specific utility was to avoid privileging observations over unobservable events in the environment in terms of what the agent can care about. So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that’s happening in the environment.
If we’re going to allow infinite episodic utilities, we’ll need some way of comparing how big different nonconvergent series are. At that point, the utility calculation will not look like an infinite sum in the conventional sense. In a sense, I agree that discounting is inelegant because it treats different time steps differently, but on the other hand, I think asymmetry between different time steps is somewhat inevitable. For instance, presumably we want the agent to value getting a reward of 1 on even steps and 0 on odd steps higher than getting a reward of 1 on steps that are divisible by three and 0 on non-multiples of three. But those reward sequences can be placed in 1-to-1 correspondence. This fact causes me to grudgingly give up on time-symmetric utility functions.
Why?
Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?
I think we need that even without infinite episodic utilities. I still think there might be possibilities involving surreal numbers, but I haven’t found the time yet to develop this idea further.
Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.
The nice thing about using programs is that a program not only determines what your observation will be, but also the entire program state at each time. That way you can care about, for instance, the head of the Turing machine printing 0s to a region of the tape that you can’t see (making assumptions about how the UTM is implemented). I’m not sure how semimeasures are usually talked about in this context; if it’s something like a deterministic program plus a noisy observation channel, then there’s no problem, but if a semimeasure doesn’t tell you what the program state history is, or doesn’t even mention a program state, then a utility function defined over semimeasures doesn’t give you a way to care about the program state history (aka events in the environment).
I don’t understand. If all the series we care about converge, then why would we need to be able to compare convergent series?
That might end up being fairly limited. Academian points out that if you define a “weak preference” of X over Y as a preference such that X is preferred over Y but there exist outcomes Z and W such that for all probabilities p>0, pZ + (1-p)Y is preferred over pW + (1-p)X, and a “strong preference” as a preference that is not a weak preference, then strong preferences are archimedian by construction, so by the VNM utility theorem, a real-valued utility function describes your strong preferences even if you omit the archimedian axiom (i.e. u(X) > u(Y) means a strong preference for X over Y, and u(X) = u(Y) means either indifference or a weak preference one way or the other). Exact ties between utilities of different outcomes should be rare, and resolving them correctly is infinitely less important than resolving strong preferences correctly. The problem with this that I just thought of is that conceivably there could be no strong preferences (i.e. for any preference, there is some other preference that is infinitely stronger). I suspect that this would cause something to go horribly wrong, but I don’t have a proof of that.
I’m not sure if this fits the definition of enumerable, but the utility function could accept a (possibly uncomputable) stream of actions. The utility function could be set up such that for any desired degree of precision, it only has to look at a finite number of elements. (I think that’s equivalent to the continuity assumption I made in my original post.)