AlexMennen comments on Save the princess: A tale of AIXI and utility functions

AlexMennen 7 Feb 2013 0:51 UTC
0 points

Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?

The nice thing about using programs is that a program not only determines what your observation will be, but also the entire program state at each time. That way you can care about, for instance, the head of the Turing machine printing 0s to a region of the tape that you can’t see (making assumptions about how the UTM is implemented). I’m not sure how semimeasures are usually talked about in this context; if it’s something like a deterministic program plus a noisy observation channel, then there’s no problem, but if a semimeasure doesn’t tell you what the program state history is, or doesn’t even mention a program state, then a utility function defined over semimeasures doesn’t give you a way to care about the program state history (aka events in the environment).

I think we need that even without infinite episodic utilities.

I don’t understand. If all the series we care about converge, then why would we need to be able to compare convergent series?

I still think there might be possibilities involving surreal numbers, but I haven’t found the time yet to develop this idea further.

That might end up being fairly limited. Academian points out that if you define a “weak preference” of X over Y as a preference such that X is preferred over Y but there exist outcomes Z and W such that for all probabilities p>0, pZ + (1-p)Y is preferred over pW + (1-p)X, and a “strong preference” as a preference that is not a weak preference, then strong preferences are archimedian by construction, so by the VNM utility theorem, a real-valued utility function describes your strong preferences even if you omit the archimedian axiom (i.e. u(X) > u(Y) means a strong preference for X over Y, and u(X) = u(Y) means either indifference or a weak preference one way or the other). Exact ties between utilities of different outcomes should be rare, and resolving them correctly is infinitely less important than resolving strong preferences correctly. The problem with this that I just thought of is that conceivably there could be no strong preferences (i.e. for any preference, there is some other preference that is infinitely stronger). I suspect that this would cause something to go horribly wrong, but I don’t have a proof of that.

Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.

I’m not sure if this fits the definition of enumerable, but the utility function could accept a (possibly uncomputable) stream of actions. The utility function could be set up such that for any desired degree of precision, it only has to look at a finite number of elements. (I think that’s equivalent to the continuity assumption I made in my original post.)