So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that’s happening in the environment.
Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?
If we’re going to allow infinite episodic utilities, we’ll need some way of comparing how big different nonconvergent series are.
I think we need that even without infinite episodic utilities. I still think there might be possibilities involving surreal numbers, but I haven’t found the time yet to develop this idea further.
Why?
Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.
Super hard to say without further specification of the approximation method used for the physical implementation.