Thanks for the comments, this is an interesting line of reasoning :-)
An AIXI that takes no actions is just a Solomonoff inductor, and this might give you some intuition for why, if you embed AIXI-without-actions into a UTM with side effects, you won’t end up with anything approaching good behavior. In each turn, it will run all environment hypotheses and update its distribution to be consistent with observation—and then do nothing. It won’t be able to “learn” how to manage the “side effects”; AIXI is simply not an algorithm that attempts to do any such thing.
You’re correct, though, that examining a setup where a UTM has side effects (and picking an algorithm to run on that UTM) is indeed a way to examine the “naturalized” problems. In fact, this idea is very similar to Orseau and Ring’s space-time embedded intelligence formalism. The big question here (which we must answer in order to be able to talk about which algorithms “perform well”) is what distribution over environments an agent will be rated against.
In this context counterfactuals are simulations of the next turns of M resulting from the possible side effects of its current UTM.
I’m not exactly sure how you would formalize this. Say you have a machine M implemented by a UTM which has side effects on the environment. M is doing internal predictions but has no outputs. There’s this thing you could do which is predict what would happen from running M (given your uncertainty about how the side effects work), but that’s not a counterfactual, that’s a prediction: constructing a counterfactual would require considering different possible computations that M could execute. (There are easy ways to cash out this sort of counterfactual using CDT or EDT, but you run into the usual logical counterfactual problems if you try to construct these sorts of counterfactuals using UDT, as far as I can tell.)
This reduces the naturalized induction problem to the tiling/consistent reflection problem; the agent must choose which agent it wants to be in the next turn(s) through side effects that can change its future implementation.
Yeah, once you figure out which distribution over environments to score against and how to formalize your counterfactuals, the problem reduces to “pick the action with the best future side effects”, which throws you directly up against the Vingean reflection problem in any environment where your capabilities include building something smarter than you :-)
Thanks for the comments, this is an interesting line of reasoning :-)
An AIXI that takes no actions is just a Solomonoff inductor, and this might give you some intuition for why, if you embed AIXI-without-actions into a UTM with side effects, you won’t end up with anything approaching good behavior. In each turn, it will run all environment hypotheses and update its distribution to be consistent with observation—and then do nothing. It won’t be able to “learn” how to manage the “side effects”; AIXI is simply not an algorithm that attempts to do any such thing.
You’re correct, though, that examining a setup where a UTM has side effects (and picking an algorithm to run on that UTM) is indeed a way to examine the “naturalized” problems. In fact, this idea is very similar to Orseau and Ring’s space-time embedded intelligence formalism. The big question here (which we must answer in order to be able to talk about which algorithms “perform well”) is what distribution over environments an agent will be rated against.
I’m not exactly sure how you would formalize this. Say you have a machine M implemented by a UTM which has side effects on the environment. M is doing internal predictions but has no outputs. There’s this thing you could do which is predict what would happen from running M (given your uncertainty about how the side effects work), but that’s not a counterfactual, that’s a prediction: constructing a counterfactual would require considering different possible computations that M could execute. (There are easy ways to cash out this sort of counterfactual using CDT or EDT, but you run into the usual logical counterfactual problems if you try to construct these sorts of counterfactuals using UDT, as far as I can tell.)
Yeah, once you figure out which distribution over environments to score against and how to formalize your counterfactuals, the problem reduces to “pick the action with the best future side effects”, which throws you directly up against the Vingean reflection problem in any environment where your capabilities include building something smarter than you :-)