Suppose that instead of having well-defined actions AIXI only has access to observables and its reward function. It might seem hopeless, but consider the subset of environments containing an implementation of a UTM which is evaluating T_A, a Turing machine implementing action-less AIXI, in which the implementation of the UTM has side effects in the next turn of the environment. This embeds AIXI-actions as side effects of an actual implementation of AIXI running as T_A on a UTM in the set of environments with observables matching those that the abstract AIXI-like algorithm observes. To maximize the reward function the agent must recognize its implementation embedded in the UTM in M and predict the consequences of the side effects of various choices it could make, substituting its own output for the output of the UTM in the next turn of the simulated environment (to avoid recursively simulating itself), choosing the side effects to maximize rewards.
In this context counterfactuals are simulations of the next turns of M resulting from the possible side effects of its current UTM. To be useful there must be a relation from computation-suffixes of T_A to the potential side effects of the UTM. In other words the agent must be able to cause a specific side effect as a result of state-transitions or tape operations performed causally-after it has determined which side effect will maximize rewards. This could be as straightforward as the UTM using the most-recently written bits on the tape to choose a side effect.
In the heating game, the UTM running T_A must be physically implemented as something that has a side effect corresponding to temperature, which causally effects the box of rewards, and all these causes must be predictable from the observables accessible to T_A in the UTM. Similarly, if there is an anvil suspended above a physical implementation of a UTM running T_A, the agent can avoid an inability to increase rewards in the future of environments in which the anvil is caused to drop (or escape from the UTM before dropping it).
This reduces the naturalized induction problem to the tiling/consistent reflection problem; the agent must choose which agent it wants to be in the next turn(s) through side effects that can change its future implementation.
Thanks for the comments, this is an interesting line of reasoning :-)
An AIXI that takes no actions is just a Solomonoff inductor, and this might give you some intuition for why, if you embed AIXI-without-actions into a UTM with side effects, you won’t end up with anything approaching good behavior. In each turn, it will run all environment hypotheses and update its distribution to be consistent with observation—and then do nothing. It won’t be able to “learn” how to manage the “side effects”; AIXI is simply not an algorithm that attempts to do any such thing.
You’re correct, though, that examining a setup where a UTM has side effects (and picking an algorithm to run on that UTM) is indeed a way to examine the “naturalized” problems. In fact, this idea is very similar to Orseau and Ring’s space-time embedded intelligence formalism. The big question here (which we must answer in order to be able to talk about which algorithms “perform well”) is what distribution over environments an agent will be rated against.
In this context counterfactuals are simulations of the next turns of M resulting from the possible side effects of its current UTM.
I’m not exactly sure how you would formalize this. Say you have a machine M implemented by a UTM which has side effects on the environment. M is doing internal predictions but has no outputs. There’s this thing you could do which is predict what would happen from running M (given your uncertainty about how the side effects work), but that’s not a counterfactual, that’s a prediction: constructing a counterfactual would require considering different possible computations that M could execute. (There are easy ways to cash out this sort of counterfactual using CDT or EDT, but you run into the usual logical counterfactual problems if you try to construct these sorts of counterfactuals using UDT, as far as I can tell.)
This reduces the naturalized induction problem to the tiling/consistent reflection problem; the agent must choose which agent it wants to be in the next turn(s) through side effects that can change its future implementation.
Yeah, once you figure out which distribution over environments to score against and how to formalize your counterfactuals, the problem reduces to “pick the action with the best future side effects”, which throws you directly up against the Vingean reflection problem in any environment where your capabilities include building something smarter than you :-)
Suppose that instead of having well-defined actions AIXI only has access to observables and its reward function. It might seem hopeless, but consider the subset of environments containing an implementation of a UTM which is evaluating T_A, a Turing machine implementing action-less AIXI, in which the implementation of the UTM has side effects in the next turn of the environment. This embeds AIXI-actions as side effects of an actual implementation of AIXI running as T_A on a UTM in the set of environments with observables matching those that the abstract AIXI-like algorithm observes. To maximize the reward function the agent must recognize its implementation embedded in the UTM in M and predict the consequences of the side effects of various choices it could make, substituting its own output for the output of the UTM in the next turn of the simulated environment (to avoid recursively simulating itself), choosing the side effects to maximize rewards.
In this context counterfactuals are simulations of the next turns of M resulting from the possible side effects of its current UTM. To be useful there must be a relation from computation-suffixes of T_A to the potential side effects of the UTM. In other words the agent must be able to cause a specific side effect as a result of state-transitions or tape operations performed causally-after it has determined which side effect will maximize rewards. This could be as straightforward as the UTM using the most-recently written bits on the tape to choose a side effect.
In the heating game, the UTM running T_A must be physically implemented as something that has a side effect corresponding to temperature, which causally effects the box of rewards, and all these causes must be predictable from the observables accessible to T_A in the UTM. Similarly, if there is an anvil suspended above a physical implementation of a UTM running T_A, the agent can avoid an inability to increase rewards in the future of environments in which the anvil is caused to drop (or escape from the UTM before dropping it).
This reduces the naturalized induction problem to the tiling/consistent reflection problem; the agent must choose which agent it wants to be in the next turn(s) through side effects that can change its future implementation.
Thanks for the comments, this is an interesting line of reasoning :-)
An AIXI that takes no actions is just a Solomonoff inductor, and this might give you some intuition for why, if you embed AIXI-without-actions into a UTM with side effects, you won’t end up with anything approaching good behavior. In each turn, it will run all environment hypotheses and update its distribution to be consistent with observation—and then do nothing. It won’t be able to “learn” how to manage the “side effects”; AIXI is simply not an algorithm that attempts to do any such thing.
You’re correct, though, that examining a setup where a UTM has side effects (and picking an algorithm to run on that UTM) is indeed a way to examine the “naturalized” problems. In fact, this idea is very similar to Orseau and Ring’s space-time embedded intelligence formalism. The big question here (which we must answer in order to be able to talk about which algorithms “perform well”) is what distribution over environments an agent will be rated against.
I’m not exactly sure how you would formalize this. Say you have a machine M implemented by a UTM which has side effects on the environment. M is doing internal predictions but has no outputs. There’s this thing you could do which is predict what would happen from running M (given your uncertainty about how the side effects work), but that’s not a counterfactual, that’s a prediction: constructing a counterfactual would require considering different possible computations that M could execute. (There are easy ways to cash out this sort of counterfactual using CDT or EDT, but you run into the usual logical counterfactual problems if you try to construct these sorts of counterfactuals using UDT, as far as I can tell.)
Yeah, once you figure out which distribution over environments to score against and how to formalize your counterfactuals, the problem reduces to “pick the action with the best future side effects”, which throws you directly up against the Vingean reflection problem in any environment where your capabilities include building something smarter than you :-)