If approximate AIXI A is part of the world, its state of knowledge is also part of the world, so it influences A’s output and A’s reward. Consider programs B that know world (observation/reward channel) program W (W doesn’t take any arguments and includes A as its part) and action program A (which doesn’t take any arguments, already taking W into account). Programs B take A’s hypothetical output, and output value of W (reward and input) inferred given the logical assumption that A’s output is equal to the hypothetical output. In other words, B acts as counterfactual inference module of a UDT agent with agent-program A and world-program W.
What happens if A believes B probable? It starts acting as a UDT agent, since its actual input is correctly predicted by B, so that B retains its believability, and its output is whatever B suggests among hypothetical outputs. Maybe A can be taught to become UDT then, by rewarding its decisions that coincide with UDT’s decisions.
The first paragraph is certainly the idea I was trying to communicate.
Perhaps A could be taught to behave like a UDT agent, in the sense that it could be taught to be any algorithm by such training, but it does not seem like this can ever allow it to believe in the model B?
It can’t be taught to be any algorithm, since it has to maximize reward, that part is fixed. My point was mostly unrelated to yours (but inspired by it): the set of correct (i.e. those that won’t be falsified) program-models (those more like Model 1, listening to hypothetical outputs, that is I’m stipulating that what you discuss in the post doesn’t happen or doesn’t matter) includes UDT-like agents that reason about logical dependence of reward on their decisions, not just explicit dependence schemes (see “against explicit dependence” section of this post) that suppose some kind of cartesian magic.
That is, there probably is an AIXI configuration (state of knowledge or initial training observation/reward prefix) that turns AIXI into a competent agent that gets all sorts of decision problems right: can reason about itself and cooperation with other copies, care about counterfactuals and so on. That’s an unexpected result quite different from what I previously believed (even though it doesn’t assert that this happens on its own, but I can’t really tell).
If approximate AIXI A is part of the world, its state of knowledge is also part of the world, so it influences A’s output and A’s reward. Consider programs B that know world (observation/reward channel) program W (W doesn’t take any arguments and includes A as its part) and action program A (which doesn’t take any arguments, already taking W into account). Programs B take A’s hypothetical output, and output value of W (reward and input) inferred given the logical assumption that A’s output is equal to the hypothetical output. In other words, B acts as counterfactual inference module of a UDT agent with agent-program A and world-program W.
What happens if A believes B probable? It starts acting as a UDT agent, since its actual input is correctly predicted by B, so that B retains its believability, and its output is whatever B suggests among hypothetical outputs. Maybe A can be taught to become UDT then, by rewarding its decisions that coincide with UDT’s decisions.
The first paragraph is certainly the idea I was trying to communicate.
Perhaps A could be taught to behave like a UDT agent, in the sense that it could be taught to be any algorithm by such training, but it does not seem like this can ever allow it to believe in the model B?
It can’t be taught to be any algorithm, since it has to maximize reward, that part is fixed. My point was mostly unrelated to yours (but inspired by it): the set of correct (i.e. those that won’t be falsified) program-models (those more like Model 1, listening to hypothetical outputs, that is I’m stipulating that what you discuss in the post doesn’t happen or doesn’t matter) includes UDT-like agents that reason about logical dependence of reward on their decisions, not just explicit dependence schemes (see “against explicit dependence” section of this post) that suppose some kind of cartesian magic.
That is, there probably is an AIXI configuration (state of knowledge or initial training observation/reward prefix) that turns AIXI into a competent agent that gets all sorts of decision problems right: can reason about itself and cooperation with other copies, care about counterfactuals and so on. That’s an unexpected result quite different from what I previously believed (even though it doesn’t assert that this happens on its own, but I can’t really tell).