I don’t know if I understand this—do you mean that A fails to “recognize itself in a mirror” for a new and surprising reason?
As near as I can tell, you say that A would not recognize itself in what you call “the shortest explanation for A’s observations” even though this does in fact describe how A works. And since we never programmed A to want to model itself, it lacks the self-awareness to realize that changing its actions can change the outcome. (It can’t even use a theory like TDT to show that it should act as if it controls the reductionist image of itself, because that would require having an explicit self-model.)
I wouldn’t normally express this by calling Model 2 simpler than Model 1, as you did in the OP—the parent comment suggests to me that A mislabels Model 1 as Model 2 -- so maybe I still don’t get what you mean.
AIXI learns a function f : outputs → inputs, modeling the environment’s response to AIXI’s outputs.
Let y be the output of A. Then we have one function f1(y) which uses y to help model the world, and another function f2(y) which ignores y and essentially recomputes it from the environment. These two models make identical predictions when applied to the actual sequence of outputs of the algorithm, but make different predictions about counterfactuals which are essential to determining the agent’s behavior. If you are using f1, as AIXI intends to, then you do a sane thing if you try and rely on causal control. If you are using f2, as AIXI probably actually would, then you have no causal control over reality, and so go catatonic if you rely on causal control.
I don’t know if I understand this—do you mean that A fails to “recognize itself in a mirror” for a new and surprising reason?
As near as I can tell, you say that A would not recognize itself in what you call “the shortest explanation for A’s observations” even though this does in fact describe how A works. And since we never programmed A to want to model itself, it lacks the self-awareness to realize that changing its actions can change the outcome. (It can’t even use a theory like TDT to show that it should act as if it controls the reductionist image of itself, because that would require having an explicit self-model.)
I wouldn’t normally express this by calling Model 2 simpler than Model 1, as you did in the OP—the parent comment suggests to me that A mislabels Model 1 as Model 2 -- so maybe I still don’t get what you mean.
AIXI learns a function f : outputs → inputs, modeling the environment’s response to AIXI’s outputs.
Let y be the output of A. Then we have one function f1(y) which uses y to help model the world, and another function f2(y) which ignores y and essentially recomputes it from the environment. These two models make identical predictions when applied to the actual sequence of outputs of the algorithm, but make different predictions about counterfactuals which are essential to determining the agent’s behavior. If you are using f1, as AIXI intends to, then you do a sane thing if you try and rely on causal control. If you are using f2, as AIXI probably actually would, then you have no causal control over reality, and so go catatonic if you rely on causal control.
I’ll try and make this a little more clear.