Both of these models give perfect predictions, but Model 2 is substantially simpler (around 10000 bits simpler, and specifying A’s control over W2′s values in 10000 bits seems quite optimistic). Therefore A will put much more probability mass on Model 2 than Model 1. In fact, Model 2 or its close variants probably receive almost all of the probability mass.
AIXI does not calculate simplicity using human heuristics. It’s quite hard to justify claims that one model is “substantially simpler”, or around “10000 bits simpler” using whatever unspecified, non-algorithmic reasoning you used to derive that gross approximation.
AIXI uses infinite computational power and searches through the space of all computable programs to find the subset which perfectly predict it’s observations to date. Then, for each such program, it computes each action it can take at the current time moment and any reward it receives in direct result. The entire process then recurses, like a min-max search algorithm, eventually returning the maximum score based on the sum future reward it receives in the particular future predicted by that particular predictor program. It then sums ALL of the potential actions over ALL of the potential futures, weighted by the inverse exponent of the length of the predictor program. For AIXI, complexity is simply program length, nothing more, nothing less. It has nothing to do with human language conditions such as your distinction between Model 2 and Model 1.
If AIXI were practical, and humans could understand the resulting predictor programs, the types of programs it would explore are best understood as the simplest form of complex universal theories of everything. AIXI would solve physics, in the grand sense. It would create programs that predict the entire universe. It would simulate the universe from the big bang to now, and it would do that an infinite number of times, until it found the universes that exactly coincide with it’s observations. The supposed ’1000 bit’ difference between your W1 and W2 are no difference at all to such a thing. It specifies complexity at a far baser and more profound level.
You also are forgetting that the core of AIXI is a move/countermove search system. After it’s first move (predictably random), it would then quickly converge on a subset of programs that perfectly predict all previous observations AND the consequences of it’s first action. The complexity of models that ignore that action are completely irrelevant if they are inaccurate. (AIXI exploits infinite computational power—it only considers perfect predictors).
I don’t quite understand; it seems like you either misunderstand badly or are being extremely uncharitable.
I smuggled in by hypothesis the claim that specifying the part of the model “A’s outputs are applied as voltages on W2” takes 10,000 bits. If you believe the world is governed by simple physical law, this seems extremely generous. This is a statement about a program specifying a certain model for the world, not about english language descriptions. I don’t know why the 10000 bit difference is ‘supposed.’
The complexity of models that ignore that action are completely irrelevant if they are inaccurate. (AIXI exploits infinite computational power—it only considers perfect predictors).
I smuggled in by hypothesis the claim that specifying the part of the model “A’s outputs are applied as voltages on W2” takes 10,000 bits.
AIXI-like algorithms don’t need to explicitly model any particular features of the world. It doesn’t matter if the wire uses a google-plex of bits, it’s completely unrelated to the bit-complexity cost of AIXI’s estimator programs.
Model 1 and Model 2 are both perfect predictors.
Both cannot simultaneously be perfect predictors as they completely disagree. Model 1 is correct when the output wire W2 is intact, Model 2 is correct only when the output wire W2 is cut or removed.
AIXI will always be trying to send actions down the W2 wire and it will quickly realize whether the wire is intact or not, converging on 1-type models or 2-type models, not both. And it will converge on the correct model type.
AIXI does not calculate simplicity using human heuristics. It’s quite hard to justify claims that one model is “substantially simpler”, or around “10000 bits simpler” using whatever unspecified, non-algorithmic reasoning you used to derive that gross approximation.
AIXI uses infinite computational power and searches through the space of all computable programs to find the subset which perfectly predict it’s observations to date. Then, for each such program, it computes each action it can take at the current time moment and any reward it receives in direct result. The entire process then recurses, like a min-max search algorithm, eventually returning the maximum score based on the sum future reward it receives in the particular future predicted by that particular predictor program. It then sums ALL of the potential actions over ALL of the potential futures, weighted by the inverse exponent of the length of the predictor program. For AIXI, complexity is simply program length, nothing more, nothing less. It has nothing to do with human language conditions such as your distinction between Model 2 and Model 1.
If AIXI were practical, and humans could understand the resulting predictor programs, the types of programs it would explore are best understood as the simplest form of complex universal theories of everything. AIXI would solve physics, in the grand sense. It would create programs that predict the entire universe. It would simulate the universe from the big bang to now, and it would do that an infinite number of times, until it found the universes that exactly coincide with it’s observations. The supposed ’1000 bit’ difference between your W1 and W2 are no difference at all to such a thing. It specifies complexity at a far baser and more profound level.
You also are forgetting that the core of AIXI is a move/countermove search system. After it’s first move (predictably random), it would then quickly converge on a subset of programs that perfectly predict all previous observations AND the consequences of it’s first action. The complexity of models that ignore that action are completely irrelevant if they are inaccurate. (AIXI exploits infinite computational power—it only considers perfect predictors).
I don’t quite understand; it seems like you either misunderstand badly or are being extremely uncharitable.
I smuggled in by hypothesis the claim that specifying the part of the model “A’s outputs are applied as voltages on W2” takes 10,000 bits. If you believe the world is governed by simple physical law, this seems extremely generous. This is a statement about a program specifying a certain model for the world, not about english language descriptions. I don’t know why the 10000 bit difference is ‘supposed.’
Model 1 and Model 2 are both perfect predictors.
AIXI-like algorithms don’t need to explicitly model any particular features of the world. It doesn’t matter if the wire uses a google-plex of bits, it’s completely unrelated to the bit-complexity cost of AIXI’s estimator programs.
Both cannot simultaneously be perfect predictors as they completely disagree. Model 1 is correct when the output wire W2 is intact, Model 2 is correct only when the output wire W2 is cut or removed.
AIXI will always be trying to send actions down the W2 wire and it will quickly realize whether the wire is intact or not, converging on 1-type models or 2-type models, not both. And it will converge on the correct model type.