I smuggled in by hypothesis the claim that specifying the part of the model “A’s outputs are applied as voltages on W2” takes 10,000 bits.
AIXI-like algorithms don’t need to explicitly model any particular features of the world. It doesn’t matter if the wire uses a google-plex of bits, it’s completely unrelated to the bit-complexity cost of AIXI’s estimator programs.
Model 1 and Model 2 are both perfect predictors.
Both cannot simultaneously be perfect predictors as they completely disagree. Model 1 is correct when the output wire W2 is intact, Model 2 is correct only when the output wire W2 is cut or removed.
AIXI will always be trying to send actions down the W2 wire and it will quickly realize whether the wire is intact or not, converging on 1-type models or 2-type models, not both. And it will converge on the correct model type.
AIXI-like algorithms don’t need to explicitly model any particular features of the world. It doesn’t matter if the wire uses a google-plex of bits, it’s completely unrelated to the bit-complexity cost of AIXI’s estimator programs.
Both cannot simultaneously be perfect predictors as they completely disagree. Model 1 is correct when the output wire W2 is intact, Model 2 is correct only when the output wire W2 is cut or removed.
AIXI will always be trying to send actions down the W2 wire and it will quickly realize whether the wire is intact or not, converging on 1-type models or 2-type models, not both. And it will converge on the correct model type.