The hypotheses after the modification are supposed to have knowledge that they’re in training, for example because they have enough compute to find themselves in the multiverse. Among hypotheses with equal behavior in training, we select the simpler one. We want this to be the one that disregards that knowledge. If the hypothesis has form “Return whatever maximizes property _ of the multiverse”, the simpler one uses that knowledge. It is this form of hypothesis which I suggest to remove by inspection.
As far as I understand, whether minimal circuits are daemon-free is precisely the question whether direct descriptions of the input distribution are simpler than hypotheses of form “Return whatever maximizes property _ of the multiverse”.
The hypotheses after the modification are supposed to have knowledge that they’re in training, for example because they have enough compute to find themselves in the multiverse. Among hypotheses with equal behavior in training, we select the simpler one. We want this to be the one that disregards that knowledge. If the hypothesis has form “Return whatever maximizes property _ of the multiverse”, the simpler one uses that knowledge. It is this form of hypothesis which I suggest to remove by inspection.
Ok, that should work assuming something analogous to Paul’s hypothesis about minimal circuits being daemon-free.
As far as I understand, whether minimal circuits are daemon-free is precisely the question whether direct descriptions of the input distribution are simpler than hypotheses of form “Return whatever maximizes property _ of the multiverse”.