Now we are only considering two sentence schemas, ϕsA(n) and tn:=ϕsn↔ϕnA(n). (Also, ignore the (rare) case where n is an Ackermann number.)
I’ll call the Benford probability B:=1log10, and (as before) the tn probability P:=B2+(1−B)2.
At the time when SIA considers tn, we assume it does not give its sampled programs enough time to solve either ϕsn or ϕnA(n). (This assumption is part of the problem setup; it seems likely that cases like this cannot be ruled out by a simple rule, though.) The best thing the programs can do is treat the tn like coin flips with probability P.
At the time when ϕsA(n) is considered, the program has enough time to compute ϕsn (again as part of the problem setup). It can also remember what guess it made on tn. The best thing it can do now is to logically combine those to determine ϕsA(n). This causes it to not treat ϕsA(n) like a random coin. For cases where ϕsn=true, the population of sampled programs will guess ϕsA(n)=true with frequency approaching P. For cases where ϕsn=false, the frequency will be 1−P.
This behavior is the optimal response to the problem as given to SIA, but is suboptimal for what we actually wanted. The Bayes score of SIA on the sub-sequence consisting of only the ϕsA(n) is suboptimal. It will average out to probability B, but continue to be higher and lower for individual cases, without actually predicting those cases more effectively; SIA is acting like it thinks there is a correlation between ϕsn and ϕnA(n) when there is none. (This is especially odd considering that SIA isn’t even being asked to predict ϕsn in general, in this case!)
This is still not a proof, but it looks like it could be turned into one.
I’m hoping writing it out like this unpacks some of the mutual assumptions Scott and I share as a result of talking about this.
I agree, this version works.
To walk through it in a bit more detail:
Now we are only considering two sentence schemas, ϕsA(n) and tn:=ϕsn↔ϕnA(n). (Also, ignore the (rare) case where n is an Ackermann number.)
I’ll call the Benford probability B:=1log10, and (as before) the tn probability P:=B2+(1−B)2.
At the time when SIA considers tn, we assume it does not give its sampled programs enough time to solve either ϕsn or ϕnA(n). (This assumption is part of the problem setup; it seems likely that cases like this cannot be ruled out by a simple rule, though.) The best thing the programs can do is treat the tn like coin flips with probability P.
At the time when ϕsA(n) is considered, the program has enough time to compute ϕsn (again as part of the problem setup). It can also remember what guess it made on tn. The best thing it can do now is to logically combine those to determine ϕsA(n). This causes it to not treat ϕsA(n) like a random coin. For cases where ϕsn=true, the population of sampled programs will guess ϕsA(n)=true with frequency approaching P. For cases where ϕsn=false, the frequency will be 1−P.
This behavior is the optimal response to the problem as given to SIA, but is suboptimal for what we actually wanted. The Bayes score of SIA on the sub-sequence consisting of only the ϕsA(n) is suboptimal. It will average out to probability B, but continue to be higher and lower for individual cases, without actually predicting those cases more effectively; SIA is acting like it thinks there is a correlation between ϕsn and ϕnA(n) when there is none. (This is especially odd considering that SIA isn’t even being asked to predict ϕsn in general, in this case!)
This is still not a proof, but it looks like it could be turned into one.
I’m hoping writing it out like this unpacks some of the mutual assumptions Scott and I share as a result of talking about this.