I believe your concern comes from the fact that at the time the program has to assign a probability to ϕsA(n), it has not only deduced the truth of ϕsn but it also earlier guessed at the truth value of ϕsn. When it guesses here, it loses some probability mass, but it can lose some of that probability mass in a way that is correlated to the answer it gave to tn. This way, it can still give the correct probability on ϕsA(n).
Here is my fix: Instead of L consider the case where we are trying to guess only sentences of the form ϕsA(n) and tn for some n. Meaning we modify L to reject any sentence not of that form. Both of these sub sequences are indistinguishable from coin flips with fixed probabilities. In this case, SIA will not get the correct probabilities on both subsequences, because it has an incentive to make its answers to ϕsA(n) match its answers to tn (not match when ϕsn is false), and any program that does not make them match will be trumped by one that will.
This does not mean that we have this property when we consider all of L, but the code in no way depends on E, and I see no reason to think that it will work for L, but not the modified L.
Now we are only considering two sentence schemas, ϕsA(n) and tn:=ϕsn↔ϕnA(n). (Also, ignore the (rare) case where n is an Ackermann number.)
I’ll call the Benford probability B:=1log10, and (as before) the tn probability P:=B2+(1−B)2.
At the time when SIA considers tn, we assume it does not give its sampled programs enough time to solve either ϕsn or ϕnA(n). (This assumption is part of the problem setup; it seems likely that cases like this cannot be ruled out by a simple rule, though.) The best thing the programs can do is treat the tn like coin flips with probability P.
At the time when ϕsA(n) is considered, the program has enough time to compute ϕsn (again as part of the problem setup). It can also remember what guess it made on tn. The best thing it can do now is to logically combine those to determine ϕsA(n). This causes it to not treat ϕsA(n) like a random coin. For cases where ϕsn=true, the population of sampled programs will guess ϕsA(n)=true with frequency approaching P. For cases where ϕsn=false, the frequency will be 1−P.
This behavior is the optimal response to the problem as given to SIA, but is suboptimal for what we actually wanted. The Bayes score of SIA on the sub-sequence consisting of only the ϕsA(n) is suboptimal. It will average out to probability B, but continue to be higher and lower for individual cases, without actually predicting those cases more effectively; SIA is acting like it thinks there is a correlation between ϕsn and ϕnA(n) when there is none. (This is especially odd considering that SIA isn’t even being asked to predict ϕsn in general, in this case!)
This is still not a proof, but it looks like it could be turned into one.
I’m hoping writing it out like this unpacks some of the mutual assumptions Scott and I share as a result of talking about this.
I believe your concern comes from the fact that at the time the program has to assign a probability to ϕsA(n), it has not only deduced the truth of ϕsn but it also earlier guessed at the truth value of ϕsn. When it guesses here, it loses some probability mass, but it can lose some of that probability mass in a way that is correlated to the answer it gave to tn. This way, it can still give the correct probability on ϕsA(n).
Here is my fix: Instead of L consider the case where we are trying to guess only sentences of the form ϕsA(n) and tn for some n. Meaning we modify L to reject any sentence not of that form. Both of these sub sequences are indistinguishable from coin flips with fixed probabilities. In this case, SIA will not get the correct probabilities on both subsequences, because it has an incentive to make its answers to ϕsA(n) match its answers to tn (not match when ϕsn is false), and any program that does not make them match will be trumped by one that will.
This does not mean that we have this property when we consider all of L, but the code in no way depends on E, and I see no reason to think that it will work for L, but not the modified L.
I agree, this version works.
To walk through it in a bit more detail:
Now we are only considering two sentence schemas, ϕsA(n) and tn:=ϕsn↔ϕnA(n). (Also, ignore the (rare) case where n is an Ackermann number.)
I’ll call the Benford probability B:=1log10, and (as before) the tn probability P:=B2+(1−B)2.
At the time when SIA considers tn, we assume it does not give its sampled programs enough time to solve either ϕsn or ϕnA(n). (This assumption is part of the problem setup; it seems likely that cases like this cannot be ruled out by a simple rule, though.) The best thing the programs can do is treat the tn like coin flips with probability P.
At the time when ϕsA(n) is considered, the program has enough time to compute ϕsn (again as part of the problem setup). It can also remember what guess it made on tn. The best thing it can do now is to logically combine those to determine ϕsA(n). This causes it to not treat ϕsA(n) like a random coin. For cases where ϕsn=true, the population of sampled programs will guess ϕsA(n)=true with frequency approaching P. For cases where ϕsn=false, the frequency will be 1−P.
This behavior is the optimal response to the problem as given to SIA, but is suboptimal for what we actually wanted. The Bayes score of SIA on the sub-sequence consisting of only the ϕsA(n) is suboptimal. It will average out to probability B, but continue to be higher and lower for individual cases, without actually predicting those cases more effectively; SIA is acting like it thinks there is a correlation between ϕsn and ϕnA(n) when there is none. (This is especially odd considering that SIA isn’t even being asked to predict ϕsn in general, in this case!)
This is still not a proof, but it looks like it could be turned into one.
I’m hoping writing it out like this unpacks some of the mutual assumptions Scott and I share as a result of talking about this.