This is a followup to the concrete failure of the Solomonoff Induction inspired approach to inductive learning with a delay in feedback. This second failure is not one that we have an algorithm to deal with yet.
Let f:N→{0,1} be a function such that (for large n) f(n) can be computed in time 2n, but there is no algorithm that can predict f(n) in time less than 2n consistently better than a coin.
Let E(n)=f(k), where k is the greatest square number less than or equal to n.
Now, consider an algorithm which tries to inductively learn to assign probabilities to E(n) in polynomial time. Ideally, such an algorithm will learn to converge to probability 1/2.
What I mean by this is an algorithm which considers Turing machines which run in a given (say O(n2)) amount of time, and somehow has them compete with each other, and use the results of this competition to output probabilities of algorithms that did well.
This is a fuzzy category, but examples of such algorithms include the Solomonoff Induction Inspired Approach, the Benford Learner, as well as several other algorithms that have been discussed at MIRI, but have not made it to the forum yet.
I claim that all algorithms we have considered so far fail on this problem. The reason that they all fail, is that there is a single algorithm, G, which runs quickly, and does unboundedly better that the constant 1/2 algorithm.
G(n) looks at k, the greatest square less than or equal to n, and outputs 2⁄3 if k is of the form (3↑m3)2, and 1⁄2 otherwise.
When comparing G to the constant 1⁄2 algorithm, you only need to consider their scores on the places where they make different predictions. Note that when k is of the form (3↑m3)2, G will output 2/3 on a large number of predictions in a row, all of which will turn out to be the same bit. If that bit is shown to be a 1, there are enough predictions to dwarf any mistakes G has made by saying 2/3 in the past, causing G to appear to do much better than the constant 1⁄2 algorithm (until it is later shown that it output a 2⁄3 on a later bit which turns out to be a 0).
G does not consistently do better than the constant 1⁄2, but it does do arbitrarily better infinitely often, which stops the constant 1⁄2 from taking over the inductive algorithms.
Note that this is not an impossibility result. I am just pointing out a concrete way that the algorithms we have now are failing.
I suspect that the way we get past this problem will take advantage of the fact that the constant 1⁄2 is special in that no other algorithm can do consistently better than it. i.e., there is no algorithm M such that the difference in Bayes score between M and the constant 1⁄2 converges to infinity. (Note that it is important to say that it convenes to infinity, not just that it is unbounded. We already have an algorithm (which has not been posted yet) which can find hypotheses for which no other algorithm can cause the difference in bases score to be unbounded. The problem that such hypotheses often do not exist.)
G does not have this property, because an algorithm which does what G says, except it says 1⁄2 when the input is of the form (3↑m3)2 will do consistently better than G, since the algorithm makes the same prediction except for an infinite sequence of INDEPENDENT fair coins where G says 2⁄3.
This new failure, can also be expressed in the more complicated environment of assigning probabilities to logical sentences. We say that an algorithm passes “The Entangled Benford Test” if the limit as n goes to infinity of the probability assigned to “The first digit of 3↑⌊√n⌋3 is a 1,” at time 2^n is log10(2).
Is there any algorithm which solves overcomes this failure, and satisfies the entangled Benford test?
Second Failure of Inductive Learning with a Delay in Feedback
This is a followup to the concrete failure of the Solomonoff Induction inspired approach to inductive learning with a delay in feedback. This second failure is not one that we have an algorithm to deal with yet.
Let f:N→{0,1} be a function such that (for large n) f(n) can be computed in time 2n, but there is no algorithm that can predict f(n) in time less than 2n consistently better than a coin.
Let E(n)=f(k), where k is the greatest square number less than or equal to n.
Now, consider an algorithm which tries to inductively learn to assign probabilities to E(n) in polynomial time. Ideally, such an algorithm will learn to converge to probability 1/2.
What I mean by this is an algorithm which considers Turing machines which run in a given (say O(n2)) amount of time, and somehow has them compete with each other, and use the results of this competition to output probabilities of algorithms that did well.
This is a fuzzy category, but examples of such algorithms include the Solomonoff Induction Inspired Approach, the Benford Learner, as well as several other algorithms that have been discussed at MIRI, but have not made it to the forum yet.
I claim that all algorithms we have considered so far fail on this problem. The reason that they all fail, is that there is a single algorithm, G, which runs quickly, and does unboundedly better that the constant 1/2 algorithm.
G(n) looks at k, the greatest square less than or equal to n, and outputs 2⁄3 if k is of the form (3↑m3)2, and 1⁄2 otherwise.
When comparing G to the constant 1⁄2 algorithm, you only need to consider their scores on the places where they make different predictions. Note that when k is of the form (3↑m3)2, G will output 2/3 on a large number of predictions in a row, all of which will turn out to be the same bit. If that bit is shown to be a 1, there are enough predictions to dwarf any mistakes G has made by saying 2/3 in the past, causing G to appear to do much better than the constant 1⁄2 algorithm (until it is later shown that it output a 2⁄3 on a later bit which turns out to be a 0).
G does not consistently do better than the constant 1⁄2, but it does do arbitrarily better infinitely often, which stops the constant 1⁄2 from taking over the inductive algorithms.
Note that this is not an impossibility result. I am just pointing out a concrete way that the algorithms we have now are failing.
I suspect that the way we get past this problem will take advantage of the fact that the constant 1⁄2 is special in that no other algorithm can do consistently better than it. i.e., there is no algorithm M such that the difference in Bayes score between M and the constant 1⁄2 converges to infinity. (Note that it is important to say that it convenes to infinity, not just that it is unbounded. We already have an algorithm (which has not been posted yet) which can find hypotheses for which no other algorithm can cause the difference in bases score to be unbounded. The problem that such hypotheses often do not exist.)
G does not have this property, because an algorithm which does what G says, except it says 1⁄2 when the input is of the form (3↑m3)2 will do consistently better than G, since the algorithm makes the same prediction except for an infinite sequence of INDEPENDENT fair coins where G says 2⁄3.
This new failure, can also be expressed in the more complicated environment of assigning probabilities to logical sentences. We say that an algorithm passes “The Entangled Benford Test” if the limit as n goes to infinity of the probability assigned to “The first digit of 3↑⌊√n⌋3 is a 1,” at time 2^n is log10(2).
Is there any algorithm which solves overcomes this failure, and satisfies the entangled Benford test?