I feel this is a bit of an artifact of the problem statement though—I feel a more “realistic” scenario is that we are given a block box which which is either an oracle, or is correct for programs <N and returns arbitrary answers (sometimes correct, sometimes not) for longer programs.
If false oracles are wrong 50% of the time for length > N, you can run the test program X times, and have 1/2^X odds of it being a false oracle if it matches each time. Obviously, if the test program fails, you now know N is less than the test program’s length.
So, if you run the test program 4 times, odds are 1⁄16 that the oracle is fake, and 15⁄16 that it is real. For any “realistic” scenario, there’s a probability of even a regular computer program failing due to hardware glitches, so you’ll eventually approach the same confidence you’d have in a standard desktop PC calculating 2+2 :)
(This solution does somewhat generalize to odds other than 50%: once you find ANY test program that fails, you can repeat that program numerous times to estimate the oracle’s actual failure rate.)
The idea that the oracle flips a coin to determine if the test program of length >N halts is also not terribly “realistic”.
Instead, how about the following: the black box is either an oracle, or uses some sort of clever but not foolproof heuristic to guess whether or not a program halts, such that all programs for which the heuristic fails have length >N. One example of this is a Turing machine that runs a given program for BB(N) steps to see if it halts.
Then finding a test program that fails would actually be fiendishly difficult: you basically have to come up with a better heuristic than whatever the black box uses, so that you get a program which you know to halt, but the black box will be wrong about.
I feel this is a bit of an artifact of the problem statement though—I feel a more “realistic” scenario is that we are given a block box which which is either an oracle, or is correct for programs <N and returns arbitrary answers (sometimes correct, sometimes not) for longer programs.
If false oracles are wrong 50% of the time for length > N, you can run the test program X times, and have 1/2^X odds of it being a false oracle if it matches each time. Obviously, if the test program fails, you now know N is less than the test program’s length.
So, if you run the test program 4 times, odds are 1⁄16 that the oracle is fake, and 15⁄16 that it is real. For any “realistic” scenario, there’s a probability of even a regular computer program failing due to hardware glitches, so you’ll eventually approach the same confidence you’d have in a standard desktop PC calculating 2+2 :)
(This solution does somewhat generalize to odds other than 50%: once you find ANY test program that fails, you can repeat that program numerous times to estimate the oracle’s actual failure rate.)
The idea that the oracle flips a coin to determine if the test program of length >N halts is also not terribly “realistic”.
Instead, how about the following: the black box is either an oracle, or uses some sort of clever but not foolproof heuristic to guess whether or not a program halts, such that all programs for which the heuristic fails have length >N. One example of this is a Turing machine that runs a given program for BB(N) steps to see if it halts.
Then finding a test program that fails would actually be fiendishly difficult: you basically have to come up with a better heuristic than whatever the black box uses, so that you get a program which you know to halt, but the black box will be wrong about.