The idea that the oracle flips a coin to determine if the test program of length >N halts is also not terribly “realistic”.
Instead, how about the following: the black box is either an oracle, or uses some sort of clever but not foolproof heuristic to guess whether or not a program halts, such that all programs for which the heuristic fails have length >N. One example of this is a Turing machine that runs a given program for BB(N) steps to see if it halts.
Then finding a test program that fails would actually be fiendishly difficult: you basically have to come up with a better heuristic than whatever the black box uses, so that you get a program which you know to halt, but the black box will be wrong about.
The idea that the oracle flips a coin to determine if the test program of length >N halts is also not terribly “realistic”.
Instead, how about the following: the black box is either an oracle, or uses some sort of clever but not foolproof heuristic to guess whether or not a program halts, such that all programs for which the heuristic fails have length >N. One example of this is a Turing machine that runs a given program for BB(N) steps to see if it halts.
Then finding a test program that fails would actually be fiendishly difficult: you basically have to come up with a better heuristic than whatever the black box uses, so that you get a program which you know to halt, but the black box will be wrong about.