“Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)”
All of this is irrelevant.
So I will admit I am frustrated here. I don’t think that your analogy is even close to equivalent,
I think you are thinking about this in the wrong way.
So let’s say you were an adviser advising one of the players on what to choose. Every time you told him to throw rock over the last million games, he lost. Yet every time you told him to throw Scissors he won. Now you have thought very much about this problem, and all of your theorizing keeps telling you that your player should play Rock (the theorycrafting has told you this for quite a while now).
At what point is this evidence that you are reasoning incorrectly about the problem, and really you should just tell the player to play scissors? Would you actually continue to tell him to throw Rock if you were losing $1 every time the player you advised lost?
Now if this advising situation had been a game that you played with your strategy and I had separately played with my strategy, who would have won?
Suppose my strategy made an equal (enough) number of suggestions for each option over the last 1m trials, while the opponent played paper every time. My current strategy suggests that playing rock on the next game is the best move. The opponent’s move is defined to not be dependent on my prior moves (because otherwise things get too complicated for brief analysis)
There are two major competing posterior strategies at this point: “Scissors for the first 1M trials, then rock” and “Scissors for the first 1M trials” It is not possible for my prior probability for “Scissors for the first N, then rock” to be higher than my probability for “Scissors forever” for an infinite number of N, so there is some number of trials after which any legal prior probability distribution favors “Scissors forever”, if it loses only a finite number of times.
At this point I’m going to try to save face by pointing out that for each N, there is a legal set of prior probabilities of the optimum strategy to suggest each option an equal number of time. They would have to be arranged such that “My opponent will play paper X times then something else” is more likely than “My opponent will play paper X times then play paper again” for 2⁄3 of X from 0 to N. Given that “My opponent will always play paper” is a superset of the latter, and each time I am wrong I must eliminate a probability space larger than it from consideration, and that I have been wrong 700k times, I obviously must have assigned less than ~1e-6 initial probability to all estimates that my opponent will play paper 1M+1 times in a row, but higher than that to ~700k cases of supersets of “my opponent will play paper X times in a row then change” where X is less than 1M. While a legal set of priors, I think it would be clearly unreasonable in practice to fail to adapt to a strategy of strict paper within 10.
Strangely, many of the strategies which are effective against humans for best-of-seven seem to be ineffective against rational agents for long-term performance. Would it be interesting to have a RPS best-of competition between programs with and without access to their opponent’s source code, or even just between LW readers who are willing to play high-stakes RPS?
Unweighted random wins 1⁄3 of the time; nobody can do better than that versus unweighted random. The rules would have to account for that.
I saw a long time ago a website that would play RPS against you using a genetic algorithm; it had something like a 80% win rate against casual human players.
“Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)”
All of this is irrelevant.
So I will admit I am frustrated here. I don’t think that your analogy is even close to equivalent,
I think you are thinking about this in the wrong way.
So let’s say you were an adviser advising one of the players on what to choose. Every time you told him to throw rock over the last million games, he lost. Yet every time you told him to throw Scissors he won. Now you have thought very much about this problem, and all of your theorizing keeps telling you that your player should play Rock (the theorycrafting has told you this for quite a while now).
At what point is this evidence that you are reasoning incorrectly about the problem, and really you should just tell the player to play scissors? Would you actually continue to tell him to throw Rock if you were losing $1 every time the player you advised lost?
Now if this advising situation had been a game that you played with your strategy and I had separately played with my strategy, who would have won?
Suppose my strategy made an equal (enough) number of suggestions for each option over the last 1m trials, while the opponent played paper every time. My current strategy suggests that playing rock on the next game is the best move. The opponent’s move is defined to not be dependent on my prior moves (because otherwise things get too complicated for brief analysis)
There are two major competing posterior strategies at this point: “Scissors for the first 1M trials, then rock” and “Scissors for the first 1M trials” It is not possible for my prior probability for “Scissors for the first N, then rock” to be higher than my probability for “Scissors forever” for an infinite number of N, so there is some number of trials after which any legal prior probability distribution favors “Scissors forever”, if it loses only a finite number of times.
At this point I’m going to try to save face by pointing out that for each N, there is a legal set of prior probabilities of the optimum strategy to suggest each option an equal number of time. They would have to be arranged such that “My opponent will play paper X times then something else” is more likely than “My opponent will play paper X times then play paper again” for 2⁄3 of X from 0 to N. Given that “My opponent will always play paper” is a superset of the latter, and each time I am wrong I must eliminate a probability space larger than it from consideration, and that I have been wrong 700k times, I obviously must have assigned less than ~1e-6 initial probability to all estimates that my opponent will play paper 1M+1 times in a row, but higher than that to ~700k cases of supersets of “my opponent will play paper X times in a row then change” where X is less than 1M. While a legal set of priors, I think it would be clearly unreasonable in practice to fail to adapt to a strategy of strict paper within 10.
Strangely, many of the strategies which are effective against humans for best-of-seven seem to be ineffective against rational agents for long-term performance. Would it be interesting to have a RPS best-of competition between programs with and without access to their opponent’s source code, or even just between LW readers who are willing to play high-stakes RPS?
Cool, sounds like we are converging.
I would be interested in seeing a RPS competition between programs, sounds interesting.
Unweighted random wins 1⁄3 of the time; nobody can do better than that versus unweighted random. The rules would have to account for that.
I saw a long time ago a website that would play RPS against you using a genetic algorithm; it had something like a 80% win rate against casual human players.