Suppose my strategy made an equal (enough) number of suggestions for each option over the last 1m trials, while the opponent played paper every time. My current strategy suggests that playing rock on the next game is the best move. The opponent’s move is defined to not be dependent on my prior moves (because otherwise things get too complicated for brief analysis)
There are two major competing posterior strategies at this point: “Scissors for the first 1M trials, then rock” and “Scissors for the first 1M trials” It is not possible for my prior probability for “Scissors for the first N, then rock” to be higher than my probability for “Scissors forever” for an infinite number of N, so there is some number of trials after which any legal prior probability distribution favors “Scissors forever”, if it loses only a finite number of times.
At this point I’m going to try to save face by pointing out that for each N, there is a legal set of prior probabilities of the optimum strategy to suggest each option an equal number of time. They would have to be arranged such that “My opponent will play paper X times then something else” is more likely than “My opponent will play paper X times then play paper again” for 2⁄3 of X from 0 to N. Given that “My opponent will always play paper” is a superset of the latter, and each time I am wrong I must eliminate a probability space larger than it from consideration, and that I have been wrong 700k times, I obviously must have assigned less than ~1e-6 initial probability to all estimates that my opponent will play paper 1M+1 times in a row, but higher than that to ~700k cases of supersets of “my opponent will play paper X times in a row then change” where X is less than 1M. While a legal set of priors, I think it would be clearly unreasonable in practice to fail to adapt to a strategy of strict paper within 10.
Strangely, many of the strategies which are effective against humans for best-of-seven seem to be ineffective against rational agents for long-term performance. Would it be interesting to have a RPS best-of competition between programs with and without access to their opponent’s source code, or even just between LW readers who are willing to play high-stakes RPS?
Unweighted random wins 1⁄3 of the time; nobody can do better than that versus unweighted random. The rules would have to account for that.
I saw a long time ago a website that would play RPS against you using a genetic algorithm; it had something like a 80% win rate against casual human players.
Suppose my strategy made an equal (enough) number of suggestions for each option over the last 1m trials, while the opponent played paper every time. My current strategy suggests that playing rock on the next game is the best move. The opponent’s move is defined to not be dependent on my prior moves (because otherwise things get too complicated for brief analysis)
There are two major competing posterior strategies at this point: “Scissors for the first 1M trials, then rock” and “Scissors for the first 1M trials” It is not possible for my prior probability for “Scissors for the first N, then rock” to be higher than my probability for “Scissors forever” for an infinite number of N, so there is some number of trials after which any legal prior probability distribution favors “Scissors forever”, if it loses only a finite number of times.
At this point I’m going to try to save face by pointing out that for each N, there is a legal set of prior probabilities of the optimum strategy to suggest each option an equal number of time. They would have to be arranged such that “My opponent will play paper X times then something else” is more likely than “My opponent will play paper X times then play paper again” for 2⁄3 of X from 0 to N. Given that “My opponent will always play paper” is a superset of the latter, and each time I am wrong I must eliminate a probability space larger than it from consideration, and that I have been wrong 700k times, I obviously must have assigned less than ~1e-6 initial probability to all estimates that my opponent will play paper 1M+1 times in a row, but higher than that to ~700k cases of supersets of “my opponent will play paper X times in a row then change” where X is less than 1M. While a legal set of priors, I think it would be clearly unreasonable in practice to fail to adapt to a strategy of strict paper within 10.
Strangely, many of the strategies which are effective against humans for best-of-seven seem to be ineffective against rational agents for long-term performance. Would it be interesting to have a RPS best-of competition between programs with and without access to their opponent’s source code, or even just between LW readers who are willing to play high-stakes RPS?
Cool, sounds like we are converging.
I would be interested in seeing a RPS competition between programs, sounds interesting.
Unweighted random wins 1⁄3 of the time; nobody can do better than that versus unweighted random. The rules would have to account for that.
I saw a long time ago a website that would play RPS against you using a genetic algorithm; it had something like a 80% win rate against casual human players.