Imagine if instead of Omega you were on a futuristic game show. As you go onto the show, you enter a future-science brain scanner that scans your brain. After scanning, the game show hosts secretly put the money into the various boxes behind stage.
You now get up on stage and choose whether to one or two box.
Keep in mind that before you got up on the show, 100 other contestants played the game that day. All of the two-boxers ended up with less money than the one-boxers. As an avid watcher of the show, you clearly remember that in every previous broadcast (one a day for ten years) the one-boxers did better than the two-boxers.
Can you honestly tell me that the superior move here is two-boxing? Where does the evidence point? If one strategy clearly and consistently produces inferior results compared to another strategy, that should be all we need to discard it as inferior.
If one strategy clearly and consistently produc[ed] inferior results compared to another strategy, that should be all we need to discard it as inferior.
I disagree. Just because Rock lost every time it was played doesn’t mean that it’s inferior to Paper or Scissors, to use a trivial example.
Playing your double: Evidence that your opponent will not use rock is evidence that you should not use paper. If you don’t use rock, and don’t use paper, then you must use scissors and tie with your opponent who followed the same reasoning.
Updating on evidence that rock doesn’t win when it is used means rock wins.
EDIT: consider what you would believe if you tried to call a coin a large number of times and were always right. Then consider what you would believe if you were always wrong.
For rock to lose consistently means that somebody isn’t updating properly, or is using a failing strategy, or a winning strategy.
For example, if I tell my opponent “I’m going to play only paper”, and I do, rock will always lose when played. That strategy can still win over several moves, if I am not transparent; all I have to do is correctly predict that my opponent will predict that the current round is the one in which I change my strategy.
If they believe (through expressed preferences, assuming that they independently try to win each round) that rock will lose against me, rock will win against them.
“I literally just … edit your post … and then say … you said … what … you didn’t say.”
I can play the selective quotation game too. It doesn’t make it valid.
What I originally wrote was “Just because Rock lost every time it was played doesn’t mean that it’s inferior to Paper or Scissors”
What you misquoted was the statement Updating on evidence that rock doesn’t win when it is used means rock wins. (emphasis on added context)
That’s standard behavior in the simple simultaneous strategy games; figure out what your opponent’s move is and play the maneuver which counters it. If you are transparent enough that I can correctly determine that you will play the maneuver that would have won the most prior rounds, I can beat you in the long run. The correct update to seeing RPS is to update the model of your opponent’s strategy, and base the winning percentages off what you believe your opponent’s strategy is.
That’s why I can win with “I always throw rock”, stated plainly at the start. Most people (if they did the reasoning), would have very a very low prior that I was telling the truth, and the first round ties. The next round I typically win, with a comment of “I see what you did there”.
What are your priors that my actual strategy, given that I had said I would always throw rock and threw rock the first time, would fall into either category: “Throw rock for N rounds and then change” or “Throw rock until it loses N times (in a row) and then change”? (Keep in mind conservation of probability: The sum of all N across each possible strategy must total 1)
If you don’t ascribe a significant chance of me telling the truth, there is some N at which you stop throwing paper, even while it is working. The fact that throwing scissors would have lost you every prior match is not strong evidence that it will lose the next one.
“I can play the selective quotation game too. It doesn’t make it valid.”
Except I didn’t break things up with ellipses to make things up like you just did. Nice false equivocation.
Either rock always wins or it doesn’t. I was pointing out the lack of consistency in what you said.
If you are proposing that rock does actually win, then that is completely different that what I setup in my scenario. A more accurate representation would be if paper was ALWAYS thrown by your opponents.
Then you come along and say that “no rock will actually win guys! Look at my theory that says so” before you get up and predictably lose. Just like everyone before you.
His post was a blatant misrepresentation, a joke of an example.
My post took the exact words posted in order, showing a direct contradiction in his scenario. He then edited the quote that I had and removed it.
Beforehand it said that Rock always lost. After his edit that line was entirely removed, and then he said that I misquoted him. Sure, of course it looks like much more of a misquote after an edit. But I think that is highly deceptive, so I said so.
Beforehand he said that Rock always lost, and then said that Rock didn’t actually lose. If his second statement was correct, then his first statement would be trivially false.
Let’s dig further.
Original line: “Just because Rock lost every time it was played doesn’t mean that it’s inferior to Paper or Scissors”
My quote: “”Rock lost every time it was played ”
Showing that he was talking about a scenario where Rock lost every time it was played. I highlighted the relevant part. The part about determining inferiority is irrelevant to the scenario.
Second Original Quote:
“Updating on evidence that rock doesn’t win when it is used means rock wins.”
Second My Quote:
“rock doesn’t win when it is used means rock wins.”
He is outlining a situation in which he thinks that Rock does win, even though the scenario contradicts that.
Comparing:
“I literally just … edit your post … and then say … you said … what … you didn’t say.”
Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)
“Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)”
All of this is irrelevant.
So I will admit I am frustrated here. I don’t think that your analogy is even close to equivalent,
I think you are thinking about this in the wrong way.
So let’s say you were an adviser advising one of the players on what to choose. Every time you told him to throw rock over the last million games, he lost. Yet every time you told him to throw Scissors he won. Now you have thought very much about this problem, and all of your theorizing keeps telling you that your player should play Rock (the theorycrafting has told you this for quite a while now).
At what point is this evidence that you are reasoning incorrectly about the problem, and really you should just tell the player to play scissors? Would you actually continue to tell him to throw Rock if you were losing $1 every time the player you advised lost?
Now if this advising situation had been a game that you played with your strategy and I had separately played with my strategy, who would have won?
Suppose my strategy made an equal (enough) number of suggestions for each option over the last 1m trials, while the opponent played paper every time. My current strategy suggests that playing rock on the next game is the best move. The opponent’s move is defined to not be dependent on my prior moves (because otherwise things get too complicated for brief analysis)
There are two major competing posterior strategies at this point: “Scissors for the first 1M trials, then rock” and “Scissors for the first 1M trials” It is not possible for my prior probability for “Scissors for the first N, then rock” to be higher than my probability for “Scissors forever” for an infinite number of N, so there is some number of trials after which any legal prior probability distribution favors “Scissors forever”, if it loses only a finite number of times.
At this point I’m going to try to save face by pointing out that for each N, there is a legal set of prior probabilities of the optimum strategy to suggest each option an equal number of time. They would have to be arranged such that “My opponent will play paper X times then something else” is more likely than “My opponent will play paper X times then play paper again” for 2⁄3 of X from 0 to N. Given that “My opponent will always play paper” is a superset of the latter, and each time I am wrong I must eliminate a probability space larger than it from consideration, and that I have been wrong 700k times, I obviously must have assigned less than ~1e-6 initial probability to all estimates that my opponent will play paper 1M+1 times in a row, but higher than that to ~700k cases of supersets of “my opponent will play paper X times in a row then change” where X is less than 1M. While a legal set of priors, I think it would be clearly unreasonable in practice to fail to adapt to a strategy of strict paper within 10.
Strangely, many of the strategies which are effective against humans for best-of-seven seem to be ineffective against rational agents for long-term performance. Would it be interesting to have a RPS best-of competition between programs with and without access to their opponent’s source code, or even just between LW readers who are willing to play high-stakes RPS?
Unweighted random wins 1⁄3 of the time; nobody can do better than that versus unweighted random. The rules would have to account for that.
I saw a long time ago a website that would play RPS against you using a genetic algorithm; it had something like a 80% win rate against casual human players.
Here is another way to think about this problem.
Imagine if instead of Omega you were on a futuristic game show. As you go onto the show, you enter a future-science brain scanner that scans your brain. After scanning, the game show hosts secretly put the money into the various boxes behind stage.
You now get up on stage and choose whether to one or two box.
Keep in mind that before you got up on the show, 100 other contestants played the game that day. All of the two-boxers ended up with less money than the one-boxers. As an avid watcher of the show, you clearly remember that in every previous broadcast (one a day for ten years) the one-boxers did better than the two-boxers.
Can you honestly tell me that the superior move here is two-boxing? Where does the evidence point? If one strategy clearly and consistently produces inferior results compared to another strategy, that should be all we need to discard it as inferior.
I disagree. Just because Rock lost every time it was played doesn’t mean that it’s inferior to Paper or Scissors, to use a trivial example.
I disagree.
If rock always lost when people used it, that would be evidence against using rock.
Just like if you flip a coin 1000000 times and keep getting heads that is evidence of a coin that won’t be coming up tails anytime soon.
Playing your double: Evidence that your opponent will not use rock is evidence that you should not use paper. If you don’t use rock, and don’t use paper, then you must use scissors and tie with your opponent who followed the same reasoning.
Updating on evidence that rock doesn’t win when it is used means rock wins.
EDIT: consider what you would believe if you tried to call a coin a large number of times and were always right. Then consider what you would believe if you were always wrong.
“Rock lost every time it was played ”
“rock doesn’t win when it is used means rock wins.”
One of these things is not like the other.
Those aren’t both things that I said.
For rock to lose consistently means that somebody isn’t updating properly, or is using a failing strategy, or a winning strategy.
For example, if I tell my opponent “I’m going to play only paper”, and I do, rock will always lose when played. That strategy can still win over several moves, if I am not transparent; all I have to do is correctly predict that my opponent will predict that the current round is the one in which I change my strategy.
If they believe (through expressed preferences, assuming that they independently try to win each round) that rock will lose against me, rock will win against them.
Don’t edit your post and then say you didn’t say what you said. I literally just copy pasted what you wrote and added quotes around it.
“I literally just … edit your post … and then say … you said … what … you didn’t say.”
I can play the selective quotation game too. It doesn’t make it valid.
What I originally wrote was “Just because Rock lost every time it was played doesn’t mean that it’s inferior to Paper or Scissors”
What you misquoted was the statement Updating on evidence that rock doesn’t win when it is used means rock wins. (emphasis on added context)
That’s standard behavior in the simple simultaneous strategy games; figure out what your opponent’s move is and play the maneuver which counters it. If you are transparent enough that I can correctly determine that you will play the maneuver that would have won the most prior rounds, I can beat you in the long run. The correct update to seeing RPS is to update the model of your opponent’s strategy, and base the winning percentages off what you believe your opponent’s strategy is.
That’s why I can win with “I always throw rock”, stated plainly at the start. Most people (if they did the reasoning), would have very a very low prior that I was telling the truth, and the first round ties. The next round I typically win, with a comment of “I see what you did there”.
What are your priors that my actual strategy, given that I had said I would always throw rock and threw rock the first time, would fall into either category: “Throw rock for N rounds and then change” or “Throw rock until it loses N times (in a row) and then change”? (Keep in mind conservation of probability: The sum of all N across each possible strategy must total 1)
If you don’t ascribe a significant chance of me telling the truth, there is some N at which you stop throwing paper, even while it is working. The fact that throwing scissors would have lost you every prior match is not strong evidence that it will lose the next one.
“I can play the selective quotation game too. It doesn’t make it valid.”
Except I didn’t break things up with ellipses to make things up like you just did. Nice false equivocation.
Either rock always wins or it doesn’t. I was pointing out the lack of consistency in what you said.
If you are proposing that rock does actually win, then that is completely different that what I setup in my scenario. A more accurate representation would be if paper was ALWAYS thrown by your opponents.
Then you come along and say that “no rock will actually win guys! Look at my theory that says so” before you get up and predictably lose. Just like everyone before you.
Your quotations were of sentence fragments that did not preserve meaning. There was exaggeration for emphasis but no false equivocation.
I don’t think this is even close to accurate.
His post was a blatant misrepresentation, a joke of an example.
My post took the exact words posted in order, showing a direct contradiction in his scenario. He then edited the quote that I had and removed it.
Beforehand it said that Rock always lost. After his edit that line was entirely removed, and then he said that I misquoted him. Sure, of course it looks like much more of a misquote after an edit. But I think that is highly deceptive, so I said so.
Beforehand he said that Rock always lost, and then said that Rock didn’t actually lose. If his second statement was correct, then his first statement would be trivially false.
Let’s dig further.
Original line: “Just because Rock lost every time it was played doesn’t mean that it’s inferior to Paper or Scissors”
My quote: “”Rock lost every time it was played ”
Showing that he was talking about a scenario where Rock lost every time it was played. I highlighted the relevant part. The part about determining inferiority is irrelevant to the scenario.
Second Original Quote: “Updating on evidence that rock doesn’t win when it is used means rock wins.”
Second My Quote: “rock doesn’t win when it is used means rock wins.”
He is outlining a situation in which he thinks that Rock does win, even though the scenario contradicts that.
Comparing: “I literally just … edit your post … and then say … you said … what … you didn’t say.”
And saying it is equivalent is ludicrous.
Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)
“Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)”
All of this is irrelevant.
So I will admit I am frustrated here. I don’t think that your analogy is even close to equivalent,
I think you are thinking about this in the wrong way.
So let’s say you were an adviser advising one of the players on what to choose. Every time you told him to throw rock over the last million games, he lost. Yet every time you told him to throw Scissors he won. Now you have thought very much about this problem, and all of your theorizing keeps telling you that your player should play Rock (the theorycrafting has told you this for quite a while now).
At what point is this evidence that you are reasoning incorrectly about the problem, and really you should just tell the player to play scissors? Would you actually continue to tell him to throw Rock if you were losing $1 every time the player you advised lost?
Now if this advising situation had been a game that you played with your strategy and I had separately played with my strategy, who would have won?
Suppose my strategy made an equal (enough) number of suggestions for each option over the last 1m trials, while the opponent played paper every time. My current strategy suggests that playing rock on the next game is the best move. The opponent’s move is defined to not be dependent on my prior moves (because otherwise things get too complicated for brief analysis)
There are two major competing posterior strategies at this point: “Scissors for the first 1M trials, then rock” and “Scissors for the first 1M trials” It is not possible for my prior probability for “Scissors for the first N, then rock” to be higher than my probability for “Scissors forever” for an infinite number of N, so there is some number of trials after which any legal prior probability distribution favors “Scissors forever”, if it loses only a finite number of times.
At this point I’m going to try to save face by pointing out that for each N, there is a legal set of prior probabilities of the optimum strategy to suggest each option an equal number of time. They would have to be arranged such that “My opponent will play paper X times then something else” is more likely than “My opponent will play paper X times then play paper again” for 2⁄3 of X from 0 to N. Given that “My opponent will always play paper” is a superset of the latter, and each time I am wrong I must eliminate a probability space larger than it from consideration, and that I have been wrong 700k times, I obviously must have assigned less than ~1e-6 initial probability to all estimates that my opponent will play paper 1M+1 times in a row, but higher than that to ~700k cases of supersets of “my opponent will play paper X times in a row then change” where X is less than 1M. While a legal set of priors, I think it would be clearly unreasonable in practice to fail to adapt to a strategy of strict paper within 10.
Strangely, many of the strategies which are effective against humans for best-of-seven seem to be ineffective against rational agents for long-term performance. Would it be interesting to have a RPS best-of competition between programs with and without access to their opponent’s source code, or even just between LW readers who are willing to play high-stakes RPS?
Cool, sounds like we are converging.
I would be interested in seeing a RPS competition between programs, sounds interesting.
Unweighted random wins 1⁄3 of the time; nobody can do better than that versus unweighted random. The rules would have to account for that.
I saw a long time ago a website that would play RPS against you using a genetic algorithm; it had something like a 80% win rate against casual human players.