DeepNash was evaluated against top human players over the course of two weeks in the beginning of April 2022, resulting in 50 ranked matches. Of these matches, 42 (i.e. 84%) were won by DeepNash
Given the game has imperfect information, it’s not clear you should expect to be able to win much more than that. (I haven’t played much Stratego but I would have guessed that a reasonably strong player going for high-variance strategies could beat God 10-20% of the time.)
Hmm, so is this one of those games where a novice can beat an expert a significant fraction of time, because of the imperfect information? Is there a theoretical upper limit for percent wins for the perfect player vs best human player?
I am a Stratego player, and the answer is no, not really. In fact, DeepNash won 30⁄30 (100%) against Probe, which won Computer Stratego World Championship three times in the past.
But I think Paul is not wrong. While Stratego is mostly skill not luck (it’s not like you are drawing cards and you need good cards, there is zero randomness, just hidden information), there is a bit of rock-paper-scissors involved. Novices can’t beat experts, but I do think experts can beat God.
My main point was that you quoted 42% when the win rate was 84%.
Even if there’s no cap on winrate, I don’t think you should necessarily expect to “self-improve to beat the best human players every time.” Even in a game of perfect information I think there are 2+ orders of magnitude of scale (or equivalent algorithmic progress) where you will beat human players 60-99% of the time.
So I think it could make sense to be surprised “Isn’t Stratego easy enough that AI should be crushing humans?” but it would not make sense to say “Given that AI is able to beat humans at Stratego, why is it not able to crush them every time?”
(Note that humans could potentially do better if they knew they were playing against a much stronger opponent and trying to play for a lucky win.)
It doesn’t have to be that a novice has a chance against an expert, in order for there to be declining returns to further expertise. As an example, rock-scissors-paper-nothing (rock beats scissors and nothing, scissors beats paper and nothing, paper beats rock and nothing) has the “expert” strategy of “randomize, but never choose “nothing”), which beats the incredible-novice who chooses “nothing” most of the time. Further, there is expertise in noticing patterns among your opponents, while obscuring the patterns that such prediction brings to your plays. But very good AI can probably do better than 50% against human experts, without getting anywhere near 100%.
84% for Stratego is higher than I’d have predicted.
From where did you take that quote? I find in the paper:
DeepNash was evaluated against top human players over the course of two weeks in the beginning of April 2022, resulting in 50 ranked matches. Of these matches, 42 (i.e. 84%) were won by DeepNash. In the Classic Stratego challenge ranking 2022 this corresponds to a rating of 1799, which resulted in a 3rd place for DeepNash of all ranked Gravon Stratego players
I expect that you mistakenly added the % after 42.
I am confused why it stopped at the human level:
instead of self-improving to beat even the best human player every time.
The quote is:
Given the game has imperfect information, it’s not clear you should expect to be able to win much more than that. (I haven’t played much Stratego but I would have guessed that a reasonably strong player going for high-variance strategies could beat God 10-20% of the time.)
Hmm, so is this one of those games where a novice can beat an expert a significant fraction of time, because of the imperfect information? Is there a theoretical upper limit for percent wins for the perfect player vs best human player?
I am a Stratego player, and the answer is no, not really. In fact, DeepNash won 30⁄30 (100%) against Probe, which won Computer Stratego World Championship three times in the past.
But I think Paul is not wrong. While Stratego is mostly skill not luck (it’s not like you are drawing cards and you need good cards, there is zero randomness, just hidden information), there is a bit of rock-paper-scissors involved. Novices can’t beat experts, but I do think experts can beat God.
My main point was that you quoted 42% when the win rate was 84%.
Even if there’s no cap on winrate, I don’t think you should necessarily expect to “self-improve to beat the best human players every time.” Even in a game of perfect information I think there are 2+ orders of magnitude of scale (or equivalent algorithmic progress) where you will beat human players 60-99% of the time.
So I think it could make sense to be surprised “Isn’t Stratego easy enough that AI should be crushing humans?” but it would not make sense to say “Given that AI is able to beat humans at Stratego, why is it not able to crush them every time?”
(Note that humans could potentially do better if they knew they were playing against a much stronger opponent and trying to play for a lucky win.)
It doesn’t have to be that a novice has a chance against an expert, in order for there to be declining returns to further expertise. As an example, rock-scissors-paper-nothing (rock beats scissors and nothing, scissors beats paper and nothing, paper beats rock and nothing) has the “expert” strategy of “randomize, but never choose “nothing”), which beats the incredible-novice who chooses “nothing” most of the time. Further, there is expertise in noticing patterns among your opponents, while obscuring the patterns that such prediction brings to your plays. But very good AI can probably do better than 50% against human experts, without getting anywhere near 100%.
84% for Stratego is higher than I’d have predicted.
From where did you take that quote? I find in the paper:
I expect that you mistakenly added the % after 42.
I quoted an article mentioning it without checking the source, oops:
https://www.marktechpost.com/2022/07/09/deepmind-ai-researchers-introduce-deepnash-an-autonomous-agent-trained-with-model-free-multiagent-reinforcement-learning-that-learns-to-play-the-game-of-stratego-at-expert-level/