Their main reported result is a bit weird: allegedly players aren’t more likely to make suboptimal moves in the online tournament, but when they do their suboptimal moves are somewhat worse.
Looks like a ceiling effect: a large fraction of turns just have easy or obvious movies for a player, which everyone is more than capable of solving near-perfectly (they hit the ceiling) so bad conditions don’t much affect blunders (because the conditions aren’t bad enough to pull an appreciable number of moves/players down below the ceiling to start making huge blunders), but the bad conditions do still affect the hard moves, and increase the errors in those.
(Imagine drawing a curve with a vertical line at the minimum skill necessary to compete in these. Everything to the left of it is an ‘easy’ move and all players solve it, while to the right, they are ‘hard’ moves where players increasingly likely make more expensive mistakes. Bad conditions move the curve diagonally up-right: the vertical line remains the same since the players don’t change, and the number of moves which flip from ‘easy’ to ‘hard’ changes by a relatively small %, as only a few moves cross the line, but all the moves to the right of it become harder and the mistakes increasingly expensive.)
“Assessing Human Error Against a Benchmark of Perfection”, Anderson et al 2016, indicates that human GMs match the chess engine’s predicted best move about half the time. This suggests that a lot of moves are ‘solved’ in the sense that either the move is very obvious (the opening book, forced moves), or the baseline of competency at GM level easily handles them—leaving only the other half of hard moves as critical moves which contribute to victory or defeat. Table A.1 seems to imply that ~55% of moves are classified as errors (15k/27k), so seems similar.
The paper does attempt to adjust for this with a complexity metric although I suspect this doesn’t work perfectly as it seems to be a linear adjustment with number of nodes used by the engine to calculate the optimal move.
I have a concern that the paper is comparing tournament play (offline) to match play with 4 games per match (online). In match play, especially with few games, a player who is behind needs to force the game and the player in the lead can play more conservatively. Tournaments have their own incentives but overall I would expect short match play to cause bigger errors from engine optimal play as losing players try to force a win in naturally drawing situations.
The calculated effect size is >200 ELO points which suggests to me that something is amiss.
Here’s a link to the paper whose abstract is quoted there.
Their main reported result is a bit weird: allegedly players aren’t more likely to make suboptimal moves in the online tournament, but when they do their suboptimal moves are somewhat worse.
Looks like a ceiling effect: a large fraction of turns just have easy or obvious movies for a player, which everyone is more than capable of solving near-perfectly (they hit the ceiling) so bad conditions don’t much affect blunders (because the conditions aren’t bad enough to pull an appreciable number of moves/players down below the ceiling to start making huge blunders), but the bad conditions do still affect the hard moves, and increase the errors in those.
(Imagine drawing a curve with a vertical line at the minimum skill necessary to compete in these. Everything to the left of it is an ‘easy’ move and all players solve it, while to the right, they are ‘hard’ moves where players increasingly likely make more expensive mistakes. Bad conditions move the curve diagonally up-right: the vertical line remains the same since the players don’t change, and the number of moves which flip from ‘easy’ to ‘hard’ changes by a relatively small %, as only a few moves cross the line, but all the moves to the right of it become harder and the mistakes increasingly expensive.)
“Assessing Human Error Against a Benchmark of Perfection”, Anderson et al 2016, indicates that human GMs match the chess engine’s predicted best move about half the time. This suggests that a lot of moves are ‘solved’ in the sense that either the move is very obvious (the opening book, forced moves), or the baseline of competency at GM level easily handles them—leaving only the other half of hard moves as critical moves which contribute to victory or defeat. Table A.1 seems to imply that ~55% of moves are classified as errors (15k/27k), so seems similar.
The paper does attempt to adjust for this with a complexity metric although I suspect this doesn’t work perfectly as it seems to be a linear adjustment with number of nodes used by the engine to calculate the optimal move.
I have a concern that the paper is comparing tournament play (offline) to match play with 4 games per match (online). In match play, especially with few games, a player who is behind needs to force the game and the player in the lead can play more conservatively. Tournaments have their own incentives but overall I would expect short match play to cause bigger errors from engine optimal play as losing players try to force a win in naturally drawing situations.
The calculated effect size is >200 ELO points which suggests to me that something is amiss.