Yes, I’m not so sure either about the stockfish-pawns point.
In Michael Redmond’s AlphaGo vs AlphaGo series on YouTube, he often finds the winning AI carelessly loses points in the endgame. It might have a lead of 1.5 or 2.5 points, 20 moves before the game ends; but by the time the game ends, has played enough suboptimal moves to make itself win by 0.5 - the smallest possible margin.
It never causes itself to lose with these lazy moves; only reduces its margin of victory. Redmond theorizes, and I agree, that this is because the objective is to win, not maximize point differential, and at such a late stage of the game, its victory is certain regardless.
This is still a little strange—the suboptimal moves do not sacrifice points to reduce variance, so it’s not like it’s raising p(win). But it just doesn’t care either way; a win is a win.
There are Go AI that are trained with the objective of maximizing point difference. I am told they are quite vicious, in a way that AlphaGo isn’t. But the most famous Go AI in our timeline turned out to be the more chill variant.
Yes, I’m not so sure either about the stockfish-pawns point.
In Michael Redmond’s AlphaGo vs AlphaGo series on YouTube, he often finds the winning AI carelessly loses points in the endgame. It might have a lead of 1.5 or 2.5 points, 20 moves before the game ends; but by the time the game ends, has played enough suboptimal moves to make itself win by 0.5 - the smallest possible margin.
It never causes itself to lose with these lazy moves; only reduces its margin of victory. Redmond theorizes, and I agree, that this is because the objective is to win, not maximize point differential, and at such a late stage of the game, its victory is certain regardless.
This is still a little strange—the suboptimal moves do not sacrifice points to reduce variance, so it’s not like it’s raising p(win). But it just doesn’t care either way; a win is a win.
There are Go AI that are trained with the objective of maximizing point difference. I am told they are quite vicious, in a way that AlphaGo isn’t. But the most famous Go AI in our timeline turned out to be the more chill variant.