Suppose your go bot has memorized a huge list of “if you are in such and such situation move here” type rules.
One interesting feature of alpha go was that it was generally not playing what a go professional would see as optimal play in the end game. A go professional doesn’t play moves that obviously lose points in the late game. On the other hand alpha go played many moves that lost points, likely because it judged them not to change anything about the likelihood of winning the game given that it was ahead by enough points.
A good go player has a bunch of end game patterns memorized that are optimal at maximizing points. When choosing between two moves that are both judged to win the game with 0.9999999 alpha go not choosing the move that maximizes points suggest that it does not use patterns about what optimal moves are in certain local situations to make it’s judgements.
Alpha Go is following patterns about what’s the optimal move in similar situations much less then human go players do. It’s playing the game more globally instead of focusing on local positions.
When choosing between two moves that are both judged to win the game with 0.9999999 alpha go not choosing the move that maximizes points suggest that it does not use patterns about what optimal moves are in certain local situations to make it’s judgements.
I nitpick/object to your use of “optimal moves” here. The move that maximizes points is NOT the optimal move; the optimal move is the move that maximizes win probability. In a situation where you are many points ahead, plausibly the way to maximize win probability is not to try to get more points, but rather to try to anticipate and defend against weird crazy high-variance strategies your opponent might try.
I think human go players consider points ahead as part of the situation and still don’t just play a move that provides no benefit but costs points in the endgame.
We are not talking about situation where there’s any benefit to be gained from the behavior as that behavior happened in situations that can be fully read out.
There are situations in Go where you don’t start a fight that you expect to win with 95% because you already ahead on the board and the 5% might make you lose but that’s very far from the moves of AlphaGo that I was talking about.
AlphaGo plays moves that are bad according to any pattern of what’s good in Go when it’s ahead.
One interesting feature of alpha go was that it was generally not playing what a go professional would see as optimal play in the end game. A go professional doesn’t play moves that obviously lose points in the late game. On the other hand alpha go played many moves that lost points, likely because it judged them not to change anything about the likelihood of winning the game given that it was ahead by enough points.
A good go player has a bunch of end game patterns memorized that are optimal at maximizing points. When choosing between two moves that are both judged to win the game with 0.9999999 alpha go not choosing the move that maximizes points suggest that it does not use patterns about what optimal moves are in certain local situations to make it’s judgements.
Alpha Go is following patterns about what’s the optimal move in similar situations much less then human go players do. It’s playing the game more globally instead of focusing on local positions.
I nitpick/object to your use of “optimal moves” here. The move that maximizes points is NOT the optimal move; the optimal move is the move that maximizes win probability. In a situation where you are many points ahead, plausibly the way to maximize win probability is not to try to get more points, but rather to try to anticipate and defend against weird crazy high-variance strategies your opponent might try.
This behaviour is consistent with local position based play that also considers “points ahead” as part of the situation.
I think human go players consider points ahead as part of the situation and still don’t just play a move that provides no benefit but costs points in the endgame.
We are not talking about situation where there’s any benefit to be gained from the behavior as that behavior happened in situations that can be fully read out.
There are situations in Go where you don’t start a fight that you expect to win with 95% because you already ahead on the board and the 5% might make you lose but that’s very far from the moves of AlphaGo that I was talking about.
AlphaGo plays moves that are bad according to any pattern of what’s good in Go when it’s ahead.
I feel like it’s pretty relevant that AlphaGo is the worst super-human go bot, and I don’t think better bots have this behaviour.
Last I heard, Leela Zero still tended to play slack moves in highly unbalanced late-game situations.