Chris Land comments on A conversation about Katja’s counterarguments to AI risk

Chris Land 18 Oct 2022 22:53 UTC
4 points
2
Interesting that AlphaGo plays strongly atypical or totally won positions ‘poorly’ and therefore isn’t a reliable advice-giver for human players. Chess engines have similar limitations with different qualities. First, they have no sense of move-selection difficulty. Strong human players learn to avoid positions where finding a good move is harder than normal. The second point is related: in winning positions (say, over +3.50 or under −3.50), the human move-selection goal shifts towards maximizing winning chances by eliminating counterplay. E.g., in a queen ending two pawns ahead, it’s better to exchange queens than win a third pawn with queens remaining. Not according to engines, though. If a move drops the eval from +7.50 to +5.00, they’ll call it a blunder.

I imagine these kinds of human-divergent evaluation oddities materialize for any complex task, and more complex = more divergent.
- Ege Erdil 19 Oct 2022 11:03 UTC
  3 points
  1
  Parent
  Yeah, that matches my experience with chess engines. Thanks for the comment.
  
  It’s probably worthwhile to mention that people have trained models that are more “human-like” than AlphaGo by various parts of the training process. One improvement they made on this front is that they changed the reward function so that while almost all of the variance in reward is from whether you win or lose, how many points you win by still has a small influence on the reward you get, like so:
  
  This is obviously tricky because it could push the model to take unreasonable risks to attempt to win by more and therefore risk a greater probability of losing the game, but the authors of the paper found that this particular utility function works well in practice. On top of this, they also took measures to widen the training distribution by forcing the model to play handicap games where one side gets a few extra moves, or forcing the model to play against weaker or stronger versions of itself, et cetera.
  
  All of these serve to mitigate the problems that would be caused by distributional shift, and in this case I think they were moderately successful. I can confirm from having used their model myself that it indeed makes much more “human-like” recommendations, and is very useful for humans wanting to analyze their games, unlike pure replications of AlphaGo Zero such as Leela Zero.