Ege Erdil comments on A conversation about Katja’s counterarguments to AI risk

Ege Erdil 19 Oct 2022 11:03 UTC
3 points
1
Yeah, that matches my experience with chess engines. Thanks for the comment.

It’s probably worthwhile to mention that people have trained models that are more “human-like” than AlphaGo by various parts of the training process. One improvement they made on this front is that they changed the reward function so that while almost all of the variance in reward is from whether you win or lose, how many points you win by still has a small influence on the reward you get, like so:

This is obviously tricky because it could push the model to take unreasonable risks to attempt to win by more and therefore risk a greater probability of losing the game, but the authors of the paper found that this particular utility function works well in practice. On top of this, they also took measures to widen the training distribution by forcing the model to play handicap games where one side gets a few extra moves, or forcing the model to play against weaker or stronger versions of itself, et cetera.

All of these serve to mitigate the problems that would be caused by distributional shift, and in this case I think they were moderately successful. I can confirm from having used their model myself that it indeed makes much more “human-like” recommendations, and is very useful for humans wanting to analyze their games, unlike pure replications of AlphaGo Zero such as Leela Zero.