yagudin comments on Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

yagudin 6 Dec 2017 20:31 UTC
3 points
Two interesting questions arise:
- could Alpha Zero beat the best human-computer team;
- would human-AZ team systematically beat AZ.
I think the answer to the first question is positive, but unfortunately, I couldn’t make much sense of the available raw data on Freestyle chess, so my opinion is based on the marginal revolution blog-post. The negative answer to the second question might make some optimists about human-AI cooperation like Kasparov less optimistic.
- roystgnr 6 Dec 2017 20:42 UTC
  4 points
  Parent
  I believe the answer to your second question is probably technically “yes”; if there’s any way in which AZ mispredicts relative to a human, then there’s some Ensemble Learning classifier that weights AZ move choices with human move choices and performs better than AZ alone. And because Go has so many possible states and moves at each state, humans would have to be much, much, much worse at play overall for us to conclude that humans were worse along every dimension.
  However, I’d bet the answer is practically “no”. If AlphaZero vs the top humans is now an Elo difference of 1200, that gives a predicted human victory rate of about 1/1000. We’d never be able to play enough games to get enough systematic data to identify a dimension along which humans chose better moves. And even if we did, it’s entirely possible that the best response to that identification would be “give AlphaZero more training time on those cases”, not “give AlphaZero a human partner in those cases”. And even if we did decide to give AlphaZero a human partner, how often would the partner’s input end up overriding the move AlphaZero alone would have chosen? Would the human even be able to uselessly pay attention for game after game, just so they could be ready to contribute to a winning move on game 100?