All of RL’s successes, even the huge ones like AlphaGo (which beat the world champion at Go) or its successors, were not easy to train. For one thing, the process was very unstable and very sensitive to slight mistakes. The networks had to be designed with inductive biases specifically tuned to each problem.
And the end result was that there was no generalization. Every problem required you to rethink your approach from scratch. And an AI that mastered one task wouldn’t necessarily learn another one any faster.
I had the distinct impression that AlphaZero (the version of AlphaGo where they removed all the tweaks) could be left alone for an afternoon with the rules of almost any game in the same class as go, chess, shogi, checkers, noughts-and-crosses, connect four, othello etc, and teach itself up to superhuman performance.
In the case of chess, that involved rediscovering something like 400 years of human chess theorizing, to become the strongest player in history including better than all previous hand-constructed chess programs.
In the case of go, I am told that it not only rediscovered a whole 2000 year history of go theory, but added previously undiscovered strategies. “Like getting a textbook from the future”, is a quote I have heard.
That strikes me as neither slow nor ungeneral.
And there was enough information in the AlphaZero paper that it was replicated and improved on by the LeelaChessZero open-source project, so I don’t think there can have been that many special tweaks needed?
Admittedly, the success of AlphaZero relied on it being essentially able to generate very, very large amounts of very high-quality data, so this is a domain where synthetic data was very successful.
So a weaker version of the post is “you need either a lot of data, and high quality ones, or high amounts of compute, and there’s little going around it.”
I had the distinct impression that AlphaZero (the version of AlphaGo where they removed all the tweaks) could be left alone for an afternoon with the rules of almost any game in the same class as go, chess, shogi, checkers, noughts-and-crosses, connect four, othello etc, and teach itself up to superhuman performance.
In the case of chess, that involved rediscovering something like 400 years of human chess theorizing, to become the strongest player in history including better than all previous hand-constructed chess programs.
In the case of go, I am told that it not only rediscovered a whole 2000 year history of go theory, but added previously undiscovered strategies. “Like getting a textbook from the future”, is a quote I have heard.
That strikes me as neither slow nor ungeneral.
And there was enough information in the AlphaZero paper that it was replicated and improved on by the LeelaChessZero open-source project, so I don’t think there can have been that many special tweaks needed?
Admittedly, the success of AlphaZero relied on it being essentially able to generate very, very large amounts of very high-quality data, so this is a domain where synthetic data was very successful.
So a weaker version of the post is “you need either a lot of data, and high quality ones, or high amounts of compute, and there’s little going around it.”