[LINK] Deep Learning Machine Teaches Itself Chess in 72 Hours
Lai has created an artificial intelligence machine called Giraffe that has taught itself to play chess by evaluating positions much more like humans and in an entirely different way to conventional chess engines.
Straight out of the box, the new machine plays at the same level as the best conventional chess engines, many of which have been fine-tuned over many years. On a human level, it is equivalent to FIDE International Master status, placing it within the top 2.2 percent of tournament chess players.
The technology behind Lai’s new machine is a neural network. [...] His network consists of four layers that together examine each position on the board in three different ways.
The first looks at the global state of the game, such as the number and type of pieces on each side, which side is to move, castling rights and so on. The second looks at piece-centric features such as the location of each piece on each side, while the final aspect is to map the squares that each piece attacks and defends.
[...]
Lai generated his dataset by randomly choosing five million positions from a database of computer chess games. He then created greater variety by adding a random legal move to each position before using it for training. In total he generated 175 million positions in this way.
[...]
One disadvantage of Giraffe is that neural networks are much slower than other types of data processing. Lai says Giraffe takes about 10 times longer than a conventional chess engine to search the same number of positions.
But even with this disadvantage, it is competitive. “Giraffe is able to play at the level of an FIDE International Master on a modern mainstream PC,” says Lai. By comparison, the top engines play at super-Grandmaster level.
[...]
Ref: arxiv.org/abs/1509.01549 : Giraffe: Using Deep Reinforcement Learning to Play Chess
That’s a pretty hard contradiction right there. The latter quote is probably the correct one. Modern chess engines beat any human player these days, even running on relatively modest hardware. That’s assuming full length games. At blitz games computers are much better still, compared to humans, because humans are much more error prone.
So if this neural net is playing at master level it’s still much, much weaker than the best computers. From master to grandmaster is a big leap, from grandmaster to world top is another big leap, and the best computers are above even that.
Still interesting of course.
Yes, I noticed this too. The paper itself compares Giraffe (ELO ~2400) to 8 other chess engines (page 25 of the PDF), and decides that
For comparison, a frequently updated list of chess engines and their (approximate) ELO ratings, which would list Giraffe around shared 165′th place. It seems that it is the reporting, instead of the paper, that is making the exaggeration.
I think this is pretty cool and interesting, but I feel compelled to point out that all is not as it seems:
Its worth noting, though, that just the evaluation function is a neural network. The search, while no long iteratively deepening, is still recursive. Also, the evaluation function is not a pure neural network. It includes a static exchange evaluation.
It’s also worth noting that doubling the amount of computing time usually increasing a chess engine’s score by about 60 points. International masters usually have a rating below 2500. Though this is sketchy, the top chess engines are rated at around 3300. Thus, you could make a top-notch engine approximately 10,000 times slower and achieve the same performance.
Now, that 3300 figure is probably fairly inaccurate. Also, its quite possible that if the developer tweaked their recursive search algorithm, they could improve it. Thus that 10,000 figure I came to above is probably fairly inaccurate. Regardless, it is not clear to me that the neural network itself is proving terribly useful.
Although it’s not better than existing solutions, it’s a cool example of how good results can be achieved in a relatively automatic way—by contrast, the evaluation functions of the best chess engines have been carefully engineered and fine-tuned over many years, at least sometimes with assistance from people who are themselves master-level chess players. On the other hand this neural network approach took a relatively short time and could have been applied by someone with little chess skill.
edit: Reading the actual paper, it does sound like a certain amount of iteration, and expertise on the author’s part, was still required.
edit2: BTW, the paper is very clear and well written. I’d recommend giving it a read if you’re interested in the subject matter.
Thank you for recommending to read the paper, I don’t think I would have otherwise and I greatly enjoyed reading it!
But how much of its performance comes from the neural network learning some non-trivial evaluation function and how much comes from brute-forcing the game tree on a modern computer?
If the neural network was replaced by a trivial heuristic, say, material balance, how would the engine perform?
In the paper they start with just material balance—then via the learning process, their score on the evaluation test goes from “worse than all hand-written chess engines” to “better than all except the very best one” (and the best one, while more hand-crafted, also uses some ML/statistical tuning of numeric params, and has had a lot more effort put into it).
The reason why the NN solution currently doesn’t do as well in real games is because it’s slower to evaluate and therefore can’t brute-force as far.
This is pitched as unsupervised from scratch, but on page 18, it talks about training to mimic another engine’s board evaluation function.
I don’t follow how the various networks and training regimes relate to each other. If the supervised training only produced the features listed on 18-19, that doesn’t seem like a big deal, because those are very basic chess concepts that exhibit very little expertise. (It did jump out to me that the last item, the lowest-value piece among attackers of a square is much more complicated than any of the other features.) And the point is to make a much better evaluation function and trade off against brute force. But I’m not sure what’s going on.
The main training isn’t entirely from scratch, either, but makes use of random positions from expert games, implicitly labeled balanced. It would be interesting to know how important that was, and what would happen if the system purely bootstrapped, generating typical positions by playing against itself.
It looks to me as if they did the following:
Design the features:
Manually try various combinations of features. For each candidate feature-set, attempt to learn Stockfish’s evaluation function.
Having chosen features, learn the weights:
Initialize weights via some kind of bootstrapping process using a manually-designed (but deliberately rather stupid) evaluator.
Optimize weights by unsupervised TD-leaf learning using a large database of positions (from computer-computer games) as starting points for self-play.
That sounds correct, but in the first step, I think what they are optimizing is not (just) features but representations of those features. The natural language descriptions of the features are very simple ones that require no expertise, but some representations and ways of wiring them together are more conducive to learning and more conducive to building higher level features that Stockfish uses. But, again, it doesn’t sound like they are applying much optimization power at this step.
Also, one thing that I think is just omitted is whether the network in the first step is the same network in the later steps, or whether the later steps introduce more layers to exploit the same features.
Features and representations: agreed. (I wasn’t trying to be precise.)
I assumed the same network in the first step as later, but agree that it isn’t made explicit in the paper.
I am confused by section 5 in the paper about probabilistic generation of the search tree—the paper states:
But the creators of the most popular engines literally spend hours a day trying to increase the rating of their engine, and 26 rating points is massive. Is this probabilistic search simply that unknown and good? Or is does the trick lie in the “stronger than a naive implementation of depth-limited search”, and is there some reason why we expect depth-limited search to have sophisticated implementations, but do not expect this for probablistic search?
Something like that I think. The paper suggests that optimizations applied to depth-based search techniques in more sophisticated engines are already effectively like an approximation of probability-based search.
Should in this case the probabilistic search not already be comparable in performance with non-naive depth-based search, if most of the sophistication in the latter just serves to approximate the former? Since the probabilistic search seems relatively simple the argument above seems insufficient to explain why probabilistic search is not used more widely, right?
This and other recent work with deep learning raises a substantial question: how much of intelligence is simply raw processing power? Many of the ideas and algorithms used in these recent advances have been around for a while, but they are taking advantage of much more processing power than they would have had available 10 or 20 years ago. This suggests that we may have most of the ingredients for intelligence and can’t implement it or can’t recognize it due to processing limitations.