Zvi comments on AlphaGo Zero and the Foom Debate

Zvi 22 Oct 2017 0:30 UTC
12 points
This might not be the actually important or central questions, but given my background, it’s where my brain keeps going...
I need to read up more about the techniques used, but they are clearly very general and basic. Given that, what other games or problems would we expect this system to master, versus ones that it would not master?
Go is a strange mix of a super hard problem and a strangely easy problem.
The game takes two hundred moves over a 19x19 board where large portions of that board might constitute not-obviously-wrong moves at any given time, often with sharp discontinuities in value where a slight adjustment changes groups from alive to dead or vice versa. Thus, using pure ply search and considering all possibilities is a non-starter and you need to have some way of figuring out what moves to consider at all. Doing anything with raw calculation doesn’t work.
On the other hand, Go has some things that make it a good target for this algorithm and for ML in general. Go is well-defined and bounded. Go has a limited number of possible moves—yes it’s very large, but you can know all of them and see what your ML function thinks of them for at least 1 ply. Go is complete information and deterministic. Go has a well-defined victory condition and no in-between goals (which AlphaGo exploits quite well). Go has no state—if you know the board and number of captured stones you don’t care about how you got there, which goes with complete information since you don’t need a prior on hidden information.
This means that Go has a lot of nice properties that allow you to use such a simple architecture. What would happen with other games?
This should work on Chess, with almost no modifications, so I’m wondering why they haven’t done that, and whether we’ll see it in a week. Would it crush the best computer chess programs out there? Wouldn’t shock me if it did. Another thing I’d want to see is if it’s better in the middle/end game versus the opening and by how much. I’d assume that AlphaChess would be basically perfect in the endgame and super strong in the middle game, whereas the opening is where we’d find out if past knowledge is actually worth a damn at all. Note that even if AlphaChess crushes everything after three days, we might learn interesting things by looking at it when it’s only worked for less time than that and isn’t yet crushing existing programs (even if that’s a pretty short window, you can find it by continuously testing it against Stockfish or whatever between cycles). Similarly, I bet we’d get some interesting results if we saw what partially trained versions of AlphaGo-Zero do well versus poorly in Go, as well.
One of the nice properties of such games is that playing as if you are playing against an expert is basically correct, whereas in games with hidden information that is no longer a safe assumption. Would that prevent the system from being able to train against itself without getting exploited by strategies it had trained itself not to use? Could you get it to play Hold ’em this way without getting exploited?
In some senses I’m super impressed and scared with DeepMind smashing Go this badly, while in other ways it seems cool but not scary.
- gjm 22 Oct 2017 2:34 UTC
  8 points
  Parent
  I wouldn’t be so sure it’ll work well on chess without modifications. In go, the board is fairly large relative to many important configurations that occur on it. This is a good fit for convolutional NNs, which I believe AlphaGo uses. In chess, the board is tiny. In go, even a crude evaluation of a position is subtle and probably expensive, and much of the time strategic considerations outweigh tactical ones. In chess, a simple material count is really cheap to maintain and tells a substantial fraction of the truth about who’s ahead, and tactics commonly dominate. This makes expensive “leaf” evaluation with an NN much more attractive in go than in chess. In go, the raw branching factor is rather large (hundreds of legal moves) but it’s “easy” (ha!) to prune it down a lot and be reasonably sure you aren’t missing important moves. In chess, the raw branching factor isn’t so large but almost any legal move could turn out to be best. For all these reasons, fairly naive searching + fairly simple evaluation is much more effective in chess, and fancy deep-NN evaluation (and move selection) seems less likely to make a good tradeoff. I’m sure AlphaGo-like chess programs are possible and would play reasonably well, but I’d be surprised if they could avoid just getting outsearched by something like Stockfish on “equal” hardware.
  (I am not a very strong player of either game, nor have I implemented a program that plays either game well. I could well be wrong. But I think the above is the conventional wisdom, and it seems plausible to me.)
  - Zvi 22 Oct 2017 14:34 UTC
    8 points
    Parent
    I do think Go is more tactical / sensitive to detailed changes than all that, and also that it’s not as easy as it looks to narrow the range of plausible moves especially when there’s nothing tactical going on, so this working in Go makes me optimistic about Chess. I agree that it’s not a slam dunk but I’d certainly like to see it tried. One frustrating thing is that negative results aren’t reported, so there might be lots of things that interestingly didn’t work and they wouldn’t tell us.
    - gjm 22 Oct 2017 17:19 UTC
      5 points
      Parent
      Giraffe is the most successful attempt I know of to build a NN-based chess program. It played pretty well but much weaker (on standard PC hardware) than more standard fast-searching rival programs. Of course it’s entirely possible that the DeepMind team could do better. Giraffe’s NN is a fairly simple and shallow one; I forget what AlphaGo’s looks like but I think it’s much bigger. (That doesn’t mean a bigger one would be better for chess; as I indicated above, I suspect the optimal size of a NN for playing chess may be zero.)
  - Zvi 6 Dec 2017 13:25 UTC
    7 points
    Parent
    https://arxiv.org/pdf/1712.01815.pdf
    Shows AlphaGoZero training for a day, and then beating Stockfish at Chess. Also answers the question of why they hadn’t applied it to Chess, since they have now done that (and to Shogi).
    - gjm 6 Dec 2017 13:58 UTC
      3 points
      Parent
      Yup. Of course it’s not playing on “equal” hardware—their TPUs pack a big punch—but for sure this latest work is impressive. (They’re beating Stockfish while searching something like 1000x fewer positions per second.)
      Let’s see. AlphaZero was using four TPUs, each of which does about 180 teraflops. Stockfish was using 64 threads, and let’s suppose each of those gets about 2 GIPS. Naively equating flops to CPU operations, that says AZ had ~5000x more processing power. My hazy memory is that the last time I looked, which was several years ago, a 2x increase in speed bought you about 50 ELO rating points; I bet there are some diminishing returns so let’s suppose the figure is 20 points per 2x speed increase at the Stockfish/AZ level. With all these simplistic assumptions, throwing AZ’s processing power at Stockfish by more conventional means might buy you about 240 points of ELO rating, corresponding to a win probability (taking a draw as half a win) of about 0.8. Which is a little better than AZ is reported as getting against Stockfish.
      Tentative conclusion: AZ is an impressive piece of work, especially as it learns entirely from self-play, but it’s not clear that it’s doing any better against Stockfish than the same amount of hardware would if dedicated to running Stockfish faster; so the latest evidence is still (just about) consistent with the hypothesis that the optimal size of a neural network for computer chess is zero.