Donald Hobson comments on Challenge: know everything that the best go bot knows about go

Donald Hobson May 11, 2021, 6:10 PM
LW: 8 AF: 3
AF
I think that it isn’t clear what constitutes “fully understanding” an algorithm.
Say you pick something fairly simple, like a floating point squareroot algorithm. What does it take to fully understand that.
You have to know what a squareroot is. Do you have to understand the maths behind Newton raphson iteration if the algorithm uses that? All the mathematical derivations, or just taking it as a mathematical fact that it works. Do you have to understand all the proofs about convergence rates. Or can you just go “yeah, 5 iterations seems to be enough in practice”. Do you have to understand how floating point numbers are stored in memory? Including all the special cases like NaN which your algorithm hopefully won’t be given? Do you have to keep track of how the starting guess is made, how the rounding is done. Do you have to be able to calculate the exact floating point value the algorithm would give, taking into account all the rounding errors. Answering in binary or decimal?
Is brute force minmax search easy to understand. You might be able to easily implement the algorithm, but you still don’t know which moves it will make. In general, for any algorithm that takes a lot of compute, humans won’t be able to work out what it will do without very slowly imitating a computer. There are some algorithms we can prove theorems about. But it isn’t clear which theorems we need to prove to get “full understanding”
Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of “if you are in such and such situation move here” type rules. You can understand how gradient descent would generate good rules in the abstract. You have inspected a few rules in detail. But there are far too many rules for a human to consider them all. And the rules depend on a choice of random seed.
Corollaries of success (non-exhaustive):
- You should be able to answer questions like “what will this bot do if someone plays mimic go against it” without actually literally checking that during play. More generally, you should know how the bot will respond to novel counter strategies
There is not in general a way to compute what an algorithm does without running it. Some algorithms are going about the problem in a deliberately slow way. However if we assume that the go algorithm has no massive known efficiency gains. (Ie no algorithm that computes the same answer using a millionth of the compute) And that the algorithm is far too compute hungry for humans doing it manually. Then it follows that humans won’t be able to work out exactly what the algorithm will do.
You should be able to write a computer program anew that plays go just like that go bot, without copying over all the numbers.
Being able to understand the algorithm well enough to program it for the first time, not just blindly reciting code. An ambiguous but achievable goal.
Suppose a bunch of people coded another Alpha go like system. The random seed is different. The layer widths are different. The learning rate is slightly different. Its trained with different batch size, for a different amount of iterations on a different database of stored games. It plays about as well. In many situations it makes a different move. The only way to get a go bot that plays exactly like alpha go is to copy everything including the random seed. This might have been picked based on lucky numbers or birthdays. You can’t rederive from first principles what was never derived from first principles. You can only copy numbers across, or pick your own lucky numbers. Numbers like batch size aren’t quite as pick your own, there are unreasonably small and large values, but there is still quite a lot of wiggle room.
- ChristianKl May 12, 2021, 8:29 AM
  2 points
  Parent
  Suppose your go bot has memorized a huge list of “if you are in such and such situation move here” type rules.
  One interesting feature of alpha go was that it was generally not playing what a go professional would see as optimal play in the end game. A go professional doesn’t play moves that obviously lose points in the late game. On the other hand alpha go played many moves that lost points, likely because it judged them not to change anything about the likelihood of winning the game given that it was ahead by enough points.
  A good go player has a bunch of end game patterns memorized that are optimal at maximizing points. When choosing between two moves that are both judged to win the game with 0.9999999 alpha go not choosing the move that maximizes points suggest that it does not use patterns about what optimal moves are in certain local situations to make it’s judgements.
  Alpha Go is following patterns about what’s the optimal move in similar situations much less then human go players do. It’s playing the game more globally instead of focusing on local positions.
  - Daniel Kokotajlo May 12, 2021, 9:31 AM
    7 points
    Parent
    When choosing between two moves that are both judged to win the game with 0.9999999 alpha go not choosing the move that maximizes points suggest that it does not use patterns about what optimal moves are in certain local situations to make it’s judgements.
    I nitpick/object to your use of “optimal moves” here. The move that maximizes points is NOT the optimal move; the optimal move is the move that maximizes win probability. In a situation where you are many points ahead, plausibly the way to maximize win probability is not to try to get more points, but rather to try to anticipate and defend against weird crazy high-variance strategies your opponent might try.
  - Donald Hobson May 12, 2021, 10:13 AM
    3 points
    Parent
    This behaviour is consistent with local position based play that also considers “points ahead” as part of the situation.
    - ChristianKl May 12, 2021, 10:40 AM
      2 points
      Parent
      I think human go players consider points ahead as part of the situation and still don’t just play a move that provides no benefit but costs points in the endgame.
      We are not talking about situation where there’s any benefit to be gained from the behavior as that behavior happened in situations that can be fully read out.
      There are situations in Go where you don’t start a fight that you expect to win with 95% because you already ahead on the board and the 5% might make you lose but that’s very far from the moves of AlphaGo that I was talking about.
      AlphaGo plays moves that are bad according to any pattern of what’s good in Go when it’s ahead.
      - DanielFilan May 14, 2021, 6:23 PM
        2 points
        Parent
        I feel like it’s pretty relevant that AlphaGo is the worst super-human go bot, and I don’t think better bots have this behaviour.
        gjm May 14, 2021, 9:36 PM
        4 points
        Parent
        Last I heard, Leela Zero still tended to play slack moves in highly unbalanced late-game situations.
- DanielFilan May 11, 2021, 7:04 PM
  LW: 2 AF: 1
  AF Parent
  
  I think that it isn’t clear what constitutes “fully understanding” an algorithm.
  
  That seems right.
  
  Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of “if you are in such and such situation move here” type rules.
  
  I think there’s reason to believe that SGD doesn’t do exactly this (nets that memorize random data have different learning curves than normal nets iirc?), and better reason to think it’s possible to train a top go bot that doesn’t do this.
  
  There is not in general a way to compute what an algorithm does without running it.
  
  Yes, but luckily you don’t have to do this for all algorithms, just the best go bot. Also as mentioned, I think you probably get to use a computer program for help, as long as you’ve written that computer program.
  - Donald Hobson May 12, 2021, 10:08 PM
    3 points
    Parent
    I’m thinking of humans having some fast special purpose inbuilt pattern recognition, which is nondeterministic and an introspective black box, and a slow general purpose processor. Humans can mentally follow the steps of any algorithm, slowly.
    Thus if a human can quickly predict the results of program X, then either there is a program Y based on however the human is thinking that does the same thing as X and takes only a handful of basic algorithmic operations. Or the human is using their pattern matching special purpose hardware. This hardware is nondeterministic, not introspectively accessible, and not really shaped to predict go bots.
    Either way, it also bears pointing out that if the human can predict the move a go bot would make, the human is at least as good as the machine.
    So you are going to need a computer program for “help” if you want to predict the exact moves. At this stage, you can ask if you really understand how the code works. And aren’t just repeating it by route.
  - DanielFilan May 11, 2021, 7:25 PM
    LW: 2 AF: 1
    AF Parent
    I’d also be happy with an inexact description of what the bot will do in response to specified strategies that captured all the relevant details.