Vaniver comments on AlphaGo versus Lee Sedol

Vaniver 9 Mar 2016 14:01 UTC
8 points
Amazing match. Well worth staying up to 2 AM to watch.
- Vaniver 9 Mar 2016 14:35 UTC
  14 points
  0
  Parent
  Several things I thought were interesting:
  1. The commentator (on the Deepmind channel) calling out several of AlphaGo’s moves as conservative. Essentially, it would play an additional stone to settle or augment some group that he wouldn’t necessarily have played around. What I’m curious about is how much this reflects an attempt by AlphaGo to conserve computational resources. “I think move A is a 12 point swing, and move B is a 10 point swing, but move B narrows the search tree for future moves in a way that I think will net me at least 2 more points.” (It wouldn’t be verbalized like that, since it’s not thinking verbally, but you can get this effect naturally from the tree search and position evaluator.)
  2. Both players took a long time to play “obvious” moves. (Typically, by this I mean something like a response to a forced move.) 이 sometimes didn’t—there were a handful of moves he played immediately after AlphaGo’s move—but I was still surprised by the amount of thought that went into some of the moves. This may be typical for tournament play—I haven’t watched any live before this.
  3. AlphaGo’s willingness to play aggressively and get involved in big fights with 이, and then not lose. I’m not sure that all the fights developed to AlphaGo’s advantage, but evidently enough of them did by enough.
  4. I somewhat regret 이 not playing the game out to the end; it would have been nice to know the actual score. (I’m sure estimates will be available soon, if not already.)
  - V_V 9 Mar 2016 16:29 UTC
    12 points
    Parent
    
    What I’m curious about is how much this reflects an attempt by AlphaGo to conserve computational resources.
    
    If I understand correctly, at least according to the Nature paper, it doesn’t explicitly optimize for this. Game-playing software is often perceived as playing “conservatively”, this is a general property of minimax search, and in the limit the Nash equilibrium consists of maximally conservative strategies.
    
    but I was still surprised by the amount of thought that went into some of the moves.
    
    Maybe these obvious moves weren’t so obvious at that level.
    - Error 9 Mar 2016 18:16 UTC
      5 points
      Parent
      I don’t know about that level, but I can think of at least one circumstance where I think far longer than would be expected over a forced move. If I’ve worked out the forced sequence in my head and determined that the opponent doesn’t gain anything by it, but they play it anyway, I start thinking “Danger, Danger, they’ve seen something I haven’t and I’d better re-evaluate.”
      
      Most of the time it’s nothing and they just decided to play out the position earlier than I would have. But every so often I discover a flaw in the “forced” defense and have to start scrabbling for an alternative.
      - WalterL 9 Mar 2016 18:34 UTC
        7 points
        Parent
        This is very true in Go. If you are both playing down a sequence of moves without hesitation, anticipating a payoff, one of you is wrong (kind of. It’s hard to put in words.) It is always worth making double sure that it isn’t you.
    - Vaniver 9 Mar 2016 19:20 UTC
      3 points
      Parent
      
      Maybe these obvious moves weren’t so obvious at that level.
      
      Sure. And I’m pretty low as amateurs go—what I found surprising was that there were ~6 moves where I thought “obviously play X,” and 이 immediately played X in half of them and spent 2 minutes to play X in the other half of them. It wasn’t clear to me if 이 was precomputing something he would need later, or was worried about something I wasn’t, or so on.
      
      Most of the time I was thinking something like “well, I would play Y, but I’m pretty unconfident that’s the right move” and then 이 or AlphaGo play something that are retrospectively superior to Y, or I was thinking something like “I have only the vaguest sense of what to do in this situation.” So I guess I’m pretty well-calibrated, even if my skill isn’t that great.
  - SquirrelInHell 10 Mar 2016 1:40 UTC
    4 points
    Parent
    
    The commentator (on the Deepmind channel) calling out several of AlphaGo’s moves as conservative. Essentially, it would play an additional stone to settle or augment some group that he wouldn’t necessarily have played around. What I’m curious about is how much this reflects an attempt by AlphaGo to conserve computational resources. “I think move A is a 12 point swing, and move B is a 10 point swing, but move B narrows the search tree for future moves in a way that I think will net me at least 2 more points.”
    
    If the search tree is narrowed, it is narrowed for both players, so why would it be a gain?
    - Vaniver 10 Mar 2016 1:45 UTC
      7 points
      0
      Parent
      
      If the search tree is narrowed, it is narrowed for both players, so why would it be a gain?
      
      There may be an asymmetry between successful modes of attack and successful modes of defense—if there’s a narrow thread that white can win through, and a thick thread that black can threaten through, then white wins computationally by closing off that tree.
      
      But thanks for asking: I was confused somewhat because I was thinking about AI vs. human games, but the AI is trained mostly on human vs. human and AI vs. AI games, neither of which will have the AI vs. human feature. Well, except for bots playing on KGS.
      - Vaniver 21 Mar 2016 18:22 UTC
        0 points
        Parent
        
        But thanks for asking: I was confused somewhat because I was thinking about AI vs. human games, but the AI is trained mostly on human vs. human and AI vs. AI games, neither of which will have the AI vs. human feature. Well, except for bots playing on KGS.
        
        As it turns out, we learned later that Fan Hui started working with Deepmind on AlphaGo after their match, and played a bunch of games against it as it improved. So it did have a number of AI vs. human training games.
  - gjm 9 Mar 2016 16:51 UTC
    4 points
    Parent
    
    I’m sure estimates will be available soon
    
    I saw some blog comment from someone claiming to be (IIRC) an amateur 3-4 dan—i.e., good enough to estimate this sort of thing pretty well—reckoning probably 3.5 or 4.5 points in white’s favour. That would be after the komi of 7.5 points given to white as compensation for moving second, or so I assume from the half-points in the figure. So that would correspond to black being ahead by 3-4 points before komi.
  - ChristianKl 10 Mar 2016 9:04 UTC
    2 points
    Parent
    
    I somewhat regret 이 not playing the game out to the end; it would have been nice to know the actual score. (I’m sure estimates will be available soon, if not already.)
    
    That wouldn’t have given you the actual score as AlphaGo didn’t care to maximize the score in the endgame.
  - ChristianKl 10 Mar 2016 9:05 UTC
    0 points
    Parent
    
    Both players took a long time to play “obvious” moves. (Typically, by this I mean something like a response to a forced move.)
    
    Which specific moves do you mean?
    - Vaniver 10 Mar 2016 19:34 UTC
      0 points
      Parent
      I would have to rewatch the game, since the easily available record doesn’t have the time it took them to make each move.
  - ChristianKl 9 Mar 2016 16:58 UTC
    0 points
    Parent
    
    “I think move A is a 12 point swing, and move B is a 10 point swing, but move B narrows the search tree for future moves in a way that I think will net me at least 2 more points.”
    
    No. 2 points is a lot at that level. If the commentator would think a move cost 2 points he wouldn’t call it conversative but he would call it an error.
    
    Not playing out every move is more about keeping aji open and not wasting possible ko threads. Unfortunately I don’t know how to translate aji into English.
    - Vaniver 9 Mar 2016 19:28 UTC
      0 points
      Parent
      
      No. 2 points is a lot at that level. If the commentator would think a move cost 2 points he wouldn’t call it conversative but he would call it an error.
      
      I think B actually results in more points overall, which is why it would play it; my curiosity is what fraction is due to direct effects vs. indirect effects.
      
      For example, one could imagine the board position evaluation function being different for different timing schemes. If you’re playing a blitz game where both players have 10 seconds to play each turn, some positions might move from mildly favoring black to strongly favoring black because white needs to do a bunch of thinking to navigate the game tree successfully.
      - ChristianKl 10 Mar 2016 9:21 UTC
        0 points
        Parent
        It’s no blitz game and there plenty of time to think through moves.
        
        Just for the record at my prime I used to play Go at around 2 kyu.
    - polymathwannabe 9 Mar 2016 19:11 UTC
      0 points
      Parent
      I understand aji as potential for future moves that is currently not too usable but may be after the board configuration has evolved.
      - ChristianKl 10 Mar 2016 12:47 UTC
        0 points
        Parent
        It goes in that direction but moves don’t have to be used directly to constrain movements elsewhere on the board.
        
        When playing around with Fold.it there was a similar scenario. It’s often possible to run a script to get a higher local maxima. However that made the fold more “rigid”. The experienced folders did only run the script to search the local maximas at the end when they manually did everything that could be done. With my usage of Go vocabulary running the script to optimize locally beforehand would also be a case of aji-keshi.
        
        Aji is for me a phenomological primitive that I learned while playing Go and that I can use outside of Go but which doesn’t have an existing English or German word.
      - Vaniver 10 Mar 2016 2:01 UTC
        0 points
        Parent
        The way I think about aji is something fragile on a ledge—sure, it’s safe now, but as things shift around, it may suddenly become unsafe.