Tom McGrath comments on “Acquisition of Chess Knowledge in AlphaZero”: probing AZ over time

Tom McGrath 19 Nov 2021 16:06 UTC
4 points
Thanks for the summary! Your first bullet point was my motivation for doing this. I think it’s important to test out interpretability ideas in more challenging domains.
We didn’t really do much interpretability in this paper, this is more meta-interpretability in a sense (i.e. studying whether interpretability should in principle be possible). I’d say section 4 is worth a look, especially section 4.5 which covers fundamental and practical challenges to probing. Section 7 has some NMF analysis, and we open-sourced NMF factors which you might find interesting.
- Zac Hatfield-Dodds 19 Nov 2021 23:03 UTC
  1 point
  Parent
  I enjoyed the whole paper! It’s just that “read sections 1 through 8” doesn’t reduce the length much, and 5-6 have some nice short results that can be read alone :-)
  - delton137 29 Jan 2022 23:16 UTC
    1 point
    Parent
    Zac says “Yes, over the course of training AlphaZero learns many concepts (and develops behaviours) which have clear correspondence with human concepts.”
    
    What’s the evidence for this? If AlphaZero worked by learning concepts in a sort of step-wise manner, then we should expect jumps in performance when it comes to certain types of puzzles, right? I would guess that a beginning human would exhibit jumps from learning concepts like “control the center” or “castle early, not later”.. for instance the principle “control the center”, once followed, has implications on how to place knights etc which greatly effect win probability. Is the claim they found such jumps? (eyeing the results nothing really stands out in the plots).
    
    Or is the claim that the NMF somehow proves that AlphaZero works off concepts? To me that seems suspicious as NMF is looking at weight matrices at a very crude level, it seems.
    
    I ask this partially because I went to a meetup talk (not recorded sadly) where a researcher from MIT showed a go problem that alphaGo can’t solve but which even beginner go players can solve, which shows that alphaGo actually doesn’t understand things the same way as humans. Hopefully they will publish their work soon so I can show you.