I get why the MCTS is important, but what about the training? It seems to me that if we stop training AlphaGo (Zero) and I play a game against it, it’s goal-directed even though we have stopped training it.
Yeah, I agree that even without the training it would be goal-directed, that comes from the MCTS.
Note though that if we stop training and also stop using MCTS and you play a game against it, it will beat you and yet I would say that it is not goal-directed.
Yes, as long as you keep doing the MCTS + training. The value/policy networks by themselves are not goal-directed.
I get why the MCTS is important, but what about the training? It seems to me that if we stop training AlphaGo (Zero) and I play a game against it, it’s goal-directed even though we have stopped training it.
Yeah, I agree that even without the training it would be goal-directed, that comes from the MCTS.
Note though that if we stop training and also stop using MCTS and you play a game against it, it will beat you and yet I would say that it is not goal-directed.