This is true when getting training data, but I think it’s a difference between A (or HCH) and AlphaGo Zero when doing simulation / amplification. Someone wins a simulated game of Go even if both players are making bad moves (or even random moves), which gives you a signal that A doesn’t have access to.
Isn’t A also grounded in reality by eventually giving no A to consult with?
This is true when getting training data, but I think it’s a difference between A (or HCH) and AlphaGo Zero when doing simulation / amplification. Someone wins a simulated game of Go even if both players are making bad moves (or even random moves), which gives you a signal that A doesn’t have access to.