Nicholas’s summary, that I’m copying over on his behalf:
This post argues that while it is impressive that AlphaStar can build up concepts complex enough to win at StarCraft, it is not actually developing reactive strategies. Rather than scouting what the opponent is doing and developing a new strategy based on that, AlphaStar just executes one of a predetermined set of strategies. This is because AlphaStar does not use causal reasoning and that keeps it from beating any of the top players.
Nicholas’s opinion:
While I haven’t watched enough of the games to have a strong opinion on whether AlphaStar is empirically reacting to its opponents strategies, I agree with Paul Christiano’s comment that in principle causal reasoning is just one type of computation that should be learnable.
This discussion also highlights the need for interpretability tools for deep RL so that we can have more informed discussions on exactly how and why strategies are decided on.
Nicholas’s summary, that I’m copying over on his behalf:
Nicholas’s opinion: