gwern comments on “AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”, DeepMind [won 10 of 11 games against human pros]

gwern 27 Jan 2019 1:14 UTC
9 points
Yes, if it’s as simple as ‘spam clicks from imitation learning are too hard to wash out via self-play given the weak APM limits’, it should be relatively easy to fix. Add a very tiny penalty for each click to incentivize efficiency, or preprocess the replay dataset—if a ‘spam click’ does nothing useful, it seems like it should be possible to replay through all the games, track what clicks actually result in a game-play difference and what clicks are either idempotent (eg multiple clicks in the same spot) or cancel out (eg a click to go one place which is replaced by a click to go another place before the unit has moved more than epsilon distance), and filter out the spam clicks.