ESRogs comments on “AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”, DeepMind [won 10 of 11 games against human pros]

ESRogs 26 Jan 2019 21:11 UTC
8 points
Interesting analysis here:
I will try to make a convincing argument for the following:
1. AlphaStar played with superhuman speed and precision.
2. Deepmind claimed to have restricted the AI from performing actions that would be physically impossible to a human. They have not succeeded in this and most likely are aware of it.
3. The reason why AlphaStar is performing at superhuman speeds is most likely due to it’s inability to unlearn the human players tendency to spam click. I suspect Deepmind wanted to restrict it to a more human like performance but are simply not able to. It’s going to take us some time to work our way to this point but it is the whole reason why I’m writing this so I ask you to have patience.
- Vanessa Kosoy 26 Jan 2019 21:59 UTC
  5 points
  Parent
  Number 3 is an interesting claim, but I would assume that, if this is true and DeepMind are aware of this, they would just find a way to erase the spam clicks from the human play database.
  - gwern 27 Jan 2019 1:14 UTC
    9 points
    Parent
    Yes, if it’s as simple as ‘spam clicks from imitation learning are too hard to wash out via self-play given the weak APM limits’, it should be relatively easy to fix. Add a very tiny penalty for each click to incentivize efficiency, or preprocess the replay dataset—if a ‘spam click’ does nothing useful, it seems like it should be possible to replay through all the games, track what clicks actually result in a game-play difference and what clicks are either idempotent (eg multiple clicks in the same spot) or cancel out (eg a click to go one place which is replaced by a click to go another place before the unit has moved more than epsilon distance), and filter out the spam clicks.