I will try to make a convincing argument for the following:
1. AlphaStar played with superhuman speed and precision.
2. Deepmind claimed to have restricted the AI from performing actions that would be physically impossible to a human. They have not succeeded in this and most likely are aware of it.
3. The reason why AlphaStar is performing at superhuman speeds is most likely due to it’s inability to unlearn the human players tendency to spam click. I suspect Deepmind wanted to restrict it to a more human like performance but are simply not able to. It’s going to take us some time to work our way to this point but it is the whole reason why I’m writing this so I ask you to have patience.
Number 3 is an interesting claim, but I would assume that, if this is true and DeepMind are aware of this, they would just find a way to erase the spam clicks from the human play database.
Yes, if it’s as simple as ‘spam clicks from imitation learning are too hard to wash out via self-play given the weak APM limits’, it should be relatively easy to fix. Add a very tiny penalty for each click to incentivize efficiency, or preprocess the replay dataset—if a ‘spam click’ does nothing useful, it seems like it should be possible to replay through all the games, track what clicks actually result in a game-play difference and what clicks are either idempotent (eg multiple clicks in the same spot) or cancel out (eg a click to go one place which is replaced by a click to go another place before the unit has moved more than epsilon distance), and filter out the spam clicks.
Interesting analysis here:
Number 3 is an interesting claim, but I would assume that, if this is true and DeepMind are aware of this, they would just find a way to erase the spam clicks from the human play database.
Yes, if it’s as simple as ‘spam clicks from imitation learning are too hard to wash out via self-play given the weak APM limits’, it should be relatively easy to fix. Add a very tiny penalty for each click to incentivize efficiency, or preprocess the replay dataset—if a ‘spam click’ does nothing useful, it seems like it should be possible to replay through all the games, track what clicks actually result in a game-play difference and what clicks are either idempotent (eg multiple clicks in the same spot) or cancel out (eg a click to go one place which is replaced by a click to go another place before the unit has moved more than epsilon distance), and filter out the spam clicks.