Number 3 is an interesting claim, but I would assume that, if this is true and DeepMind are aware of this, they would just find a way to erase the spam clicks from the human play database.
Yes, if it’s as simple as ‘spam clicks from imitation learning are too hard to wash out via self-play given the weak APM limits’, it should be relatively easy to fix. Add a very tiny penalty for each click to incentivize efficiency, or preprocess the replay dataset—if a ‘spam click’ does nothing useful, it seems like it should be possible to replay through all the games, track what clicks actually result in a game-play difference and what clicks are either idempotent (eg multiple clicks in the same spot) or cancel out (eg a click to go one place which is replaced by a click to go another place before the unit has moved more than epsilon distance), and filter out the spam clicks.
Number 3 is an interesting claim, but I would assume that, if this is true and DeepMind are aware of this, they would just find a way to erase the spam clicks from the human play database.
Yes, if it’s as simple as ‘spam clicks from imitation learning are too hard to wash out via self-play given the weak APM limits’, it should be relatively easy to fix. Add a very tiny penalty for each click to incentivize efficiency, or preprocess the replay dataset—if a ‘spam click’ does nothing useful, it seems like it should be possible to replay through all the games, track what clicks actually result in a game-play difference and what clicks are either idempotent (eg multiple clicks in the same spot) or cancel out (eg a click to go one place which is replaced by a click to go another place before the unit has moved more than epsilon distance), and filter out the spam clicks.