[Link] Did AlphaStar just click faster?
This is a linkpost for: https://medium.com/@aleksipietikinen/an-analysis-on-how-deepminds-starcraft-2-ai-s-superhuman-speed-could-be-a-band-aid-fix-for-the-1702fb8344d6.
tl;dr: AlphaStar clicked at a rate of 1000+ Actions Per Minute for five second periods, and a rate 1500+ APM for fractions of a second. The fastest human player can’t sustain anything above 500 APM for more than a second or two. Did AlphaStar just spam click its way to victory?
Well, it definitely may have had an advantage that embodied humans can’t have. “Does perfect stalker micro really count as intelligence?”, we wail. But you have to remember that previous starcraft bots playing with completely unrestricted apm weren’t even close to competitive level. I think that the evidence is pretty strong that AlphaStar (at least the version without attention that just perceived the whole map) could beat humans under whatever symmetric APM cap you want.
Love this bit.
This does not seem at all clear to me. Weren’t all the strategies using micro super-effectively? And apparently making other human-detectable mistakes? Seems possible that AlphaStar would win anyway without the micro, but not at all certain.
It was using micro effectively, but the crazy 1200+ APM fight was pretty unusual. If you look at most of its fights (e.g. https://youtu.be/H3MCb4W7-kM?t=2854 , APM number appears intermittently in the center-bottom ), with 6-10 units, it’s using about the same APM as the human—the micro advantage for 98% of the game isn’t because it’s clicking faster, its clicks are just better.
There were a bunch of mistakes in the first matches shown, but when they trained for twice as long it seemed like those mistakes mostly went away, and its macro play seemed within the range of skilled humans (if you’re willing to suspect that overbuilding probes might be good).
I don’t know anything about StarCraft, but the impression I got was that a few seconds of superhuman clicking in high leverage situations can mean a lot.
Agreed that this is a big improvement on previous StarCraft AIs no matter its clicking speed, but this seems like reason to doubt that AI has surpassed human strategy in StarCraft.
I think Charlie might be suggesting that AlphaStar would be superior to humans, even with only human or sub-human APM, because the precision of those actions would still be superhuman, even if the total number was slightly subhuman:
This wouldn’t necessarily mean that AlphaStar is better at strategy.
Inhuman burst APM allows AlphaStar to win stalemates far better than a human can with crazy micro, and it also uses strange, normally illogical strategies that capitalize on that ability (such as some Zerglings defeating a tank with superior micro). The AI no doubt has learned strategy and has a high level of intelligence in the game, but I don’t think AlphaStar would be nearly as competitive if it had perfect simulation of human APM, enough to the point where it may not be safe to say that AlphaStar has outplayed any pro players so much as out-tech’d them. To parrot ESRogs, it’s not evident to me that AlphaStar could beat humans under any symmetric APM cap.
I think the most interesting part of the piece is the bit at the end where the author analyzes a misleading graph. Now that I understand the graph, it seems like like strong evidence towards malicious misrepresentation or willful ignorance on the part of some subset (possibly quite small) of the AlphaStar team.
I think the article might benefit from comparisons to OpenAI’s Dota demonstration. I don’t remember anyone complaining about superhuman micro in that case. Did that team do something to combat superhuman-APM, or is Starcraft just more vulnerable than Dota to superhuman-APM tactics?
Also discussed in https://www.lesswrong.com/posts/f3iXyQurcpwJfZTE9/alphastar-mastering-the-real-time-strategy-game-starcraft-ii
I think the right way to frame this is that AlphaStar has done the Starcraft 2 equivalent of mastering blitz chess. Blitz chess is easier for computers than slow chess because humans don’t cope well with time pressure. Starcraft 2 imposes fairly extreme time pressure by default; humans need to make multiple actions per second, which means they’re choosing between multi-action templates rather than optimizing each action separately. An interesting challenge for AI would be the Starcraft equivalent of slow chess: Starcraft played at 1/10th speed.
There’s a “simple” fix for this. Make the computer play with the same user interface the human does: a screen, keyboard, and mouse. I don’t know if robots can control a computer mouse with superhuman precision or not, but it certainly would keep it from making moves all over the map at the same time...
My take away is that we should actually only be counting EPM in these matches, rather than APM, and counting most/all of AlphaStar’s clicks as effective.
Above and beyond it’s APM, it was doing things humans physically can’t do with the camera in order to micro perfectly on three effective screens at once, with effectively infinite mouse speed, and no input lag
Basically, #2 in the article dominates with how it won
Interesting article. It argues that the AI learned spam clicking from human replays, then needed its APM cap raised to prevent spam clicking from eating up all of its APM budget and inhibiting learning. Therefore, it was permitted to use inhumanly high burst APM, and with all its clicks potentially effective actions instead of spam, its effective actions per minute (EPM, actions not counting spam clicks) are going to outclass human pros to the point of breaking the game and rendering actual strategy redundant.
Except that if it’s spamming, those clicks aren’t effective actions, and if those clicks are effective actions, it’s not spamming. To the extent Alphastar spams, its superhuman APM is misleading, and the match is fairer than it might otherwise appear. To the extent that it’s using high burst EPM instead, that can potentially turn the game into a micro match rather than the strategy match that people are more interested in. But that isn’t a question of spam clicking.
Of course, if it started spam clicking, needed the APM cap raised, converted its spam into actual high EPM and Deepmind didn’t lower the cap afterwards, then the article’s objection holds true. But that didn’t sound like what it was arguing (though perhaps I misunderstood it). Indeed, it seems to argue the reverse, that spam clicking was so ingrained that the AI never broke the habit.