avturchin comments on “AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”, DeepMind [won 10 of 11 games against human pros]

avturchin 25 Jan 2019 20:59 UTC
15 points
They are now explaining this in reddit AMA: “We are capping APM. Blizzard in game APM applies some multipliers to some actions, that’s why you are seeing a higher number. https://github.com/deepmind/pysc2/blob/master/docs/environment.md#apm-calculation″
“We consulted with TLO and Blizzard about APMs, and also added a hard limit to APMs. In particular, we set a maximum of 600 APMs over 5 second periods, 400 over 15 second periods, 320 over 30 second periods, and 300 over 60 second period. If the agent issues more actions in such periods, we drop / ignore the actions. ”
“Our network has about 70M parameters.”
AMA: https://www.reddit.com/r/MachineLearning/comments/ajgzoc/we_are_oriol_vinyals_and_david_silver_from/eexs0pd/?context=3
- Wei Dai 25 Jan 2019 21:44 UTC
  23 points
  Parent
  That explanation doesn’t really absolve them very much in my mind. If you read people’s responses to that comment on Reddit, it seems clear that AlphaStar still largely won by being faster and more accurate at crucial moments (a human couldn’t duplicate the strategies AlphaStar used because they can’t perform 50 accurate actions in 5 seconds), and the APM comparison graph was trying to make people think the opposite. See this one for example:
  
  Statistics aside, it was clear from the gamers’, presenters’, and audience’s shocked reaction to the Stalker micro, all saying that no human player in the world could do what Alpha Go was doing. Using just-beside-the-point statistics is obfuscation and an avoiding of acknowledging this.
  
  AlphaStar wasn’t outsmarting the humans—it’s not like TLO and MaNa slapped their foreheads and said, “I wish I’d thought of microing Stalkers that fast! Genius!”
  - Kaj_Sotala 26 Jan 2019 9:37 UTC
    8 points
    Parent
    Is there a reason to assume that DeepMind is being intentionally deceptive rather than making a good-intentioned mistake? After all, they claim that they consulted with TLO on the APM rate, which is something that they seem unlikely to lie about, since it would be easy for TLO to dispute that if it was untrue. So presumably TLO felt like the caps that they instituted were fair, and it would have been reasonable for DeepMind to trust a top player’s judgment on that, none of them (AFAIK) being top players themselves. And even with the existing caps, people felt like the version which played TLO would likely have lost to MaNa, and the camera-limited version did actually lose to him. So it seems reasonable for someone to think that a good enough player would still beat versions of AlphaStar even with the existing limits, and that they didn’t thus give it an unfair advantage.
    On the other hand, while I can think of plenty of reasons for why DeepMind might have honestly thought that this setup was fair, I can’t think of any good reason for them to decide to be intentionally deceptive in this way. They’ve been open about their agent currently only playing Protoss vs. Protoss on a single map and of earlier versions seeing the whole map, and five of their games were played against a non-Protoss player. If they honestly felt that they only won because of the higher APM rate, then I don’t see why they wouldn’t just admit that the same way they’re forthcoming about all of the system’s other limitations.
    - Wei Dai 26 Jan 2019 11:29 UTC
      30 points
      Parent
      I think it’s quite possible that when they instituted the cap they thought it was fair, however from the actual gameplay it should be obvious to anyone who is even somewhat familiar with Starcraft II (e.g., many members of the AlphaStar team) that AlphaStar had a large advantage in “micro”, which in part came from the APM cap still allowing superhumanly fast and accurate actions at crucial times. It’s also possible that the blogpost and misleading APM comparison graph were written by someone who did not realize this, but then those who did realize should have objected to it and had it changed after they noticed.
      
      Possibly you’re not objecting to what I’m suggesting above but to the usage of “intentionally deceptive” to describe it. If so, I think you may have a point in that there may not have been any single human who had an intention to be deceptive (e.g., maybe those at DM who realized that the graph would be misleading lacked the power or incentives to make changes to it), in which case perhaps it doesn’t make sense to attribute deceptive intention to the organization. Maybe “knowingly misleading” would be a better phrase there to describe the ethical failure (assuming you agree that the episode most likely does constitute a kind of ethical failure)?
      
      ETA: I note that many people on places like Hacker News and Reddit have pointed out the misleading nature of the APM comparison graph, with virtually no one defending DM on that point, so anyone at DM who is following those discussions should have realized it by now, but no one has come out and either admitted a mistake or offered a reasonable explanation. Multiple people have also decided that it was intentional deception (which does seem like a strong possibility to me too since I think there’s a pretty high prior that the person who wrote the blog post would not be so unfamiliar with Starcraft II). Another piece of evidence that I noticed is that during the Youtube video one of the lead researchers talked about the APM comparison with the host, and also said at some point that he used to play Starcraft II. One Redditor describes it as “It’s not just the graphs, but during the conversation with Artosis the researcher was manipulating him.”
      - ESRogs 26 Jan 2019 19:38 UTC
        7 points
        Parent
        I think it’s quite possible that when they instituted the cap they thought it was fair, however from the actual gameplay it should be obvious to anyone who is even somewhat familiar with Starcraft II (e.g., many members of the AlphaStar team) that AlphaStar had a large advantage in “micro”, which in part came from the APM cap still allowing superhumanly fast and accurate actions at crucial times. It’s also possible that the blogpost and misleading APM comparison graph were written by someone who did not realize this, but then those who did realize should have objected to it and had it changed after they noticed.
        It’s not so obvious to me that someone who realizes that AlphaStar is superior at “micro” should have objected to those graphs.
        Think about it like this—you’re on the DeepMind team, developing AlphaStar, and the whole point is to make it superhuman at StarCraft. So there’s going to be some part of the game that it’s superhuman at, and to some extent this will be “unfair” to humans. The team decided to try not to let AlphaStar have “physical” advantages, but I don’t see any indication that they explicitly decided that it should not be better at “micro” or unit control in general, and should only win on “strategy”.
        Also, separating “micro” from “strategy” is probably not that simple for a model-free RL system like this. So I think they made a very reasonable decision to focus on a relatively easy-to-measure APM metric. When the resulting system doesn’t play exactly as humans do, or in a way that would be easy for humans to replicate, to me it doesn’t seem so-obvious-that-you’re-being-deceptive-if-you-don’t-notice-it that this is “unfair” and that you should go back to the drawing board with your handicapping system.
        It seems to me that which ways for AlphaStar to be superhuman are “fair” or “unfair” is to some extent a matter of taste, and there will be many cases that are ambiguous. To give a non “micro” example—suppose AlphaStar is able to better keep track of exactly how many units its opponent has (and at what hit point levels) throughout the game, than a human can, and this allows it to make just slightly more fine-grained decisions about which units it should produce. This might allow it to win a game in a way that’s not replicable by humans. It didn’t find a new strategy—it just executed better. Is that fair or unfair? It feels maybe less unfair than just being super good at micro, but exactly where the dividing line is between “interesting” and “uninteresting” ways of winning seems not super clear.
        Of course, now that a much broader group of StarCraft players has seen these games, and a consensus has emerged that this super-micro does not really seem fair, it would be weird if DeepMind did not take that into account for its next release. I will be quite surprised if they don’t adjust their setup to reduce the micro advantage going forward.
        Wei Dai 26 Jan 2019 21:14 UTC
        25 points
        Parent
        
        When the resulting system doesn’t play exactly as humans do, or in a way that would be easy for humans to replicate, to me it doesn’t seem so-obvious-that-you’re-being-deceptive-if-you-don’t-notice-it that this is “unfair” and that you should go back to the drawing board with your handicapping system.
        
        This is not the complaint that people (including me) have. Instead the complaint is that, given it’s clear that AlphaStar won mostly through micro, that graph highlighted statistics (i.e., average APM over the whole game, including humans spamming keys to keep their fingers warm) that would be irrelevant to SC2 experts for judging whether or not AlphaStar did win through micro, but would reliably mislead non-experts into thinking “no” on that question. Both of these effects should have been easy to foresee.
    - Benquo 31 Jan 2019 13:01 UTC
      12 points
      Parent
      
      I can’t think of any good reason for them to decide to be intentionally deceptive in this way.
      
      They’ve received a bunch of favorable publicity as a result.
      - Kaj_Sotala 31 Jan 2019 13:57 UTC
        1 point
        Parent
        It seems very unlikely to me that they would have gotten any less publicity if they’d reported the APM restrictions any differently. (After all, they didn’t get any less publicity for reporting the system’s other limitations either, like it only being able to play Protoss v. Protoss on a single map, or ¹⁰⁄₁₁ of the agents having whole-camera vision.)
        Rob Bensinger 13 Mar 2019 18:07 UTC
        8 points
        Parent
        After all, they didn’t get any less publicity for reporting the system’s other limitations either, like it only being able to play Protoss v. Protoss on a single map, or ¹⁰⁄₁₁ of the agents having whole-camera vision.
        They might well have gotten less publicity due to emphasizing those facts as much as they did.
    - Matt Goldenberg 26 Jan 2019 15:04 UTC
      3 points
      Parent
      To me deep mind is simply trying to paint themselves in the best light. I’m not particularly surprised by the behavior; I would expect it from a for-profit company looking to get PR. Nor am I particularly upset about the behavior; I don’t see any outright lying going on, merely an attempt to frame the facts in the best possible way for them.
  - avturchin 25 Jan 2019 21:58 UTC
    4 points
    Parent
    Ok, I read it too after my comment above… And I thought that than future evil superintellgence will start shooting at people on streets, the same commenters will said: “No, it is not a superintelligence, it is good just in tactical use of guns, and it just knows where humans are located and never misses, but its strategy is awful”. Or, in other words:
    Weak strategy + perfect skills = dangerous AI