MTGandP comments on AlphaStar: Impressive for RL progress, not for AGI progress

MTGandP 8 Nov 2019 3:37 UTC
17 points
I know more about StarCraft than I do about AI, so I could be off base, but here’s my best attempt at an explanation:
As a human, you can understand that a factory gets in the way of a unit, and if you lift it, it will no longer be in the way. The AI doesn’t understand this. The AI learns by playing through scenarios millions of times and learning that on average, in scenarios like this one, it gets an advantage when it performs this action. The AI has a much easier time learning something like “I should make a marine” (which it perceives as a single action) than “I should place my buildings such that all my units can get out of my base”, which requires making a series of correct choices about where to place buildings when the conceivable space of building placement has thousands of options.
You could see this more broadly in the Terran AI where it knows the general concept of putting buildings in front of its base (which it probably learned via imitation learning from watching human games), but it doesn’t actually understand why it should be doing that, so it does a bad job. For example, in this game , you can see that the AI has learned:
1. I should build supply depots in front of my base.
2. If I get attacked, I should raise the supply depots.
But it doesn’t actually understand the reasoning behind these two things, which is that raising the supply depots is supposed to prevent the enemy units from running into your base. So this results in a comical situation where the AI doesn’t actually have a proper wall, allowing the enemy units to run in, and then it raises the supply depots after they’ve already run in. In short, it learns what actions are correlated with winning games, but it doesn’t know why, so it doesn’t always use these actions in the right ways.
Why is this AI still able to beat strong players? I think the main reason is because it’s so good at making the right units at the right times without missing a beat. Unlike humans, it never forgets to build units or gets distracted. Because it’s so good at execution, it can afford to do dumb stuff like accidentally trapping its own units. I suspect that if you gave a pro player the chance to play against AlphaStar 100 times in a row, they would eventually figure out a way to trick the AI into making game-losing mistakes over and over. (Pro player TLO said that he practiced against AlphaStar many times while it was in development, but he didn’t say much about how the games went.)
- orthonormal 10 Nov 2019 7:29 UTC
  2 points
  Parent
  Exactly. It seems like you need something beyond present imitation learning and deep reinforcement learning to efficiently learn strategies whose individual components don’t benefit you, but which have a major effect if assembled perfectly together.
  (I mean, don’t underestimate gradient descent with huge numbers of trials—the genetic version did evolve a complicated eye in such a way that every step was a fitness improvement; but the final model has a literal blind spot that could have been avoided if it were engineered in another way.)
  - maximkazhenkov 30 Nov 2019 1:03 UTC
    1 point
    Parent
    Genetic algorithms also eventually evolved causal reasoning agents, us. That’s why it feels weird to me that we’re once again relying on gradient descent to develop AI—it seems backwards.