moridinamael comments on “AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”, DeepMind [won 10 of 11 games against human pros]

moridinamael 29 Jan 2019 19:39 UTC
2 points
The best I can do after thinking about it for a bit is compute every possible combination of units under 200 supply, multiply that by the possible positions of those units in space, multiply that by the possible combinations of buildings on the map and their potential locations in space, multiply that by the possible combinations of upgrades, multiply that by the amount of resources in all available mineral/vespene sources … I can already spot a few oversimplifications in what I just wrote, and I can think of even more things that need to be accounted for. The shields/hitpoints/energy of every unit. Combinatorially gigantic.
Just the number of potential positions of a single unit on the map is already huge.
But AlphaStar doesn’t really explore much of this space. It finds out pretty quickly that there’s really no reason to explore the parts of the space the include building random buildings in weird map locations. It explores and optimizes around the parts of the state space that look reasonably close to human play, because that was its starting point, and it’s not going to find superior strategies randomly, not without a lot of optimization in isolation.
That’s one thing I would love to see, actually. A version of the code trained purely on self-play, without a basis in human replays. Does it ever discover proxy plays or other esoteric cheese without a starting point provided in the human replays?
- ryan_b 30 Jan 2019 14:17 UTC
  2 points
  Parent
  I expect that will be the next step; it was how they approached the versions of Alpha Go too.