The best I can do after thinking about it for a bit is compute every possible combination of units under 200 supply, multiply that by the possible positions of those units in space, multiply that by the possible combinations of buildings on the map and their potential locations in space, multiply that by the possible combinations of upgrades, multiply that by the amount of resources in all available mineral/vespene sources … I can already spot a few oversimplifications in what I just wrote, and I can think of even more things that need to be accounted for. The shields/hitpoints/energy of every unit. Combinatorially gigantic.
Just the number of potential positions of a single unit on the map is already huge.
But AlphaStar doesn’t really explore much of this space. It finds out pretty quickly that there’s really no reason to explore the parts of the space the include building random buildings in weird map locations. It explores and optimizes around the parts of the state space that look reasonably close to human play, because that was its starting point, and it’s not going to find superior strategies randomly, not without a lot of optimization in isolation.
That’s one thing I would love to see, actually. A version of the code trained purely on self-play, without a basis in human replays. Does it ever discover proxy plays or other esoteric cheese without a starting point provided in the human replays?
The best I can do after thinking about it for a bit is compute every possible combination of units under 200 supply, multiply that by the possible positions of those units in space, multiply that by the possible combinations of buildings on the map and their potential locations in space, multiply that by the possible combinations of upgrades, multiply that by the amount of resources in all available mineral/vespene sources … I can already spot a few oversimplifications in what I just wrote, and I can think of even more things that need to be accounted for. The shields/hitpoints/energy of every unit. Combinatorially gigantic.
Just the number of potential positions of a single unit on the map is already huge.
But AlphaStar doesn’t really explore much of this space. It finds out pretty quickly that there’s really no reason to explore the parts of the space the include building random buildings in weird map locations. It explores and optimizes around the parts of the state space that look reasonably close to human play, because that was its starting point, and it’s not going to find superior strategies randomly, not without a lot of optimization in isolation.
That’s one thing I would love to see, actually. A version of the code trained purely on self-play, without a basis in human replays. Does it ever discover proxy plays or other esoteric cheese without a starting point provided in the human replays?
I expect that will be the next step; it was how they approached the versions of Alpha Go too.