My overall impression looking at this is still more or less summed up by what Francois Chollet said a bit ago.
Any problem can be treated as a pattern recognition problem if your training data covers a sufficiently dense sampling of the problem space. What’s interesting is what happens when your training data is a sparse sampling of the space—to extrapolate, you will need intelligence.
Whether an AI that plays StarCraft, DotA, or Overwatch succeeds or fails against top players, we’d have learned nothing from the outcome. Wins—congrats, you’ve trained on enough data. Fails—go back, train on 10x more games, add some bells & whistles to your setup, succeed.
Some of the stuff Deepmind talks about a lot—so, for instance, the AlphaLeague—seems like a clever technique simply designed to ensure that you have a sufficiently dense sampling of the space, which would normally not occur in a game with unstable equilibria. And this seems to me more like “clever technique applicable to domain where we can generate infinite data through self-play” than “stepping stone on way to AGI.”
That being said, I haven’t yet read through all the papers in the blog postl, and I’d be curious what of them people think might be / definitely are potential steps towards actually engineering intelligence.
Before now, it wasn’t immediately obvious that SC2 is a game that can be played superhumanly well without anything that looks like long-term planning or counterfactual reasoning. The way humans play it relies on a combination of past experience, narrow skills, and “what-if” mental simulation of the opponent. Building a superhuman SC2 agent out of nothing more than LSTM units indicates that you can completely do away with planning, even when the action space is very large, even when the state space is VERY large, even when the possibilities are combinatorially enormous. Yes, humans can get good at SC2 with much less than 200 years of time played (although those humans are usually studying the replays of other masters to bootstrap their understanding) but I think it’s worthwhile to focus on the inverse of this observation: that a sophisticated problem domain which looks like it ought to require planning and model-based counterfactual reasoning actually requires no such thing. What other problem domains seem like they ought to require planning and counterfactual reasoning, but can probably be conquered with nothing more advanced than a deep LSTM network?
(I haven’t seen anyone bother to compute an estimate of the size of the state-space of SC2 relative to, for example, Go or Chess, and I’m not sure if there’s even a coherent way to go about it.)
(I haven’t seen anyone bother to compute an estimate of the size of the state-space of SC2 relative to, for example, Go or Chess, and I’m not sure if there’s even a coherent way to go about it.)
Now that is a metric I would be interested to see. It feels like the answer is obviously that there is a coherent way to go about it, otherwise the same techniques could not have been used to explore both spaces.
I wonder if they could just have AlphaStar count states as it goes.
The best I can do after thinking about it for a bit is compute every possible combination of units under 200 supply, multiply that by the possible positions of those units in space, multiply that by the possible combinations of buildings on the map and their potential locations in space, multiply that by the possible combinations of upgrades, multiply that by the amount of resources in all available mineral/vespene sources … I can already spot a few oversimplifications in what I just wrote, and I can think of even more things that need to be accounted for. The shields/hitpoints/energy of every unit. Combinatorially gigantic.
Just the number of potential positions of a single unit on the map is already huge.
But AlphaStar doesn’t really explore much of this space. It finds out pretty quickly that there’s really no reason to explore the parts of the space the include building random buildings in weird map locations. It explores and optimizes around the parts of the state space that look reasonably close to human play, because that was its starting point, and it’s not going to find superior strategies randomly, not without a lot of optimization in isolation.
That’s one thing I would love to see, actually. A version of the code trained purely on self-play, without a basis in human replays. Does it ever discover proxy plays or other esoteric cheese without a starting point provided in the human replays?
It’s worth noting that NLP took a big leap in 2018 through simple unsupervised/predictive training on large text corpuses to build text embeddings which encode a lot of semantic knowledge about the world.
My overall impression looking at this is still more or less summed up by what Francois Chollet said a bit ago.
Some of the stuff Deepmind talks about a lot—so, for instance, the AlphaLeague—seems like a clever technique simply designed to ensure that you have a sufficiently dense sampling of the space, which would normally not occur in a game with unstable equilibria. And this seems to me more like “clever technique applicable to domain where we can generate infinite data through self-play” than “stepping stone on way to AGI.”
That being said, I haven’t yet read through all the papers in the blog postl, and I’d be curious what of them people think might be / definitely are potential steps towards actually engineering intelligence.
Before now, it wasn’t immediately obvious that SC2 is a game that can be played superhumanly well without anything that looks like long-term planning or counterfactual reasoning. The way humans play it relies on a combination of past experience, narrow skills, and “what-if” mental simulation of the opponent. Building a superhuman SC2 agent out of nothing more than LSTM units indicates that you can completely do away with planning, even when the action space is very large, even when the state space is VERY large, even when the possibilities are combinatorially enormous. Yes, humans can get good at SC2 with much less than 200 years of time played (although those humans are usually studying the replays of other masters to bootstrap their understanding) but I think it’s worthwhile to focus on the inverse of this observation: that a sophisticated problem domain which looks like it ought to require planning and model-based counterfactual reasoning actually requires no such thing. What other problem domains seem like they ought to require planning and counterfactual reasoning, but can probably be conquered with nothing more advanced than a deep LSTM network?
(I haven’t seen anyone bother to compute an estimate of the size of the state-space of SC2 relative to, for example, Go or Chess, and I’m not sure if there’s even a coherent way to go about it.)
Now that is a metric I would be interested to see. It feels like the answer is obviously that there is a coherent way to go about it, otherwise the same techniques could not have been used to explore both spaces.
I wonder if they could just have AlphaStar count states as it goes.
The best I can do after thinking about it for a bit is compute every possible combination of units under 200 supply, multiply that by the possible positions of those units in space, multiply that by the possible combinations of buildings on the map and their potential locations in space, multiply that by the possible combinations of upgrades, multiply that by the amount of resources in all available mineral/vespene sources … I can already spot a few oversimplifications in what I just wrote, and I can think of even more things that need to be accounted for. The shields/hitpoints/energy of every unit. Combinatorially gigantic.
Just the number of potential positions of a single unit on the map is already huge.
But AlphaStar doesn’t really explore much of this space. It finds out pretty quickly that there’s really no reason to explore the parts of the space the include building random buildings in weird map locations. It explores and optimizes around the parts of the state space that look reasonably close to human play, because that was its starting point, and it’s not going to find superior strategies randomly, not without a lot of optimization in isolation.
That’s one thing I would love to see, actually. A version of the code trained purely on self-play, without a basis in human replays. Does it ever discover proxy plays or other esoteric cheese without a starting point provided in the human replays?
I expect that will be the next step; it was how they approached the versions of Alpha Go too.
This assumes that human intelligence appears from something different than training on very large dataset of books, movies, parents chats etc.
It’s worth noting that NLP took a big leap in 2018 through simple unsupervised/predictive training on large text corpuses to build text embeddings which encode a lot of semantic knowledge about the world.