I’m don’t have relevant expertise in either AI or SC2, but I was wondering whether precision might still be a bigger mechanical advantage than the write-up notes. Even if humans can (say) max out at 150 ‘combat’ actions per minute, they might misclick, not be able to pick out the right unit in a busy and fast battle to focus fire/trigger abilities/etc, and so on. The AI presumably won’t have this problem. So even with similar EAPM (and subdividing out ‘non-combat’ EAPM which need not be so accurate), Alphastar may still have a considerable mechanical advantage.
I’d also be interested in how important, beyond some (high) baseline, ‘decision making’ is at the highest levels of SC2 play. One worry I have is although decision-making is important (build orders, scouting, etc. etc.) what decides many (?most) pro games is who can more effectively micro in the key battles, or who can best juggle all the macro/econ tasks (I’d guess some considerations in favour would be that APM is very important, and that a lot of the units in SC2 are implicitly balanced by ‘human’ unit control limitations). If so, unlike Chess and Go, there may not be some deep strategic insights Alphastar can uncover to give it the edge, and ‘beating humans fairly’ is essentially an exercise in getting the AI to fall within the band of ‘reasonably human’, but can still subtly exploit enough of the ‘microable’ advantages to prevail.
If so, unlike Chess and Go, there may not be some deep strategic insights Alphastar can uncover to give it the edge
I think that’s where the central issue lies with games like Starcraft or Dota; their strategy space is perhaps not as rich and complex as we have initially expected. Which might be a good reason to update towards believing that the real world is less exploitable (i.e. technonormality?) as well? I don’t know.
However, I think it would be a mistake to write off these RTS games as “solved” in the AI community the same way chess/Go are and move on to other problem domains. AlphaStar/OpenAI5 require hundreds of years of training time to reach the level of human top professionals, and I don’t think it’s an “efficiency” problem at all.
Additionally, in both cases there are implicit domain knowledge integrated into the training process: In the case of AlphaStar, the AI was first trained on human game data and, as the post mentions, competing agents are subdivided into strategy spaces defined by human experts:
Hundreds of versions of the AI play against each other, and the ones that perform best are selected to play against human players. Each one has its own set of units that it is incentivized to use via reinforcement learning, so that they each play with different strategies.
In the case of OpenAI5, the AI is still constrained to a small pool of heroes, the item choices are hard-coded by human experts, and it would have never discovered relatively straightforward strategies (defeating Roshan to receive a power-up, if you’re familiar with the game) were it not for the programmers’ incentivizing in the training process. It also received the same skepticism in the gaming community (in fact, I’d say the mechanical advantage of OpenAI5 was even more blatant than with AlphaStar).
This is not to belittle the achievements of the researchers, it’s just that I believe these games still provide fantastic testing grounds for future AI research, including paradigms outside deep reinforcement learning. In Dota, for example, one could change the game mode to single draft to force the AI out of a narrow strategy-space that might have been optimal in the normal game.
In fact, I believe (~75% confidence) the combinatorial space of heroes in a single draft Dota game (and the corresponding optimal-strategy-space) to be so large that, without a paradigm shift at least as significant as the deep learning revolution, RL agents will never beat top professional humans within 2 orders of magnitude of compute of current research projects.
I’m not as familiar with Starcraft II but I’m sure there are simple constraints one can put on the game to make it rich in strategy space for AIs as well.
I wonder if you could get around this problem by giving it a game interface more similar to the one humans use. Like, give it actual screen images instead of lists of objects, and have it move a mouse cursor using something equivalent to the dynamics of an arm, where the mouse has momentum and the AI has to apply forces to it. It still might have precision advantages, with enough training, but I bet it would even the playing field a bit.
I don’t think this would be a worthwhile endeavor, because we already know that deep reinforcement learning can deal with these sorts of interface constraints as shown by Deepmind’s older work. I would expect the agent behavior to converge towards that of the current AI, but requiring more compute.
I think the question is about making the compute requirements comparable. One of the critisims of early AI work is about how using simple math on abstract things can seem very powerful if the abstractions are provided for it. But real humans have to extract the essential abstractions from the messy world. Consider a soldier robot that has to assign friend or foe classification to a humanoid as part of a decision to maybe shoot at it. That is a real subtask that giving a magic “label” would unfairly circumvent. In nature even if camouflage is imperfect it can be valuable and even if the animal is correctly identified as prey delaying the detection event or having the hunter hesitate can be valuable.
Also a game like QWOP is surprisingly diffcult for humans and giving a computer “just control over legs” would make the whole game trivial.
A lot of the starcraft technique also mirrors the games restrctions. Part of the point of control groups is to bypass screen zoom limitations. For example in Supreme Commander some of the particular kinds of limitations do not exist because you can zoom out to have the whole map on the screen at once and because providing attention to different parts of the battlefield has been made more handy (or atleast different (there are new problems such as “dots fighting dots” making it hard to see micro considerations))
Maybe you’re right… My sense is that it would converge toward the behavior of the current AI, but slower, especially for movements that require a lot of accuracy. There might be a simpler way to add that constraint without wasting compute, though.
Thanks for this excellent write-up!
I’m don’t have relevant expertise in either AI or SC2, but I was wondering whether precision might still be a bigger mechanical advantage than the write-up notes. Even if humans can (say) max out at 150 ‘combat’ actions per minute, they might misclick, not be able to pick out the right unit in a busy and fast battle to focus fire/trigger abilities/etc, and so on. The AI presumably won’t have this problem. So even with similar EAPM (and subdividing out ‘non-combat’ EAPM which need not be so accurate), Alphastar may still have a considerable mechanical advantage.
I’d also be interested in how important, beyond some (high) baseline, ‘decision making’ is at the highest levels of SC2 play. One worry I have is although decision-making is important (build orders, scouting, etc. etc.) what decides many (?most) pro games is who can more effectively micro in the key battles, or who can best juggle all the macro/econ tasks (I’d guess some considerations in favour would be that APM is very important, and that a lot of the units in SC2 are implicitly balanced by ‘human’ unit control limitations). If so, unlike Chess and Go, there may not be some deep strategic insights Alphastar can uncover to give it the edge, and ‘beating humans fairly’ is essentially an exercise in getting the AI to fall within the band of ‘reasonably human’, but can still subtly exploit enough of the ‘microable’ advantages to prevail.
I think that’s where the central issue lies with games like Starcraft or Dota; their strategy space is perhaps not as rich and complex as we have initially expected. Which might be a good reason to update towards believing that the real world is less exploitable (i.e. technonormality?) as well? I don’t know.
However, I think it would be a mistake to write off these RTS games as “solved” in the AI community the same way chess/Go are and move on to other problem domains. AlphaStar/OpenAI5 require hundreds of years of training time to reach the level of human top professionals, and I don’t think it’s an “efficiency” problem at all.
Additionally, in both cases there are implicit domain knowledge integrated into the training process: In the case of AlphaStar, the AI was first trained on human game data and, as the post mentions, competing agents are subdivided into strategy spaces defined by human experts:
In the case of OpenAI5, the AI is still constrained to a small pool of heroes, the item choices are hard-coded by human experts, and it would have never discovered relatively straightforward strategies (defeating Roshan to receive a power-up, if you’re familiar with the game) were it not for the programmers’ incentivizing in the training process. It also received the same skepticism in the gaming community (in fact, I’d say the mechanical advantage of OpenAI5 was even more blatant than with AlphaStar).
This is not to belittle the achievements of the researchers, it’s just that I believe these games still provide fantastic testing grounds for future AI research, including paradigms outside deep reinforcement learning. In Dota, for example, one could change the game mode to single draft to force the AI out of a narrow strategy-space that might have been optimal in the normal game.
In fact, I believe (~75% confidence) the combinatorial space of heroes in a single draft Dota game (and the corresponding optimal-strategy-space) to be so large that, without a paradigm shift at least as significant as the deep learning revolution, RL agents will never beat top professional humans within 2 orders of magnitude of compute of current research projects.
I’m not as familiar with Starcraft II but I’m sure there are simple constraints one can put on the game to make it rich in strategy space for AIs as well.
I wonder if you could get around this problem by giving it a game interface more similar to the one humans use. Like, give it actual screen images instead of lists of objects, and have it move a mouse cursor using something equivalent to the dynamics of an arm, where the mouse has momentum and the AI has to apply forces to it. It still might have precision advantages, with enough training, but I bet it would even the playing field a bit.
I don’t think this would be a worthwhile endeavor, because we already know that deep reinforcement learning can deal with these sorts of interface constraints as shown by Deepmind’s older work. I would expect the agent behavior to converge towards that of the current AI, but requiring more compute.
I think the question is about making the compute requirements comparable. One of the critisims of early AI work is about how using simple math on abstract things can seem very powerful if the abstractions are provided for it. But real humans have to extract the essential abstractions from the messy world. Consider a soldier robot that has to assign friend or foe classification to a humanoid as part of a decision to maybe shoot at it. That is a real subtask that giving a magic “label” would unfairly circumvent. In nature even if camouflage is imperfect it can be valuable and even if the animal is correctly identified as prey delaying the detection event or having the hunter hesitate can be valuable.
Also a game like QWOP is surprisingly diffcult for humans and giving a computer “just control over legs” would make the whole game trivial.
A lot of the starcraft technique also mirrors the games restrctions. Part of the point of control groups is to bypass screen zoom limitations. For example in Supreme Commander some of the particular kinds of limitations do not exist because you can zoom out to have the whole map on the screen at once and because providing attention to different parts of the battlefield has been made more handy (or atleast different (there are new problems such as “dots fighting dots” making it hard to see micro considerations))
Maybe you’re right… My sense is that it would converge toward the behavior of the current AI, but slower, especially for movements that require a lot of accuracy. There might be a simpler way to add that constraint without wasting compute, though.