maximkazhenkov comments on The unexpected difficulty of comparing AlphaStar to humans

maximkazhenkov Sep 21, 2019, 7:59 PM
5 points
You could argue that it’s not showcasing the skills we’re interested in, as it doesn’t need to put the same emphasis on long-term planning and outsmarting its opponent, that equal human players have to. But that will also be the case if you put me against someone who’s never played the game.
Interesting point. Would it be fair to say that, in a tournament match, a human pro player is behaving much more like a reinforcement learning agent than a general intelligence using System 2? In other words, the human player is also just executing reflexes he has gained through experience, and not coming up with ingenious novel strategies in the middle of a game.
I guess it was unreasonable to complain about the lack of inductive reasoning and game-theoretic thinking in AlphaStar from the beginning since DeepMind is a RL company, and RL agents just don’t do that sort of stuff. But I think it’s fair to say that AlphaStar’s victory was much less satisfying than AlphaZero, being not only unable to generalize across multiple RTS games, but also unable to explore the strategy space of a single game (hence the incentivizing of use of certain units during training). I think we all expected seeing perfect game sense and situation-dependent strategy choice, but instead blink stalkers is the one build to rule them all, apparently.
- MathiasKB Sep 23, 2019, 6:25 PM
  6 points
  Parent
  I think that’s a very fair way to put it, yes. One way this becomes very apparent, is that you can have a conversation with a starcraft player while he’s playing. It will be clear the player is not paying you his full attention at particularly demanding moments, however.
  Novel strategies are thought up inbetween games and refined through dozens of practice games. In the end you have a mental decision tree of how to respond to most situations that could arise. Without having played much chess, I imagine this is how people do chess openers do as well.
  I considered using system 1 and 2 analogies, but because of certain resevations I have with the dichotomy, I opted not to. Basically I don’t think you can cleanly divide human intelligence into those two catagories.
  Ask a starcraft player why they made a certain maneuver and they will for the most part be able to tell you why he did it, despite never having thought the reason out loud until you asked. There is some deep strategical thinking being done at the instinctual level. This intelligence is just as real as system 2 intelligence and should not be dismissed as being merely reflexes.
  My central critique is essentially of seeing starcraft ‘mechanics’ as unintelligent. Every small maneuver has a (most often implicit) reason for being made. Starcraft players are not limited by their physical capabilities nearly as much as they are limited by their ability to think fast enough. If we are interested in something other than what it looks like when someone can think at much higher speeds than humans, we should be picking another game than starcraft.
  - JenniferRM Dec 5, 2019, 8:45 PM
    29 points
    Parent
    I think the abstract question of how to cognitively manage a “large action space” and “fog of war” is central here.
    In some sense StarCraft could be seen as turn based, with each turn lasting for 1 microsecond, but this framing makes the action space of a beginning-to-end game *enormous*. Maybe not so enormous that a bigger data center couldn’t fix it? In some sense, brute force can eventually solve ANY problem tractable to a known “vaguely O(N*log(N))” algorithm.
    BUT facing “a limit that forces meta-cognition” is a key idea for “the reason to apply AI to an RTS next, as opposed to a turn based game.”
    If DeepMind solves it with “merely a bigger data center” then there is a sense in which maybe DeepMind has not yet found the kinds of algorithms that deal with “nebulosity” as an explicit part of the action space (and which are expected by numerous people (including me) to be widely useful in many domains).
    (Tangent: The Portia spider is relevant here because it seems that its whole schtick is that it scans with its (limited, but far seeing) eyes, builds up a model of the world via an accumulation of glances, re-uses (limited) neurons to slowly imagine a route through that space, and then follows the route to sneak up on other (similarly limited, but less “meta-cognitive”?) spiders which are its prey.)
    No matter how fast something can think or react, SOME game could hypothetically be invented that forces a finitely speedy mind to need action space compression and (maybe) even compression of compression choices. Also, the physical world itself appears to contain huge computational depths.
    In some sense then, the “idea of an AI getting good *at an RTS*” is an attempt (which might have failed or might be poorly motivated) to point at issues related to cognitive compression and meta-cognition. There is an implied research strategy aimed at learning to use a pragmatically finite mind to productively work on a pragmatically infinite challenge.
    The hunch is that maybe object level compression choices should always have the capacity to suggest not just a move IN THE GAME of doing certain things, but also a move IN THE MIND to re-parse the action space, compress it differently, and hope to bring a different (and more appropriate) set of “reflexes” to bear.
    The idea of a game with “fog of war” helps support this research vision. Some actions are pointless for the game, but essential to ensuring the game is “being understood correctly” and game designers adding fog of war to a video game could be seen as an attempt to represent this possibly universally inevitable cognitive limitation in a concretely-ludic symbolic form.
    If an AI is trained by programmers “to learn to play an RTS” but that AI doesn’t seem to be learning lessons about meta-cognition or clock/calendar management, then it feels a little bit like the AI is not learning what we hoped it was suppose to learn from “an RTS”.
    This is why these points made by maximkazhenkov in a neighboring comment are central:
    The agents on [the public game] ladder don’t scout much and can’t react accordingly. They don’t tech switch midgame and some of them get utterly confused in ways a human wouldn’t.
    I think this is conceptually linked (through the idea of having strategic access to the compression strategy currently employed) to this thing you said:
    ...you can have a conversation with a starcraft player while he’s playing. It will be clear the player is not paying you his full attention at particularly demanding moments, however… I considered using system 1 and 2 analogies, but because of certain resevations I have with the dichotomy… [that said] there is some deep strategical thinking being done at the instinctual level. This intelligence is just as real as system 2 intelligence and should not be dismissed as being merely reflexes.
    In the story about metacognition, verbal powers seem to come up over and over.
    I think a lot of people who think hard about this understand that “mere reflexes” are not mere (especially when deeply linked to a reasoning engine that has theories about reflexes).
    Also, I think that human meta-cognitive processes might reveal themselves to some degree in the apparent fact that a verbal summary can be generated by a human *in parallel without disrupting the “reflexes” very much*… then sometimes there is a pause in the verbalization while a player concentrates on <something>, and then the verbalization resumes (possibly with a summary of the ‘strategic meaning’ of the actions that just occurred).
    Arguably, to close the loop and make the system more like the general intelligence of a human, part of what should be happening is that any reasoning engine bolted onto the (constrained) reflex engine should be able to be queried by ML programmers to get advice about what kinds of “practice” or “training” needs to be attempted next.
    The idea is that by *constraining* the “reflex engine” (to be INadequate for directly mastering the game) we might be forced to develop a reasoning engine for understanding the reflex engine and squeezing the most performance out of it in the face of constraints on what is known and how much time there is to correlate and integrate what is known.
    A decent “reflexive reasoning engine” (ie a reasoning engine focused on reflexive engines) might be able to nudge the reflex engine (every 1-30 seconds or so?) to do things that allow the reflex engine to scout brand new maps or change tech trees or do whatever else “seems meta-cognitively important”.
    A good reasoning engine might be able to DESIGN new maps that would stress test a specific reflex repertoire that it thinks it is currently bad at.
    A *great* reasoning engine might be able to predict in the first 30 seconds of a game that it is facing a “stronger player” (with a more relevant reflex engine for this game) such that it will probably lose the game for lack of “the right pre-computed way of thinking about the game”.
    A really FANTASTIC reflexive reasoning engine might even be able to notice a weaker opponent and then play a “teaching game” that shows that opponent a technique (a locally coherent part of the action space that is only sometimes relevant) that the opponent doesn’t understand yet, in a way that might cause the opponent’s own reflexive reasoning engine to understand its own weakness and be correctly motivated to practice a way to fix that weakness.
    (Tangent: To recall the tangent above to the Portia spider. It preyed on other spiders with similar spider limits. One of the fears here is that all this metacognition, when it occurs in nature, is often deployed in service to competition, either with other members of the same species or else to catch prey. Giving these powers to software entities that ALREADY have better thinking hardware than humans in many ways… well… it certainly gives ME pause. Interesting to think about… but scary to imagine being deployed in the midst of WW3.)
    It sounds, Mathias, like you understand a lot of the centrality and depth of “trained reflexes” intuitively from familiarity with BOTH StarCraft and ML both, and part of what I’m doing here is probably just restating large areas of agreement in a new way. Hopefully I am also pointing to other things that are relevant and unknown to some readers :-)
    If what we really care about is proving that it can do long term thinking and planning in a game with a large actionspace and imperfect information, why choose starcraft? Why not select something like Frozen Synapse where the only way to win is to fundamentally understand these concepts?
    Personally, I did not know that Frozen Synapse existed before I read your comment here. I suspect a lot of people didn’t… and also I suspect that part of using StarCraft was simply for its PR value as a beloved RTS classic with a thriving pro scene and deep emotional engagement by many people.
    I’m going to go explore Frozen Synapse now. Thank you for calling my attention to it!