Richard Korzekwa comments on AlphaStar: Impressive for RL progress, not for AGI progress

Richard Korzekwa 14 Nov 2019 14:47 UTC
9 points
0
The replay for the match in that video is AlphaStarMid_042_TvT.SC2Replay, so it’s from the middle of training.

Here is the relevant screen capture: https://i.imgur.com/POFhzfj.png
- paulfchristiano 15 Nov 2019 3:33 UTC
  6 points
  0
  Parent
  Thanks! That’s only marginally less surprising than the final RL policy, and I suspect the final RL policy will make the same kind of mistake. Seems like the OP’s example was legit and I overestimated the RL agent.
  - Richard Korzekwa 15 Nov 2019 14:14 UTC
    1 point
    0
    Parent
    I’m not sure how surprised to be about middle of training, versus final RL policy. Are you saying that this sort of mistake should be learned quickly in RL?
    - paulfchristiano 15 Nov 2019 17:12 UTC
      2 points
      0
      Parent
      I don’t have a big difference in my model of mid vs. final, they have very similar MMR, the difference between them is pretty small in the scheme of things (e..g probably smaller than the impact of doubling model size) and my picture isn’t refined enough to appreciate those differences. For any particular dumb mistake I’d be surprised if the line between not making it and making it was in that particular doubling.