paulfchristiano comments on AlphaStar: Impressive for RL progress, not for AGI progress

paulfchristiano 15 Nov 2019 3:33 UTC
6 points
Thanks! That’s only marginally less surprising than the final RL policy, and I suspect the final RL policy will make the same kind of mistake. Seems like the OP’s example was legit and I overestimated the RL agent.
- Richard Korzekwa 15 Nov 2019 14:14 UTC
  1 point
  Parent
  I’m not sure how surprised to be about middle of training, versus final RL policy. Are you saying that this sort of mistake should be learned quickly in RL?
  - paulfchristiano 15 Nov 2019 17:12 UTC
    2 points
    Parent
    I don’t have a big difference in my model of mid vs. final, they have very similar MMR, the difference between them is pretty small in the scheme of things (e..g probably smaller than the impact of doubling model size) and my picture isn’t refined enough to appreciate those differences. For any particular dumb mistake I’d be surprised if the line between not making it and making it was in that particular doubling.