Pattern comments on [AN #91]: Concepts, implementations, problems, and a benchmark for impact measurement

Pattern 20 Mar 2020 17:56 UTC
LW: 2 AF: 1
AF
The thing about Montezuma’s revenge and similar hard exploration tasks is that there’s only one trajectory you need to learn; and if you forget any part of it you fail drastically; I would by default expect this to be better than adversarial dynamics / populations at ensuring that the agent doesn’t forget things.
But is it easier to remember things if there’s more than one way to do them?
- Rohin Shah 21 Mar 2020 21:21 UTC
  LW: 2 AF: 2
  AF Parent
  Unclear, seems like it could go either way.
  If you aren’t forced to learn all the ways of doing the task, then you should expect the neural net to learn only one of the ways. So maybe it’s that the adversarial nature of OpenAI Five forced it to learn all the ways, and it was then paradoxically easier to remember all of the ways than just one of the ways.
  - Pattern 22 Mar 2020 6:57 UTC
    2 points
    Parent
    Intuitively, if you forget how to do something one way, but you remember how to do it other ways, then that could make figuring out the other way again easier, thought I don’t have a reason to suspect that would be the case for NNs/etc, and might depend on the specifics of the task.