The thing about Montezuma’s revenge and similar hard exploration tasks is that there’s only one trajectory you need to learn; and if you forget any part of it you fail drastically; I would by default expect this to be better than adversarial dynamics / populations at ensuring that the agent doesn’t forget things.
But is it easier to remember things if there’s more than one way to do them?
If you aren’t forced to learn all the ways of doing the task, then you should expect the neural net to learn only one of the ways. So maybe it’s that the adversarial nature of OpenAI Five forced it to learn all the ways, and it was then paradoxically easier to remember all of the ways than just one of the ways.
Intuitively, if you forget how to do something one way, but you remember how to do it other ways, then that could make figuring out the other way again easier, thought I don’t have a reason to suspect that would be the case for NNs/etc, and might depend on the specifics of the task.
But is it easier to remember things if there’s more than one way to do them?
Unclear, seems like it could go either way.
If you aren’t forced to learn all the ways of doing the task, then you should expect the neural net to learn only one of the ways. So maybe it’s that the adversarial nature of OpenAI Five forced it to learn all the ways, and it was then paradoxically easier to remember all of the ways than just one of the ways.
Intuitively, if you forget how to do something one way, but you remember how to do it other ways, then that could make figuring out the other way again easier, thought I don’t have a reason to suspect that would be the case for NNs/etc, and might depend on the specifics of the task.