I may be missing your point, but isn’t the fact that the Memento agent works on Montezuma’s Revenge evidence that learning is not generalizing across “sections” in Montezuma’s Revenge?
I was indicating that I hadn’t found the answer I sought (but I included those quotes because they seemed interesting, if unrelated).
Automatic / program. See Section 4, whose first sentence is “To generalize this observation, we first propose a simple algorithm for selecting states associated with plateaus of the last agent.”
Thanks for highlighting that. The reason I was interested is because I was thinking of the neural networks as being deployed to complete tasks rather than the entire game by themselves.
I ended up concluding the game was being divided up into ‘parts’ or epochs, each with their own respective agent deployed in sequence. The “this method makes things easy as long as there’s not interference” thing is interesting when compared to multi-agent learning—they’re on the same team, but cooperation doesn’t seem to be easy under these circumstances (or at least not an efficient strategy, in terms of computational constraints), and reminded me of my questions about those approaches, like: Does freezing one agent (for a round) so it’s predictable, then train the other one (or have it play with a human) improve things? How can ‘learning to cooperate better’ be balanced with ‘continuing to be able to cooperate/coordinate with the other player’?
I was indicating that I hadn’t found the answer I sought (but I included those quotes because they seemed interesting, if unrelated).
Thanks for highlighting that. The reason I was interested is because I was thinking of the neural networks as being deployed to complete tasks rather than the entire game by themselves.
I ended up concluding the game was being divided up into ‘parts’ or epochs, each with their own respective agent deployed in sequence. The “this method makes things easy as long as there’s not interference” thing is interesting when compared to multi-agent learning—they’re on the same team, but cooperation doesn’t seem to be easy under these circumstances (or at least not an efficient strategy, in terms of computational constraints), and reminded me of my questions about those approaches, like: Does freezing one agent (for a round) so it’s predictable, then train the other one (or have it play with a human) improve things? How can ‘learning to cooperate better’ be balanced with ‘continuing to be able to cooperate/coordinate with the other player’?