Quintin Pope comments on DeepMind: Generally capable agents emerge from open-ended play

Quintin Pope 28 Jul 2021 2:41 UTC
LW: 8 AF: 3
AF
What really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., “randomly move things around until something works” sounds simple, but learning to contextually apply that strategy
1. to the appropriate objects,
2. in scenarios where you don’t have a better idea of what to do, and
3. immediately stopping when you find something that works
is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios.
I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.