What really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., “randomly move things around until something works” sounds simple, but learning to contextually apply that strategy
to the appropriate objects,
in scenarios where you don’t have a better idea of what to do, and
immediately stopping when you find something that works
is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios.
I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.
What really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., “randomly move things around until something works” sounds simple, but learning to contextually apply that strategy
to the appropriate objects,
in scenarios where you don’t have a better idea of what to do, and
immediately stopping when you find something that works
is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios.
I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.