Very impressive results! I’m particularly glad to see the agents incorporating text descriptions of their goals in the agents’ inputs. It’s a step forward in training agents that flexibly follow human instructions.
However, it currently looks like the agents are just using the text instructions as a source of information about how to acquire reward from their explicit reward functions, so this approach won’t produce corrigible agents. Hopefully, we can combine XLand with something like the cooperative inverse reinforcement learning paradigm.
E.g., we could add CIRL agent to the XLand environments whose objective is to assist the standard RL agents. Then we’d have:
An RL agent
whose inputs are the text description of its goal and its RGB vision + other sensors
that gets direct reward signals
A CIRL agent
whose inputs are the text description of the RL agent’s goals and the CIRL agent’s own RGB vision + other sensors
that has to infer the RL agent’s true reward from the RL agent’s behavior
Then, apply XLand open ended training where each RL agent has a variable number of CIRL agents assigned as assistants. Hopefully, we’ll get a CIRL agent that can receive instructions via text and watch the behavior of the agent it’s assisting to further refine its beliefs about its current objective.
The summary says they use text and a search for “text” in the paper gives this on page 32:
“In these past works, the goal usually consists of the position of the agent or a target observation to reach, however some previous work uses text goals (Colas et al., 2020) for the agent similarly to this work.”
So I thought they provided goals as text. I’ll be disappointed if they don’t. Hopefully, future work will do so (and potentially use pretrained LMs to process the goal texts).
What’s the practical difference between “text” and one-hots of said “text”? One-hots are the standard for inputting text into models. It is only recently that we expect models to learn their preferred encoding for raw text (cf. transformers). By taking a small shortcut, the authors of this paper get to show off their agent work without loss of generality: one could still give one-hot instructions to an agent that is learning to act in the real life.
Very impressive results! I’m particularly glad to see the agents incorporating text descriptions of their goals in the agents’ inputs. It’s a step forward in training agents that flexibly follow human instructions.
However, it currently looks like the agents are just using the text instructions as a source of information about how to acquire reward from their explicit reward functions, so this approach won’t produce corrigible agents. Hopefully, we can combine XLand with something like the cooperative inverse reinforcement learning paradigm.
E.g., we could add CIRL agent to the XLand environments whose objective is to assist the standard RL agents. Then we’d have:
An RL agent
whose inputs are the text description of its goal and its RGB vision + other sensors
that gets direct reward signals
A CIRL agent
whose inputs are the text description of the RL agent’s goals and the CIRL agent’s own RGB vision + other sensors
that has to infer the RL agent’s true reward from the RL agent’s behavior
Then, apply XLand open ended training where each RL agent has a variable number of CIRL agents assigned as assistants. Hopefully, we’ll get a CIRL agent that can receive instructions via text and watch the behavior of the agent it’s assisting to further refine its beliefs about its current objective.
[Deleted]
The summary says they use text and a search for “text” in the paper gives this on page 32:
“In these past works, the goal usually consists of the position of the agent or a target observation to reach, however some previous work uses text goals (Colas et al., 2020) for the agent similarly to this work.”
So I thought they provided goals as text. I’ll be disappointed if they don’t. Hopefully, future work will do so (and potentially use pretrained LMs to process the goal texts).
[Deleted]
What’s the practical difference between “text” and one-hots of said “text”? One-hots are the standard for inputting text into models. It is only recently that we expect models to learn their preferred encoding for raw text (cf. transformers). By taking a small shortcut, the authors of this paper get to show off their agent work without loss of generality: one could still give one-hot instructions to an agent that is learning to act in the real life.