“A supreme counterexample is the Decision Transformer, which can be used to run processes which achieve SOTA for offline reinforcement learning despite being trained on random trajectories.”
This is not true. The Decision Transformer paper doesn’t run any complex experiments on random data; they only give a toy example with random data.
We actually ran experiments with Decision Transformer on random data from the D4RL offline RL suite. Specifically, we considered random data from the Mujoco Gym tasks. We found that when it only has access to random data, Decision Transformer only achieves 4% of the performance that it can achieve when it has access to expert data. (See the D4RL Gym results in our Table 1, and compare “DT” on “random” to “medium-expert”.)
You also claim that GPT-like models achieve “SOTA performance in domains traditionally dominated by RL, like games.” You cite the paper “Multi-Game Decision Transformers” for this claim.
But, in Multi-Game Decision Transformers, reinforcement learning (specifically, a Q-learning variant called BCQ) trained on a single Atari game beats Decision Transformer trained on many Atari games. This is shown in Figure 1 of that paper. The authors of the paper don’t even claim that Decision Transformer beats RL. Instead, they write: “We are not striving for mastery or efficiency that game-specific agents can offer, as we believe we are still in early stages of this research agenda. Rather, we investigate whether the same trends observed in language and vision hold for large-scale generalist reinforcement learning agents.”
It may be that Decision Transformers are on a path to matching RL, but it’s important to know that this hasn’t yet happened. I’m also not aware of any work establishing scaling laws in RL.
“A supreme counterexample is the Decision Transformer, which can be used to run processes which achieve SOTA for
offlinereinforcement learning despite being trained on random trajectories.”This is not true. The Decision Transformer paper doesn’t run any complex experiments on random data; they only give a toy example with random data.
We actually ran experiments with Decision Transformer on random data from the D4RL offline RL suite. Specifically, we considered random data from the Mujoco Gym tasks. We found that when it only has access to random data, Decision Transformer only achieves 4% of the performance that it can achieve when it has access to expert data. (See the D4RL Gym results in our Table 1, and compare “DT” on “random” to “medium-expert”.)
You also claim that GPT-like models achieve “SOTA performance in domains traditionally dominated by RL, like games.” You cite the paper “Multi-Game Decision Transformers” for this claim.
But, in Multi-Game Decision Transformers, reinforcement learning (specifically, a Q-learning variant called BCQ) trained on a single Atari game beats Decision Transformer trained on many Atari games. This is shown in Figure 1 of that paper. The authors of the paper don’t even claim that Decision Transformer beats RL. Instead, they write: “We are not striving for mastery or efficiency that game-specific agents can offer, as we believe we are still in early stages of this research agenda. Rather, we investigate whether the same trends observed in language and vision hold for large-scale generalist reinforcement learning agents.”
It may be that Decision Transformers are on a path to matching RL, but it’s important to know that this hasn’t yet happened. I’m also not aware of any work establishing scaling laws in RL.
Thanks for the correction. I’ll read the paper more closely and correct the post.