John Schulman comments on “Decision Transformer” (Tool AIs are secret Agent AIs)

John Schulman 9 Jun 2021 15:46 UTC
LW: 11 AF: 7
AF
Basically agree—I think that a model trained by maximum likelihood on offline data is less goal-directed than one that’s trained by an iterative process where you reinforce its own samples (aka online RL), but still somewhat goal directed. It needs to simulate a goal-directed agent to do a good job at maximum likelihood. OTOH it’s mostly concerned with covering all possibilities, so the goal directed reasoning isn’t emphasized. But with multiple iterations, the model can improve quality (-> more goal directedness) at the expense of coverage/diversity.