Simon Lermen comments on A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)

Simon Lermen 17 May 2023 4:46 UTC
1 point
0
I think the app is quite intuitive and useful if you have some base understanding of mechanistic interpretability, would be great to also have something similar for TransformerLens.
In future directions, you write: “Decision Transformers are dissimilar to language models due to the presence of the RTG token which acts as a strong steering tool in its own right.” In which sense is the RTG not just another token in the input? We know that current language models learn to play chess and other games from just training on text. To extend it to BabyAI games, are you planning to just translate the games with RTG, state, and action into text tokens and put them into a larger text dataset? The text tokens could be human-understandable or you reuse tokens that are not used much.
- Joseph Bloom 18 May 2023 13:08 UTC
  2 points
  0
  Parent
  Thanks Simon, I’m glad you found the app intuitive :)
  
  The RTG is just another token in the input, except that it has an especially strong relationship with training distribution. It’s heavily predictive in a way other tokens aren’t because it’s derived from a labelled trajectory (it’s the remaining reward in the trajectory after that step).
  
  For BabyAI, the idea would be to use an instruction prepended to the trajectory made up of a limited vocab (see baby ai paper for their vocab). I would be pretty partial to throwing out the RTG and using a behavioral clone for a BabyAI model. It seems likely this would be easier to train. Since the goal of these models is to be useful for gaining understanding, I’d like to avoid reusing tokens as that might complicate analysis later on.