joswald comments on Paper: Transformers learn in-context by gradient descent

joswald 17 Dec 2022 9:48 UTC
LW: 10 AF: 4
0
AF
Hi there—I am the first author! Thanks for this very nice write up. Regarding: “mechanistically understand the inner workings of optimized Transformers that learn in-context”—its definitely fair to say that we do this (only) for self-attention only Transformers! Also, I try to be more careful and (hopefully consistently) only claim this for our simple problems studied… working on v2 including language experiments and I am also trying to find a way how to verify the hypotheses in pretrained models. Thanks again!
- LawrenceC 17 Dec 2022 9:57 UTC
  LW: 3 AF: 1
  0
  AF Parent
  You’re welcome, and I’m glad you think the writeup is good.
  
  Thank you for the good work.