Algon comments on Paper: Transformers learn in-context by gradient descent

Algon 16 Dec 2022 12:06 UTC
7 points
2
If true, it feels like stories of how models with attention learn to be deceptive are simpler than I thought they were.
EDIT: A somewhat enlightening twitter thread by the authors.