If true, it feels like stories of how models with attention learn to be deceptive are simpler than I thought they were.
EDIT: A somewhat enlightening twitter thread by the authors.
If true, it feels like stories of how models with attention learn to be deceptive are simpler than I thought they were.
EDIT: A somewhat enlightening twitter thread by the authors.