Douglas Summers-Stay comments on interpreting GPT: the logit lens

Douglas Summers-Stay 2 Sep 2020 16:08 UTC
1 point
Could you try a prompt that tells it to end a sentence with a particular word, and see how that word casts its influence back over the sentence? I know that this works with GPT-3, but I didn’t really understand how it could.
- nostalgebraist 2 Sep 2020 16:22 UTC
  1 point
  Parent
  Interesting topic! I’m not confident this lens would reveal much about it (vs. attention maps or something), but it’s worth a try.
  I’d encourage you to try this yourself with the Colab notebook, since you presumably have more experience writing this kind of prompt than I do.