[Question] Transformer Mech Interp: Any visualizations?

Joyee ChenJan 18, 2023, 4:32 AM

3 points

After getting to the the part of a demo (one of Neel Nanda’s interp demos) where they talk about the idea of a Logit Lens, and Layer Attribution, I have a bit of trouble visualizing it as I could have for simpler concepts (e.g. residual streams, which were indeed drawn as my primary method of comprehension). Anybody have good resources for illustrations? (I know Nanda had a great colorful runthrough but it was only at a high level and for encoder-decoder machines, and thus not generalizable)

Joyee ChenJan 18, 2023, 4:32 AM

3 points

0 comments1 min readLW link

AI Interpretability (ML & AI)

No answers.

No comments.

  