Transformers

TagLast edit: Feb 24, 2022, 11:01 AM by Vivek Hebbar

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

How LLMs are and are not myopic

janusJul 25, 2023, 2:19 AM

135 points

16 comments8 min readLW link

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them

Roman LeventovDec 27, 2023, 2:51 PM

33 points

9 comments4 min readLW link

[Question] If I ask an LLM to think step by step, how big are the steps?

ryan_bSep 13, 2024, 8:30 PM

7 points

1 comment1 min readLW link

Concrete Steps to Get Started in Transformer Mechanistic Interpretability

Neel NandaDec 25, 2022, 10:21 PM

57 points

7 comments12 min readLW link

(www.neelnanda.io)

Residual stream norms grow exponentially over the forward pass

StefanHex and TurnTrout

May 7, 2023, 12:46 AM

77 points

24 comments11 min readLW link

Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind

DragonGodJan 13, 2023, 4:53 PM

62 points

12 comments1 min readLW link

(arxiv.org)

Modern Transformers are AGI, and Human-Level

abramdemskiMar 26, 2024, 5:46 PM

219 points

87 comments5 min readLW link

How Do Induction Heads Actually Work in Transformers With Finite Capacity?

Fabien RogerMar 23, 2023, 9:09 AM

27 points

0 comments5 min readLW link

Google’s PaLM-E: An Embodied Multimodal Language Model

SandXboxMar 7, 2023, 4:11 AM

87 points

7 comments1 min readLW link

(palm-e.github.io)

How fast can we perform a forward pass?

jsteinhardtJun 10, 2022, 11:30 PM

53 points

9 comments15 min readLW link

(bounded-regret.ghost.io)

If language is for communication, what does that imply about LLMs?

Bill BenzonMay 12, 2024, 2:55 AM

10 points

0 comments1 min readLW link

Exploring Llama-3-8B MLP Neurons

ntt123Jun 9, 2024, 2:19 PM

10 points

0 comments4 min readLW link

(neuralblog.github.io)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

Diego Caples and rrenaud

Sep 6, 2024, 5:55 PM

70 points

7 comments4 min readLW link

Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability

ntt123Jun 17, 2024, 11:46 AM

5 points

4 comments6 min readLW link

(neuralblog.github.io)

Week One of Studying Transformers Architecture

JustisMillsJun 20, 2024, 3:47 AM

3 points

0 comments15 min readLW link

(justismills.substack.com)

How Big a Deal are MatMul-Free Transformers?

JustisMillsJun 27, 2024, 10:28 PM

19 points

6 comments5 min readLW link

(justismills.substack.com)

Characterizing stable regions in the residual stream of LLMs

Jett Janiak, jacek, Chatrik, Giorgi Giglemiani, nlpet and StefanHex

Sep 26, 2024, 1:44 PM

42 points

4 comments1 min readLW link

(arxiv.org)

Transformers Explained (Again)

RohanSOct 22, 2024, 4:06 AM

4 points

0 comments18 min readLW link

Towards Understanding the Representation of Belief State Geometry in Transformers

Karthik ViswanathanApr 18, 2025, 12:39 PM

3 points

0 comments12 min readLW link

Analyzing how SAE features evolve across a forward pass

bensenberner, danibalcells, Michael Oesterle, Ediz Ucar and StefanHex

Nov 7, 2024, 10:07 PM

47 points

0 comments1 min readLW link

(arxiv.org)

Monet: Mixture of Monosemantic Experts for Transformers Explained

CalebMarescaJan 25, 2025, 7:37 PM

20 points

2 comments11 min readLW link

So, just why do GPTs have to operate by continuing an existing string?

Bill BenzonMar 24, 2023, 12:08 PM

−4 points

0 comments3 min readLW link

We Need To Know About Continual Learning

michael_mjdApr 22, 2023, 5:08 PM

30 points

14 comments4 min readLW link

The Method of Loci: With some brief remarks, including transformers and evaluating AIs

Bill BenzonDec 2, 2023, 2:36 PM

6 points

0 comments3 min readLW link

Has anyone experimented with Dodrio, a tool for exploring transformer models through interactive visualization?

Bill BenzonDec 11, 2023, 8:34 PM

4 points

0 comments1 min readLW link

Transformer Architecture Choice for Resisting Prompt Injection and Jail-Breaking Attacks

RogerDearnaleyMay 21, 2023, 8:29 AM

9 points

1 comment4 min readLW link

Neuroevolution, Social Intelligence, and Logic

vinnik.dmitry07May 31, 2023, 5:54 PM

1 point

0 comments10 min readLW link

[Question] Killing Recurrent Memory Over Self Attention?

Del NoboloJun 6, 2023, 11:02 PM

3 points

0 comments1 min readLW link

GPT-2′s positional embedding matrix is a helix

AdamYedidiaJul 21, 2023, 4:16 AM

44 points

21 comments4 min readLW link

The positional embedding matrix and previous-token heads: how do they actually work?

AdamYedidiaAug 10, 2023, 1:58 AM

26 points

4 comments13 min readLW link

Google DeepMind’s RT-2

SandXboxAug 11, 2023, 11:26 AM

9 points

1 comment1 min readLW link

(robotics-transformer2.github.io)

World, mind, and learnability: A note on the metaphysical structure of the cosmos [& LLMs]

Bill BenzonSep 5, 2023, 12:19 PM

4 points

1 comment5 min readLW link

New Tool: the Residual Stream Viewer

AdamYedidiaOct 1, 2023, 12:49 AM

32 points

7 comments4 min readLW link

(tinyurl.com)

Research agenda—Building a multi-modal chess-language model

p.b.Apr 7, 2022, 12:25 PM

8 points

2 comments2 min readLW link

Searching for Modularity in Large Language Models

NickyP and Stephen Fowler

Sep 8, 2022, 2:25 AM

44 points

3 comments14 min readLW link

Brief Notes on Transformers

Adam JermynSep 26, 2022, 2:46 PM

48 points

3 comments2 min readLW link

Building a transformer from scratch—AI safety up-skilling challenge

Marius HobbhahnOct 12, 2022, 3:40 PM

42 points

1 comment5 min readLW link

[Question] Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers?

simeon_cDec 31, 2022, 11:34 AM

8 points

5 comments1 min readLW link

No Really, Attention is ALL You Need—Attention can do feedforward networks

Robert_AIZIJan 31, 2023, 6:48 PM

29 points

7 comments6 min readLW link

(aizi.substack.com)

Addendum: More Efficient FFNs via Attention

Robert_AIZIFeb 6, 2023, 6:55 PM

10 points

2 comments5 min readLW link

(aizi.substack.com)

An Analogy for Understanding Transformers

CallumMcDougallMay 13, 2023, 12:20 PM

89 points

6 comments9 min readLW link

Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained

Zeping YuDec 26, 2023, 12:36 AM

7 points

1 comment11 min readLW link

Finding Backward Chaining Circuits in Transformers Trained on Tree Search

abhayesian, Jannik Brinkmann and Victor Levoso

May 28, 2024, 5:29 AM

50 points

1 comment9 min readLW link

(arxiv.org)

Attention SAEs Scale to GPT-2 Small

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

Feb 3, 2024, 6:50 AM

78 points

4 comments8 min readLW link

Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM

57 points

13 comments3 min readLW link

Visualizing small Attention-only Transformers

WCargoNov 19, 2024, 9:37 AM

4 points

0 comments8 min readLW link

Deconfusing In-Context Learning

Arjun PanicksseryFeb 25, 2024, 9:48 AM

37 points

1 comment2 min readLW link

Decompiling Tracr Transformers—An interpretability experiment

Hannes ThurnherrMar 27, 2024, 9:49 AM

4 points

0 comments14 min readLW link

Understanding mesa-optimization using toy models

tilmanr, rusheb, Guillaume Corlouer, Dan Valentine, afspies, mivanitskiy and Can

May 7, 2023, 5:00 PM

43 points

2 comments10 min readLW link

[Question] Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?

right..enough?Apr 13, 2024, 3:09 AM

7 points

0 comments7 min readLW link

Transformers Represent Belief State Geometry in their Residual Stream

Adam ShaiApr 16, 2024, 9:16 PM

416 points

100 comments12 min readLW link

An interesting mathematical model of how LLMs work

Bill BenzonApr 30, 2024, 11:01 AM

5 points

0 comments1 min readLW link

No comments.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer