Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Transformers
Tag
Last edit:
Feb 24, 2022, 11:01 AM
by
Vivek Hebbar
Relevant
New
Old
Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley
Jan 5, 2024, 8:46 AM
37
points
4
comments
2
min read
LW
link
How LLMs are and are not myopic
janus
Jul 25, 2023, 2:19 AM
134
points
16
comments
8
min read
LW
link
Google’s PaLM-E: An Embodied Multimodal Language Model
SandXbox
Mar 7, 2023, 4:11 AM
87
points
7
comments
1
min read
LW
link
(palm-e.github.io)
AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov
Dec 27, 2023, 2:51 PM
33
points
9
comments
4
min read
LW
link
Modern Transformers are AGI, and Human-Level
abramdemski
Mar 26, 2024, 5:46 PM
219
points
87
comments
5
min read
LW
link
[Question]
If I ask an LLM to think step by step, how big are the steps?
ryan_b
Sep 13, 2024, 8:30 PM
7
points
1
comment
1
min read
LW
link
How Do Induction Heads Actually Work in Transformers With Finite Capacity?
Fabien Roger
Mar 23, 2023, 9:09 AM
27
points
0
comments
5
min read
LW
link
Residual stream norms grow exponentially over the forward pass
StefanHex
and
TurnTrout
May 7, 2023, 12:46 AM
77
points
24
comments
11
min read
LW
link
How fast can we perform a forward pass?
jsteinhardt
Jun 10, 2022, 11:30 PM
53
points
9
comments
15
min read
LW
link
(bounded-regret.ghost.io)
Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Neel Nanda
Dec 25, 2022, 10:21 PM
57
points
7
comments
12
min read
LW
link
(www.neelnanda.io)
Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind
DragonGod
Jan 13, 2023, 4:53 PM
62
points
12
comments
1
min read
LW
link
(arxiv.org)
[Question]
Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?
right..enough?
Apr 13, 2024, 3:09 AM
7
points
0
comments
7
min read
LW
link
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai
Apr 16, 2024, 9:16 PM
413
points
100
comments
12
min read
LW
link
An interesting mathematical model of how LLMs work
Bill Benzon
Apr 30, 2024, 11:01 AM
5
points
0
comments
1
min read
LW
link
If language is for communication, what does that imply about LLMs?
Bill Benzon
May 12, 2024, 2:55 AM
10
points
0
comments
1
min read
LW
link
Exploring Llama-3-8B MLP Neurons
ntt123
Jun 9, 2024, 2:19 PM
10
points
0
comments
4
min read
LW
link
(neuralblog.github.io)
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples
and
rrenaud
Sep 6, 2024, 5:55 PM
70
points
7
comments
4
min read
LW
link
Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability
ntt123
Jun 17, 2024, 11:46 AM
5
points
4
comments
6
min read
LW
link
(neuralblog.github.io)
Week One of Studying Transformers Architecture
JustisMills
Jun 20, 2024, 3:47 AM
3
points
0
comments
15
min read
LW
link
(justismills.substack.com)
How Big a Deal are MatMul-Free Transformers?
JustisMills
Jun 27, 2024, 10:28 PM
19
points
6
comments
5
min read
LW
link
(justismills.substack.com)
Addendum: More Efficient FFNs via Attention
Robert_AIZI
Feb 6, 2023, 6:55 PM
10
points
2
comments
5
min read
LW
link
(aizi.substack.com)
Characterizing stable regions in the residual stream of LLMs
Jett Janiak
,
jacek
,
Chatrik
,
Giorgi Giglemiani
,
nlpet
and
StefanHex
Sep 26, 2024, 1:44 PM
42
points
4
comments
1
min read
LW
link
(arxiv.org)
Transformers Explained (Again)
RohanS
Oct 22, 2024, 4:06 AM
4
points
0
comments
18
min read
LW
link
Analyzing how SAE features evolve across a forward pass
bensenberner
,
danibalcells
,
Michael Oesterle
,
Ediz Ucar
and
StefanHex
Nov 7, 2024, 10:07 PM
47
points
0
comments
1
min read
LW
link
(arxiv.org)
[Question]
Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers?
simeon_c
Dec 31, 2022, 11:34 AM
8
points
5
comments
1
min read
LW
link
Monet: Mixture of Monosemantic Experts for Transformers Explained
CalebMaresca
Jan 25, 2025, 7:37 PM
20
points
2
comments
11
min read
LW
link
So, just why do GPTs have to operate by continuing an existing string?
Bill Benzon
Mar 24, 2023, 12:08 PM
−4
points
0
comments
3
min read
LW
link
We Need To Know About Continual Learning
michael_mjd
Apr 22, 2023, 5:08 PM
29
points
14
comments
4
min read
LW
link
The Method of Loci: With some brief remarks, including transformers and evaluating AIs
Bill Benzon
Dec 2, 2023, 2:36 PM
6
points
0
comments
3
min read
LW
link
Has anyone experimented with Dodrio, a tool for exploring transformer models through interactive visualization?
Bill Benzon
Dec 11, 2023, 8:34 PM
4
points
0
comments
1
min read
LW
link
An Analogy for Understanding Transformers
CallumMcDougall
May 13, 2023, 12:20 PM
89
points
6
comments
9
min read
LW
link
Transformer Architecture Choice for Resisting Prompt Injection and Jail-Breaking Attacks
RogerDearnaley
May 21, 2023, 8:29 AM
9
points
1
comment
4
min read
LW
link
Neuroevolution, Social Intelligence, and Logic
vinnik.dmitry07
May 31, 2023, 5:54 PM
1
point
0
comments
10
min read
LW
link
[Question]
Killing Recurrent Memory Over Self Attention?
Del Nobolo
Jun 6, 2023, 11:02 PM
3
points
0
comments
1
min read
LW
link
GPT-2′s positional embedding matrix is a helix
AdamYedidia
Jul 21, 2023, 4:16 AM
44
points
21
comments
4
min read
LW
link
The positional embedding matrix and previous-token heads: how do they actually work?
AdamYedidia
Aug 10, 2023, 1:58 AM
26
points
4
comments
13
min read
LW
link
Google DeepMind’s RT-2
SandXbox
Aug 11, 2023, 11:26 AM
9
points
1
comment
1
min read
LW
link
(robotics-transformer2.github.io)
World, mind, and learnability: A note on the metaphysical structure of the cosmos [& LLMs]
Bill Benzon
Sep 5, 2023, 12:19 PM
4
points
1
comment
5
min read
LW
link
New Tool: the Residual Stream Viewer
AdamYedidia
Oct 1, 2023, 12:49 AM
32
points
7
comments
4
min read
LW
link
(tinyurl.com)
Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained
Zeping Yu
Dec 26, 2023, 12:36 AM
7
points
1
comment
11
min read
LW
link
Research agenda—Building a multi-modal chess-language model
p.b.
Apr 7, 2022, 12:25 PM
8
points
2
comments
2
min read
LW
link
No Really, Attention is ALL You Need—Attention can do feedforward networks
Robert_AIZI
Jan 31, 2023, 6:48 PM
29
points
7
comments
6
min read
LW
link
(aizi.substack.com)
Searching for Modularity in Large Language Models
NickyP
and
Stephen Fowler
Sep 8, 2022, 2:25 AM
44
points
3
comments
14
min read
LW
link
Brief Notes on Transformers
Adam Jermyn
Sep 26, 2022, 2:46 PM
48
points
3
comments
2
min read
LW
link
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian
,
Jannik Brinkmann
and
Victor Levoso
May 28, 2024, 5:29 AM
50
points
1
comment
9
min read
LW
link
(arxiv.org)
Attention SAEs Scale to GPT-2 Small
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Feb 3, 2024, 6:50 AM
78
points
4
comments
8
min read
LW
link
Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search
Arjun Panickssery
Feb 12, 2024, 12:56 AM
57
points
13
comments
3
min read
LW
link
Visualizing small Attention-only Transformers
WCargo
Nov 19, 2024, 9:37 AM
4
points
0
comments
8
min read
LW
link
Deconfusing In-Context Learning
Arjun Panickssery
Feb 25, 2024, 9:48 AM
37
points
1
comment
2
min read
LW
link
Building a transformer from scratch—AI safety up-skilling challenge
Marius Hobbhahn
Oct 12, 2022, 3:40 PM
42
points
1
comment
5
min read
LW
link
Decompiling Tracr Transformers—An interpretability experiment
Hannes Thurnherr
Mar 27, 2024, 9:49 AM
4
points
0
comments
14
min read
LW
link
Understanding mesa-optimization using toy models
tilmanr
,
rusheb
,
Guillaume Corlouer
,
Dan Valentine
,
afspies
,
mivanitskiy
and
Can
May 7, 2023, 5:00 PM
43
points
2
comments
10
min read
LW
link
No comments.
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel