Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Arthur Conmy
Karma:
1,651
Intepretability
Views my own
All
Posts
Comments
New
Top
Old
Page
2
[Paper] All’s Fair In Love And Love: Copy Suppression in GPT-2 Small
CallumMcDougall
,
Arthur Conmy
,
Cody Rushing
,
Tom McGrath
and
Neel Nanda
Oct 13, 2023, 6:32 PM
82
points
4
comments
8
min read
LW
link
Three ways interpretability could be impactful
Arthur Conmy
Sep 18, 2023, 1:02 AM
47
points
8
comments
4
min read
LW
link
Mechanistically interpreting time in GPT-2 small
rgould
,
Elizabeth Ho
and
Arthur Conmy
Apr 16, 2023, 5:57 PM
68
points
6
comments
21
min read
LW
link
RLHF does not appear to differentially cause mode-collapse
Arthur Conmy
and
beren
Mar 20, 2023, 3:39 PM
95
points
9
comments
3
min read
LW
link
OpenAI introduce ChatGPT API at 1/10th the previous $/token
Arthur Conmy
Mar 1, 2023, 8:48 PM
28
points
4
comments
1
min read
LW
link
(openai.com)
Arthur Conmy’s Shortform
Arthur Conmy
Nov 1, 2022, 9:35 PM
2
points
1
comment
LW
link
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
RowanWang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck
and
jsteinhardt
Oct 28, 2022, 11:55 PM
101
points
9
comments
9
min read
LW
link
2
reviews
(arxiv.org)
Previous
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel