Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Matthew Rahtz
Karma:
55
All
Posts
Comments
New
Top
Old
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Neel Nanda
,
Tom Lieberum
,
Matthew Rahtz
,
János Kramár
,
Geoffrey Irving
,
Rohin Shah
and
Vlad Mikulik
20 Jul 2023 10:50 UTC
44
points
3
comments
2
min read
LW
link
(arxiv.org)
Specification gaming: the flip side of AI ingenuity
Vika
,
Vlad Mikulik
,
Matthew Rahtz
,
tom4everitt
,
Zac Kenton
and
janleike
6 May 2020 23:51 UTC
66
points
9
comments
6
min read
LW
link
Back to top