Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Andy Arditi
Karma:
416
https://andyrdt.com
All
Posts
Comments
New
Top
Old
Unlearning via RMU is mostly shallow
Andy Arditi
and
bilalchughtai
23 Jul 2024 16:07 UTC
50
points
3
comments
6
min read
LW
link
Refusal in LLMs is mediated by a single direction
Andy Arditi
,
Oscar Obeso
,
Aaquib111
,
wesg
and
Neel Nanda
27 Apr 2024 11:13 UTC
228
points
93
comments
10
min read
LW
link
Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi
and
Oscar Obeso
8 Dec 2023 17:08 UTC
81
points
7
comments
7
min read
LW
link
Back to top