Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Oscar Obeso
Karma:
292
Math undergrad at ETH Zurich.
More info:
oscarbalcells.com
All
Posts
Comments
New
Top
Old
Refusal in LLMs is mediated by a single direction
Andy Arditi
,
Oscar Obeso
,
Aaquib111
,
wesg
and
Neel Nanda
27 Apr 2024 11:13 UTC
228
points
93
comments
10
min read
LW
link
Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi
and
Oscar Obeso
8 Dec 2023 17:08 UTC
81
points
7
comments
7
min read
LW
link
Back to top