Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
dsbowen
Karma:
25
Research Scientist at FAR AI.
All
Posts
Comments
New
Top
Old
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
ChengCheng
,
Brendan Murphy
,
Adrià Garriga-alonso
,
Yashvardhan Sharma
,
dsbowen
,
smallsilo
,
Yawen Duan
,
ChrisCundy
,
Hannah Betts
,
AdamGleave
and
Kellin Pelrine
7 Feb 2025 3:57 UTC
29
points
0
comments
10
min read
LW
link
Back to top