Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Kellin Pelrine
Karma:
129
All
Posts
Comments
New
Top
Old
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
ChengCheng
,
Brendan Murphy
,
AdamGleave
and
Kellin Pelrine
1 Nov 2024 0:10 UTC
18
points
0
comments
6
min read
LW
link
(far.ai)
Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave
,
EuanMcLean
,
Tony Wang
,
Kellin Pelrine
,
Tom Tseng
,
Yawen Duan
,
Joseph Miller
and
MichaelDennis
20 Jul 2023 17:31 UTC
129
points
22
comments
10
min read
LW
link
(far.ai)
Back to top