RSS

Akbir Khan

Karma: 215

Au­to­mated Re­searchers Can Subtly Sandbag

Mar 26, 2025, 7:13 PM
41 points
0 comments4 min readLW link
(alignment.anthropic.com)

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
137 points
13 comments13 min readLW link

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

Feb 7, 2024, 9:28 PM
89 points
14 comments9 min readLW link
(arxiv.org)

Why multi-agent safety is im­por­tant

Akbir KhanJun 14, 2022, 9:23 AM
10 points
2 comments10 min readLW link

Why we need proso­cial agents

Akbir KhanNov 2, 2021, 3:19 PM
7 points
0 comments2 min readLW link