RSS

Sam Bowman

Karma: 942

https://​​cims.nyu.edu/​​~sbowman/​​

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
119 points
17 comments1 min readLW link
(www.anthropic.com)

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

17 Apr 2024 21:09 UTC
44 points
1 comment3 min readLW link
(tiny.cc)

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

7 Feb 2024 21:28 UTC
87 points
14 comments9 min readLW link
(arxiv.org)

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
109 points
13 comments6 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
133 points
18 comments11 min readLW link

In­verse Scal­ing Prize: Se­cond Round Winners

24 Jan 2023 20:12 UTC
58 points
17 comments15 min readLW link

AI Safety and Neigh­bor­ing Com­mu­ni­ties: A Quick-Start Guide, as of Sum­mer 2022

Sam Bowman1 Sep 2022 19:15 UTC
76 points
2 comments7 min readLW link

Sur­vey of NLP Re­searchers: NLP is con­tribut­ing to AGI progress; ma­jor catas­tro­phe plausible

Sam Bowman31 Aug 2022 1:39 UTC
92 points
6 comments2 min readLW link

Ar­tifi­cial Sand­wich­ing: When can we test scal­able al­ign­ment pro­to­cols with­out hu­mans?

Sam Bowman13 Jul 2022 21:14 UTC
42 points
6 comments5 min readLW link

An­nounc­ing the In­verse Scal­ing Prize ($250k Prize Pool)

27 Jun 2022 15:58 UTC
169 points
14 comments7 min readLW link

Jobs: Help scale up LM al­ign­ment re­search at NYU

Sam Bowman9 May 2022 14:12 UTC
60 points
1 comment1 min readLW link

A Small Nega­tive Re­sult on Debate

Sam Bowman12 Apr 2022 18:19 UTC
42 points
11 comments1 min readLW link

NLP Po­si­tion Paper: When Com­bat­ting Hype, Pro­ceed with Caution

Sam Bowman15 Oct 2021 20:57 UTC
46 points
14 comments1 min readLW link