Sam Bowman

Karma: 942

https://cims.nyu.edu/~sbowman/

Simple probes can catch sleeper agents

Monte M, Carson Denison, Zac Hatfield-Dodds, David Duvenaud, Sam Bowman, Ethan Perez and evhub

23 Apr 2024 21:10 UTC

119 points

17 comments1 min readLW link

(www.anthropic.com)

LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery, Sam Bowman and Shi Feng

17 Apr 2024 21:09 UTC

44 points

1 comment3 min readLW link

(tiny.cc)

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan, John Hughes, Dan Valentine, Sam Bowman and Ethan Perez

7 Feb 2024 21:28 UTC

87 points

14 comments9 min readLW link

(arxiv.org)

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman and Ethan Perez

18 Jul 2023 16:36 UTC

109 points

13 comments6 min readLW link

Pretraining Language Models with Human Preferences

Tomek Korbak, Sam Bowman and Ethan Perez

21 Feb 2023 17:57 UTC

133 points

18 comments11 min readLW link

Inverse Scaling Prize: Second Round Winners

Ian McKenzie, Sam Bowman and Ethan Perez

24 Jan 2023 20:12 UTC

58 points

17 comments15 min readLW link

AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022

Sam Bowman1 Sep 2022 19:15 UTC

76 points

2 comments7 min readLW link

Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible

Sam Bowman31 Aug 2022 1:39 UTC

92 points

6 comments2 min readLW link

Artificial Sandwiching: When can we test scalable alignment protocols without humans?

Sam Bowman13 Jul 2022 21:14 UTC

42 points

6 comments5 min readLW link

Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez, Ian McKenzie and Sam Bowman

27 Jun 2022 15:58 UTC

169 points

14 comments7 min readLW link

Jobs: Help scale up LM alignment research at NYU

Sam Bowman9 May 2022 14:12 UTC

60 points

1 comment1 min readLW link

A Small Negative Result on Debate

Sam Bowman12 Apr 2022 18:19 UTC

42 points

11 comments1 min readLW link

NLP Position Paper: When Combatting Hype, Proceed with Caution

Sam Bowman15 Oct 2021 20:57 UTC

46 points

14 comments1 min readLW link