RSS

Sam Bowman

Karma: 938

https://​​cims.nyu.edu/​​~sbowman/​​

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
117 points
15 comments1 min readLW link
(www.anthropic.com)

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

17 Apr 2024 21:09 UTC
43 points
1 comment3 min readLW link
(tiny.cc)

De­bat­ing with More Per­sua­sive LLMs Leads to More Truth­ful Answers

7 Feb 2024 21:28 UTC
87 points
14 comments9 min readLW link
(arxiv.org)

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

18 Jul 2023 16:36 UTC
109 points
13 comments6 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
133 points
18 comments11 min readLW link

In­verse Scal­ing Prize: Se­cond Round Winners

24 Jan 2023 20:12 UTC
58 points
17 comments15 min readLW link