RSS

HarrietW

Karma: 39

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

3 Aug 2024 10:16 UTC
7 points
0 comments14 min readLW link
(www.oliversourbut.net)

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

8 Nov 2023 11:37 UTC
49 points
0 comments18 min readLW link