RSS

LAThomson

Karma: 90

4th-year undergrad Computer Science and Philosophy student at Oxford, and part-time (hopefully full-time in future!) AI Safety researcher :)

Towards shut­down­able agents via stochas­tic choice

Jul 8, 2024, 10:14 AM
59 points
11 comments23 min readLW link
(arxiv.org)

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

Nov 8, 2023, 11:37 AM
49 points
0 comments18 min readLW link