RSS

HoldenKarnofsky

Karma: 7,075

Sab­o­tage Eval­u­a­tions for Fron­tier Models

18 Oct 2024 22:33 UTC
93 points
46 comments6 min readLW link
(assets.anthropic.com)

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

HoldenKarnofsky20 Jun 2024 13:33 UTC
42 points
0 comments1 min readLW link

Good job op­por­tu­ni­ties for helping with the most im­por­tant century

HoldenKarnofsky18 Jan 2024 17:30 UTC
36 points
0 comments4 min readLW link
(www.cold-takes.com)

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofsky27 Oct 2023 15:19 UTC
200 points
33 comments8 min readLW link

3 lev­els of threat obfuscation

HoldenKarnofsky2 Aug 2023 14:58 UTC
69 points
14 comments7 min readLW link