RSS

HoldenKarnofsky

Karma: 7,075

Sab­o­tage Eval­u­a­tions for Fron­tier Models

18 Oct 2024 22:33 UTC
93 points
55 comments6 min readLW link
(assets.anthropic.com)

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

HoldenKarnofsky20 Jun 2024 13:33 UTC
42 points
0 comments1 min readLW link

Good job op­por­tu­ni­ties for helping with the most im­por­tant century

HoldenKarnofsky18 Jan 2024 17:30 UTC
36 points
0 comments4 min readLW link
(www.cold-takes.com)

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofsky27 Oct 2023 15:19 UTC
200 points
33 comments8 min readLW link

3 lev­els of threat obfuscation

HoldenKarnofsky2 Aug 2023 14:58 UTC
69 points
14 comments7 min readLW link

A Play­book for AI Risk Re­duc­tion (fo­cused on mis­al­igned AI)

HoldenKarnofsky6 Jun 2023 18:05 UTC
90 points
41 comments14 min readLW link

Seek­ing (Paid) Case Stud­ies on Standards

HoldenKarnofsky26 May 2023 17:58 UTC
69 points
9 comments11 min readLW link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

HoldenKarnofsky14 Mar 2023 19:23 UTC
76 points
17 comments15 min readLW link

Dis­cus­sion with Nate Soares on a key al­ign­ment difficulty

HoldenKarnofsky13 Mar 2023 21:20 UTC
256 points
42 comments22 min readLW link

What does Bing Chat tell us about AI risk?

HoldenKarnofsky28 Feb 2023 17:40 UTC
80 points
21 comments2 min readLW link
(www.cold-takes.com)

How ma­jor gov­ern­ments can help with the most im­por­tant century

HoldenKarnofsky24 Feb 2023 18:20 UTC
29 points
0 comments4 min readLW link
(www.cold-takes.com)

What AI com­pa­nies can do to­day to help with the most im­por­tant century

HoldenKarnofsky20 Feb 2023 17:00 UTC
38 points
3 comments9 min readLW link
(www.cold-takes.com)

Jobs that can help with the most im­por­tant century

HoldenKarnofsky10 Feb 2023 18:20 UTC
24 points
0 comments19 min readLW link
(www.cold-takes.com)

Spread­ing mes­sages to help with the most im­por­tant century

HoldenKarnofsky25 Jan 2023 18:20 UTC
75 points
4 comments18 min readLW link
(www.cold-takes.com)

How we could stum­ble into AI catastrophe

HoldenKarnofsky13 Jan 2023 16:20 UTC
71 points
18 comments18 min readLW link
(www.cold-takes.com)

Trans­for­ma­tive AI is­sues (not just mis­al­ign­ment): an overview

HoldenKarnofsky5 Jan 2023 20:20 UTC
34 points
6 comments18 min readLW link
(www.cold-takes.com)

Rac­ing through a minefield: the AI de­ploy­ment problem

HoldenKarnofsky22 Dec 2022 16:10 UTC
38 points
2 comments13 min readLW link
(www.cold-takes.com)

High-level hopes for AI alignment

HoldenKarnofsky15 Dec 2022 18:00 UTC
58 points
3 comments19 min readLW link
(www.cold-takes.com)

AI Safety Seems Hard to Measure

HoldenKarnofsky8 Dec 2022 19:50 UTC
71 points
6 comments14 min readLW link
(www.cold-takes.com)

Why Would AI “Aim” To Defeat Hu­man­ity?

HoldenKarnofsky29 Nov 2022 19:30 UTC
69 points
10 comments33 min readLW link
(www.cold-takes.com)