RSS

HoldenKarnofsky

Karma: 7,105

Sab­o­tage Eval­u­a­tions for Fron­tier Models

Oct 18, 2024, 10:33 PM
94 points
56 comments6 min readLW link
(assets.anthropic.com)

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

HoldenKarnofskyJun 20, 2024, 1:33 PM
42 points
0 comments1 min readLW link

Good job op­por­tu­ni­ties for helping with the most im­por­tant century

HoldenKarnofskyJan 18, 2024, 5:30 PM
36 points
0 comments4 min readLW link
(www.cold-takes.com)

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofskyOct 27, 2023, 3:19 PM
200 points
33 comments8 min readLW link

3 lev­els of threat obfuscation

HoldenKarnofskyAug 2, 2023, 2:58 PM
69 points
14 comments7 min readLW link

A Play­book for AI Risk Re­duc­tion (fo­cused on mis­al­igned AI)

HoldenKarnofskyJun 6, 2023, 6:05 PM
90 points
42 comments14 min readLW link1 review

Seek­ing (Paid) Case Stud­ies on Standards

HoldenKarnofskyMay 26, 2023, 5:58 PM
69 points
9 comments11 min readLW link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

HoldenKarnofskyMar 14, 2023, 7:23 PM
76 points
17 comments15 min readLW link

Dis­cus­sion with Nate Soares on a key al­ign­ment difficulty

HoldenKarnofskyMar 13, 2023, 9:20 PM
265 points
43 comments22 min readLW link1 review

What does Bing Chat tell us about AI risk?

HoldenKarnofskyFeb 28, 2023, 5:40 PM
80 points
21 comments2 min readLW link
(www.cold-takes.com)

How ma­jor gov­ern­ments can help with the most im­por­tant century

HoldenKarnofskyFeb 24, 2023, 6:20 PM
29 points
0 comments4 min readLW link
(www.cold-takes.com)

What AI com­pa­nies can do to­day to help with the most im­por­tant century

HoldenKarnofskyFeb 20, 2023, 5:00 PM
38 points
3 comments9 min readLW link
(www.cold-takes.com)

Jobs that can help with the most im­por­tant century

HoldenKarnofskyFeb 10, 2023, 6:20 PM
24 points
0 comments19 min readLW link
(www.cold-takes.com)

Spread­ing mes­sages to help with the most im­por­tant century

HoldenKarnofskyJan 25, 2023, 6:20 PM
75 points
4 comments18 min readLW link
(www.cold-takes.com)

How we could stum­ble into AI catastrophe

HoldenKarnofskyJan 13, 2023, 4:20 PM
71 points
18 comments18 min readLW link
(www.cold-takes.com)

Trans­for­ma­tive AI is­sues (not just mis­al­ign­ment): an overview

HoldenKarnofskyJan 5, 2023, 8:20 PM
34 points
6 comments18 min readLW link
(www.cold-takes.com)

Rac­ing through a minefield: the AI de­ploy­ment problem

HoldenKarnofskyDec 22, 2022, 4:10 PM
38 points
2 comments13 min readLW link
(www.cold-takes.com)

High-level hopes for AI alignment

HoldenKarnofskyDec 15, 2022, 6:00 PM
58 points
3 comments19 min readLW link
(www.cold-takes.com)

AI Safety Seems Hard to Measure

HoldenKarnofskyDec 8, 2022, 7:50 PM
71 points
6 comments14 min readLW link
(www.cold-takes.com)

Why Would AI “Aim” To Defeat Hu­man­ity?

HoldenKarnofskyNov 29, 2022, 7:30 PM
69 points
10 comments33 min readLW link
(www.cold-takes.com)