RSS

AI Risk

TagLast edit: Nov 2, 2022, 8:27 PM by brook

AI Risk is analysis of the risks associated with building powerful AI systems.

Related: AI, Orthogonality thesis, Complexity of value, Goodhart’s law, Paperclip maximiser

Su­per­in­tel­li­gence FAQ

Scott AlexanderSep 20, 2016, 7:00 PM
137 points
39 comments27 min readLW link

AGI Ruin: A List of Lethalities

Eliezer YudkowskyJun 5, 2022, 10:05 PM
929 points
708 comments30 min readLW link3 reviews

What failure looks like

paulfchristianoMar 17, 2019, 8:18 PM
431 points
55 comments8 min readLW link2 reviews

Speci­fi­ca­tion gam­ing ex­am­ples in AI

VikaApr 3, 2018, 12:30 PM
47 points
9 comments1 min readLW link2 reviews

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob BensingerMay 7, 2023, 9:52 PM
91 points
26 comments19 min readLW link

Where I agree and dis­agree with Eliezer

paulfchristianoJun 19, 2022, 7:15 PM
898 points
223 comments18 min readLW link2 reviews

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

Nov 11, 2021, 3:01 AM
328 points
253 comments34 min readLW link1 review

“Cor­rigi­bil­ity at some small length” by dath ilan

Christopher KingApr 5, 2023, 1:47 AM
32 points
3 comments9 min readLW link
(www.glowfic.com)

In­tu­itions about goal-di­rected behavior

Rohin ShahDec 1, 2018, 4:25 AM
54 points
15 comments6 min readLW link

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimiMar 8, 2021, 10:05 PM
58 points
7 comments9 min readLW link

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris SalaMay 22, 2023, 2:31 PM
155 points
5 comments3 min readLW link
(www.conjecture.dev)

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8resJul 12, 2022, 2:49 AM
313 points
89 comments29 min readLW link3 reviews

AGI in sight: our look at the game board

Feb 18, 2023, 10:17 PM
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

Dec 18, 2023, 8:35 PM
174 points
22 comments12 min readLW link1 review

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM
37 points
4 comments2 min readLW link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

apcFeb 8, 2020, 9:28 PM
104 points
29 comments16 min readLW link

Open Prob­lems in AI X-Risk [PAIS #5]

Jun 10, 2022, 2:08 AM
61 points
6 comments36 min readLW link

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese AnthisNov 22, 2022, 4:50 PM
93 points
64 comments1 min readLW link
(www.science.org)

Bing chat is the AI fire alarm

RatiosFeb 17, 2023, 6:51 AM
115 points
63 comments3 min readLW link

MIRI an­nounces new “Death With Dig­nity” strategy

Eliezer YudkowskyApr 2, 2022, 12:43 AM
354 points
545 comments18 min readLW link1 review

Robin Han­son’s lat­est AI risk po­si­tion statement

LironMar 3, 2023, 2:25 PM
55 points
18 comments1 min readLW link
(www.overcomingbias.com)

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane RuthenisDec 19, 2023, 8:09 PM
68 points
11 comments1 min readLW link

Stampy’s AI Safety Info—New Distil­la­tions #1 [March 2023]

markovApr 7, 2023, 11:06 AM
42 points
0 comments2 min readLW link
(aisafety.info)

A Path out of In­suffi­cient Views

UnrealSep 24, 2024, 8:00 PM
40 points
55 comments9 min readLW link

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël TrazziJul 28, 2018, 9:27 PM
74 points
9 comments3 min readLW link
(github.com)

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimiAug 11, 2020, 6:16 PM
53 points
55 comments1 min readLW link

Another (outer) al­ign­ment failure story

paulfchristianoApr 7, 2021, 8:12 PM
247 points
38 comments12 min readLW link1 review

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM
30 points
14 comments9 min readLW link

Devel­op­men­tal Stages of GPTs

orthonormalJul 26, 2020, 10:03 PM
140 points
72 comments7 min readLW link1 review

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail SaminJul 12, 2023, 12:12 PM
105 points
13 comments4 min readLW link

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

Nov 22, 2022, 6:57 PM
134 points
97 comments24 min readLW link

Did Ben­gio and Teg­mark lose a de­bate about AI x-risk against LeCun and Mitchell?

Karl von WendtJun 25, 2023, 4:59 PM
106 points
53 comments7 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_HiltonJan 17, 2022, 4:49 PM
65 points
14 comments13 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhubSep 7, 2019, 6:11 PM
78 points
11 comments8 min readLW link

Soft take­off can still lead to de­ci­sive strate­gic advantage

Daniel KokotajloAug 23, 2019, 4:39 PM
122 points
47 comments8 min readLW link4 reviews

Alexan­der and Yud­kowsky on AGI goals

Jan 24, 2023, 9:09 PM
178 points
53 comments26 min readLW link1 review

The Hid­den Com­plex­ity of Wishes

Eliezer YudkowskyNov 24, 2007, 12:12 AM
178 points
199 comments8 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofskyJun 9, 2022, 3:50 PM
170 points
42 comments17 min readLW link
(www.cold-takes.com)

Be­ing at peace with Doom

Johannes C. MayerApr 9, 2023, 2:53 PM
23 points
14 comments4 min readLW link1 review

Should we post­pone AGI un­til we reach safety?

otto.bartenNov 18, 2020, 3:43 PM
27 points
36 comments3 min readLW link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimiOct 11, 2021, 8:20 AM
110 points
10 comments15 min readLW link

Coun­ter­ar­gu­ments to the ba­sic AI x-risk case

KatjaGraceOct 14, 2022, 1:00 PM
371 points
124 comments34 min readLW link1 review
(aiimpacts.org)

“Endgame safety” for AGI

Steven ByrnesJan 24, 2023, 2:15 PM
85 points
10 comments6 min readLW link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin PopeMar 21, 2023, 12:06 AM
358 points
232 comments39 min readLW link1 review

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannellDec 18, 2021, 2:15 AM
32 points
22 comments4 min readLW link

Ar­chi­tects of Our Own Demise: We Should Stop Devel­op­ing AI Carelessly

RokoOct 26, 2023, 12:36 AM
170 points
75 comments3 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

Feb 15, 2023, 6:11 PM
100 points
27 comments4 min readLW link

An­nounc­ing Apollo Research

May 30, 2023, 4:17 PM
217 points
11 comments8 min readLW link

Devil’s Ad­vo­cate: Ad­verse Selec­tion Against Con­scien­tious­ness

lionhearted (Sebastian Marshall)May 28, 2023, 5:53 PM
10 points
2 comments1 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

BuckJul 21, 2020, 8:01 PM
82 points
44 comments3 min readLW link

[Question] How likely are sce­nar­ios where AGI ends up overtly or de facto tor­tur­ing us? How likely are sce­nar­ios where AGI pre­vents us from com­mit­ting suicide or dy­ing?

JohnGreerMar 28, 2023, 6:00 PM
11 points
4 comments1 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

Dec 1, 2022, 11:11 PM
302 points
33 comments2 min readLW link

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons to Pre­serve Humanity

James_MillerMar 18, 2023, 4:22 PM
41 points
73 comments12 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John NayOct 26, 2022, 1:24 AM
1 point
10 comments3 min readLW link

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

Apr 11, 2023, 5:30 PM
141 points
11 comments1 min readLW link

A Bear Case: My Pre­dic­tions Re­gard­ing AI Progress

Thane RuthenisMar 5, 2025, 4:41 PM
343 points
150 comments9 min readLW link

Thoughts on re­fus­ing harm­ful re­quests to large lan­guage models

William_SJan 19, 2023, 7:49 PM
32 points
4 comments2 min readLW link

Against a Gen­eral Fac­tor of Doom

Jeffrey HeningerNov 23, 2022, 4:50 PM
61 points
19 comments4 min readLW link1 review
(aiimpacts.org)

How we could stum­ble into AI catastrophe

HoldenKarnofskyJan 13, 2023, 4:20 PM
71 points
18 comments18 min readLW link
(www.cold-takes.com)

Are we drop­ping the ball on Recom­men­da­tion AIs?

Charbel-RaphaëlOct 23, 2024, 5:48 PM
41 points
17 comments6 min readLW link

The Prefer­ence Fulfill­ment Hypothesis

Kaj_SotalaFeb 26, 2023, 10:55 AM
66 points
62 comments11 min readLW link

AI Safety Info Distil­la­tion Fellowship

Feb 17, 2023, 4:16 PM
47 points
3 comments3 min readLW link

World-Model In­ter­pretabil­ity Is All We Need

Thane RuthenisJan 14, 2023, 7:37 PM
35 points
22 comments21 min readLW link

The other side of the tidal wave

KatjaGraceNov 3, 2023, 5:40 AM
189 points
86 comments1 min readLW link
(worldspiritsockpuppet.com)

Stuxnet, not Skynet: Hu­man­ity’s dis­em­pow­er­ment by AI

RokoNov 4, 2023, 10:23 PM
107 points
24 comments6 min readLW link

[Linkpost] Bi­den-Har­ris Ex­ec­u­tive Order on AI

berenOct 30, 2023, 3:20 PM
3 points
0 comments1 min readLW link

[Question] Good tax­onomies of all risks (small or large) from AI?

Aryeh EnglanderMar 5, 2024, 6:15 PM
6 points
1 comment1 min readLW link

New vol­un­tary com­mit­ments (AI Seoul Sum­mit)

Zach Stein-PerlmanMay 21, 2024, 11:00 AM
81 points
17 comments7 min readLW link
(www.gov.uk)

Clar­ify­ing some key hy­pothe­ses in AI alignment

Aug 15, 2019, 9:29 PM
79 points
12 comments9 min readLW link

[Question] First and Last Ques­tions for GPT-5*

Mitchell_PorterNov 24, 2023, 5:03 AM
15 points
5 comments1 min readLW link

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian BradshawMar 12, 2023, 11:09 PM
212 points
14 comments1 min readLW link

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher KingMar 15, 2023, 12:29 AM
116 points
22 comments2 min readLW link

AI #2

ZviMar 2, 2023, 2:50 PM
66 points
18 comments55 min readLW link
(thezvi.wordpress.com)

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcsMar 15, 2023, 5:55 PM
180 points
42 comments1 min readLW link

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)Feb 8, 2023, 10:33 PM
46 points
20 comments1 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidadDec 15, 2022, 1:42 PM
67 points
1 comment3 min readLW link

My thoughts on OpenAI’s al­ign­ment plan

AkashDec 30, 2022, 7:33 PM
55 points
3 comments20 min readLW link

[Question] Why are we sure that AI will “want” some­thing?

ShmiSep 16, 2022, 8:35 PM
31 points
57 comments1 min readLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannellOct 23, 2022, 9:48 PM
61 points
44 comments17 min readLW link

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshenAug 20, 2022, 10:29 AM
36 points
4 comments7 min readLW link

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas TeixeiraJul 29, 2022, 9:06 PM
38 points
1 comment9 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimiAug 20, 2022, 12:56 PM
24 points
2 comments2 min readLW link

AI Safety Seems Hard to Measure

HoldenKarnofskyDec 8, 2022, 7:50 PM
71 points
6 comments14 min readLW link
(www.cold-takes.com)

The Wizard of Oz Prob­lem: How in­cen­tives and nar­ra­tives can skew our per­cep­tion of AI developments

AkashMar 20, 2023, 8:44 PM
16 points
3 comments6 min readLW link

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

AkashDec 20, 2022, 9:39 PM
18 points
2 comments11 min readLW link

Episte­molog­i­cal Vigilance for Alignment

adamShimiJun 6, 2022, 12:27 AM
66 points
11 comments10 min readLW link

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_SotalaJun 2, 2022, 9:41 PM
129 points
27 comments4 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchinJun 10, 2022, 5:24 PM
10 points
2 comments1 min readLW link

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

May 24, 2022, 12:00 AM
58 points
3 comments21 min readLW link

AXRP Epi­sode 13 - First Prin­ci­ples of AGI Safety with Richard Ngo

DanielFilanMar 31, 2022, 5:20 AM
25 points
1 comment48 min readLW link

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimiJul 23, 2022, 11:40 AM
38 points
5 comments3 min readLW link

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworthAug 12, 2022, 4:30 PM
110 points
49 comments1 min readLW link

Shapes of Mind and Plu­ral­ism in Alignment

adamShimiAug 13, 2022, 10:01 AM
33 points
2 comments2 min readLW link

How Josiah be­came an AI safety researcher

Neil CrawfordSep 6, 2022, 5:17 PM
4 points
0 comments1 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John NayOct 21, 2022, 2:03 AM
5 points
18 comments54 min readLW link

AI Safety Micro­grant Round

Chris_LeongNov 14, 2022, 4:25 AM
22 points
1 comment1 min readLW link

Poster Ses­sion on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM
7 points
8 comments1 min readLW link

Rac­ing through a minefield: the AI de­ploy­ment problem

HoldenKarnofskyDec 22, 2022, 4:10 PM
38 points
2 comments13 min readLW link
(www.cold-takes.com)

ea.do­mains—Do­mains Free to a Good Home

plexJan 12, 2023, 1:32 PM
24 points
0 comments1 min readLW link

Many AI gov­er­nance pro­pos­als have a trade­off be­tween use­ful­ness and feasibility

Feb 3, 2023, 6:49 PM
22 points
2 comments2 min readLW link

Con­tra “Strong Co­her­ence”

DragonGodMar 4, 2023, 8:05 PM
39 points
24 comments1 min readLW link

Con­tra Han­son on AI Risk

LironMar 4, 2023, 8:02 AM
36 points
23 comments8 min readLW link

“Care­fully Boot­strapped Align­ment” is or­ga­ni­za­tion­ally hard

RaemonMar 17, 2023, 6:00 PM
262 points
23 comments11 min readLW link1 review

Quote quiz: “drift­ing into de­pen­dence”

jasoncrawfordApr 27, 2023, 3:13 PM
7 points
6 comments1 min readLW link
(rootsofprogress.org)

But ex­actly how com­plex and frag­ile?

KatjaGraceNov 3, 2019, 6:20 PM
87 points
32 comments3 min readLW link1 review
(meteuphoric.com)

“Hu­man­ity vs. AGI” Will Never Look Like “Hu­man­ity vs. AGI” to Humanity

Thane RuthenisDec 16, 2023, 8:08 PM
190 points
34 comments5 min readLW link

MIRI’s April 2024 Newsletter

HarlanApr 12, 2024, 11:38 PM
95 points
0 comments3 min readLW link
(intelligence.org)

Staged release

Zach Stein-PerlmanApr 17, 2024, 4:00 PM
11 points
4 comments2 min readLW link

A short­com­ing of con­crete demon­stra­tions as AGI risk advocacy

Steven ByrnesDec 11, 2024, 4:48 PM
103 points
27 comments2 min readLW link

Will work­ing here ad­vance AGI? Help us not de­stroy the world!

Yonatan CaleMay 29, 2022, 11:42 AM
30 points
47 comments1 min readLW link

AI safety tax dynamics

owencbOct 23, 2024, 12:18 PM
22 points
0 comments6 min readLW link
(strangecities.substack.com)

RA Bounty: Look­ing for feed­back on screen­play about AI Risk

WriterOct 26, 2023, 1:23 PM
30 points
6 comments1 min readLW link

A Rocket–In­ter­pretabil­ity Analogy

plexOct 21, 2024, 1:55 PM
152 points
31 comments1 min readLW link

Ban­kless Pod­cast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesedFeb 20, 2023, 4:42 PM
83 points
54 comments1 min readLW link
(www.youtube.com)

Chad Jones pa­per mod­el­ing AI and x-risk vs. growth

jasoncrawfordApr 26, 2023, 8:07 PM
39 points
7 comments2 min readLW link
(web.stanford.edu)

The first fu­ture and the best future

KatjaGraceApr 25, 2024, 6:40 AM
106 points
12 comments1 min readLW link
(worldspiritsockpuppet.com)

Ta­boo P(doom)

NathanBarnardFeb 3, 2023, 10:37 AM
14 points
10 comments1 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
151 points
40 comments5 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_AltairJun 10, 2022, 5:38 PM
21 points
4 comments8 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_LeongJan 17, 2022, 9:23 AM
75 points
16 comments3 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM
53 points
3 comments28 min readLW link

Thoughts on AGI safety from the top

jylin04Feb 2, 2022, 8:06 PM
36 points
3 comments32 min readLW link

[Question] Does the Struc­ture of an al­gorithm mat­ter for AI Risk and/​or con­scious­ness?

Logan ZoellnerDec 3, 2021, 6:31 PM
7 points
4 comments1 min readLW link

Drug ad­dicts and de­cep­tively al­igned agents—a com­par­a­tive analysis

JanNov 5, 2021, 9:42 PM
42 points
2 comments12 min readLW link
(universalprior.substack.com)

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblattDec 15, 2021, 7:06 PM
16 points
15 comments27 min readLW link

Paradigm-build­ing from first prin­ci­ples: Effec­tive al­tru­ism, AGI, and alignment

Cameron BergFeb 8, 2022, 4:12 PM
29 points
5 comments14 min readLW link

On A List of Lethalities

ZviJun 13, 2022, 12:30 PM
165 points
50 comments54 min readLW link1 review
(thezvi.wordpress.com)

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

hippkeSep 3, 2021, 2:19 PM
5 points
14 comments1 min readLW link

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTroutJun 22, 2021, 10:26 PM
71 points
43 comments16 min readLW link
(arxiv.org)

[Question] Con­di­tional on the first AGI be­ing al­igned cor­rectly, is a good out­come even still likely?

iamthouthouartiSep 6, 2021, 5:30 PM
2 points
1 comment1 min readLW link

Re­view of “Fun with +12 OOMs of Com­pute”

Mar 28, 2021, 2:55 PM
65 points
21 comments8 min readLW link1 review

AI Safety is Drop­ping the Ball on Clown Attacks

trevorOct 22, 2023, 8:09 PM
73 points
82 comments34 min readLW link

April drafts

AI ImpactsApr 1, 2021, 6:10 PM
49 points
2 comments1 min readLW link
(aiimpacts.org)

Bayeswatch 7: Wildfire

lsusrSep 8, 2021, 5:35 AM
51 points
6 comments3 min readLW link

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

Sep 13, 2023, 1:29 PM
19 points
0 comments5 min readLW link
(aisafety.camp)

In­for­ma­tion war­fare his­tor­i­cally re­volved around hu­man conduits

trevorAug 28, 2023, 6:54 PM
37 points
7 comments3 min readLW link

The In­ner Align­ment Problem

Jun 4, 2019, 1:20 AM
104 points
17 comments13 min readLW link

Clar­ify­ing “What failure looks like”

Sam ClarkeSep 20, 2020, 8:40 PM
97 points
14 comments17 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhubSep 10, 2019, 11:03 PM
69 points
27 comments27 min readLW link

Con­di­tions for Mesa-Optimization

Jun 1, 2019, 8:52 PM
84 points
48 comments12 min readLW link

[Book Re­view] “The Align­ment Prob­lem” by Brian Christian

lsusrSep 20, 2021, 6:36 AM
72 points
16 comments6 min readLW link

Con­ti­nu­ity Assumptions

Jan_KulveitJun 13, 2022, 9:31 PM
44 points
13 comments4 min readLW link

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwernJul 3, 2023, 12:48 AM
426 points
54 comments7 min readLW link
(www.youtube.com)

I Think Eliezer Should Go on Glenn Beck

Lao MeinJun 30, 2023, 3:12 AM
29 points
21 comments1 min readLW link

Catas­trophic Risks from AI #5: Rogue AIs

Jun 27, 2023, 10:06 PM
15 points
0 comments22 min readLW link
(arxiv.org)

An un­al­igned benchmark

paulfchristianoNov 17, 2018, 3:51 PM
31 points
0 comments9 min readLW link

Ten Levels of AI Align­ment Difficulty

Sammy MartinJul 3, 2023, 8:20 PM
130 points
24 comments12 min readLW link1 review

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_DavisSep 17, 2020, 2:23 AM
72 points
12 comments5 min readLW link
(aima.cs.berkeley.edu)

Thoughts on shar­ing in­for­ma­tion about lan­guage model capabilities

paulfchristianoJul 31, 2023, 4:04 PM
210 points
44 comments11 min readLW link1 review

Risks from Learned Op­ti­miza­tion: Introduction

May 31, 2019, 11:44 PM
187 points
42 comments12 min readLW link3 reviews

De­cep­tive Alignment

Jun 5, 2019, 8:16 PM
118 points
20 comments17 min readLW link

Risks from AI Overview: Summary

Aug 18, 2023, 1:21 AM
25 points
1 comment13 min readLW link
(www.safe.ai)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

Sep 4, 2023, 12:54 PM
109 points
16 comments5 min readLW link
(arxiv.org)

How difficult is AI Align­ment?

Sammy MartinSep 13, 2024, 3:47 PM
44 points
6 comments23 min readLW link

Microdooms averted by work­ing on AI Safety

Nikola JurkovicSep 17, 2023, 9:46 PM
34 points
3 comments3 min readLW link
(forum.effectivealtruism.org)

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven ByrnesNov 24, 2019, 1:40 AM
25 points
3 comments7 min readLW link

AISN #24: Kiss­inger Urges US-China Co­op­er­a­tion on AI, China’s New AI Law, US Ex­port Con­trols, In­ter­na­tional In­sti­tu­tions, and Open Source AI

Oct 18, 2023, 5:06 PM
14 points
0 comments6 min readLW link
(newsletter.safe.ai)

[Question] Sugges­tions of posts on the AF to review

adamShimiFeb 16, 2021, 12:40 PM
56 points
20 comments1 min readLW link

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimiMar 11, 2021, 3:01 PM
21 points
12 comments9 min readLW link

Orthog­o­nal­ity is expensive

berenApr 3, 2023, 10:20 AM
43 points
9 comments3 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

Jun 3, 2021, 8:37 PM
71 points
9 comments3 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimiAug 14, 2021, 3:16 PM
16 points
8 comments8 min readLW link

[Question] What are good al­ign­ment con­fer­ence pa­pers?

adamShimiAug 28, 2021, 1:35 PM
12 points
2 comments1 min readLW link

In­ter­view with Skynet

lsusrSep 30, 2021, 2:20 AM
49 points
1 comment2 min readLW link

Epistemic Strate­gies of Safety-Ca­pa­bil­ities Tradeoffs

adamShimiOct 22, 2021, 8:22 AM
5 points
0 comments6 min readLW link

Us­ing Brain-Com­puter In­ter­faces to get more data for AI alignment

RobboNov 7, 2021, 12:00 AM
43 points
10 comments7 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimiNov 17, 2021, 9:42 PM
47 points
3 comments1 min readLW link

Sum­mary of the Acausal At­tack Is­sue for AIXI

DiffractorDec 13, 2021, 8:16 AM
12 points
6 comments4 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:44 PM
127 points
9 comments15 min readLW link

More Is Differ­ent for AI

jsteinhardtJan 4, 2022, 7:30 PM
140 points
24 comments3 min readLW link1 review
(bounded-regret.ghost.io)

Reli­a­bil­ity, Se­cu­rity, and AI risk: Notes from in­fosec text­book chap­ter 1

AkashApr 7, 2023, 3:47 PM
34 points
1 comment4 min readLW link

Paradigm-build­ing: Introduction

Cameron BergFeb 8, 2022, 12:06 AM
28 points
0 comments2 min readLW link

[Question] Would (my­opic) gen­eral pub­lic good pro­duc­ers sig­nifi­cantly ac­cel­er­ate the de­vel­op­ment of AGI?

mako yassMar 2, 2022, 11:47 PM
25 points
10 comments1 min readLW link

AMA Con­jec­ture, A New Align­ment Startup

adamShimiApr 9, 2022, 9:43 AM
47 points
42 comments1 min readLW link

Why I’m Wor­ried About AI

peterbarnettMay 23, 2022, 9:13 PM
22 points
2 comments12 min readLW link

Distil­led—AGI Safety from First Principles

Harrison GMay 29, 2022, 12:57 AM
11 points
1 comment14 min readLW link

I’m try­ing out “as­ter­oid mind­set”

Alex_AltairJun 3, 2022, 1:35 PM
90 points
5 comments4 min readLW link

A Quick Guide to Con­fronting Doom

RubyApr 13, 2022, 7:30 PM
240 points
33 comments2 min readLW link

[Question] Has there been any work on at­tempt­ing to use Pas­cal’s Mug­ging to make an AGI be­have?

Chris_LeongJun 15, 2022, 8:33 AM
7 points
17 comments1 min readLW link

Con­fu­sion about neu­ro­science/​cog­ni­tive sci­ence as a dan­ger for AI Alignment

Samuel NellessenJun 22, 2022, 5:59 PM
2 points
1 comment3 min readLW link
(snellessen.com)

Con­fu­sions in My Model of AI Risk

peterbarnettJul 7, 2022, 1:05 AM
22 points
9 comments5 min readLW link

Neu­ro­science and Alignment

Garrett BakerMar 18, 2024, 9:09 PM
40 points
25 comments2 min readLW link

Top les­son from GPT: we will prob­a­bly de­stroy hu­man­ity “for the lulz” as soon as we are able.

ShmiApr 16, 2023, 8:27 PM
63 points
28 comments1 min readLW link

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_NgoAug 10, 2022, 10:46 PM
107 points
15 comments27 min readLW link1 review

A Com­mon-Sense Case For Mu­tu­ally-Misal­igned AGIs Ally­ing Against Humans

Thane RuthenisDec 17, 2023, 8:28 PM
29 points
7 comments11 min readLW link

Cortés, AI Risk, and the Dy­nam­ics of Com­pet­ing Conquerors

James_MillerJan 2, 2024, 4:37 PM
14 points
2 comments3 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimiSep 17, 2022, 5:03 PM
18 points
0 comments1 min readLW link

A con­ver­sa­tion about Katja’s coun­ter­ar­gu­ments to AI risk

Oct 18, 2022, 6:40 PM
43 points
9 comments33 min readLW link

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. HudsonOct 25, 2022, 3:54 AM
15 points
3 comments1 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Robert MilesNov 1, 2022, 11:23 PM
68 points
105 comments2 min readLW link

Talk­ing pub­li­cly about AI risk

Jan_KulveitApr 21, 2023, 11:28 AM
180 points
9 comments6 min readLW link

Pod­cast: Shoshan­nah Tekofsky on skil­ling up in AI safety, vis­it­ing Berkeley, and de­vel­op­ing novel re­search ideas

AkashNov 25, 2022, 8:47 PM
37 points
2 comments9 min readLW link

Catas­trophic Risks from AI #4: Or­ga­ni­za­tional Risks

Jun 26, 2023, 7:36 PM
23 points
0 comments21 min readLW link
(arxiv.org)

A Nar­row Path: a plan to deal with AI ex­tinc­tion risk

Oct 7, 2024, 1:02 PM
73 points
12 comments2 min readLW link
(www.narrowpath.co)

Went­worth and Larsen on buy­ing time

Jan 9, 2023, 9:31 PM
74 points
6 comments12 min readLW link

[Linkpost] TIME ar­ti­cle: Deep­Mind’s CEO Helped Take AI Main­stream. Now He’s Urg­ing Caution

AkashJan 21, 2023, 4:51 PM
58 points
2 comments3 min readLW link
(time.com)

Talk to me about your sum­mer/​ca­reer plans

AkashJan 31, 2023, 6:29 PM
31 points
3 comments2 min readLW link

How evals might (or might not) pre­vent catas­trophic risks from AI

AkashFeb 7, 2023, 8:16 PM
45 points
0 comments9 min readLW link

Refram­ing the bur­den of proof: Com­pa­nies should prove that mod­els are safe (rather than ex­pect­ing au­di­tors to prove that mod­els are dan­ger­ous)

AkashApr 25, 2023, 6:49 PM
27 points
11 comments3 min readLW link
(childrenoficarus.substack.com)

4 ways to think about de­moc­ra­tiz­ing AI [GovAI Linkpost]

AkashFeb 13, 2023, 6:06 PM
24 points
4 comments1 min readLW link
(www.governance.ai)

AI #1: Syd­ney and Bing

ZviFeb 21, 2023, 2:00 PM
171 points
45 comments61 min readLW link1 review
(thezvi.wordpress.com)

Full Tran­script: Eliezer Yud­kowsky on the Ban­kless podcast

Feb 23, 2023, 12:34 PM
138 points
89 comments75 min readLW link

In­cen­tives and Selec­tion: A Miss­ing Frame From AI Threat Dis­cus­sions?

DragonGodFeb 26, 2023, 1:18 AM
11 points
16 comments2 min readLW link

Evil au­to­com­plete: Ex­is­ten­tial Risk and Next-To­ken Predictors

YitzFeb 28, 2023, 8:47 AM
9 points
3 comments5 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM
32 points
23 comments10 min readLW link

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofskyOct 27, 2023, 3:19 PM
200 points
33 comments8 min readLW link

[Linkpost] Scott Alexan­der re­acts to OpenAI’s lat­est post

AkashMar 11, 2023, 10:24 PM
27 points
0 comments5 min readLW link
(astralcodexten.substack.com)

Plan for mediocre al­ign­ment of brain-like [model-based RL] AGI

Steven ByrnesMar 13, 2023, 2:11 PM
67 points
25 comments12 min readLW link

What would a com­pute mon­i­tor­ing plan look like? [Linkpost]

AkashMar 26, 2023, 7:33 PM
158 points
10 comments4 min readLW link
(arxiv.org)

AISN #25: White House Ex­ec­u­tive Order on AI, UK AI Safety Sum­mit, and Progress on Vol­un­tary Eval­u­a­tions of AI Risks

Dan HOct 31, 2023, 7:34 PM
35 points
1 comment6 min readLW link
(newsletter.safe.ai)

[Question] Should peo­ple build pro­duc­ti­za­tions of open source AI mod­els?

lcNov 2, 2023, 1:26 AM
23 points
0 comments1 min readLW link

“Tak­ing AI Risk Se­ri­ously” (thoughts by Critch)

RaemonJan 29, 2018, 9:27 AM
110 points
68 comments13 min readLW link

Cri­tiquing “What failure looks like”

Grue_SlinkyDec 27, 2019, 11:59 PM
35 points
6 comments3 min readLW link

The AI Revolu­tion in Biology

Roman LeventovMay 26, 2024, 9:30 AM
13 points
0 comments1 min readLW link
(www.cognitiverevolution.ai)

Will GPT-5 be able to self-im­prove?

Nathan Helm-BurgerApr 29, 2023, 5:34 PM
18 points
22 comments3 min readLW link

“Why can’t you just turn it off?”

RokoNov 19, 2023, 2:46 PM
48 points
25 comments1 min readLW link

Catas­trophic Risks from AI #2: Mal­i­cious Use

Jun 22, 2023, 5:10 PM
38 points
1 comment17 min readLW link
(arxiv.org)

Mis­tral Large 2 (123B) ex­hibits al­ign­ment faking

Mar 27, 2025, 3:39 PM
77 points
4 comments13 min readLW link

A plea for solu­tion­ism on AI safety

jasoncrawfordJun 9, 2023, 4:29 PM
72 points
6 comments6 min readLW link
(rootsofprogress.org)

Min­i­mum Vi­able Exterminator

Richard HorvathMay 29, 2023, 4:32 PM
14 points
5 comments5 min readLW link

The Con­trol Prob­lem: Un­solved or Un­solv­able?

RemmeltJun 2, 2023, 3:42 PM
55 points
46 comments14 min readLW link

Catas­trophic Risks from AI #1: Introduction

Jun 22, 2023, 5:09 PM
40 points
1 comment5 min readLW link
(arxiv.org)

My re­search agenda in agent foundations

Alex_AltairJun 28, 2023, 6:00 PM
72 points
9 comments11 min readLW link

Hands-On Ex­pe­rience Is Not Magic

Thane RuthenisMay 27, 2023, 4:57 PM
21 points
14 comments5 min readLW link

Ques­tions about Con­je­cure’s CoEm proposal

Mar 9, 2023, 7:32 PM
51 points
4 comments2 min readLW link

Against ubiquitous al­ign­ment taxes

berenMar 6, 2023, 7:50 PM
57 points
10 comments2 min readLW link

Why and When In­ter­pretabil­ity Work is Dangerous

Nicholas / Heather KrossMay 28, 2023, 12:27 AM
20 points
9 comments8 min readLW link
(www.thinkingmuchbetter.com)

AI Align­ment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM
126 points
6 comments35 min readLW link

Gra­di­ent Des­cent on the Hu­man Brain

Apr 1, 2024, 10:39 PM
59 points
5 comments2 min readLW link

Sen­sor Ex­po­sure can Com­pro­mise the Hu­man Brain in the 2020s

trevorOct 26, 2023, 3:31 AM
17 points
6 comments10 min readLW link

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane RuthenisOct 28, 2024, 7:45 PM
20 points
2 comments33 min readLW link
(aiimpacts.org)

The Main Sources of AI Risk?

Mar 21, 2019, 6:28 PM
126 points
26 comments2 min readLW link

(4 min read) An in­tu­itive ex­pla­na­tion of the AI in­fluence situation

trevorJan 13, 2024, 5:34 PM
12 points
26 comments4 min readLW link

Eight Short Stud­ies On Excuses

Scott AlexanderApr 20, 2010, 11:01 PM
852 points
253 comments10 min readLW link

My mo­ti­va­tion and the­ory of change for work­ing in AI healthtech

Andrew_CritchOct 12, 2024, 12:36 AM
176 points
37 comments14 min readLW link

AI com­pa­nies aren’t re­ally us­ing ex­ter­nal evaluators

Zach Stein-PerlmanMay 24, 2024, 4:01 PM
242 points
15 comments4 min readLW link

Mak­ing al­ign­ment a law of the universe

jugginsFeb 25, 2025, 10:44 AM
0 points
3 comments15 min readLW link

Brain­storm­ing ad­di­tional AI risk re­duc­tion ideas

John_MaxwellJun 14, 2012, 7:55 AM
19 points
37 comments1 min readLW link

List your AI X-Risk cruxes!

Aryeh EnglanderApr 28, 2024, 6:26 PM
42 points
7 comments2 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael HarthNov 15, 2020, 5:14 PM
75 points
12 comments15 min readLW link

AI Safety Newslet­ter #7: Dis­in­for­ma­tion, Gover­nance Recom­men­da­tions for AI labs, and Se­nate Hear­ings on AI

Dan HMay 23, 2023, 9:47 PM
25 points
0 comments6 min readLW link
(newsletter.safe.ai)

Ac­ti­va­tion ad­di­tions in a small resi­d­ual network

Garrett BakerMay 22, 2023, 8:28 PM
22 points
4 comments3 min readLW link

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_MaxwellMar 11, 2020, 4:27 PM
43 points
21 comments4 min readLW link

Ac­ti­va­tion ad­di­tions in a sim­ple MNIST network

Garrett BakerMay 18, 2023, 2:49 AM
26 points
0 comments2 min readLW link

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

Difficul­ties in mak­ing pow­er­ful al­igned AI

DanielFilanMay 14, 2023, 8:50 PM
41 points
1 comment10 min readLW link
(danielfilan.com)

The prob­lem/​solu­tion ma­trix: Calcu­lat­ing the prob­a­bil­ity of AI safety “on the back of an en­velope”

John_MaxwellOct 20, 2019, 8:03 AM
22 points
4 comments2 min readLW link

The case for re­mov­ing al­ign­ment and ML re­search from the train­ing dataset

berenMay 30, 2023, 8:54 PM
48 points
8 comments5 min readLW link

Win­ners of AI Align­ment Awards Re­search Contest

Jul 13, 2023, 4:14 PM
115 points
4 comments12 min readLW link
(alignmentawards.com)

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

Dan HMay 9, 2023, 3:26 PM
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel KokotajloFeb 27, 2020, 6:10 PM
27 points
5 comments8 min readLW link

A Case for the Least For­giv­ing Take On Alignment

Thane RuthenisMay 2, 2023, 9:34 PM
100 points
85 comments22 min readLW link

We need (a lot) more rogue agent honeypots

OzyrusMar 23, 2025, 10:24 PM
36 points
11 comments4 min readLW link

[Question] How much do per­sonal bi­ases in risk as­sess­ment af­fect as­sess­ment of AI risks?

Gordon Seidoh WorleyMay 3, 2023, 6:12 AM
10 points
8 comments1 min readLW link

Will Ar­tifi­cial Su­per­in­tel­li­gence Kill Us?

James_MillerMay 23, 2023, 4:27 PM
33 points
2 comments22 min readLW link

My AI Model Delta Com­pared To Yudkowsky

johnswentworthJun 10, 2024, 4:12 PM
280 points
103 comments4 min readLW link

Grad­ual Disem­pow­er­ment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM
124 points
36 comments6 min readLW link

Re­ply to Holden on ‘Tool AI’

Eliezer YudkowskyJun 12, 2012, 6:00 PM
152 points
356 comments17 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben PaceJul 29, 2020, 9:49 PM
82 points
14 comments7 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_itJan 15, 2018, 2:33 PM
81 points
68 comments2 min readLW link

Uber Self-Driv­ing Crash

jefftkNov 7, 2019, 3:00 PM
109 points
1 comment2 min readLW link
(www.jefftk.com)

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_SotalaMay 2, 2020, 7:35 AM
43 points
19 comments7 min readLW link
(plato.stanford.edu)

Catas­tro­phe through Chaos

Marius HobbhahnJan 31, 2025, 2:19 PM
182 points
17 comments12 min readLW link

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

Jan 30, 2025, 5:03 PM
159 points
52 comments2 min readLW link
(gradual-disempowerment.ai)

Recom­mender Align­ment for Lock-In Risk

alamertonMar 24, 2025, 12:56 PM
2 points
0 comments7 min readLW link

Sys­tems that can­not be un­safe can­not be safe

DavidmanheimMay 2, 2023, 8:53 AM
62 points
27 comments2 min readLW link

Pre­dictable up­dat­ing about AI risk

Joe CarlsmithMay 8, 2023, 9:53 PM
292 points
25 comments36 min readLW link1 review

Safety isn’t safety with­out a so­cial model (or: dis­pel­ling the myth of per se tech­ni­cal safety)

Andrew_CritchJun 14, 2024, 12:16 AM
357 points
38 comments4 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

steven0461May 8, 2023, 10:30 PM
33 points
44 comments2 min readLW link

AISafety.com – Re­sources for AI Safety

May 17, 2024, 3:57 PM
82 points
3 comments1 min readLW link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman LeventovJan 18, 2024, 10:05 AM
5 points
2 comments4 min readLW link

AISN #35: Lob­by­ing on AI Reg­u­la­tion Plus, New Models from OpenAI and Google, and Le­gal Regimes for Train­ing on Copy­righted Data

May 16, 2024, 2:29 PM
2 points
3 comments6 min readLW link
(newsletter.safe.ai)

Let’s build a fire alarm for AGI

chaosmageMay 15, 2023, 9:16 AM
−1 points
0 comments2 min readLW link

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

Dan HMay 16, 2023, 3:14 PM
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

[Question] Why not con­strain wet­labs in­stead of AI?

Lone PineMar 21, 2023, 6:02 PM
15 points
10 comments1 min readLW link

The Sorry State of AI X-Risk Ad­vo­cacy, and Thoughts on Do­ing Better

Thane RuthenisFeb 21, 2025, 8:15 PM
148 points
52 comments6 min readLW link

Why AI Safety is Hard

Simon MöllerMar 22, 2023, 10:44 AM
1 point
0 comments6 min readLW link

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_PorterOct 16, 2012, 4:33 AM
21 points
39 comments1 min readLW link

Three Sto­ries for How AGI Comes Be­fore FAI

John_MaxwellSep 17, 2019, 11:26 PM
27 points
5 comments6 min readLW link

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_SotalaMay 4, 2018, 8:56 AM
14 points
1 comment1 min readLW link
(arxiv.org)

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworthAug 8, 2020, 6:31 PM
155 points
30 comments3 min readLW link

The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

AkashMar 23, 2023, 5:10 PM
107 points
24 comments6 min readLW link

Work on Se­cu­rity In­stead of Friendli­ness?

Wei DaiJul 21, 2012, 6:28 PM
71 points
107 comments2 min readLW link

Re­quest: stop ad­vanc­ing AI capabilities

So8resMay 26, 2023, 5:42 PM
154 points
24 comments1 min readLW link

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben PaceOct 4, 2019, 4:08 AM
221 points
61 comments15 min readLW link2 reviews

Against “ar­gu­ment from over­hang risk”

RobertMMay 16, 2024, 4:44 AM
30 points
11 comments5 min readLW link

In­vi­ta­tion to lead a pro­ject at AI Safety Camp (Vir­tual Edi­tion, 2025)

Aug 23, 2024, 2:18 PM
17 points
2 comments4 min readLW link

A moral back­lash against AI will prob­a­bly slow down AGI development

geoffreymillerJun 7, 2023, 8:39 PM
51 points
10 comments14 min readLW link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin ShahJan 2, 2020, 6:20 PM
36 points
95 comments10 min readLW link
(mailchi.mp)

The Short­est Path Between Scylla and Charybdis

Thane RuthenisDec 18, 2023, 8:08 PM
50 points
8 comments5 min readLW link

AI Safety “Suc­cess Sto­ries”

Wei DaiSep 7, 2019, 2:54 AM
126 points
27 comments4 min readLW link1 review

Catas­trophic Risks from AI #3: AI Race

Jun 23, 2023, 7:21 PM
18 points
9 comments29 min readLW link
(arxiv.org)

Paus­ing AI is Pos­i­tive Ex­pected Value

LironMar 10, 2024, 5:10 PM
9 points
2 comments3 min readLW link
(twitter.com)

Catas­trophic Risks from AI #6: Dis­cus­sion and FAQ

Jun 27, 2023, 11:23 PM
24 points
1 comment13 min readLW link
(arxiv.org)

Levels of safety for AI and other technologies

jasoncrawfordJun 28, 2023, 6:35 PM
16 points
0 comments2 min readLW link
(rootsofprogress.org)

De­tach­ment vs at­tach­ment [AI risk and men­tal health]

Neil Jan 15, 2024, 12:41 AM
14 points
4 comments3 min readLW link

Four lenses on AI risks

jasoncrawfordMar 28, 2023, 9:52 PM
23 points
5 comments3 min readLW link
(rootsofprogress.org)

OpenAI: Fallout

ZviMay 28, 2024, 1:20 PM
204 points
25 comments36 min readLW link
(thezvi.wordpress.com)

“If we go ex­tinct due to mis­al­igned AI, at least na­ture will con­tinue, right? … right?”

plexMay 18, 2024, 2:09 PM
47 points
23 comments2 min readLW link
(aisafety.info)

If you wish to make an ap­ple pie, you must first be­come dic­ta­tor of the universe

jasoncrawfordJul 5, 2023, 6:14 PM
27 points
9 comments13 min readLW link
(rootsofprogress.org)

Kessler’s Se­cond Syndrome

Jesse HooglandJan 26, 2025, 7:04 AM
69 points
2 comments3 min readLW link

A shift in ar­gu­ments for AI risk

Richard_NgoMay 28, 2019, 1:47 PM
32 points
7 comments1 min readLW link
(fragile-credences.github.io)

AI Doom Is Not (Only) Disjunctive

NickGabsMar 30, 2023, 1:42 AM
12 points
0 comments5 min readLW link

Adam Smith Meets AI Doomers

James_MillerJan 31, 2024, 3:53 PM
34 points
10 comments5 min readLW link

[Question] Is this a Pivotal Weak Act? Creat­ing bac­te­ria that de­com­pose metal

doomyeserSep 11, 2024, 6:07 PM
9 points
9 comments3 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhubMay 29, 2020, 8:38 PM
220 points
36 comments38 min readLW link2 reviews

The Mask Comes Off: A Trio of Tales

ZviFeb 14, 2025, 3:30 PM
81 points
1 comment13 min readLW link
(thezvi.wordpress.com)

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

Jun 7, 2019, 7:53 PM
82 points
5 comments6 min readLW link

Brainrot

Jesse HooglandJan 26, 2025, 5:35 AM
42 points
0 comments3 min readLW link

AI risk hub in Sin­ga­pore?

Daniel KokotajloOct 29, 2020, 11:45 AM
58 points
18 comments4 min readLW link

Wizards and prophets of AI [draft for com­ment]

jasoncrawfordMar 31, 2023, 8:22 PM
16 points
11 comments6 min readLW link

Con­sider the hum­ble rock (or: why the dumb thing kills you)

pleiotrothJul 4, 2024, 1:54 PM
62 points
11 comments4 min readLW link

The Ris­ing Sea

Jesse HooglandJan 25, 2025, 8:48 PM
90 points
2 comments2 min readLW link

[Question] Mea­sure of com­plex­ity al­lowed by the laws of the uni­verse and rel­a­tive the­ory?

dr_sSep 7, 2023, 12:21 PM
8 points
22 comments1 min readLW link

The ba­sic rea­sons I ex­pect AGI ruin

Rob BensingerApr 18, 2023, 3:37 AM
189 points
73 comments14 min readLW link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda LinseforsSep 27, 2023, 9:27 PM
22 points
12 comments4 min readLW link

AI al­ign­ment as a trans­la­tion problem

Roman LeventovFeb 5, 2024, 2:14 PM
22 points
2 comments3 min readLW link

AISN #23: New OpenAI Models, News from An­thropic, and Rep­re­sen­ta­tion Engineering

Dan HOct 4, 2023, 5:37 PM
15 points
2 comments5 min readLW link
(newsletter.safe.ai)

An­nounc­ing Hu­man-al­igned AI Sum­mer School

May 22, 2024, 8:55 AM
50 points
0 comments1 min readLW link
(humanaligned.ai)

The AI Safety Game (UPDATED)

Daniel KokotajloDec 5, 2020, 10:27 AM
45 points
10 comments3 min readLW link

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre VariengienFeb 6, 2024, 7:10 PM
75 points
12 comments16 min readLW link

[Question] What does it look like for AI to sig­nifi­cantly im­prove hu­man co­or­di­na­tion, be­fore su­per­in­tel­li­gence?

Bird ConceptJan 15, 2024, 7:22 PM
22 points
2 comments1 min readLW link

My dis­agree­ments with “AGI ruin: A List of Lethal­ities”

Noosphere89Sep 15, 2024, 5:22 PM
36 points
46 comments18 min readLW link

Google’s Eth­i­cal AI team and AI Safety

magfrumpFeb 20, 2021, 9:42 AM
12 points
16 comments7 min readLW link

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_NgoJan 21, 2019, 12:41 PM
133 points
23 comments8 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane RuthenisDec 21, 2023, 8:02 PM
159 points
42 comments1 min readLW link

25 Min Talk on Me­taEth­i­cal.AI with Ques­tions from Stu­art Armstrong

June KuApr 29, 2021, 3:38 PM
21 points
7 comments1 min readLW link

Less Real­is­tic Tales of Doom

Mark XuMay 6, 2021, 11:01 PM
113 points
13 comments4 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimiJun 23, 2021, 9:44 AM
15 points
3 comments3 min readLW link

Sam Alt­man and Ezra Klein on the AI Revolution

Zack_M_DavisJun 27, 2021, 4:53 AM
38 points
17 comments1 min readLW link
(www.nytimes.com)

Com­plex Sys­tems are Hard to Control

jsteinhardtApr 4, 2023, 12:00 AM
42 points
5 comments10 min readLW link
(bounded-regret.ghost.io)

New sur­vey: 46% of Amer­i­cans are con­cerned about ex­tinc­tion from AI; 69% sup­port a six-month pause in AI development

AkashApr 5, 2023, 1:26 AM
46 points
9 comments1 min readLW link
(today.yougov.com)

Gaia Net­work: a prac­ti­cal, in­cre­men­tal path­way to Open Agency Architecture

Dec 20, 2023, 5:11 PM
22 points
8 comments16 min readLW link

Is progress in ML-as­sisted the­o­rem-prov­ing benefi­cial?

mako yassSep 28, 2021, 1:54 AM
11 points
3 comments1 min readLW link

Epistemic Strate­gies of Selec­tion Theorems

adamShimiOct 18, 2021, 8:57 AM
33 points
1 comment12 min readLW link

Cur­rent AIs Provide Nearly No Data Rele­vant to AGI Alignment

Thane RuthenisDec 15, 2023, 8:16 PM
129 points
157 comments8 min readLW link1 review

Us­ing blin­ders to help you see things for what they are

Adam ZernerNov 11, 2021, 7:07 AM
13 points
2 comments2 min readLW link

My cur­rent un­cer­tain­ties re­gard­ing AI, al­ign­ment, and the end of the world

dominicqNov 14, 2021, 2:08 PM
2 points
3 comments2 min readLW link

Ngo and Yud­kowsky on al­ign­ment difficulty

Nov 15, 2021, 8:31 PM
254 points
151 comments99 min readLW link1 review

AI Safety Memes Wiki

Jul 24, 2024, 6:53 PM
34 points
1 comment1 min readLW link
(aisafety.info)

Some ab­stract, non-tech­ni­cal rea­sons to be non-max­i­mally-pes­simistic about AI alignment

Rob BensingerDec 12, 2021, 2:08 AM
70 points
35 comments7 min readLW link

[Talk tran­script] What “struc­ture” is and why it matters

Alex_AltairJul 25, 2024, 3:49 PM
23 points
0 comments5 min readLW link
(www.youtube.com)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [April 2023]

steven0461Apr 8, 2023, 4:21 AM
57 points
88 comments2 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskeyDec 28, 2021, 2:20 AM
10 points
1 comment6 min readLW link
(www.bayesianinvestor.com)

[Question] Did AI pi­o­neers not worry much about AI risks?

lisperatiFeb 9, 2020, 7:58 PM
42 points
9 comments1 min readLW link

Why Billion­aires Will Not Sur­vive an AGI Ex­tinc­tion Event

funnyfrancoMar 15, 2025, 6:08 AM
7 points
0 comments14 min readLW link

n=3 AI Risk Quick Math and Reasoning

lionhearted (Sebastian Marshall)Apr 7, 2023, 8:27 PM
6 points
3 comments4 min readLW link

All images from the WaitButWhy se­quence on AI

trevorApr 8, 2023, 7:36 AM
73 points
5 comments2 min readLW link

The Plan − 2023 Version

johnswentworthDec 29, 2023, 11:34 PM
151 points
40 comments31 min readLW link1 review

How I Formed My Own Views About AI Safety

Neel NandaFeb 27, 2022, 6:50 PM
65 points
6 comments13 min readLW link
(www.neelnanda.io)

“warn­ing about ai doom” is also “an­nounc­ing ca­pa­bil­ities progress to noobs”

the gears to ascensionApr 8, 2023, 11:42 PM
23 points
5 comments3 min readLW link

It Looks Like You’re Try­ing To Take Over The World

gwernMar 9, 2022, 4:35 PM
407 points
120 comments1 min readLW link1 review
(www.gwern.net)

[RETRACTED] It’s time for EA lead­er­ship to pull the short-timelines fire alarm.

Not RelevantApr 8, 2022, 4:07 PM
115 points
166 comments4 min readLW link

AI Safety Newslet­ter #1 [CAIS Linkpost]

Apr 10, 2023, 8:18 PM
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

Twit­ter thread on AI takeover scenarios

Richard_NgoJul 31, 2024, 12:24 AM
37 points
0 comments2 min readLW link
(x.com)

Up­grad­ing the AI Safety Community

Dec 16, 2023, 3:34 PM
42 points
9 comments42 min readLW link

The­ory of Change for AI Safety Camp

Linda LinseforsJan 22, 2025, 10:07 PM
35 points
3 comments7 min readLW link

Where’s the foom?

Fergus FettesApr 11, 2023, 3:50 PM
34 points
27 comments2 min readLW link

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

May 30, 2022, 8:25 PM
51 points
3 comments25 min readLW link

Re­sponse to Aschen­bren­ner’s “Si­tu­a­tional Aware­ness”

Rob BensingerJun 6, 2024, 10:57 PM
194 points
27 comments3 min readLW link

[Link] Sarah Con­stantin: “Why I am Not An AI Doomer”

lbThingrbApr 12, 2023, 1:52 AM
61 points
13 comments1 min readLW link
(sarahconstantin.substack.com)

Some dis­junc­tive rea­sons for ur­gency on AI risk

Wei DaiFeb 15, 2019, 8:43 PM
36 points
24 comments1 min readLW link

Why I don’t be­lieve in doom

mukashiJun 7, 2022, 11:49 PM
6 points
30 comments4 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

Dec 19, 2023, 7:14 PM
45 points
4 comments6 min readLW link
(arxiv.org)

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMillsJun 15, 2022, 3:12 AM
35 points
4 comments2 min readLW link

Are we there yet?

theflowerpotJun 20, 2022, 11:19 AM
2 points
2 comments1 min readLW link

Fi­nan­cial Times: We must slow down the race to God-like AI

trevorApr 13, 2023, 7:55 PM
113 points
17 comments16 min readLW link
(www.ft.com)

AI #76: Six Shorts Sto­ries About OpenAI

ZviAug 8, 2024, 1:50 PM
53 points
10 comments48 min readLW link
(thezvi.wordpress.com)

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan HJul 2, 2022, 12:09 AM
40 points
0 comments1 min readLW link
(arxiv.org)

In­tro­duc­ing AI Lab Watch

Zach Stein-PerlmanApr 30, 2024, 5:00 PM
224 points
30 comments1 min readLW link
(ailabwatch.org)

The Align­ment Problem

lsusrJul 11, 2022, 3:03 AM
46 points
18 comments3 min readLW link

We don’t want to post again “This might be the last AI Safety Camp”

Jan 21, 2025, 12:03 PM
36 points
17 comments1 min readLW link
(manifund.org)

AI Takeover Sce­nario with Scaled LLMs

simeon_cApr 16, 2023, 11:28 PM
42 points
15 comments8 min readLW link

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_FriedrichAug 3, 2022, 7:17 PM
1 point
0 comments1 min readLW link
(forms.gle)

Non-Ad­ver­sar­ial Good­hart and AI Risks

DavidmanheimMar 27, 2018, 1:39 AM
22 points
11 comments6 min readLW link

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

VikaAug 12, 2022, 9:06 PM
395 points
37 comments14 min readLW link1 review

Against most, but not all, AI risk analogies

Matthew BarnettJan 14, 2024, 3:36 AM
63 points
41 comments7 min readLW link

Thoughts on ‘List of Lethal­ities’

Alex Lawsen Aug 17, 2022, 6:33 PM
27 points
0 comments10 min readLW link

The Dis­solu­tion of AI Safety

RokoDec 12, 2024, 10:34 AM
8 points
44 comments1 min readLW link
(www.transhumanaxiology.com)

Beyond Hyperanthropomorphism

PointlessOneAug 21, 2022, 5:55 PM
3 points
17 comments1 min readLW link
(studio.ribbonfarm.com)

Assess­ment of AI safety agen­das: think about the down­side risk

Roman LeventovDec 19, 2023, 9:00 AM
13 points
1 comment1 min readLW link

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [Sept 2022]

plexSep 8, 2022, 11:56 AM
22 points
48 comments3 min readLW link

Do not delete your mis­al­igned AGI.

mako yassMar 24, 2024, 9:37 PM
62 points
13 comments3 min readLW link

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

DavidmanheimSep 21, 2022, 7:32 PM
13 points
0 comments1 min readLW link

Re­sults from the lan­guage model hackathon

Esben KranOct 10, 2022, 8:29 AM
22 points
1 comment4 min readLW link

A case for AI al­ign­ment be­ing difficult

jessicataDec 31, 2023, 7:55 PM
105 points
58 comments15 min readLW link1 review
(unstableontology.com)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Jan 26, 2023, 9:01 PM
39 points
81 comments2 min readLW link

Deep­Mind and Google Brain are merg­ing [Linkpost]

AkashApr 20, 2023, 6:47 PM
55 points
5 comments1 min readLW link
(www.deepmind.com)

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_SotalaFeb 12, 2018, 12:30 PM
45 points
4 comments6 min readLW link
(kajsotala.fi)

Six AI Risk/​Strat­egy Ideas

Wei DaiAug 27, 2019, 12:40 AM
73 points
17 comments4 min readLW link1 review

Drexler on AI Risk

PeterMcCluskeyFeb 1, 2019, 5:11 AM
35 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

The strat­egy-steal­ing assumption

paulfchristianoSep 16, 2019, 3:23 PM
86 points
53 comments12 min readLW link3 reviews

What I Learned Run­ning Refine

adamShimiNov 24, 2022, 2:49 PM
108 points
5 comments4 min readLW link

[Question] Why don’t quan­tiliz­ers also cut off the up­per end of the dis­tri­bu­tion?

Alex_AltairMay 15, 2023, 1:40 AM
25 points
2 comments1 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_CritchJun 14, 2022, 7:31 PM
241 points
41 comments2 min readLW link1 review

[Question] What do you mean with ‘al­ign­ment is solv­able in prin­ci­ple’?

RemmeltJan 17, 2025, 3:03 PM
3 points
9 comments1 min readLW link

Ex­perts’ AI timelines are longer than you have been told?

Vasco GriloJan 16, 2025, 6:03 PM
10 points
4 comments3 min readLW link
(bayes.net)

AI Align­ment Meme Viruses

RationalDinoJan 15, 2025, 3:55 PM
4 points
0 comments2 min readLW link

LLMs seem (rel­a­tively) safe

JustisMillsApr 25, 2024, 10:13 PM
53 points
24 comments7 min readLW link
(justismills.substack.com)

A prob­lem shared by many differ­ent al­ign­ment targets

ThomasCederborgJan 15, 2025, 2:22 PM
12 points
16 comments36 min readLW link

False Pos­i­tives in En­tity-Level Hal­lu­ci­na­tion De­tec­tion: A Tech­ni­cal Challenge

MaxKamacheeJan 14, 2025, 7:22 PM
1 point
0 comments2 min readLW link

Thoughts on the In-Con­text Schem­ing AI Experiment

ExCephJan 9, 2025, 2:19 AM
3 points
0 comments4 min readLW link

The “Every­one Can’t Be Wrong” Prior causes AI risk de­nial but helped pre­his­toric people

Knight LeeJan 9, 2025, 5:54 AM
1 point
0 comments2 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part III: Failure Modes

DeLesley HutchinsDec 5, 2024, 9:39 AM
4 points
0 comments20 min readLW link

Logic vs in­tu­ition ⇔ al­gorithm vs ML

pchvykovJan 4, 2025, 9:06 AM
5 points
0 comments7 min readLW link

The In­tel­li­gence Curse

lukedragoJan 3, 2025, 7:07 PM
122 points
26 comments18 min readLW link
(lukedrago.substack.com)

Align­ment Is Not All You Need

Adam JonesJan 2, 2025, 5:50 PM
43 points
10 comments6 min readLW link
(adamjones.me)

I Recom­mend More Train­ing Rationales

Gianluca CalcagniDec 31, 2024, 2:06 PM
2 points
0 comments6 min readLW link

Take­off speeds pre­sen­ta­tion at Anthropic

Tom DavidsonJun 4, 2024, 10:46 PM
92 points
0 comments25 min readLW link

Hu­man study on AI spear phish­ing campaigns

Jan 3, 2025, 3:11 PM
79 points
8 comments5 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven ByrnesDec 1, 2023, 5:30 PM
197 points
62 comments14 min readLW link1 review

How to solve the mi­suse prob­lem as­sum­ing that in 10 years the de­fault sce­nario is that AGI agents are ca­pa­ble of syn­thetiz­ing pathogens

jeremttiNov 27, 2024, 9:17 PM
6 points
0 comments9 min readLW link

AISC 2024 - Pro­ject Summaries

NickyPNov 27, 2023, 10:32 PM
48 points
3 comments18 min readLW link

Hope to live or fear to die?

Knight LeeNov 27, 2024, 10:42 AM
3 points
0 comments1 min readLW link

I cre­ated an Asi Align­ment Tier List

TimeGoatApr 21, 2024, 6:44 PM
−6 points
0 comments1 min readLW link

Two Tales of AI Takeover: My Doubts

Violet HourMar 5, 2024, 3:51 PM
30 points
8 comments29 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part II: The­ory and Experiment

DeLesley HutchinsDec 5, 2024, 9:04 AM
2 points
0 comments17 min readLW link

Tak­ing Away the Guns First: The Fun­da­men­tal Flaw in AI Development

s-iceNov 26, 2024, 10:11 PM
1 point
0 comments17 min readLW link

A Guide to Fore­cast­ing AI Science Ca­pa­bil­ities

Eleni AngelouApr 29, 2023, 11:24 PM
6 points
1 comment4 min readLW link

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_ANov 24, 2024, 5:17 PM
1 point
0 comments9 min readLW link

A bet­ter “State­ment on AI Risk?”

Knight LeeNov 25, 2024, 4:50 AM
5 points
6 comments3 min readLW link

On ex­clud­ing dan­ger­ous in­for­ma­tion from training

ShayBenMosheNov 17, 2023, 11:14 AM
23 points
5 comments3 min readLW link

[Question] Have we seen any “ReLU in­stead of sig­moid-type im­prove­ments” recently

KvmanThinkingNov 23, 2024, 3:51 AM
2 points
4 comments1 min readLW link

Truth Ter­mi­nal: A re­con­struc­tion of events

Nov 17, 2024, 11:51 PM
3 points
1 comment7 min readLW link

An­nounc­ing At­las Computing

miyazonoApr 11, 2024, 3:56 PM
44 points
4 comments4 min readLW link

Propos­ing the Con­di­tional AI Safety Treaty (linkpost TIME)

otto.bartenNov 15, 2024, 1:59 PM
10 points
8 comments3 min readLW link
(time.com)

[Question] Can sin­gu­lar­ity emerge from trans­form­ers?

MPApr 8, 2024, 2:26 PM
−3 points
1 comment1 min readLW link

[Question] What’s the pro­to­col for if a novice has ML ideas that are un­likely to work, but might im­prove ca­pa­bil­ities if they do work?

droctaJan 9, 2024, 10:51 PM
6 points
2 comments2 min readLW link

What should AI safety be try­ing to achieve?

EuanMcLeanMay 23, 2024, 11:17 AM
17 points
1 comment13 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

Nov 7, 2023, 5:59 PM
38 points
2 comments2 min readLW link
(arxiv.org)

Con­fronting the le­gion of doom.

Spiritus DeiNov 13, 2024, 5:03 PM
−20 points
3 comments5 min readLW link

Sym­biotic self-al­ign­ment of AIs.

Spiritus DeiNov 7, 2023, 5:18 PM
1 point
0 comments3 min readLW link

The So­cial Align­ment Problem

irvingApr 28, 2023, 2:16 PM
99 points
13 comments8 min readLW link

Do you want a first-prin­ci­pled pre­pared­ness guide to pre­pare your­self and loved ones for po­ten­tial catas­tro­phes?

Ulrik HornNov 14, 2023, 12:13 PM
16 points
5 comments15 min readLW link

[Question] Real­is­tic near-fu­ture sce­nar­ios of AI doom un­der­stand­able for non-techy peo­ple?

RomanSApr 28, 2023, 2:45 PM
4 points
4 comments1 min readLW link

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir KumarNov 16, 2023, 2:02 AM
1 point
0 comments1 min readLW link

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Joerg WeissNov 16, 2023, 6:46 PM
11 points
2 comments29 min readLW link

Killswitch

JunioNov 18, 2023, 10:53 PM
2 points
0 comments3 min readLW link

Ilya: The AI sci­en­tist shap­ing the world

David VargaNov 20, 2023, 1:09 PM
11 points
0 comments4 min readLW link

AI as Su­per-Demagogue

RationalDinoNov 5, 2023, 9:21 PM
11 points
12 comments9 min readLW link

Thoughts af­ter the Wolfram and Yud­kowsky discussion

TahpNov 14, 2024, 1:43 AM
25 points
13 comments6 min readLW link

The two para­graph ar­gu­ment for AI risk

CronoDASNov 25, 2023, 2:01 AM
19 points
8 comments1 min readLW link

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreigNov 29, 2023, 1:59 PM
4 points
0 comments5 min readLW link

Sup­port me in a Week-Long Pick­et­ing Cam­paign Near OpenAI’s HQ: Seek­ing Sup­port and Ideas from the LessWrong Community

PercyApr 30, 2023, 5:48 PM
−21 points
15 comments1 min readLW link

What AI safety re­searchers can learn from Ma­hatma Gandhi

Lysandre TerrisseNov 8, 2024, 7:49 PM
−6 points
0 comments3 min readLW link

[Question] fake al­ign­ment solu­tions????

KvmanThinkingDec 11, 2024, 3:31 AM
1 point
6 comments1 min readLW link

The benefits and risks of op­ti­mism (about AI safety)

Karl von WendtDec 3, 2023, 12:45 PM
−7 points
6 comments5 min readLW link

A call for a quan­ti­ta­tive re­port card for AI bioter­ror­ism threat models

JunoDec 4, 2023, 6:35 AM
12 points
0 comments10 min readLW link

Sur­vey of 2,778 AI au­thors: six parts in pictures

KatjaGraceJan 6, 2024, 4:43 AM
80 points
1 comment2 min readLW link

[Question] Ac­cu­racy of ar­gu­ments that are seen as ridicu­lous and in­tu­itively false but don’t have good counter-arguments

Christopher KingApr 29, 2023, 11:58 PM
30 points
39 comments1 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

Aug 3, 2024, 10:16 AM
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasperNov 4, 2023, 8:08 PM
277 points
42 comments3 min readLW link

Call for sub­mis­sions: Choice of Fu­tures sur­vey questions

c.troutApr 30, 2023, 6:59 AM
4 points
0 comments2 min readLW link
(airtable.com)

Tether­ware #2: What ev­ery hu­man should know about our most likely AI future

Jáchym FibírFeb 28, 2025, 11:12 AM
3 points
0 comments11 min readLW link
(tetherware.substack.com)

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbanceNov 2, 2023, 5:43 PM
84 points
89 comments5 min readLW link

Nav­i­gat­ing pub­lic AI x-risk hype while pur­su­ing tech­ni­cal solutions

Dan BraunFeb 19, 2023, 12:22 PM
18 points
0 comments2 min readLW link

Why em­piri­cists should be­lieve in AI risk

Knight LeeDec 11, 2024, 3:51 AM
5 points
0 comments1 min readLW link

Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik SamoylovOct 30, 2023, 11:42 PM
−19 points
11 comments2 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoinerOct 30, 2024, 6:31 PM
46 points
8 comments4 min readLW link

[Question] How does the ever-in­creas­ing use of AI in the mil­i­tary for the di­rect pur­pose of mur­der­ing peo­ple af­fect your p(doom)?

JustausernameApr 6, 2024, 6:31 AM
19 points
16 comments1 min readLW link

Eval­u­at­ing the fea­si­bil­ity of SI’s plan

JoshuaFoxJan 10, 2013, 8:17 AM
39 points
187 comments4 min readLW link

Limit­ing fac­tors to pre­dict AI take-off speed

Alfonso Pérez EscuderoMay 31, 2023, 11:19 PM
1 point
0 comments6 min readLW link

A Dou­ble-Fea­ture on The Extropians

Maxwell TabarrokJun 3, 2023, 6:27 PM
59 points
4 comments1 min readLW link

In­trin­sic vs. Ex­trin­sic Alignment

Alfonso Pérez EscuderoJun 1, 2023, 1:06 AM
1 point
1 comment3 min readLW link

How will they feed us

meijer1973Jun 1, 2023, 8:49 AM
4 points
3 comments5 min readLW link

Un­pre­dictabil­ity and the In­creas­ing Difficulty of AI Align­ment for In­creas­ingly In­tel­li­gent AI

Max_He-HoMay 31, 2023, 10:25 PM
5 points
2 comments20 min readLW link

The un­spo­ken but ridicu­lous as­sump­tion of AI doom: the hid­den doom assumption

Christopher KingJun 1, 2023, 5:01 PM
−9 points
1 comment3 min readLW link

Open Source LLMs Can Now Ac­tively Lie

Josh LevyJun 1, 2023, 10:03 PM
6 points
0 comments3 min readLW link

Q&A with ex­perts on risks from AI #1

XiXiDuJan 8, 2012, 11:46 AM
45 points
67 comments9 min readLW link

Algo trad­ing is a cen­tral ex­am­ple of AI risk

Vanessa KosoyJul 28, 2018, 8:31 PM
27 points
5 comments1 min readLW link

Will the world’s elites nav­i­gate the cre­ation of AI just fine?

lukeprogMay 31, 2013, 6:49 PM
36 points
266 comments2 min readLW link

Pro­posal: labs should pre­com­mit to paus­ing if an AI ar­gues for it­self to be improved

NickGabsJun 2, 2023, 10:31 PM
3 points
3 comments4 min readLW link

[FICTION] Prometheus Ris­ing: The Emer­gence of an AI Consciousness

Super AGIJun 10, 2023, 4:41 AM
−14 points
0 comments9 min readLW link

An­drew Ng wants to have a con­ver­sa­tion about ex­tinc­tion risk from AI

Leon LangJun 5, 2023, 10:29 PM
32 points
2 comments1 min readLW link
(twitter.com)

The (lo­cal) unit of in­tel­li­gence is FLOPs

boazbarakJun 5, 2023, 6:23 PM
42 points
7 comments5 min readLW link

Non-loss of con­trol AGI-re­lated catas­tro­phes are out of con­trol too

Jun 12, 2023, 12:01 PM
2 points
3 comments24 min readLW link

A Play­book for AI Risk Re­duc­tion (fo­cused on mis­al­igned AI)

HoldenKarnofskyJun 6, 2023, 6:05 PM
90 points
42 comments14 min readLW link1 review

Us­ing Con­sen­sus Mechanisms as an ap­proach to Alignment

PrometheusJun 10, 2023, 11:38 PM
11 points
2 comments6 min readLW link

Agen­tic Mess (A Failure Story)

Jun 6, 2023, 1:09 PM
46 points
5 comments13 min readLW link

Man­i­fold Pre­dicted the AI Ex­tinc­tion State­ment and CAIS Wanted it Deleted

David CheeJun 12, 2023, 3:54 PM
71 points
15 comments12 min readLW link

Cur­rent AI harms are also sci-fi

Christopher KingJun 8, 2023, 5:49 PM
26 points
3 comments1 min readLW link

[FICTION] Un­box­ing Ely­sium: An AI’S Escape

Super AGIJun 10, 2023, 4:41 AM
−16 points
4 comments14 min readLW link

Why AI may not save the World

Alberto ZannoniJun 9, 2023, 5:42 PM
0 points
0 comments4 min readLW link
(a16z.com)

[Question] AI Rights: In your view, what would be re­quired for an AGI to gain rights and pro­tec­tions from the var­i­ous Govern­ments of the World?

Super AGIJun 9, 2023, 1:24 AM
10 points
26 comments1 min readLW link

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

Jun 12, 2023, 5:55 PM
67 points
6 comments4 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

David Scott Krueger (formerly: capybaralet)Jun 12, 2019, 9:53 PM
44 points
33 comments6 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

Jun 27, 2023, 6:05 AM
38 points
2 comments15 min readLW link

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_ArmstrongNov 25, 2019, 10:40 AM
25 points
15 comments1 min readLW link

Aligned Ob­jec­tives Prize Competition

PrometheusJun 15, 2023, 12:42 PM
8 points
0 comments2 min readLW link
(app.impactmarkets.io)

Light­ning Post: Things peo­ple in AI Safety should stop talk­ing about

PrometheusJun 20, 2023, 3:00 PM
23 points
6 comments2 min readLW link

Q&A with Stan Fran­klin on risks from AI

XiXiDuJun 11, 2011, 3:22 PM
36 points
10 comments2 min readLW link

A Friendly Face (Another Failure Story)

Jun 20, 2023, 10:31 AM
65 points
21 comments16 min readLW link

Muehlhauser-Go­ertzel Dialogue, Part 1

lukeprogMar 16, 2012, 5:12 PM
42 points
161 comments33 min readLW link

Ex­plor­ing Last-Re­sort Mea­sures for AI Align­ment: Hu­man­ity’s Ex­tinc­tion Switch

0xPetraJun 23, 2023, 5:01 PM
7 points
0 comments2 min readLW link

[LINK] NYT Ar­ti­cle about Ex­is­ten­tial Risk from AI

[deleted]Jan 28, 2013, 10:37 AM
38 points
23 comments1 min readLW link

Refram­ing the Prob­lem of AI Progress

Wei DaiApr 12, 2012, 7:31 PM
32 points
47 comments1 min readLW link

New AI risks re­search in­sti­tute at Oxford University

lukeprogNov 16, 2011, 6:52 PM
36 points
10 comments1 min readLW link

Memes and Ra­tional Decisions

inferentialJan 9, 2015, 6:42 AM
35 points
18 comments10 min readLW link

I just watched don’t look up.

ATheCoderJun 23, 2023, 9:22 PM
0 points
5 comments2 min readLW link

AI-Plans.com—a con­tributable compendium

IknownothingJun 25, 2023, 2:40 PM
39 points
7 comments4 min readLW link
(ai-plans.com)

Cheat sheet of AI X-risk

momom2Jun 29, 2023, 4:28 AM
19 points
1 comment7 min readLW link

Challenge pro­posal: small­est pos­si­ble self-hard­en­ing back­door for RLHF

Christopher KingJun 29, 2023, 4:56 PM
7 points
0 comments2 min readLW link

Biosafety Reg­u­la­tions (BMBL) and their rele­vance for AI

Štěpán LosJun 29, 2023, 7:22 PM
4 points
0 comments4 min readLW link

Me­taphors for AI, and why I don’t like them

boazbarakJun 28, 2023, 10:47 PM
38 points
18 comments12 min readLW link

AGI & War

CalecuteJun 29, 2023, 10:20 PM
9 points
1 comment1 min readLW link

AI In­ci­dent Shar­ing—Best prac­tices from other fields and a com­pre­hen­sive list of ex­ist­ing platforms

Štěpán LosJun 28, 2023, 5:21 PM
20 points
0 comments4 min readLW link

AI Safety with­out Align­ment: How hu­mans can WIN against AI

vicchainJun 29, 2023, 5:53 PM
1 point
1 comment2 min readLW link

Levels of AI Self-Im­prove­ment

avturchinApr 29, 2018, 11:45 AM
11 points
1 comment39 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraperApr 30, 2020, 10:47 AM
4 points
1 comment51 min readLW link

Quan­ti­ta­tive cruxes in Alignment

Martín SotoJul 2, 2023, 8:38 PM
19 points
0 comments23 min readLW link

Sources of ev­i­dence in Alignment

Martín SotoJul 2, 2023, 8:38 PM
20 points
0 comments11 min readLW link

Some Thoughts on Sin­gu­lar­ity Strategies

Wei DaiJul 13, 2011, 2:41 AM
45 points
30 comments3 min readLW link

An AGI kill switch with defined se­cu­rity properties

PeterpiperJul 5, 2023, 5:40 PM
−5 points
6 comments1 min readLW link

How I Learned To Stop Wor­ry­ing And Love The Shoggoth

Peter MerelJul 12, 2023, 5:47 PM
9 points
15 comments5 min readLW link

Do you feel that AGI Align­ment could be achieved in a Type 0 civ­i­liza­tion?

Super AGIJul 6, 2023, 4:52 AM
−2 points
1 comment1 min readLW link

An Overview of AI risks—the Flyer

Jul 17, 2023, 12:03 PM
20 points
0 comments1 min readLW link
(docs.google.com)

ACI#4: Seed AI is the new Per­pet­ual Mo­tion Machine

Akira PyinyaJul 8, 2023, 1:17 AM
−1 points
0 comments6 min readLW link

An­nounc­ing AI Align­ment work­shop at the ALIFE 2023 conference

rorygreigJul 8, 2023, 1:52 PM
16 points
0 comments1 min readLW link
(humanvaluesandartificialagency.com)

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric DrexlerJul 10, 2023, 1:42 PM
117 points
9 comments12 min readLW link

A trick for Safer GPT-N

RaziedAug 23, 2020, 12:39 AM
7 points
1 comment2 min readLW link

Gear­ing Up for Long Timelines in a Hard World

DalcyJul 14, 2023, 6:11 AM
15 points
0 comments4 min readLW link

against “AI risk”

Wei DaiApr 11, 2012, 10:46 PM
35 points
91 comments1 min readLW link

“Smarter than us” is out!

Stuart_ArmstrongFeb 25, 2014, 3:50 PM
41 points
57 comments1 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_ArmstrongNov 22, 2019, 2:17 PM
22 points
16 comments4 min readLW link

[Question] What crite­rion would you use to se­lect com­pa­nies likely to cause AI doom?

momom2Jul 13, 2023, 8:31 PM
8 points
4 comments1 min readLW link

Why was the AI Align­ment com­mu­nity so un­pre­pared for this mo­ment?

Ras1513Jul 15, 2023, 12:26 AM
121 points
65 comments2 min readLW link

Sim­ple al­ign­ment plan that maybe works

IknownothingJul 18, 2023, 10:48 PM
4 points
8 comments1 min readLW link

Ru­n­away Op­ti­miz­ers in Mind Space

silentbobJul 16, 2023, 2:26 PM
16 points
0 comments12 min readLW link

Quick Thoughts on Lan­guage Models

RohanSJul 18, 2023, 8:38 PM
6 points
0 comments4 min readLW link

[Question] Why don’t we con­sider large forms of so­cial or­ga­ni­za­tion (eco­nomic and poli­ti­cal forms, in par­tic­u­lar) to qual­ify as AGI and Trans­for­ma­tive AI?

T LJul 16, 2023, 6:54 PM
1 point
0 comments2 min readLW link

Proof of pos­te­ri­or­ity: a defense against AI-gen­er­ated misinformation

jchanJul 17, 2023, 12:04 PM
33 points
3 comments5 min readLW link

Q&A with Abram Dem­ski on risks from AI

XiXiDuJan 17, 2012, 9:43 AM
33 points
71 comments9 min readLW link

Q&A with ex­perts on risks from AI #2

XiXiDuJan 9, 2012, 7:40 PM
22 points
29 comments7 min readLW link

AI Safety Dis­cus­sion Day

Linda LinseforsSep 15, 2020, 2:40 PM
20 points
0 comments1 min readLW link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

otto.bartenJul 24, 2023, 10:07 AM
12 points
0 comments7 min readLW link
(time.com)

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

Søren ElverlinSep 27, 2020, 5:51 PM
17 points
6 comments1 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2023]

smallsiloJul 20, 2023, 8:20 PM
38 points
40 comments2 min readLW link
(forum.effectivealtruism.org)

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

JustausernameJul 23, 2023, 4:08 PM
4 points
1 comment3 min readLW link

Au­tonomous Align­ment Over­sight Frame­work (AAOF)

JustausernameJul 25, 2023, 10:25 AM
−9 points
0 comments4 min readLW link

A re­sponse to the Richards et al.’s “The Illu­sion of AI’s Ex­is­ten­tial Risk”

Harrison FellJul 26, 2023, 5:34 PM
1 point
0 comments10 min readLW link

[Question] Have you ever con­sid­ered tak­ing the ‘Tur­ing Test’ your­self?

Super AGIJul 27, 2023, 3:48 AM
2 points
6 comments1 min readLW link

Eval­u­at­ing Su­per­hu­man Models with Con­sis­tency Checks

Aug 1, 2023, 7:51 AM
21 points
2 comments9 min readLW link
(arxiv.org)

3 lev­els of threat obfuscation

HoldenKarnofskyAug 2, 2023, 2:58 PM
69 points
14 comments7 min readLW link

4 types of AGI se­lec­tion, and how to con­strain them

RemmeltAug 8, 2023, 10:02 AM
−4 points
3 comments3 min readLW link

Embed­ding Eth­i­cal Pri­ors into AI Sys­tems: A Bayesian Approach

JustausernameAug 3, 2023, 3:31 PM
−5 points
3 comments21 min readLW link

Ilya Sutskever’s thoughts on AI safety (July 2023): a tran­script with my comments

mishkaAug 10, 2023, 7:07 PM
21 points
3 comments5 min readLW link

Some al­ign­ment ideas

SelonNeriasAug 10, 2023, 5:51 PM
1 point
0 comments11 min readLW link

Seek­ing In­put to AI Safety Book for non-tech­ni­cal audience

Darren McKeeAug 10, 2023, 5:58 PM
10 points
4 comments1 min readLW link

On­line AI Safety Dis­cus­sion Day

Linda LinseforsOct 8, 2020, 12:11 PM
5 points
0 comments1 min readLW link

Ex­is­ten­tially rele­vant thought ex­per­i­ment: To kill or not to kill, a sniper, a man and a but­ton.

AlexFromSafeTransitionAug 14, 2023, 10:53 AM
−18 points
6 comments4 min readLW link

We Should Pre­pare for a Larger Rep­re­sen­ta­tion of Academia in AI Safety

Leon LangAug 13, 2023, 6:03 PM
90 points
14 comments5 min readLW link

De­com­pos­ing in­de­pen­dent gen­er­al­iza­tions in neu­ral net­works via Hes­sian analysis

Aug 14, 2023, 5:04 PM
83 points
4 comments1 min readLW link

AISN #19: US-China Com­pe­ti­tion on AI Chips, Mea­sur­ing Lan­guage Agent Devel­op­ments, Eco­nomic Anal­y­sis of Lan­guage Model Pro­pa­ganda, and White House AI Cy­ber Challenge

Dan HAug 15, 2023, 4:10 PM
21 points
0 comments5 min readLW link
(newsletter.safe.ai)

Ideas for im­prov­ing epistemics in AI safety outreach

micAug 21, 2023, 7:55 PM
64 points
6 comments3 min readLW link

Mesa-Op­ti­miza­tion: Ex­plain it like I’m 10 Edition

brookAug 26, 2023, 11:04 PM
20 points
1 comment6 min readLW link

En­hanc­ing Cor­rigi­bil­ity in AI Sys­tems through Ro­bust Feed­back Loops

JustausernameAug 24, 2023, 3:53 AM
1 point
0 comments6 min readLW link

Reflec­tions on “Mak­ing the Atomic Bomb”

boazbarakAug 17, 2023, 2:48 AM
51 points
7 comments8 min readLW link

AI Reg­u­la­tion May Be More Im­por­tant Than AI Align­ment For Ex­is­ten­tial Safety

otto.bartenAug 24, 2023, 11:41 AM
65 points
39 comments5 min readLW link

When dis­cussing AI doom bar­ri­ers pro­pose spe­cific plau­si­ble scenarios

anithiteAug 18, 2023, 4:06 AM
5 points
0 comments3 min readLW link

Mili­tary AI as a Con­ver­gent Goal of Self-Im­prov­ing AI

avturchinNov 13, 2017, 12:17 PM
5 points
3 comments1 min readLW link

2 un­usual rea­sons for why we can avoid be­ing turned into paperclips

Artem PanushAug 19, 2023, 10:28 AM
1 point
0 comments4 min readLW link

[Question] Clar­ify­ing how mis­al­ign­ment can arise from scal­ing LLMs

UtilAug 19, 2023, 2:16 PM
3 points
1 comment1 min readLW link

Will AI kill ev­ery­one? Here’s what the god­fathers of AI have to say [RA video]

WriterAug 19, 2023, 5:29 PM
58 points
8 comments1 min readLW link
(youtu.be)

Re­port on Fron­tier Model Training

YafahEdelmanAug 30, 2023, 8:02 PM
122 points
21 comments21 min readLW link
(docs.google.com)

Ram­ble on STUFF: in­tel­li­gence, simu­la­tion, AI, doom, de­fault mode, the usual

Bill BenzonAug 26, 2023, 3:49 PM
5 points
0 comments4 min readLW link

The Game of Dominance

Karl von WendtAug 27, 2023, 11:04 AM
24 points
15 comments6 min readLW link

A Let­ter to the Edi­tor of MIT Tech­nol­ogy Review

JeffsAug 30, 2023, 4:59 PM
0 points
0 comments2 min readLW link

The Epistemic Author­ity of Deep Learn­ing Pioneers

Dylan BowmanAug 29, 2023, 6:14 PM
8 points
2 comments3 min readLW link

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

Dan HAug 29, 2023, 3:07 PM
12 points
0 comments8 min readLW link
(newsletter.safe.ai)

Neu­ral pro­gram syn­the­sis is a dan­ger­ous technology

syllogismJan 12, 2018, 4:19 PM
10 points
6 comments2 min readLW link

New, Brief Pop­u­lar-Level In­tro­duc­tion to AI Risks and Superintelligence

LyleNJan 23, 2015, 3:43 PM
33 points
3 comments1 min readLW link

Video es­say: How Will We Know When AI is Con­scious?

JanProSep 6, 2023, 6:10 PM
11 points
7 comments1 min readLW link
(www.youtube.com)

AISN #21: Google Deep­Mind’s GPT-4 Com­peti­tor, Mili­tary In­vest­ments in Au­tonomous Drones, The UK AI Safety Sum­mit, and Case Stud­ies in AI Policy

Dan HSep 5, 2023, 3:03 PM
15 points
0 comments5 min readLW link
(newsletter.safe.ai)

[Question] What is to be done? (About the profit mo­tive)

Connor BarberSep 8, 2023, 7:27 PM
1 point
21 comments1 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 8

Sep 9, 2023, 4:34 PM
28 points
0 comments13 min readLW link

FAI Re­search Con­straints and AGI Side Effects

JustinShovelainJun 3, 2015, 7:25 PM
27 points
59 comments7 min readLW link

A con­ver­sa­tion with Pi, a con­ver­sa­tional AI.

Spiritus DeiSep 15, 2023, 11:13 PM
1 point
0 comments1 min readLW link

Jimmy Ap­ples, source of the ru­mor that OpenAI has achieved AGI in­ter­nally, is a cred­ible in­sider.

JorterderSep 28, 2023, 1:20 AM
−6 points
2 comments1 min readLW link
(twitter.com)

MMLU’s Mo­ral Sce­nar­ios Bench­mark Doesn’t Mea­sure What You Think it Measures

corey morrisSep 27, 2023, 5:54 PM
18 points
2 comments4 min readLW link
(medium.com)

Tech­ni­cal AI Safety Re­search Land­scape [Slides]

Magdalena WacheSep 18, 2023, 1:56 PM
41 points
0 comments4 min readLW link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AINov 14, 2020, 3:51 PM
34 points
6 comments1 min readLW link

[Question] Whom Do You Trust?

JackOfAllTradesFeb 26, 2024, 7:38 PM
1 point
0 comments1 min readLW link

[Linkpost] Mark Zucker­berg con­fronted about Meta’s Llama 2 AI’s abil­ity to give users de­tailed guidance on mak­ing an­thrax—Busi­ness Insider

micSep 26, 2023, 12:05 PM
18 points
11 comments2 min readLW link
(www.businessinsider.com)

The mind-killer

Paul CrowleyMay 2, 2009, 4:49 PM
29 points
160 comments2 min readLW link

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotalSep 29, 2023, 2:01 PM
160 points
79 comments1 min readLW link
(titotal.substack.com)

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni AngelouSep 23, 2023, 10:03 PM
37 points
15 comments2 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre PeignéSep 23, 2023, 4:21 PM
30 points
8 comments5 min readLW link

Linkpost: Are Emer­gent Abil­ities in Large Lan­guage Models just In-Con­text Learn­ing?

Erich_GrunewaldOct 8, 2023, 12:14 PM
12 points
7 comments2 min readLW link
(arxiv.org)

Paper: Iden­ti­fy­ing the Risks of LM Agents with an LM-Emu­lated Sand­box—Univer­sity of Toronto 2023 - Bench­mark con­sist­ing of 36 high-stakes tools and 144 test cases!

Singularian2501Oct 9, 2023, 12:00 AM
6 points
0 comments1 min readLW link

Ideation and Tra­jec­tory Model­ling in Lan­guage Models

NickyPOct 5, 2023, 7:21 PM
16 points
2 comments10 min readLW link

The Gra­di­ent – The Ar­tifi­cial­ity of Alignment

micOct 8, 2023, 4:06 AM
12 points
1 comment5 min readLW link
(thegradient.pub)

[Question] Should I do it?

MrLightNov 19, 2020, 1:08 AM
−3 points
16 comments2 min readLW link

Be­come a PIBBSS Re­search Affiliate

Oct 10, 2023, 7:41 AM
24 points
6 comments6 min readLW link

Ra­tion­al­is­ing hu­mans: an­other mug­ging, but not Pas­cal’s

Stuart_ArmstrongNov 14, 2017, 3:46 PM
7 points
1 comment3 min readLW link

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George3d6Dec 16, 2020, 1:32 PM
26 points
15 comments15 min readLW link
(cerebralab.com)

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

Oct 12, 2023, 7:58 PM
151 points
29 comments14 min readLW link

Back to the Past to the Future

PrometheusOct 18, 2023, 4:51 PM
5 points
0 comments1 min readLW link

[Question] What do you make of AGI:un­al­igned::space­ships:not enough food?

Ronny FernandezFeb 22, 2020, 2:14 PM
4 points
3 comments1 min readLW link

Tax­on­omy of AI-risk counterarguments

Odd anonOct 16, 2023, 12:12 AM
64 points
13 comments8 min readLW link

Risk Map of AI Systems

Dec 15, 2020, 9:16 AM
28 points
3 comments8 min readLW link

AISU 2021

Linda LinseforsJan 30, 2021, 5:40 PM
28 points
2 comments1 min readLW link

Non­per­son Predicates

Eliezer YudkowskyDec 27, 2008, 1:47 AM
68 points
177 comments6 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohenFeb 18, 2021, 2:51 PM
49 points
123 comments2 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam ClarkeMar 7, 2021, 9:44 PM
15 points
5 comments1 min readLW link

[Question] Is a Self-Iter­at­ing AGI Vuln­er­a­ble to Thomp­son-style Tro­jans?

sxaeMar 25, 2021, 2:46 PM
15 points
6 comments3 min readLW link

AI or­a­cles on blockchain

CaravaggioApr 6, 2021, 8:13 PM
5 points
0 comments3 min readLW link

What if AGI is near?

Wulky WilkinsenApr 14, 2021, 12:05 AM
11 points
5 comments1 min readLW link

[Question] Is there any­thing that can stop AGI de­vel­op­ment in the near term?

Wulky WilkinsenApr 22, 2021, 8:37 PM
5 points
5 comments1 min readLW link

[Question] [time­boxed ex­er­cise] write me your model of AI hu­man-ex­is­ten­tial safety and the al­ign­ment prob­lems in 15 minutes

QuinnMay 4, 2021, 7:10 PM
6 points
2 comments1 min readLW link

AI Safety Re­search Pro­ject Ideas

Owain_EvansMay 21, 2021, 1:39 PM
58 points
2 comments3 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

Jun 8, 2021, 5:12 PM
65 points
11 comments7 min readLW link

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

QuinnJun 27, 2021, 5:44 PM
17 points
6 comments2 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg SeverensJul 3, 2021, 9:23 PM
4 points
0 comments1 min readLW link

A gen­tle apoc­a­lypse

pchvykovAug 16, 2021, 5:03 AM
3 points
5 comments3 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos RamirezAug 27, 2021, 1:48 AM
29 points
17 comments8 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-PerlmanAug 29, 2021, 6:00 PM
5 points
2 comments11 min readLW link

Dist­in­guish­ing AI takeover scenarios

Sep 8, 2021, 4:19 PM
74 points
11 comments14 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

BuckSep 9, 2021, 7:46 PM
88 points
12 comments5 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_EvansSep 16, 2021, 10:09 AM
58 points
24 comments6 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy MartinSep 17, 2021, 6:47 PM
31 points
1 comment27 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard HarrisSep 27, 2021, 3:55 PM
76 points
13 comments10 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Sep 29, 2021, 5:09 PM
30 points
7 comments10 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron BergOct 3, 2021, 8:10 PM
19 points
1 comment16 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

Oct 18, 2021, 6:37 PM
82 points
9 comments10 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_EvansOct 22, 2021, 4:23 PM
31 points
15 comments1 min readLW link

Truth­ful and hon­est AI

Oct 29, 2021, 7:28 AM
42 points
1 comment13 min readLW link

What is the most evil AI that we could build, to­day?

ThomasJNov 1, 2021, 7:58 PM
−2 points
14 comments1 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius HobbhahnNov 8, 2021, 12:51 PM
29 points
15 comments12 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJulesNov 11, 2021, 7:04 AM
2 points
2 comments1 min readLW link

What would we do if al­ign­ment were fu­tile?

Grant DemareeNov 14, 2021, 8:09 AM
75 points
39 comments3 min readLW link

Two Stupid AI Align­ment Ideas

aphyerNov 16, 2021, 4:13 PM
27 points
3 comments4 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair HalberstadtNov 16, 2021, 7:55 PM
10 points
2 comments6 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

Nov 23, 2021, 7:16 PM
67 points
13 comments3 min readLW link
(aitracker.org)

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika SrikumarNov 24, 2021, 8:27 AM
6 points
0 comments1 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius HobbhahnNov 29, 2021, 3:18 PM
16 points
5 comments7 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

Dec 6, 2021, 1:54 PM
54 points
1 comment12 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_likaDec 7, 2021, 7:37 PM
1 point
0 comments1 min readLW link

Univer­sal­ity and the “Filter”

maggiehayesDec 16, 2021, 12:47 AM
10 points
2 comments11 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe CarlsmithDec 16, 2021, 8:48 PM
80 points
20 comments1 min readLW link

2+2: On­tolog­i­cal Framework

LyrialtusFeb 1, 2022, 1:07 AM
−15 points
2 comments12 min readLW link

[un­ti­tled post]

superads91Feb 6, 2022, 8:39 PM
−5 points
8 comments1 min readLW link

How harm­ful are im­prove­ments in AI? + Poll

Feb 15, 2022, 6:16 PM
15 points
4 comments8 min readLW link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donorMar 6, 2022, 6:43 PM
44 points
11 comments5 min readLW link

Ask AI com­pa­nies about what they are do­ing for AI safety?

micMar 9, 2022, 3:14 PM
51 points
0 comments2 min readLW link

Is There a Valley of Bad Civ­i­liza­tional Ad­e­quacy?

lbThingrbMar 11, 2022, 7:49 PM
13 points
1 comment2 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

YitzMar 16, 2022, 2:47 AM
8 points
8 comments1 min readLW link

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor LeahyApr 8, 2022, 11:40 AM
197 points
25 comments4 min readLW link

Is tech­ni­cal AI al­ign­ment re­search a net pos­i­tive?

cranberry_bearApr 12, 2022, 1:07 PM
6 points
2 comments2 min readLW link

[Question] Can some­one ex­plain to me why MIRI is so pes­simistic of our chances of sur­vival?

iamthouthouartiApr 14, 2022, 8:28 PM
10 points
7 comments1 min readLW link

[Question] Con­vince me that hu­man­ity *isn’t* doomed by AGI

YitzApr 15, 2022, 5:26 PM
61 points
50 comments1 min readLW link

Reflec­tions on My Own Miss­ing Mood

Lone PineApr 21, 2022, 4:19 PM
52 points
25 comments5 min readLW link

Code Gen­er­a­tion as an AI risk setting

Not RelevantApr 17, 2022, 10:27 PM
91 points
16 comments2 min readLW link

[Question] What is be­ing im­proved in re­cur­sive self im­prove­ment?

Lone PineApr 25, 2022, 6:30 PM
7 points
6 comments1 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

KakiliApr 27, 2022, 10:07 PM
10 points
2 comments8 min readLW link

Video and Tran­script of Pre­sen­ta­tion on Ex­is­ten­tial Risk from Power-Seek­ing AI

Joe CarlsmithMay 8, 2022, 3:50 AM
20 points
1 comment29 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM
58 points
0 comments59 min readLW link

Agency As a Nat­u­ral Abstraction

Thane RuthenisMay 13, 2022, 6:02 PM
55 points
9 comments13 min readLW link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenzMay 14, 2022, 4:01 PM
34 points
0 comments1 min readLW link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

May 16, 2022, 9:21 PM
63 points
6 comments6 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

May 17, 2022, 3:26 PM
26 points
0 comments3 min readLW link

Why I’m Op­ti­mistic About Near-Term AI Risk

harsimonyMay 15, 2022, 11:05 PM
57 points
27 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon FischerAug 21, 2022, 5:13 PM
28 points
3 comments7 min readLW link

Re­shap­ing the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM
147 points
35 comments21 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy GillenMay 24, 2022, 11:10 PM
9 points
2 comments10 min readLW link

A Story of AI Risk: In­struc­tGPT-N

peterbarnettMay 26, 2022, 11:22 PM
24 points
0 comments8 min readLW link

We will be around in 30 years

mukashiJun 7, 2022, 3:47 AM
12 points
205 comments2 min readLW link

Re­search Ques­tions from Stained Glass Windows

StefanHexJun 8, 2022, 12:38 PM
4 points
0 comments2 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane RuthenisJun 16, 2022, 10:00 PM
25 points
4 comments18 min readLW link

A plau­si­ble story about AI risk.

DeLesley HutchinsJun 10, 2022, 2:08 AM
16 points
2 comments4 min readLW link

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleeseJun 10, 2022, 10:35 PM
45 points
2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane RuthenisJun 11, 2022, 6:29 PM
48 points
5 comments4 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

Nikita SokolskyJun 13, 2022, 6:26 PM
137 points
72 comments15 min readLW link

A Modest Pivotal Act

anonymousaisafetyJun 13, 2022, 7:24 PM
−16 points
1 comment5 min readLW link

[Question] AI mis­al­ign­ment risk from GPT-like sys­tems?

fiso64Jun 19, 2022, 5:35 PM
10 points
8 comments1 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

Jun 20, 2022, 10:54 AM
86 points
30 comments15 min readLW link

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGodJun 25, 2022, 1:53 PM
5 points
3 comments6 min readLW link

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

Jul 16, 2022, 12:57 PM
84 points
132 comments3 min readLW link

Refram­ing the AI Risk

Thane RuthenisJul 1, 2022, 6:44 PM
26 points
7 comments6 min readLW link

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanSJul 2, 2022, 5:45 PM
3 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Can we achieve AGI Align­ment by bal­anc­ing mul­ti­ple hu­man ob­jec­tives?

Ben SmithJul 3, 2022, 2:51 AM
11 points
1 comment4 min readLW link

New US Se­nate Bill on X-Risk Miti­ga­tion [Linkpost]

Evan R. MurphyJul 4, 2022, 1:25 AM
35 points
12 comments1 min readLW link
(www.hsgac.senate.gov)

My Most Likely Rea­son to Die Young is AI X-Risk

AISafetyIsNotLongtermistJul 4, 2022, 5:08 PM
61 points
24 comments4 min readLW link
(forum.effectivealtruism.org)

Please help us com­mu­ni­cate AI xrisk. It could save the world.

otto.bartenJul 4, 2022, 9:47 PM
4 points
7 comments2 min readLW link

When is it ap­pro­pri­ate to use statis­ti­cal mod­els and prob­a­bil­ities for de­ci­sion mak­ing ?

Younes KamelJul 5, 2022, 12:34 PM
10 points
7 comments4 min readLW link
(youneskamel.substack.com)

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

Jul 12, 2022, 8:11 PM
50 points
0 comments1 min readLW link
(docs.google.com)

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane RuthenisJul 13, 2022, 8:23 PM
43 points
16 comments4 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
59 points
8 comments20 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCephJul 19, 2022, 2:28 AM
9 points
1 comment9 min readLW link

What En­vi­ron­ment Prop­er­ties Select Agents For World-Model­ing?

Thane RuthenisJul 23, 2022, 7:27 PM
25 points
1 comment12 min readLW link

Align­ment be­ing im­pos­si­ble might be bet­ter than it be­ing re­ally difficult

Martín SotoJul 25, 2022, 11:57 PM
13 points
2 comments2 min readLW link

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8resJul 27, 2022, 3:21 AM
175 points
38 comments6 min readLW link

[Question] How likely do you think worse-than-ex­tinc­tion type fates to be?

span1Aug 1, 2022, 4:08 AM
3 points
3 comments1 min readLW link

Three pillars for avoid­ing AGI catas­tro­phe: Tech­ni­cal al­ign­ment, de­ploy­ment de­ci­sions, and coordination

LintzAAug 3, 2022, 11:15 PM
24 points
0 comments11 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane RuthenisAug 4, 2022, 11:31 PM
38 points
1 comment13 min readLW link

Com­plex­ity No Bar to AI (Or, why Com­pu­ta­tional Com­plex­ity mat­ters less than you think for real life prob­lems)

Noosphere89Aug 7, 2022, 7:55 PM
17 points
14 comments3 min readLW link
(www.gwern.net)

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworthAug 10, 2022, 4:08 PM
209 points
34 comments3 min readLW link1 review

Anti-squat­ted AI x-risk do­mains index

plexAug 12, 2022, 12:01 PM
56 points
6 comments1 min readLW link

In­fant AI Scenario

Nathan1123Aug 12, 2022, 9:20 PM
1 point
0 comments3 min readLW link

The Dumbest Pos­si­ble Gets There First

ArtaxerxesAug 13, 2022, 10:20 AM
44 points
7 comments2 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane RuthenisAug 17, 2022, 6:47 PM
42 points
14 comments1 min readLW link

Align­ment’s phlo­gis­ton

Eleni AngelouAug 18, 2022, 10:27 PM
10 points
2 comments2 min readLW link

Bench­mark­ing Pro­pos­als on Risk Scenarios

Paul BricmanAug 20, 2022, 10:01 AM
25 points
2 comments14 min readLW link

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

AlgonAug 20, 2022, 7:48 PM
80 points
125 comments1 min readLW link

My Plan to Build Aligned Superintelligence

apollonianbluesAug 21, 2022, 1:16 PM
18 points
7 comments8 min readLW link

The Align­ment Prob­lem Needs More Pos­i­tive Fiction

NetcentricaAug 21, 2022, 10:01 PM
6 points
3 comments5 min readLW link

It Looks Like You’re Try­ing To Take Over The Narrative

George3d6Aug 24, 2022, 1:36 PM
3 points
20 comments9 min readLW link
(www.epistem.ink)

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane RuthenisAug 26, 2022, 6:49 PM
30 points
1 comment6 min readLW link

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett BakerAug 26, 2022, 6:26 PM
120 points
48 comments1 min readLW link

An­nual AGI Bench­mark­ing Event

Lawrence PhillipsAug 27, 2022, 12:06 AM
24 points
3 comments2 min readLW link
(www.metaculus.com)

Help Un­der­stand­ing Prefer­ences And Evil

NetcentricaAug 27, 2022, 3:42 AM
6 points
7 comments2 min readLW link

[Question] What would you ex­pect a mas­sive mul­ti­modal on­line fed­er­ated learner to be ca­pa­ble of?

Aryeh EnglanderAug 27, 2022, 5:31 PM
13 points
4 comments1 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane RuthenisAug 29, 2022, 6:37 PM
14 points
2 comments3 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni AngelouSep 3, 2022, 12:47 PM
9 points
0 comments3 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworthAug 30, 2022, 8:48 PM
208 points
30 comments10 min readLW link1 review

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni AngelouSep 1, 2022, 4:57 PM
7 points
8 comments3 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubcSep 2, 2022, 6:14 PM
9 points
10 comments6 min readLW link

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhubSep 2, 2022, 9:57 PM
39 points
13 comments3 min readLW link

A Game About AI Align­ment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardtSep 5, 2022, 7:55 AM
18 points
15 comments2 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil CrawfordSep 6, 2022, 5:17 PM
6 points
0 comments4 min readLW link

It’s (not) how you use it

Eleni AngelouSep 7, 2022, 5:15 PM
8 points
1 comment2 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul BricmanSep 9, 2022, 10:08 AM
20 points
6 comments10 min readLW link

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul BricmanSep 12, 2022, 12:00 PM
6 points
0 comments14 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

Sep 11, 2022, 10:57 AM
46 points
13 comments30 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

AkashSep 11, 2022, 11:43 PM
46 points
8 comments6 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasperSep 12, 2022, 7:07 PM
97 points
7 comments2 min readLW link
(arxiv.org)

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul BricmanSep 16, 2022, 2:45 PM
30 points
0 comments16 min readLW link

Risk aver­sion and GPT-3

casualphysicsenjoyerSep 13, 2022, 8:50 PM
1 point
0 comments1 min readLW link

[Question] Would a Misal­igned SSI Really Kill Us All?

DragonGodSep 14, 2022, 12:15 PM
6 points
7 comments6 min readLW link

Emily Brontë on: Psy­chol­ogy Re­quired for Se­ri­ous™ AGI Safety Research

robertzkSep 14, 2022, 2:47 PM
2 points
0 comments1 min readLW link

Pre­cise P(doom) isn’t very im­por­tant for pri­ori­ti­za­tion or strategy

harsimonySep 14, 2022, 5:19 PM
14 points
6 comments1 min readLW link

Re­spond­ing to ‘Beyond Hyper­an­thro­po­mor­phism’

ukc10014Sep 14, 2022, 8:37 PM
9 points
0 comments16 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilmSep 15, 2022, 8:25 AM
10 points
4 comments12 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

AkashSep 15, 2022, 6:37 PM
107 points
23 comments15 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

T431Sep 17, 2022, 10:27 PM
17 points
4 comments1 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon LangSep 18, 2022, 1:08 PM
44 points
3 comments1 min readLW link
(docs.google.com)

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John NaySep 18, 2022, 8:39 PM
11 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

How to Train Your AGI Dragon

Eris DiscordiaSep 21, 2022, 10:28 PM
−1 points
3 comments5 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

Sep 22, 2022, 1:55 PM
22 points
0 comments27 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul BricmanSep 23, 2022, 3:30 PM
15 points
0 comments10 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

AorouSep 24, 2022, 2:33 AM
7 points
6 comments3 min readLW link

On Generality

Eris DiscordiaSep 26, 2022, 4:06 AM
2 points
0 comments5 min readLW link

Oren’s Field Guide of Bad AGI Outcomes

Eris DiscordiaSep 26, 2022, 4:06 AM
0 points
0 comments1 min readLW link

(Struc­tural) Sta­bil­ity of Cou­pled Optimizers

Paul BricmanSep 30, 2022, 11:28 AM
25 points
0 comments10 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon LangSep 29, 2022, 10:38 PM
17 points
2 comments12 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

eliflandSep 30, 2022, 12:21 PM
67 points
0 comments3 min readLW link
(docs.google.com)

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

KriegerOct 2, 2022, 9:53 AM
8 points
6 comments1 min readLW link

An­nounc­ing the AI Safety Nudge Com­pe­ti­tion to Help Beat Procrastination

Marc CarauleanuOct 1, 2022, 1:49 AM
10 points
0 comments1 min readLW link

Gen­er­a­tive, Epi­sodic Ob­jec­tives for Safe AI

Michael GlassOct 5, 2022, 11:18 PM
11 points
3 comments8 min readLW link

[Linkpost] “Blueprint for an AI Bill of Rights”—Office of Science and Tech­nol­ogy Policy, USA (2022)

T431Oct 5, 2022, 4:42 PM
9 points
4 comments2 min readLW link
(www.whitehouse.gov)

What does it mean for an AGI to be ‘safe’?

So8resOct 7, 2022, 4:13 AM
74 points
29 comments3 min readLW link

Pos­si­ble miracles

Oct 9, 2022, 6:17 PM
64 points
34 comments8 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

Oct 12, 2022, 12:24 PM
33 points
4 comments8 min readLW link
(www.gladstone.ai)

Cat­a­logu­ing Pri­ors in The­ory and Practice

Paul BricmanOct 13, 2022, 12:36 PM
13 points
8 comments7 min readLW link

Misal­ign­ment-by-de­fault in multi-agent systems

Oct 13, 2022, 3:38 PM
21 points
8 comments20 min readLW link
(www.gladstone.ai)

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

Oct 14, 2022, 3:50 PM
22 points
0 comments17 min readLW link
(www.gladstone.ai)

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio FrancaOct 11, 2022, 10:50 PM
6 points
0 comments9 min readLW link

Nice­ness is unnatural

So8resOct 13, 2022, 1:30 AM
130 points
20 comments8 min readLW link1 review

Greed Is the Root of This Evil

Thane RuthenisOct 13, 2022, 8:40 PM
21 points
7 comments8 min readLW link

Prov­ably Hon­est—A First Step

Srijanak DeNov 5, 2022, 7:18 PM
10 points
2 comments8 min readLW link

[Question] How easy is it to su­per­vise pro­cesses vs out­comes?

Noosphere89Oct 18, 2022, 5:48 PM
3 points
0 comments1 min readLW link

Re­sponse to Katja Grace’s AI x-risk counterarguments

Oct 19, 2022, 1:17 AM
77 points
18 comments15 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM
29 points
0 comments1 min readLW link
(github.com)

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDELOct 28, 2022, 1:53 AM
−22 points
4 comments9 min readLW link

AI Re­searchers On AI Risk

Scott AlexanderMay 22, 2015, 11:16 AM
19 points
0 comments16 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshevOct 30, 2022, 4:57 PM
9 points
0 comments10 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshevOct 31, 2022, 5:03 PM
7 points
4 comments14 min readLW link

AI as a Civ­i­liza­tional Risk Part 4/​6: Bioweapons and Philos­o­phy of Modification

PashaKamyshevNov 1, 2022, 8:50 PM
7 points
1 comment8 min readLW link

AI as a Civ­i­liza­tional Risk Part 5/​6: Re­la­tion­ship be­tween C-risk and X-risk

PashaKamyshevNov 3, 2022, 2:19 AM
2 points
0 comments7 min readLW link

AI as a Civ­i­liza­tional Risk Part 6/​6: What can be done

PashaKamyshevNov 3, 2022, 7:48 PM
2 points
4 comments4 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porbyOct 29, 2022, 10:16 PM
116 points
4 comments4 min readLW link

My (naive) take on Risks from Learned Optimization

Artyom KarpovOct 31, 2022, 10:59 AM
7 points
0 comments5 min readLW link

Clar­ify­ing AI X-risk

Nov 1, 2022, 11:03 AM
127 points
24 comments4 min readLW link1 review

Threat Model Liter­a­ture Review

Nov 1, 2022, 11:03 AM
78 points
4 comments25 min readLW link

Why do we post our AI safety plans on the In­ter­net?

Peter S. ParkNov 3, 2022, 4:02 PM
4 points
4 comments11 min readLW link

My sum­mary of “Prag­matic AI Safety”

Eleni AngelouNov 5, 2022, 12:54 PM
3 points
0 comments5 min readLW link

4 Key As­sump­tions in AI Safety

PrometheusNov 7, 2022, 10:50 AM
20 points
5 comments7 min readLW link

Loss of con­trol of AI is not a likely source of AI x-risk

squekNov 7, 2022, 6:44 PM
−6 points
0 comments5 min readLW link

AI Safety Un­con­fer­ence NeurIPS 2022

OrpheusNov 7, 2022, 3:39 PM
25 points
0 comments1 min readLW link
(aisafetyevents.org)

Value For­ma­tion: An Over­ar­ch­ing Model

Thane RuthenisNov 15, 2022, 5:16 PM
34 points
20 comments34 min readLW link

Is AI Gain-of-Func­tion re­search a thing?

MadHatterNov 12, 2022, 2:33 AM
9 points
2 comments2 min readLW link

The limited up­side of interpretability

Peter S. ParkNov 15, 2022, 6:46 PM
13 points
11 comments1 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

Nov 23, 2022, 5:10 PM
180 points
9 comments8 min readLW link

Con­jec­ture Se­cond Hiring Round

Nov 23, 2022, 5:11 PM
92 points
0 comments1 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane RuthenisNov 24, 2022, 5:06 PM
17 points
5 comments9 min readLW link

Dis­cussing how to al­ign Trans­for­ma­tive AI if it’s de­vel­oped very soon

eliflandNov 28, 2022, 4:17 PM
37 points
2 comments28 min readLW link

[Question] Is there any policy for a fair treat­ment of AIs whose friendli­ness is in doubt?

nahojNov 18, 2022, 7:01 PM
15 points
10 comments1 min readLW link

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahojDec 3, 2022, 8:32 PM
1 point
8 comments1 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. ParkDec 4, 2022, 12:17 PM
−15 points
4 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella DuanDec 4, 2022, 6:48 PM
29 points
6 comments1 min readLW link

[Question] Who are some promi­nent rea­son­able peo­ple who are con­fi­dent that AI won’t kill ev­ery­one?

Optimization ProcessDec 5, 2022, 9:12 AM
72 points
54 comments1 min readLW link

Aligned Be­hav­ior is not Ev­i­dence of Align­ment Past a Cer­tain Level of Intelligence

Ronny FernandezDec 5, 2022, 3:19 PM
19 points
5 comments7 min readLW link

Fore­sight for AGI Safety Strat­egy: Miti­gat­ing Risks and Iden­ti­fy­ing Golden Opportunities

jacquesthibsDec 5, 2022, 4:09 PM
28 points
6 comments8 min readLW link

AI Safety in a Vuln­er­a­ble World: Re­quest­ing Feed­back on Pre­limi­nary Thoughts

Jordan ArelDec 6, 2022, 10:35 PM
4 points
2 comments3 min readLW link

Fear miti­gated the nu­clear threat, can it do the same to AGI risks?

Igor IvanovDec 9, 2022, 10:04 AM
6 points
8 comments5 min readLW link

Reflec­tions on the PIBBSS Fel­low­ship 2022

Dec 11, 2022, 9:53 PM
32 points
0 comments18 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron BergDec 19, 2022, 5:01 PM
30 points
3 comments20 min readLW link

Why I think that teach­ing philos­o­phy is high impact

Eleni AngelouDec 19, 2022, 3:11 AM
5 points
0 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann DuboisDec 19, 2022, 10:42 PM
5 points
6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

Dec 19, 2022, 9:31 PM
65 points
28 comments10 min readLW link

New AI risk in­tro from Vox [link post]

JakubKDec 21, 2022, 6:00 AM
5 points
1 comment2 min readLW link
(www.vox.com)

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieveDec 25, 2022, 8:14 PM
3 points
6 comments1 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane RuthenisDec 25, 2022, 4:50 PM
33 points
38 comments9 min readLW link

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

CanDec 26, 2022, 6:51 PM
16 points
2 comments10 min readLW link
(forum.effectivealtruism.org)

In­tro­duc­tion: Bias in Eval­u­at­ing AGI X-Risks

Dec 27, 2022, 10:27 AM
1 point
0 comments3 min readLW link

Mere ex­po­sure effect: Bias in Eval­u­at­ing AGI X-Risks

Dec 27, 2022, 2:05 PM
0 points
2 comments1 min readLW link

Band­wagon effect: Bias in Eval­u­at­ing AGI X-Risks

Dec 28, 2022, 7:54 AM
−1 points
0 comments1 min readLW link

In Defense of Wrap­per-Minds

Thane RuthenisDec 28, 2022, 6:28 PM
24 points
38 comments3 min readLW link

Friendly and Un­friendly AGI are Indistinguishable

ErgoEchoDec 29, 2022, 10:13 PM
−4 points
4 comments4 min readLW link
(neologos.co)

In­ter­nal In­ter­faces Are a High-Pri­or­ity In­ter­pretabil­ity Target

Thane RuthenisDec 29, 2022, 5:49 PM
26 points
6 comments7 min readLW link

CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram RachumDec 29, 2022, 4:08 PM
15 points
0 comments1 min readLW link

Re­ac­tive de­val­u­a­tion: Bias in Eval­u­at­ing AGI X-Risks

Dec 30, 2022, 9:02 AM
−15 points
9 comments1 min readLW link

Curse of knowl­edge and Naive re­al­ism: Bias in Eval­u­at­ing AGI X-Risks

Dec 31, 2022, 1:33 PM
−7 points
1 comment1 min readLW link
(www.lesswrong.com)

[Question] Are Mix­ture-of-Ex­perts Trans­form­ers More In­ter­pretable Than Dense Trans­form­ers?

simeon_cDec 31, 2022, 11:34 AM
8 points
5 comments1 min readLW link

Challenge to the no­tion that any­thing is (maybe) pos­si­ble with AGI

Jan 1, 2023, 3:57 AM
−27 points
4 comments1 min readLW link
(mflb.com)

Sum­mary of 80k’s AI prob­lem profile

JakubKJan 1, 2023, 7:30 AM
7 points
0 comments5 min readLW link
(forum.effectivealtruism.org)

Belief Bias: Bias in Eval­u­at­ing AGI X-Risks

Jan 2, 2023, 8:59 AM
−10 points
1 comment1 min readLW link

Sta­tus quo bias; Sys­tem jus­tifi­ca­tion: Bias in Eval­u­at­ing AGI X-Risks

Jan 3, 2023, 2:50 AM
−11 points
0 comments1 min readLW link

Causal rep­re­sen­ta­tion learn­ing as a tech­nique to pre­vent goal misgeneralization

PabloAMCJan 4, 2023, 12:07 AM
21 points
0 comments8 min readLW link

Illu­sion of truth effect and Am­bi­guity effect: Bias in Eval­u­at­ing AGI X-Risks

RemmeltJan 5, 2023, 4:05 AM
−13 points
2 comments1 min readLW link

AI Safety Camp, Vir­tual Edi­tion 2023

Linda LinseforsJan 6, 2023, 11:09 AM
40 points
10 comments3 min readLW link
(aisafety.camp)

AI Safety Camp: Ma­chine Learn­ing for Scien­tific Dis­cov­ery

Eleni AngelouJan 6, 2023, 3:21 AM
3 points
0 comments1 min readLW link

An­chor­ing fo­cal­ism and the Iden­ti­fi­able vic­tim effect: Bias in Eval­u­at­ing AGI X-Risks

RemmeltJan 7, 2023, 9:59 AM
1 point
2 comments1 min readLW link

Big list of AI safety videos

JakubKJan 9, 2023, 6:12 AM
11 points
2 comments1 min readLW link
(docs.google.com)

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

Jan 10, 2023, 4:06 PM
84 points
8 comments39 min readLW link
(arxiv.org)

[Question] Could Si­mu­lat­ing an AGI Tak­ing Over the World Ac­tu­ally Lead to a LLM Tak­ing Over the World?

simeon_cJan 13, 2023, 6:33 AM
15 points
1 comment1 min readLW link

Reflec­tions on Trust­ing Trust & AI

Itay YonaJan 16, 2023, 6:36 AM
10 points
1 comment3 min readLW link
(mentaleap.ai)

OpenAI’s Align­ment Plan is not S.M.A.R.T.

Søren ElverlinJan 18, 2023, 6:39 AM
9 points
19 comments4 min readLW link

Gra­di­ent Filtering

Jan 18, 2023, 8:09 PM
55 points
16 comments13 min readLW link

6-para­graph AI risk in­tro for MAISI

JakubKJan 19, 2023, 9:22 AM
11 points
0 comments2 min readLW link
(www.maisi.club)

List of tech­ni­cal AI safety ex­er­cises and projects

JakubKJan 19, 2023, 9:35 AM
41 points
5 comments1 min readLW link
(docs.google.com)

NYT: Google will “re­cal­ibrate” the risk of re­leas­ing AI due to com­pe­ti­tion with OpenAI

Michael HuangJan 22, 2023, 8:38 AM
47 points
2 comments1 min readLW link
(www.nytimes.com)

Next steps af­ter AGISF at UMich

JakubKJan 25, 2023, 8:57 PM
10 points
0 comments5 min readLW link
(docs.google.com)

What is the ground re­al­ity of coun­tries tak­ing steps to re­cal­ibrate AI de­vel­op­ment to­wards Align­ment first?

NebuchJan 29, 2023, 1:26 PM
8 points
6 comments3 min readLW link

In­ter­views with 97 AI Re­searchers: Quan­ti­ta­tive Analysis

Feb 2, 2023, 1:01 AM
23 points
0 comments7 min readLW link

[Question] What qual­ities does an AGI need to have to re­al­ize the risk of false vac­uum, with­out hard­cod­ing physics the­o­ries into it?

RationalSieveFeb 3, 2023, 4:00 PM
1 point
4 comments1 min readLW link

Monthly Doom Ar­gu­ment Threads? Doom Ar­gu­ment Wiki?

LVSNFeb 4, 2023, 4:59 PM
3 points
0 comments1 min readLW link

Se­cond call: CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram RachumFeb 5, 2023, 12:18 PM
2 points
0 comments2 min readLW link

Early situ­a­tional aware­ness and its im­pli­ca­tions, a story

Jacob PfauFeb 6, 2023, 8:45 PM
29 points
6 comments3 min readLW link

Is this a weak pivotal act: cre­at­ing nanobots that eat evil AGIs (but noth­ing else)?

Christopher KingFeb 10, 2023, 7:26 PM
0 points
3 comments1 min readLW link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM
33 points
2 comments1 min readLW link

Near-Term Risks of an Obe­di­ent Ar­tifi­cial Intelligence

ymeskhoutFeb 18, 2023, 6:30 PM
20 points
1 comment6 min readLW link

Should we cry “wolf”?

TapataktFeb 18, 2023, 11:24 AM
24 points
5 comments1 min readLW link

The pub­lic sup­ports reg­u­lat­ing AI for safety

Zach Stein-PerlmanFeb 17, 2023, 4:10 AM
114 points
9 comments1 min readLW link
(aiimpacts.org)

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM
10 points
2 comments18 min readLW link

Bing find­ing ways to by­pass Microsoft’s filters with­out be­ing asked. Is it re­pro­ducible?

Christopher KingFeb 20, 2023, 3:11 PM
27 points
15 comments1 min readLW link

De­cep­tive Align­ment is <1% Likely by Default

DavidWFeb 21, 2023, 3:09 PM
89 points
31 comments14 min readLW link1 review

An­nounc­ing aisafety.training

JJ HepburnJan 21, 2023, 1:01 AM
61 points
4 comments1 min readLW link

Is there a ML agent that aban­dons it’s util­ity func­tion out-of-dis­tri­bu­tion with­out los­ing ca­pa­bil­ities?

Christopher KingFeb 22, 2023, 4:49 PM
1 point
7 comments1 min readLW link

Au­to­mated Sand­wich­ing & Quan­tify­ing Hu­man-LLM Co­op­er­a­tion: ScaleOver­sight hackathon results

Feb 23, 2023, 10:48 AM
8 points
0 comments6 min readLW link

Re­search pro­posal: Lev­er­ag­ing Jun­gian archetypes to cre­ate val­ues-based models

MiguelDevMar 5, 2023, 5:39 PM
5 points
2 comments2 min readLW link

Ret­ro­spec­tive on the 2022 Con­jec­ture AI Discussions

Andrea_MiottiFeb 24, 2023, 10:41 PM
90 points
5 comments2 min readLW link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

Feb 24, 2023, 11:03 PM
61 points
7 comments47 min readLW link

[Question] Pink Shog­goths: What does al­ign­ment look like in prac­tice?

Yuli_BanFeb 25, 2023, 12:23 PM
25 points
13 comments11 min readLW link

[Question] Would more model evals teams be good?

Ryan KiddFeb 25, 2023, 10:01 PM
20 points
4 comments1 min readLW link

Cu­ri­os­ity as a Solu­tion to AGI Alignment

Harsha G.Feb 26, 2023, 11:36 PM
7 points
7 comments3 min readLW link

Ta­boo “hu­man-level in­tel­li­gence”

SherrinfordFeb 26, 2023, 8:42 PM
12 points
7 comments1 min readLW link

The idea of an “al­igned su­per­in­tel­li­gence” seems misguided

ssadlerFeb 27, 2023, 11:19 AM
6 points
7 comments3 min readLW link
(ssadler.substack.com)

Tran­script: Yud­kowsky on Ban­kless fol­low-up Q&A

vonkFeb 28, 2023, 3:46 AM
54 points
40 comments22 min readLW link

The bur­den of knowing

arisAlexisFeb 28, 2023, 6:40 PM
5 points
0 comments2 min readLW link

Call for Cruxes by Rhyme, a Longter­mist His­tory Consultancy

LaraMar 1, 2023, 6:39 PM
1 point
0 comments3 min readLW link
(forum.effectivealtruism.org)

Reflec­tion Mechanisms as an Align­ment Tar­get—At­ti­tudes on “near-term” AI

Mar 2, 2023, 4:29 AM
21 points
0 comments8 min readLW link

Con­scious­ness is ir­rele­vant—in­stead solve al­ign­ment by ask­ing this question

Oliver SiegelMar 4, 2023, 10:06 PM
−10 points
6 comments1 min readLW link

Why kill ev­ery­one?

arisAlexisMar 5, 2023, 11:53 AM
7 points
5 comments2 min readLW link

Is it time to talk about AI dooms­day prep­ping yet?

bokovMar 5, 2023, 9:17 PM
0 points
8 comments1 min readLW link

Who Aligns the Align­ment Re­searchers?

Ben SmithMar 5, 2023, 11:22 PM
48 points
0 comments11 min readLW link

Cap Model Size for AI Safety

research_prime_spaceMar 6, 2023, 1:11 AM
0 points
4 comments1 min readLW link

In­tro­duc­ing AI Align­ment Inc., a Cal­ifor­nia pub­lic benefit cor­po­ra­tion...

TherapistAIMar 7, 2023, 6:47 PM
1 point
4 comments1 min readLW link

[Question] What‘s in your list of un­solved prob­lems in AI al­ign­ment?

jacquesthibsMar 7, 2023, 6:58 PM
60 points
9 comments1 min readLW link

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

rememberMar 7, 2023, 4:47 PM
46 points
2 comments79 min readLW link
(futureoflife.org)

Why Un­con­trol­lable AI Looks More Likely Than Ever

Mar 8, 2023, 3:41 PM
18 points
0 comments4 min readLW link
(time.com)

Speed run­ning ev­ery­one through the bad al­ign­ment bingo. $5k bounty for a LW con­ver­sa­tional agent

ArthurBMar 9, 2023, 9:26 AM
140 points
33 comments2 min readLW link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:34 PM
17 points
1 comment22 min readLW link
(www.anthropic.com)

Value drift threat models

Garrett BakerMay 12, 2023, 11:03 PM
27 points
4 comments5 min readLW link

Every­thing’s nor­mal un­til it’s not

Eleni AngelouMar 10, 2023, 2:02 AM
7 points
0 comments3 min readLW link

Is AI Safety drop­ping the ball on pri­vacy?

markovSep 13, 2023, 1:07 PM
50 points
17 comments7 min readLW link

The hu­man­ity’s biggest mistake

RomanSMar 10, 2023, 4:30 PM
0 points
1 comment2 min readLW link

On tak­ing AI risk se­ri­ously

Eleni AngelouMar 13, 2023, 5:50 AM
6 points
0 comments1 min readLW link
(www.nytimes.com)

We don’t trade with ants

KatjaGraceJan 10, 2023, 11:50 PM
270 points
109 comments7 min readLW link1 review
(worldspiritsockpuppet.com)

Linkpost: A tale of 2.5 or­thog­o­nal­ity theses

DavidWMar 13, 2023, 2:19 PM
9 points
3 comments1 min readLW link
(forum.effectivealtruism.org)

Linkpost: A Con­tra AI FOOM Read­ing List

DavidWMar 13, 2023, 2:45 PM
25 points
4 comments1 min readLW link
(magnusvinding.com)

Linkpost: ‘Dis­solv­ing’ AI Risk – Pa­ram­e­ter Uncer­tainty in AI Fu­ture Forecasting

DavidWMar 13, 2023, 4:52 PM
6 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Could Roko’s basilisk acausally bar­gain with a pa­per­clip max­i­mizer?

Christopher KingMar 13, 2023, 6:21 PM
1 point
8 comments1 min readLW link

A bet­ter anal­ogy and ex­am­ple for teach­ing AI takeover: the ML Inferno

Christopher KingMar 14, 2023, 7:14 PM
18 points
0 comments5 min readLW link

New eco­nomic sys­tem for AI era

ksme shoMar 17, 2023, 5:42 PM
−1 points
1 comment5 min readLW link

Sur­vey on in­ter­me­di­ate goals in AI governance

Mar 17, 2023, 1:12 PM
25 points
3 comments1 min readLW link

(re­tired ar­ti­cle) AGI With In­ter­net Ac­cess: Why we won’t stuff the ge­nie back in its bot­tle.

Max TKMar 18, 2023, 3:43 AM
5 points
10 comments4 min readLW link

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons Not to Pre­serve (most of) Humanity

Alex BeymanMar 22, 2023, 4:09 AM
−14 points
6 comments19 min readLW link

Hu­man­ity’s Lack of Unity Will Lead to AGI Catastrophe

MiguelDevMar 19, 2023, 7:18 PM
3 points
2 comments4 min readLW link

[Question] Wouldn’t an in­tel­li­gent agent keep us al­ive and help us al­ign it­self to our val­ues in or­der to pre­vent risk ? by Risk I mean ex­per­i­men­ta­tion by try­ing to al­ign po­ten­tially smarter repli­cas?

Terrence RotoufleMar 21, 2023, 5:44 PM
−3 points
1 comment2 min readLW link

[Question] What does pul­ling the fire alarm look like?

nemMar 20, 2023, 9:45 PM
2 points
0 comments1 min readLW link

Im­mor­tal­ity or death by AGI

ImmortalityOrDeathByAGISep 21, 2023, 11:59 PM
47 points
30 comments4 min readLW link
(forum.effectivealtruism.org)

Biose­cu­rity and AI: Risks and Opportunities

Steve NewmanFeb 27, 2024, 6:45 PM
11 points
1 comment7 min readLW link
(www.safe.ai)

Is Align­ment enough?

gchuAug 16, 2024, 11:46 AM
1 point
0 comments1 min readLW link

The con­ver­gent dy­namic we missed

RemmeltDec 12, 2023, 11:19 PM
2 points
2 comments1 min readLW link

Cor­po­rate Gover­nance for Fron­tier AI Labs: A Re­search Agenda

Matthew WeardenFeb 28, 2024, 11:29 AM
4 points
0 comments16 min readLW link
(matthewwearden.co.uk)

AI Safety pro­posal—In­fluenc­ing the su­per­in­tel­li­gence explosion

MorganMay 22, 2024, 11:31 PM
0 points
2 comments7 min readLW link

Post se­ries on “Li­a­bil­ity Law for re­duc­ing Ex­is­ten­tial Risk from AI”

Nora_AmmannFeb 29, 2024, 4:39 AM
42 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

In­cre­men­tal AI Risks from Proxy-Simulations

kmenouDec 19, 2023, 6:56 PM
2 points
0 comments1 min readLW link
(individual.utoronto.ca)

Mo­ral­ity as Co­op­er­a­tion Part I: Humans

DeLesley HutchinsDec 5, 2024, 8:16 AM
5 points
0 comments19 min readLW link

On the fu­ture of lan­guage models

owencbDec 20, 2023, 4:58 PM
105 points
17 comments1 min readLW link

What should AI safety be try­ing to achieve?

EuanMcLeanMay 23, 2024, 11:17 AM
17 points
1 comment13 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part II: The­ory and Experiment

DeLesley HutchinsDec 5, 2024, 9:04 AM
2 points
0 comments17 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part III: Failure Modes

DeLesley HutchinsDec 5, 2024, 9:39 AM
4 points
0 comments20 min readLW link

Big Pic­ture AI Safety: Introduction

EuanMcLeanMay 23, 2024, 11:15 AM
46 points
7 comments5 min readLW link

“De­stroy hu­man­ity” as an im­me­di­ate subgoal

Seth AhrenbachDec 22, 2023, 6:52 PM
3 points
13 comments3 min readLW link

AI safety ad­vo­cates should con­sider pro­vid­ing gen­tle push­back fol­low­ing the events at OpenAI

civilsocietyDec 22, 2023, 6:55 PM
16 points
5 comments3 min readLW link

What will the first hu­man-level AI look like, and how might things go wrong?

EuanMcLeanMay 23, 2024, 11:17 AM
20 points
2 comments15 min readLW link

What mis­takes has the AI safety move­ment made?

EuanMcLeanMay 23, 2024, 11:19 AM
63 points
29 comments12 min readLW link

More Thoughts on the Hu­man-AGI War

Seth AhrenbachDec 27, 2023, 1:03 AM
−3 points
4 comments7 min readLW link

Reflect­ing on the tran­shu­man­ist re­but­tal to AI ex­is­ten­tial risk and cri­tique of our de­bate method­olo­gies and mi­suse of statistics

catgirlsruletheworldAug 20, 2024, 1:59 AM
−5 points
0 comments4 min readLW link

Some­thing Is Lost When AI Makes Art

utilistrutilAug 18, 2024, 10:53 PM
16 points
1 comment11 min readLW link

Plan­ning to build a cryp­to­graphic box with perfect secrecy

Lysandre TerrisseDec 31, 2023, 9:31 AM
40 points
6 comments11 min readLW link

In­ves­ti­gat­ing Alter­na­tive Fu­tures: Hu­man and Su­per­in­tel­li­gence In­ter­ac­tion Scenarios

Hiroshi YamakawaJan 3, 2024, 11:46 PM
1 point
0 comments17 min readLW link

Does AI care about re­al­ity or just its own per­cep­tion?

RedFishBlueFishJan 5, 2024, 4:05 AM
−6 points
8 comments1 min readLW link

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul BricmanJan 7, 2024, 9:31 AM
11 points
0 comments2 min readLW link
(www.youtube.com)

AI de­mands un­prece­dented reliability

JonoJan 9, 2024, 4:30 PM
22 points
5 comments2 min readLW link

Sur­vey of 2,778 AI au­thors: six parts in pictures

KatjaGraceJan 6, 2024, 4:43 AM
80 points
1 comment2 min readLW link

[Question] What’s the pro­to­col for if a novice has ML ideas that are un­likely to work, but might im­prove ca­pa­bil­ities if they do work?

droctaJan 9, 2024, 10:51 PM
6 points
2 comments2 min readLW link

Two Tales of AI Takeover: My Doubts

Violet HourMar 5, 2024, 3:51 PM
30 points
8 comments29 min readLW link

Disprov­ing and par­tially fix­ing a fully ho­mo­mor­phic en­cryp­tion scheme with perfect secrecy

Lysandre TerrisseMay 26, 2024, 2:56 PM
16 points
1 comment18 min readLW link

The Un­der­re­ac­tion to OpenAI

SherrinfordJan 18, 2024, 10:08 PM
21 points
0 comments6 min readLW link

Brain­storm­ing: Slow Takeoff

David PiepgrassJan 23, 2024, 6:58 AM
3 points
0 comments51 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUTJan 21, 2024, 2:32 AM
1 point
0 comments1 min readLW link

An­nounc­ing Con­ver­gence Anal­y­sis: An In­sti­tute for AI Sce­nario & Gover­nance Research

Mar 7, 2024, 9:37 PM
23 points
1 comment4 min readLW link

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

StellaAthenaJan 25, 2024, 7:17 PM
94 points
14 comments1 min readLW link
(www.rand.org)

What Failure Looks Like is not an ex­is­ten­tial risk (and al­ign­ment is not the solu­tion)

otto.bartenFeb 2, 2024, 6:59 PM
13 points
12 comments9 min readLW link

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

Feb 2, 2024, 11:17 PM
98 points
0 comments9 min readLW link

Why I think it’s net harm­ful to do tech­ni­cal safety re­search at AGI labs

RemmeltFeb 7, 2024, 4:17 AM
26 points
24 comments1 min readLW link

Sce­nario plan­ning for AI x-risk

Corin KatzkeFeb 10, 2024, 12:14 AM
24 points
12 comments14 min readLW link
(forum.effectivealtruism.org)

Carl Shul­man On Dwarkesh Pod­cast June 2023

MoonickerFeb 11, 2024, 9:02 PM
18 points
0 comments159 min readLW link

Tort Law Can Play an Im­por­tant Role in Miti­gat­ing AI Risk

Gabriel WeilFeb 12, 2024, 5:17 PM
39 points
9 comments5 min readLW link

The Astro­nom­i­cal Sacri­fice Dilemma

Matthew McRedmondMar 11, 2024, 7:58 PM
15 points
3 comments4 min readLW link

AI In­ci­dent Re­port­ing: A Reg­u­la­tory Review

Mar 11, 2024, 9:03 PM
16 points
0 comments6 min readLW link

Dar­wi­nian Traps and Ex­is­ten­tial Risks

KristianRonnAug 25, 2024, 10:37 PM
85 points
14 comments10 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouartiAug 21, 2020, 11:25 PM
8 points
10 comments1 min readLW link

The Shut­down Prob­lem: In­com­plete Prefer­ences as a Solution

EJTFeb 23, 2024, 4:01 PM
52 points
33 comments42 min readLW link

Con­trol­ling AGI Risk

TeaSeaMar 15, 2024, 4:56 AM
6 points
8 comments4 min readLW link

AI-gen­er­ated opi­oids could be a catas­trophic risk

ejk64Mar 20, 2024, 5:48 PM
0 points
2 comments3 min readLW link

Static vs Dy­namic Alignment

Gracie GreenMar 21, 2024, 5:44 PM
5 points
0 comments29 min readLW link

Timelines to Trans­for­ma­tive AI: an investigation

Zershaaneh QureshiMar 26, 2024, 6:28 PM
20 points
2 comments50 min readLW link

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

Ryo Nov 8, 2023, 8:10 PM
1 point
0 comments8 min readLW link

Ar­tifi­cial In­tel­li­gence and Liv­ing Wisdom

TMFOWMar 29, 2024, 7:41 AM
−6 points
1 comment17 min readLW link
(tmfow.substack.com)

[Question] Will OpenAI also re­quire a “Su­per Red Team Agent” for its “Su­per­al­ign­ment” Pro­ject?

Super AGIMar 30, 2024, 5:25 AM
2 points
2 comments1 min readLW link

Death with Awesomeness

osmarksApr 1, 2024, 8:24 PM
3 points
2 comments2 min readLW link

Thou­sands of mal­i­cious ac­tors on the fu­ture of AI misuse

Apr 1, 2024, 10:08 AM
37 points
0 comments1 min readLW link

A Semiotic Cri­tique of the Orthog­o­nal­ity Thesis

Nicolas VillarrealJun 4, 2024, 6:52 PM
3 points
10 comments15 min readLW link

$250K in Prizes: SafeBench Com­pe­ti­tion An­nounce­ment

ozhangApr 3, 2024, 10:07 PM
26 points
0 comments1 min readLW link

[Question] How does the ever-in­creas­ing use of AI in the mil­i­tary for the di­rect pur­pose of mur­der­ing peo­ple af­fect your p(doom)?

JustausernameApr 6, 2024, 6:31 AM
19 points
16 comments1 min readLW link

Why em­piri­cists should be­lieve in AI risk

Knight LeeDec 11, 2024, 3:51 AM
5 points
0 comments1 min readLW link

[Question] fake al­ign­ment solu­tions????

KvmanThinkingDec 11, 2024, 3:31 AM
1 point
6 comments1 min readLW link

[Question] Can sin­gu­lar­ity emerge from trans­form­ers?

MPApr 8, 2024, 2:26 PM
−3 points
1 comment1 min readLW link

An­nounc­ing At­las Computing

miyazonoApr 11, 2024, 3:56 PM
44 points
4 comments4 min readLW link

I cre­ated an Asi Align­ment Tier List

TimeGoatApr 21, 2024, 6:44 PM
−6 points
0 comments1 min readLW link

Take­off speeds pre­sen­ta­tion at Anthropic

Tom DavidsonJun 4, 2024, 10:46 PM
92 points
0 comments25 min readLW link

LLMs seem (rel­a­tively) safe

JustisMillsApr 25, 2024, 10:13 PM
53 points
24 comments7 min readLW link
(justismills.substack.com)

The po­ten­tial Ai risk

The White DeathMay 7, 2024, 8:31 PM
1 point
0 comments1 min readLW link

Is There a Power Play Over­hang?

crispweedMay 8, 2024, 5:39 PM
3 points
0 comments1 min readLW link
(upcoder.com)

An­nounc­ing the AI Safety Sum­mit Talks with Yoshua Bengio

otto.bartenMay 14, 2024, 12:52 PM
9 points
1 comment1 min readLW link

MIT Fu­tureTech are hiring for an Oper­a­tions and Pro­ject Man­age­ment role.

peterslatteryMay 17, 2024, 11:21 PM
2 points
0 comments3 min readLW link

AI 2030 – AI Policy Roadmap

LTMMay 17, 2024, 11:29 PM
8 points
0 comments1 min readLW link

Giv­ing away your predictions

[Error communicating with LW2 server]Sep 8, 2024, 1:11 PM
1 point
0 comments1 min readLW link

La­bor Par­ti­ci­pa­tion is a High-Pri­or­ity AI Align­ment Risk

alexJun 17, 2024, 6:09 PM
6 points
0 comments17 min readLW link

Propos­ing the Post-Sin­gu­lar­ity Sym­biotic Researches

Hiroshi YamakawaJun 20, 2024, 4:05 AM
7 points
1 comment12 min readLW link

Con­nect­ing the Dots: LLMs can In­fer & Ver­bal­ize La­tent Struc­ture from Train­ing Data

Jun 21, 2024, 3:54 PM
163 points
13 comments8 min readLW link
(arxiv.org)

Grey Goo Re­quires AI

harsimonyJan 15, 2021, 4:45 AM
8 points
11 comments4 min readLW link
(harsimony.wordpress.com)

A nec­es­sary Mem­brane for­mal­ism feature

ThomasCederborgSep 10, 2024, 9:33 PM
20 points
6 comments11 min readLW link

Ted Kaczy­in­ski proves in­stru­men­tal con­ver­gence?

xXAlphaSigmaXxJun 28, 2024, 3:50 AM
0 points
0 comments1 min readLW link

Anal­y­sis of key AI analogies

Kevin KohlerJun 29, 2024, 10:55 AM
10 points
2 comments15 min readLW link

Re­view of METR’s pub­lic eval­u­a­tion protocol

Jun 30, 2024, 10:03 PM
10 points
0 comments5 min readLW link

Towards shut­down­able agents via stochas­tic choice

Jul 8, 2024, 10:14 AM
59 points
11 comments23 min readLW link
(arxiv.org)

[Question] Self-cen­sor­ing on AI x-risk dis­cus­sions?

DecaeneusJul 1, 2024, 6:24 PM
17 points
2 comments1 min readLW link

I read ev­ery ma­jor AI lab’s safety plan so you don’t have to

sarahhwDec 16, 2024, 6:51 PM
20 points
0 comments12 min readLW link
(longerramblings.substack.com)

In­tel­li­gence–Agency Equiv­alence ≈ Mass–En­ergy Equiv­alence: On Static Na­ture of In­tel­li­gence & Phys­i­cal­iza­tion of Ethics

ankFeb 22, 2025, 12:12 AM
1 point
0 comments6 min readLW link

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

Oct 31, 2024, 12:01 PM
194 points
52 comments2 min readLW link
(www.thecompendium.ai)

AIS Hun­gary is hiring a part-time Tech­ni­cal Lead! (Dead­line: Dec 31st)

gergogasparDec 17, 2024, 2:12 PM
1 point
0 comments2 min readLW link

Yoshua Ben­gio: Rea­son­ing through ar­gu­ments against tak­ing AI safety seriously

Judd RosenblattJul 11, 2024, 11:53 PM
70 points
3 comments1 min readLW link
(yoshuabengio.org)

Align­ment: “Do what I would have wanted you to do”

Oleg TrottJul 12, 2024, 4:47 PM
11 points
48 comments1 min readLW link

A Solu­tion for AGI/​ASI Safety

Weibing WangDec 18, 2024, 7:44 PM
50 points
29 comments1 min readLW link

The AI al­ign­ment prob­lem in so­cio-tech­ni­cal sys­tems from a com­pu­ta­tional per­spec­tive: A Top-Down-Top view and outlook

zhaoweizhangJul 15, 2024, 6:56 PM
3 points
0 comments9 min readLW link

AI Apoca­lypse and the Buddha

pchvykovFeb 22, 2025, 4:33 PM
−17 points
6 comments9 min readLW link

Re­cur­sion in AI is scary. But let’s talk solu­tions.

Oleg TrottJul 16, 2024, 8:34 PM
3 points
10 comments2 min readLW link

[Question] Would a scope-in­sen­si­tive AGI be less likely to in­ca­pac­i­tate hu­man­ity?

Jim BuhlerJul 21, 2024, 2:15 PM
2 points
3 comments1 min readLW link

The $100B plan with “70% risk of kil­ling us all” w Stephen Fry [video]

Oleg TrottJul 21, 2024, 8:06 PM
36 points
8 comments1 min readLW link
(www.youtube.com)

How rea­son­able is tak­ing ex­tinc­tion risk?

FVeldeJul 23, 2024, 6:05 PM
2 points
4 comments4 min readLW link

Knowl­edge Base 8: The truth as an at­trac­tor in the in­for­ma­tion space

iwisApr 25, 2024, 3:28 PM
−8 points
0 comments2 min readLW link

AI ex­is­ten­tial risk prob­a­bil­ities are too un­re­li­able to in­form policy

Oleg TrottJul 28, 2024, 12:59 AM
18 points
5 comments1 min readLW link
(www.aisnakeoil.com)

[Question] Has Eliezer pub­li­cly and satis­fac­to­rily re­sponded to at­tempted re­but­tals of the anal­ogy to evolu­tion?

kalerJul 28, 2024, 12:23 PM
10 points
14 comments1 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

Jul 30, 2024, 4:22 PM
215 points
49 comments12 min readLW link

The need for multi-agent experiments

Martín SotoAug 1, 2024, 5:14 PM
43 points
3 comments9 min readLW link

Eth­i­cal De­cep­tion: Should AI Ever Lie?

Jason ReidAug 2, 2024, 5:53 PM
5 points
2 comments7 min readLW link

LLMs stifle cre­ativity, elimi­nate op­por­tu­ni­ties for serendipi­tous dis­cov­ery and dis­rupt in­ter­gen­er­a­tional trans­fer of wisdom

GhdzAug 5, 2024, 6:27 PM
6 points
2 comments7 min readLW link

[Question] Is an AI re­li­gion jus­tified?

p4rziv4lAug 6, 2024, 3:42 PM
−35 points
11 comments1 min readLW link

Case Story: Lack of Con­sumer Pro­tec­tion Pro­ce­dures AI Ma­nipu­la­tion and the Threat of Fund Con­cen­tra­tion in Crypto Seek­ing As­sis­tance to Fund a Civil Case to Estab­lish Facts and Pro­tect Vuln­er­a­ble Con­sumers from Da­m­age Caused by Au­to­mated Sys­tems

Petr 'Margot' AndreevAug 8, 2024, 5:55 AM
−9 points
0 comments9 min readLW link

[Question] Does VETLM solve AI su­per­al­ign­ment?

Oleg TrottAug 8, 2024, 6:22 PM
−1 points
10 comments1 min readLW link

The Con­scious­ness Co­nun­drum: Why We Can’t Dis­miss Ma­chine Sentience

SystematicApproachAug 13, 2024, 6:01 PM
−21 points
1 comment3 min readLW link

Cri­tique of ‘Many Peo­ple Fear A.I. They Shouldn’t’ by David Brooks.

Axel AhlqvistAug 15, 2024, 6:38 PM
12 points
8 comments3 min readLW link

An Alter­nate His­tory of the Fu­ture, 2025-2040

Mr BeastlyFeb 24, 2025, 5:53 AM
3 points
3 comments10 min readLW link

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

Sep 25, 2024, 2:52 PM
36 points
2 comments4 min readLW link
(arxiv.org)

The Over­looked Ne­ces­sity of Com­plete Se­man­tic Rep­re­sen­ta­tion in AI Safety and Alignment

williamsaeAug 15, 2024, 7:42 PM
−1 points
0 comments3 min readLW link

AGI’s Op­pos­ing Force

SimonBaarsAug 16, 2024, 4:18 AM
9 points
2 comments1 min readLW link

An “Ob­ser­va­tory” For a Shy Su­per AI?

SherrinfordSep 27, 2024, 9:22 PM
5 points
0 comments1 min readLW link
(robreid.substack.com)

Knowl­edge Base 1: Could it in­crease in­tel­li­gence and make it safer?

iwisSep 30, 2024, 4:00 PM
−4 points
0 comments4 min readLW link

Does nat­u­ral se­lec­tion fa­vor AIs over hu­mans?

cdkgOct 3, 2024, 6:47 PM
20 points
1 comment1 min readLW link
(link.springer.com)

[Question] Seek­ing AI Align­ment Tu­tor/​Ad­vi­sor: $100–150/​hr

MrThinkOct 5, 2024, 9:28 PM
26 points
3 comments2 min readLW link

Limits of safe and al­igned AI

ShivamOct 8, 2024, 9:30 PM
2 points
0 comments4 min readLW link

Pal­isade is hiring: Exec As­sis­tant, Con­tent Lead, Ops Lead, and Policy Lead

Charlie Rogers-SmithOct 9, 2024, 12:04 AM
11 points
0 comments4 min readLW link

Ge­offrey Hin­ton on the Past, Pre­sent, and Fu­ture of AI

Stephen McAleeseOct 12, 2024, 4:41 PM
22 points
5 comments18 min readLW link

Fac­tor­ing P(doom) into a bayesian network

Joseph GardiOct 17, 2024, 5:55 PM
1 point
0 comments1 min readLW link

Lenses of Control

WillPetilloOct 22, 2024, 7:51 AM
14 points
0 comments9 min readLW link

Miles Brundage re­signed from OpenAI, and his AGI readi­ness team was disbanded

garrisonOct 23, 2024, 11:40 PM
118 points
1 comment7 min readLW link
(garrisonlovely.substack.com)

Dario Amodei’s “Machines of Lov­ing Grace” sound in­cred­ibly dan­ger­ous, for Humans

Super AGIOct 27, 2024, 5:05 AM
8 points
1 comment1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoinerOct 30, 2024, 6:31 PM
46 points
8 comments4 min readLW link

Nav­i­gat­ing pub­lic AI x-risk hype while pur­su­ing tech­ni­cal solutions

Dan BraunFeb 19, 2023, 12:22 PM
18 points
0 comments2 min readLW link

Tether­ware #2: What ev­ery hu­man should know about our most likely AI future

Jáchym FibírFeb 28, 2025, 11:12 AM
3 points
0 comments11 min readLW link
(tetherware.substack.com)

What AI safety re­searchers can learn from Ma­hatma Gandhi

Lysandre TerrisseNov 8, 2024, 7:49 PM
−6 points
0 comments3 min readLW link

Thoughts af­ter the Wolfram and Yud­kowsky discussion

TahpNov 14, 2024, 1:43 AM
25 points
13 comments6 min readLW link

Con­fronting the le­gion of doom.

Spiritus DeiNov 13, 2024, 5:03 PM
−20 points
3 comments5 min readLW link

Propos­ing the Con­di­tional AI Safety Treaty (linkpost TIME)

otto.bartenNov 15, 2024, 1:59 PM
10 points
8 comments3 min readLW link
(time.com)

Truth Ter­mi­nal: A re­con­struc­tion of events

Nov 17, 2024, 11:51 PM
3 points
1 comment7 min readLW link

[Question] Have we seen any “ReLU in­stead of sig­moid-type im­prove­ments” recently

KvmanThinkingNov 23, 2024, 3:51 AM
2 points
4 comments1 min readLW link

A bet­ter “State­ment on AI Risk?”

Knight LeeNov 25, 2024, 4:50 AM
5 points
6 comments3 min readLW link

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_ANov 24, 2024, 5:17 PM
1 point
0 comments9 min readLW link

Tak­ing Away the Guns First: The Fun­da­men­tal Flaw in AI Development

s-iceNov 26, 2024, 10:11 PM
1 point
0 comments17 min readLW link

Hope to live or fear to die?

Knight LeeNov 27, 2024, 10:42 AM
3 points
0 comments1 min readLW link

How to solve the mi­suse prob­lem as­sum­ing that in 10 years the de­fault sce­nario is that AGI agents are ca­pa­ble of syn­thetiz­ing pathogens

jeremttiNov 27, 2024, 9:17 PM
6 points
0 comments9 min readLW link

Hu­man study on AI spear phish­ing campaigns

Jan 3, 2025, 3:11 PM
79 points
8 comments5 min readLW link

I Recom­mend More Train­ing Rationales

Gianluca CalcagniDec 31, 2024, 2:06 PM
2 points
0 comments6 min readLW link

Align­ment Is Not All You Need

Adam JonesJan 2, 2025, 5:50 PM
43 points
10 comments6 min readLW link
(adamjones.me)

The In­tel­li­gence Curse

lukedragoJan 3, 2025, 7:07 PM
122 points
26 comments18 min readLW link
(lukedrago.substack.com)

Logic vs in­tu­ition ⇔ al­gorithm vs ML

pchvykovJan 4, 2025, 9:06 AM
5 points
0 comments7 min readLW link

The “Every­one Can’t Be Wrong” Prior causes AI risk de­nial but helped pre­his­toric people

Knight LeeJan 9, 2025, 5:54 AM
1 point
0 comments2 min readLW link

Thoughts on the In-Con­text Schem­ing AI Experiment

ExCephJan 9, 2025, 2:19 AM
3 points
0 comments4 min readLW link

False Pos­i­tives in En­tity-Level Hal­lu­ci­na­tion De­tec­tion: A Tech­ni­cal Challenge

MaxKamacheeJan 14, 2025, 7:22 PM
1 point
0 comments2 min readLW link

A prob­lem shared by many differ­ent al­ign­ment targets

ThomasCederborgJan 15, 2025, 2:22 PM
12 points
16 comments36 min readLW link

AI Align­ment Meme Viruses

RationalDinoJan 15, 2025, 3:55 PM
4 points
0 comments2 min readLW link

Ex­perts’ AI timelines are longer than you have been told?

Vasco GriloJan 16, 2025, 6:03 PM
10 points
4 comments3 min readLW link
(bayes.net)

[Question] What do you mean with ‘al­ign­ment is solv­able in prin­ci­ple’?

RemmeltJan 17, 2025, 3:03 PM
3 points
9 comments1 min readLW link

AI: How We Got Here—A Neu­ro­science Perspective

Mordechai RorvigJan 19, 2025, 11:51 PM
3 points
0 comments2 min readLW link
(www.kickstarter.com)

SIGMI Cer­tifi­ca­tion Criteria

a littoral wizardJan 20, 2025, 2:41 AM
6 points
0 comments1 min readLW link

The Hu­man Align­ment Prob­lem for AIs

rifeJan 22, 2025, 4:06 AM
10 points
5 comments3 min readLW link

Six Thoughts on AI Safety

boazbarakJan 24, 2025, 10:20 PM
91 points
55 comments15 min readLW link

Safe Search is off: root causes of AI catas­trophic risks

Jemal YoungJan 31, 2025, 6:22 PM
2 points
0 comments3 min readLW link

A god in a box

predict-wooJan 29, 2025, 12:55 AM
1 point
0 comments7 min readLW link

A cri­tique of Soares “4 back­ground claims”

YanLyutnevJan 27, 2025, 8:27 PM
−7 points
0 comments14 min readLW link

How To Prevent a Dystopia

ankJan 29, 2025, 2:16 PM
−3 points
4 comments1 min readLW link

[Question] Su­per­in­tel­li­gence Strat­egy: A Prag­matic Path to… Doom?

Mr BeastlyMar 19, 2025, 10:30 PM
6 points
0 comments3 min readLW link

AI ac­cel­er­a­tion, Deep­Seek, moral philosophy

Josh HFeb 2, 2025, 12:08 AM
2 points
0 comments12 min readLW link

Align­ment Can Re­duce Perfor­mance on Sim­ple Eth­i­cal Questions

Daan HenselmansFeb 3, 2025, 7:35 PM
15 points
7 comments6 min readLW link

The Univer­sal Lock­pick Hy­poth­e­sis: A Struc­tural Vuln­er­a­bil­ity of All Life

Cole HolinMar 23, 2025, 7:27 PM
1 point
0 comments2 min readLW link

Pop­ulec­tomy.ai

YonatanKMar 24, 2025, 10:06 PM
7 points
2 comments2 min readLW link

Ra­tional Utopia & Nar­row Way There: Mul­tiver­sal AI Align­ment, Non-Agen­tic Static Place AI, New Ethics… (V. 4)

ankFeb 11, 2025, 3:21 AM
13 points
8 comments35 min readLW link

How AI Takeover Might Hap­pen in 2 Years

joshcFeb 7, 2025, 5:10 PM
397 points
132 comments29 min readLW link
(x.com)

Scale Was All We Needed, At First

Gabe MFeb 14, 2024, 1:49 AM
295 points
34 comments8 min readLW link
(aiacumen.substack.com)

Ca­pa­bil­ities De­nial: The Danger of Un­der­es­ti­mat­ing AI

Christopher KingMar 21, 2023, 1:24 AM
6 points
5 comments3 min readLW link

Ex­plor­ing the Pre­cau­tion­ary Prin­ci­ple in AI Devel­op­ment: His­tor­i­cal Analo­gies and Les­sons Learned

Christopher KingMar 21, 2023, 3:53 AM
−1 points
2 comments9 min readLW link

[Question] Em­ployer con­sid­er­ing part­ner­ing with ma­jor AI labs. What to do?

GraduallyMoreAgitatedMar 21, 2023, 5:43 PM
37 points
7 comments2 min readLW link

Key Ques­tions for Digi­tal Minds

Jacy Reese AnthisMar 22, 2023, 5:13 PM
22 points
0 comments7 min readLW link
(www.sentienceinstitute.org)

Why We MUST Create an AGI that Disem­pow­ers Hu­man­ity. For Real.

twkaiserMar 22, 2023, 11:01 PM
−17 points
1 comment4 min readLW link

ChatGPT’s “fuzzy al­ign­ment” isn’t ev­i­dence of AGI al­ign­ment: the ba­nana test

Michael TontchevMar 23, 2023, 7:12 AM
23 points
6 comments4 min readLW link

Limit in­tel­li­gent weapons

Lucas PfeiferMar 23, 2023, 5:54 PM
−11 points
36 comments1 min readLW link

GPT-4 al­ign­ing with aca­sual de­ci­sion the­ory when in­structed to play games, but in­cludes a CDT ex­pla­na­tion that’s in­cor­rect if they differ

Christopher KingMar 23, 2023, 4:16 PM
7 points
4 comments8 min readLW link

Grind­ing slimes in the dun­geon of AI al­ign­ment research

Max HMar 24, 2023, 4:51 AM
10 points
2 comments4 min readLW link

Does GPT-4 ex­hibit agency when sum­ma­riz­ing ar­ti­cles?

Christopher KingMar 24, 2023, 3:49 PM
16 points
2 comments5 min readLW link

More ex­per­i­ments in GPT-4 agency: writ­ing memos

Christopher KingMar 24, 2023, 5:51 PM
5 points
2 comments10 min readLW link

ChatGPT Plu­g­ins—The Begin­ning of the End

Bary LevyMar 25, 2023, 11:45 AM
15 points
4 comments1 min readLW link

[Question] How Poli­tics in­ter­acts with AI ?

qbolecMar 26, 2023, 9:53 AM
−11 points
4 comments1 min readLW link

What can we learn from Lex Frid­man’s in­ter­view with Sam Alt­man?

Karl von WendtMar 27, 2023, 6:27 AM
56 points
22 comments9 min readLW link

Half-baked al­ign­ment idea

ozbMar 28, 2023, 5:47 PM
6 points
27 comments1 min readLW link

Adapt­ing to Change: Over­com­ing Chronos­ta­sis in AI Lan­guage Models

RationalMindsetMar 28, 2023, 2:32 PM
−1 points
0 comments6 min readLW link

I had a chat with GPT-4 on the fu­ture of AI and AI safety

Kristian FreedMar 28, 2023, 5:47 PM
1 point
0 comments8 min readLW link

“Un­in­ten­tional AI safety re­search”: Why not sys­tem­at­i­cally mine AI tech­ni­cal re­search for safety pur­poses?

Jemal YoungMar 29, 2023, 3:56 PM
27 points
3 comments6 min readLW link

I made AI Risk Propaganda

monkymindMar 29, 2023, 2:26 PM
−3 points
0 comments1 min readLW link

“Sorcerer’s Ap­pren­tice” from Fan­ta­sia as an anal­ogy for alignment

awgMar 29, 2023, 6:21 PM
9 points
4 comments1 min readLW link
(video.disney.com)

Paus­ing AI Devel­op­ments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibsMar 29, 2023, 11:16 PM
291 points
297 comments3 min readLW link
(time.com)

Align­ment—Path to AI as ally, not slave nor foe

ozbMar 30, 2023, 2:54 PM
10 points
3 comments2 min readLW link

Wi­den­ing Over­ton Win­dow—Open Thread

PrometheusMar 31, 2023, 10:03 AM
23 points
8 comments1 min readLW link

Why Yud­kowsky Is Wrong And What He Does Can Be More Dangerous

idontagreewiththatJun 6, 2023, 5:59 PM
−38 points
4 comments3 min readLW link

GPT-4 busted? Clear self-in­ter­est when sum­ma­riz­ing ar­ti­cles about it­self vs when ar­ti­cle talks about Claude, LLaMA, or DALL·E 2

Christopher KingMar 31, 2023, 5:05 PM
6 points
4 comments4 min readLW link

Imag­ine a world where Microsoft em­ploy­ees used Bing

Christopher KingMar 31, 2023, 6:36 PM
6 points
2 comments2 min readLW link

Is AGI suici­dal­ity the golden ray of hope?

Alex KirkoApr 4, 2023, 11:29 PM
−18 points
4 comments1 min readLW link

AI com­mu­nity build­ing: EliezerKart

Christopher KingApr 1, 2023, 3:25 PM
45 points
0 comments2 min readLW link

Pes­simism about AI Safety

Apr 2, 2023, 7:43 AM
4 points
1 comment25 min readLW link

AI Safety via Luck

JozdienApr 1, 2023, 8:13 PM
81 points
7 comments11 min readLW link

The AI gov­er­nance gaps in de­vel­op­ing countries

ntranJun 17, 2023, 2:50 AM
20 points
1 comment14 min readLW link

How to safely use an optimizer

Simon FischerMar 28, 2024, 4:11 PM
47 points
21 comments7 min readLW link

Ex­plor­ing non-an­thro­pocen­tric as­pects of AI ex­is­ten­tial safety

mishkaApr 3, 2023, 6:07 PM
8 points
0 comments3 min readLW link

Steer­ing systems

Max HApr 4, 2023, 12:56 AM
50 points
1 comment15 min readLW link

ICA Simulacra

OzyrusApr 5, 2023, 6:41 AM
26 points
2 comments7 min readLW link

Against sac­ri­fic­ing AI trans­parency for gen­er­al­ity gains

Ape in the coatMay 7, 2023, 6:52 AM
4 points
0 comments2 min readLW link

Su­per­in­tel­li­gence will out­smart us or it isn’t superintelligence

Neil Apr 3, 2023, 3:01 PM
−4 points
4 comments1 min readLW link

Do we have a plan for the “first crit­i­cal try” prob­lem?

Christopher KingApr 3, 2023, 4:27 PM
−3 points
14 comments1 min readLW link

Towards em­pa­thy in RL agents and be­yond: In­sights from cog­ni­tive sci­ence for AI Align­ment

Marc CarauleanuApr 3, 2023, 7:59 PM
15 points
6 comments1 min readLW link
(clipchamp.com)

[Question] Does it be­come eas­ier, or harder, for the world to co­or­di­nate around not build­ing AGI as time goes on?

Eli TyreJul 29, 2019, 10:59 PM
86 points
31 comments3 min readLW link2 reviews

Strate­gies to Prevent AI Annihilation

lastchanceformankindApr 4, 2023, 8:59 AM
−2 points
0 comments4 min readLW link

AGI de­ploy­ment as an act of aggression

dr_sApr 5, 2023, 6:39 AM
28 points
30 comments13 min readLW link

[Question] Daisy-chain­ing ep­silon-step verifiers

DecaeneusApr 6, 2023, 2:07 AM
2 points
1 comment1 min readLW link

One Does Not Sim­ply Re­place the Hu­mans

JerkyTreatsApr 6, 2023, 8:56 PM
9 points
3 comments4 min readLW link
(www.lesswrong.com)

OpenAI: Our ap­proach to AI safety

Jacob G-WApr 5, 2023, 8:26 PM
1 point
1 comment1 min readLW link
(openai.com)

Willi­ams-Beuren Syn­drome: Frendly Mutations

TakkApr 5, 2023, 8:59 PM
−1 points
1 comment1 min readLW link

Yoshua Ben­gio: “Slow­ing down de­vel­op­ment of AI sys­tems pass­ing the Tur­ing test”

Roman LeventovApr 6, 2023, 3:31 AM
49 points
2 comments5 min readLW link
(yoshuabengio.org)

A decade of lurk­ing, a month of posting

Max HApr 9, 2023, 12:21 AM
70 points
4 comments5 min readLW link

Align­ment of Au­toGPT agents

OzyrusApr 12, 2023, 12:54 PM
14 points
1 comment4 min readLW link

Why I’m not wor­ried about im­mi­nent doom

kwiat.devApr 10, 2023, 3:31 PM
7 points
2 comments4 min readLW link

Mea­sur­ing ar­tifi­cial in­tel­li­gence on hu­man bench­marks is naive

AnomalousApr 11, 2023, 11:34 AM
11 points
4 comments1 min readLW link
(forum.effectivealtruism.org)

In fa­vor of ac­cel­er­at­ing prob­lems you’re try­ing to solve

Christopher KingApr 11, 2023, 6:15 PM
2 points
2 comments4 min readLW link

AI Risk US Pres­i­den­tial Candidate

Simon BerensApr 11, 2023, 7:31 PM
5 points
3 comments1 min readLW link

Open-source LLMs may prove Bostrom’s vuln­er­a­ble world hypothesis

Roope AhvenharjuApr 15, 2023, 7:16 PM
1 point
1 comment1 min readLW link

Ar­tifi­cial In­tel­li­gence as exit strat­egy from the age of acute ex­is­ten­tial risk

Arturo MaciasApr 12, 2023, 2:48 PM
−7 points
15 comments7 min readLW link

AGI goal space is big, but nar­row­ing might not be as hard as it seems.

Jacy Reese AnthisApr 12, 2023, 7:03 PM
15 points
0 comments3 min readLW link

Pol­lut­ing the agen­tic commons

hamandcheeseApr 13, 2023, 5:42 PM
7 points
4 comments2 min readLW link
(www.secondbest.ca)

The Virus—Short Story

Michael SoareverixApr 13, 2023, 6:18 PM
4 points
0 comments4 min readLW link

On the pos­si­bil­ity of im­pos­si­bil­ity of AGI Long-Term Safety

Roman YenMay 13, 2023, 6:38 PM
8 points
3 comments9 min readLW link

Spec­u­la­tion on map­ping the moral land­scape for fu­ture Ai Alignment

Sven Heinz (Welwordion)Apr 16, 2023, 1:43 PM
1 point
0 comments1 min readLW link

On ur­gency, pri­or­ity and col­lec­tive re­ac­tion to AI-Risks: Part I

DenreikApr 16, 2023, 7:14 PM
−10 points
15 comments5 min readLW link

AGI Clinics: A Safe Haven for Hu­man­ity’s First En­coun­ters with Superintelligence

portr.Apr 17, 2023, 1:52 AM
−5 points
1 comment1 min readLW link

Defin­ing Boundaries on Out­comes

TakkJun 7, 2023, 5:41 PM
1 point
0 comments1 min readLW link

No, re­ally, it pre­dicts next to­kens.

simonApr 18, 2023, 3:47 AM
58 points
55 comments3 min readLW link

Pre­dic­tion: any un­con­trol­lable AI will turn earth into a gi­ant computer

Karl von WendtApr 17, 2023, 12:30 PM
11 points
8 comments3 min readLW link

What is your timelines for ADI (ar­tifi­cial dis­em­pow­er­ing in­tel­li­gence)?

Christopher KingApr 17, 2023, 5:01 PM
3 points
3 comments2 min readLW link

Green goo is plausible

anithiteApr 18, 2023, 12:04 AM
62 points
31 comments4 min readLW link1 review

World and Mind in Ar­tifi­cial In­tel­li­gence: ar­gu­ments against the AI pause

Arturo MaciasApr 18, 2023, 2:40 PM
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

Apr 18, 2023, 6:44 PM
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

I Believe I Know Why AI Models Hallucinate

Richard AragonApr 19, 2023, 9:07 PM
−10 points
6 comments7 min readLW link
(turingssolutions.com)

[Cross­post] Or­ga­niz­ing a de­bate with ex­perts and MPs to raise AI xrisk aware­ness: a pos­si­ble blueprint

otto.bartenApr 19, 2023, 11:45 AM
8 points
0 comments4 min readLW link
(forum.effectivealtruism.org)

How to ex­press this sys­tem for eth­i­cally al­igned AGI as a Math­e­mat­i­cal for­mula?

Oliver SiegelApr 19, 2023, 8:13 PM
−1 points
0 comments1 min readLW link

[Question] Is there any liter­a­ture on us­ing so­cial­iza­tion for AI al­ign­ment?

Nathan1123Apr 19, 2023, 10:16 PM
10 points
9 comments2 min readLW link

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

OzyrusApr 20, 2023, 6:04 AM
11 points
3 comments1 min readLW link
(github.com)

Ideas for stud­ies on AGI risk

dr_sApr 20, 2023, 6:17 PM
5 points
1 comment11 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher KingApr 20, 2023, 7:57 PM
2 points
7 comments3 min readLW link

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

JakubKApr 21, 2023, 10:07 AM
16 points
0 comments5 min readLW link
(sohl-dickstein.github.io)

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarksApr 22, 2023, 2:36 PM
39 points
7 comments5 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dovApr 23, 2023, 5:34 AM
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Paths to failure

Apr 25, 2023, 8:03 AM
29 points
1 comment8 min readLW link

A con­cise sum-up of the ba­sic ar­gu­ment for AI doom

Mergimio H. DoefevmilApr 24, 2023, 5:37 PM
11 points
6 comments2 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian FreedApr 24, 2023, 5:23 PM
7 points
0 comments4 min readLW link

A Pro­posal for AI Align­ment: Us­ing Directly Op­pos­ing Models

Arne BApr 27, 2023, 6:05 PM
0 points
5 comments3 min readLW link

Mak­ing Nanobots isn’t a one-shot pro­cess, even for an ar­tifi­cial superintelligance

dankradApr 25, 2023, 12:39 AM
20 points
13 comments6 min readLW link

My Assess­ment of the Chi­nese AI Safety Community

Lao MeinApr 25, 2023, 4:21 AM
250 points
94 comments3 min readLW link

Briefly how I’ve up­dated since ChatGPT

rimeApr 25, 2023, 2:47 PM
48 points
2 comments2 min readLW link

Free­dom Is All We Need

Leo GlisicApr 27, 2023, 12:09 AM
−1 points
8 comments10 min readLW link

Hal­loween Problem

Saint BlasphemerOct 24, 2023, 4:46 PM
−10 points
1 comment1 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.bartenOct 24, 2023, 10:11 AM
17 points
1 comment1 min readLW link

[Question] What if AGI had its own uni­verse to maybe wreck?

msealeOct 26, 2023, 5:49 PM
−1 points
2 comments1 min readLW link

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_cOct 25, 2023, 11:46 PM
123 points
35 comments22 min readLW link1 review
(www.navigatingrisks.ai)

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal NaranOct 26, 2023, 5:54 PM
−7 points
2 comments2 min readLW link

[un­ti­tled post]

NeuralSystem_e5e1Apr 27, 2023, 5:37 PM
3 points
0 comments1 min readLW link

Re­sponse to “Co­or­di­nated paus­ing: An eval­u­a­tion-based co­or­di­na­tion scheme for fron­tier AI de­vel­op­ers”

Matthew WeardenOct 30, 2023, 5:27 PM
5 points
2 comments6 min readLW link
(matthewwearden.co.uk)

Char­bel-Raphaël and Lu­cius dis­cuss interpretability

Oct 30, 2023, 5:50 AM
111 points
7 comments21 min readLW link

An In­ter­na­tional Man­hat­tan Pro­ject for Ar­tifi­cial Intelligence

Glenn ClaytonApr 27, 2023, 5:34 PM
−11 points
2 comments5 min readLW link

Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik SamoylovOct 30, 2023, 11:42 PM
−19 points
11 comments2 min readLW link

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbanceNov 2, 2023, 5:43 PM
84 points
89 comments5 min readLW link

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasperNov 4, 2023, 8:08 PM
277 points
42 comments3 min readLW link

AI as Su­per-Demagogue

RationalDinoNov 5, 2023, 9:21 PM
11 points
12 comments9 min readLW link

Sym­biotic self-al­ign­ment of AIs.

Spiritus DeiNov 7, 2023, 5:18 PM
1 point
0 comments3 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

Nov 7, 2023, 5:59 PM
38 points
2 comments2 min readLW link
(arxiv.org)

The So­cial Align­ment Problem

irvingApr 28, 2023, 2:16 PM
99 points
13 comments8 min readLW link

Do you want a first-prin­ci­pled pre­pared­ness guide to pre­pare your­self and loved ones for po­ten­tial catas­tro­phes?

Ulrik HornNov 14, 2023, 12:13 PM
16 points
5 comments15 min readLW link

[Question] Real­is­tic near-fu­ture sce­nar­ios of AI doom un­der­stand­able for non-techy peo­ple?

RomanSApr 28, 2023, 2:45 PM
4 points
4 comments1 min readLW link

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir KumarNov 16, 2023, 2:02 AM
1 point
0 comments1 min readLW link

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Joerg WeissNov 16, 2023, 6:46 PM
11 points
2 comments29 min readLW link

On ex­clud­ing dan­ger­ous in­for­ma­tion from training

ShayBenMosheNov 17, 2023, 11:14 AM
23 points
5 comments3 min readLW link

Killswitch

JunioNov 18, 2023, 10:53 PM
2 points
0 comments3 min readLW link

Ilya: The AI sci­en­tist shap­ing the world

David VargaNov 20, 2023, 1:09 PM
11 points
0 comments4 min readLW link

A Guide to Fore­cast­ing AI Science Ca­pa­bil­ities

Eleni AngelouApr 29, 2023, 11:24 PM
6 points
1 comment4 min readLW link

The two para­graph ar­gu­ment for AI risk

CronoDASNov 25, 2023, 2:01 AM
19 points
8 comments1 min readLW link

AISC 2024 - Pro­ject Summaries

NickyPNov 27, 2023, 10:32 PM
48 points
3 comments18 min readLW link

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreigNov 29, 2023, 1:59 PM
4 points
0 comments5 min readLW link

Sup­port me in a Week-Long Pick­et­ing Cam­paign Near OpenAI’s HQ: Seek­ing Sup­port and Ideas from the LessWrong Community

PercyApr 30, 2023, 5:48 PM
−21 points
15 comments1 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven ByrnesDec 1, 2023, 5:30 PM
197 points
62 comments14 min readLW link1 review

The benefits and risks of op­ti­mism (about AI safety)

Karl von WendtDec 3, 2023, 12:45 PM
−7 points
6 comments5 min readLW link

A call for a quan­ti­ta­tive re­port card for AI bioter­ror­ism threat models

JunoDec 4, 2023, 6:35 AM
12 points
0 comments10 min readLW link

[Question] Ac­cu­racy of ar­gu­ments that are seen as ridicu­lous and in­tu­itively false but don’t have good counter-arguments

Christopher KingApr 29, 2023, 11:58 PM
30 points
39 comments1 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

Aug 3, 2024, 10:16 AM
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

Call for sub­mis­sions: Choice of Fu­tures sur­vey questions

c.troutApr 30, 2023, 6:59 AM
4 points
0 comments2 min readLW link
(airtable.com)

Ac­cess to AI: a hu­man right?

dmteaJul 25, 2020, 9:38 AM
5 points
3 comments2 min readLW link

Agen­tic Lan­guage Model Memes

FactorialCodeAug 1, 2020, 6:03 PM
16 points
1 comment2 min readLW link

Con­ver­sa­tion with Paul Christiano

abergalSep 11, 2019, 11:20 PM
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepicNov 14, 2011, 5:02 PM
112 points
9 comments56 min readLW link

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprogJul 8, 2013, 2:33 PM
17 points
8 comments1 min readLW link

How can I re­duce ex­is­ten­tial risk from AI?

lukeprogNov 13, 2012, 9:56 PM
63 points
92 comments8 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)Feb 6, 2019, 7:09 PM
25 points
17 comments1 min readLW link

Refram­ing mis­al­igned AGI’s: well-in­ten­tioned non-neu­rotyp­i­cal assistants

zhukeepaApr 1, 2018, 1:22 AM
46 points
14 comments2 min readLW link

When is un­al­igned AI morally valuable?

paulfchristianoMay 25, 2018, 1:57 AM
81 points
53 comments10 min readLW link

In­tro­duc­ing the AI Align­ment Fo­rum (FAQ)

Oct 29, 2018, 9:07 PM
87 points
8 comments6 min readLW link

Swim­ming Up­stream: A Case Study in In­stru­men­tal Rationality

TurnTroutJun 3, 2018, 3:16 AM
76 points
7 comments8 min readLW link

Cur­rent AI Safety Roles for Soft­ware Engineers

ozziegooenNov 9, 2018, 8:57 PM
70 points
9 comments4 min readLW link

[Question] Why is so much dis­cus­sion hap­pen­ing in pri­vate Google Docs?

Wei DaiJan 12, 2019, 2:19 AM
101 points
22 comments1 min readLW link

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei DaiAug 17, 2019, 5:38 PM
79 points
14 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei DaiDec 16, 2018, 10:13 PM
102 points
25 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_itJan 20, 2019, 2:46 PM
74 points
41 comments1 min readLW link

Soon: a weekly AI Safety pre­req­ui­sites mod­ule on LessWrong

nullApr 30, 2018, 1:23 PM
35 points
10 comments1 min readLW link

And the AI would have got away with it too, if...

Stuart_ArmstrongMay 22, 2019, 9:35 PM
75 points
7 comments1 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

LarksDec 24, 2017, 6:52 PM
41 points
5 comments23 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer YudkowskyDec 12, 2018, 1:40 AM
97 points
7 comments9 min readLW link

I Vouch For MIRI

ZviDec 17, 2017, 5:50 PM
39 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Be­ware of black boxes in AI al­ign­ment research

cousin_itJan 18, 2018, 3:07 PM
39 points
10 comments1 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

ZviMar 12, 2018, 12:10 PM
28 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Three AI Safety Re­lated Ideas

Wei DaiDec 13, 2018, 9:32 PM
69 points
38 comments2 min readLW link

A rant against robots

Lê Nguyên HoangJan 14, 2020, 10:03 PM
65 points
7 comments5 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

Alex FlintMar 31, 2018, 6:37 PM
30 points
3 comments11 min readLW link

Course recom­men­da­tions for Friendli­ness researchers

LouieJan 9, 2013, 2:33 PM
96 points
112 comments10 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_KristofferssonFeb 2, 2018, 4:25 AM
29 points
11 comments8 min readLW link

AI Sum­mer Fel­lows Program

colmMar 21, 2018, 3:32 PM
21 points
0 comments1 min readLW link

The ge­nie knows, but doesn’t care

Rob BensingerSep 6, 2013, 6:42 AM
121 points
495 comments8 min readLW link

[Question] Does agency nec­es­sar­ily im­ply self-preser­va­tion in­stinct?

Mislav JurićMay 1, 2023, 4:06 PM
5 points
8 comments1 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin ShahJul 2, 2018, 4:10 PM
70 points
12 comments8 min readLW link
(mailchi.mp)

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël TrazziJul 1, 2019, 3:26 PM
63 points
16 comments5 min readLW link

The sim­ple pic­ture on AI safety

Alex FlintMay 27, 2018, 7:43 PM
31 points
10 comments2 min readLW link

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

May 1, 2023, 4:47 PM
96 points
10 comments30 min readLW link

Elon Musk donates $10M to the Fu­ture of Life In­sti­tute to keep AI benefi­cial

Paul CrowleyJan 15, 2015, 4:33 PM
78 points
52 comments1 min readLW link

Strate­gic im­pli­ca­tions of AIs’ abil­ity to co­or­di­nate at low cost, for ex­am­ple by merging

Wei DaiApr 25, 2019, 5:08 AM
69 points
46 comments2 min readLW link1 review

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

Ramana KumarJun 21, 2019, 12:50 PM
43 points
6 comments1 min readLW link
(arxiv.org)

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogsMay 15, 2018, 8:11 PM
17 points
12 comments1 min readLW link
(www.theatlantic.com)

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_ArmstrongOct 10, 2017, 11:06 AM
23 points
8 comments1 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]Jul 23, 2014, 12:24 AM
62 points
28 comments3 min readLW link

AI Align­ment Re­search Overview (by Ja­cob Stein­hardt)

Ben PaceNov 6, 2019, 7:24 PM
44 points
0 comments7 min readLW link
(docs.google.com)

A gen­eral model of safety-ori­ented AI development

Wei DaiJun 11, 2018, 9:00 PM
65 points
8 comments1 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei DaiSep 10, 2019, 8:29 AM
52 points
26 comments3 min readLW link

AI Safety Newslet­ter #4: AI and Cy­ber­se­cu­rity, Per­sua­sive AIs, Weaponiza­tion, and Ge­offrey Hin­ton talks AI risks

May 2, 2023, 6:41 PM
32 points
0 comments5 min readLW link
(newsletter.safe.ai)

Avert­ing Catas­tro­phe: De­ci­sion The­ory for COVID-19, Cli­mate Change, and Po­ten­tial Disasters of All Kinds

JakubKMay 2, 2023, 10:50 PM
10 points
0 comments1 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_ArmstrongApr 7, 2014, 11:00 AM
83 points
418 comments7 min readLW link

Reg­u­late or Com­pete? The China Fac­tor in U.S. AI Policy (NAIR #2)

charles_mMay 5, 2023, 5:43 PM
2 points
1 comment7 min readLW link
(navigatingairisks.substack.com)

But What If We Ac­tu­ally Want To Max­i­mize Paper­clips?

snerxMay 25, 2023, 7:13 AM
−17 points
6 comments7 min readLW link

Top 9+2 myths about AI risk

Stuart_ArmstrongJun 29, 2015, 8:41 PM
68 points
44 comments2 min readLW link

For­mal­iz­ing the “AI x-risk is un­likely be­cause it is ridicu­lous” argument

Christopher KingMay 3, 2023, 6:56 PM
48 points
17 comments3 min readLW link

Ro­hin Shah on rea­sons for AI optimism

abergalOct 31, 2019, 12:10 PM
40 points
58 comments1 min readLW link
(aiimpacts.org)

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_ArmstrongFeb 6, 2020, 11:50 AM
38 points
25 comments3 min readLW link

We don’t need AGI for an amaz­ing future

Karl von WendtMay 4, 2023, 12:10 PM
18 points
32 comments5 min readLW link

[Question] Why not use ac­tive SETI to pre­vent AI Doom?

RomanSMay 5, 2023, 2:41 PM
13 points
13 comments1 min readLW link
No comments.