RSS

AI Risk

TagLast edit: 2 Nov 2022 20:27 UTC by brook

AI Risk is analysis of the risks associated with building powerful AI systems.

Related: AI, Orthogonality thesis, Complexity of value, Goodhart’s law, Paperclip maximiser

Su­per­in­tel­li­gence FAQ

Scott Alexander20 Sep 2016 19:00 UTC
134 points
38 comments27 min readLW link

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky5 Jun 2022 22:05 UTC
916 points
703 comments30 min readLW link3 reviews

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
416 points
54 comments8 min readLW link2 reviews

Speci­fi­ca­tion gam­ing ex­am­ples in AI

Vika3 Apr 2018 12:30 UTC
46 points
9 comments1 min readLW link2 reviews

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob Bensinger7 May 2023 21:52 UTC
91 points
26 comments19 min readLW link

Where I agree and dis­agree with Eliezer

paulfchristiano19 Jun 2022 19:15 UTC
889 points
220 comments18 min readLW link2 reviews

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

11 Nov 2021 3:01 UTC
328 points
251 comments34 min readLW link1 review

In­tu­itions about goal-di­rected behavior

Rohin Shah1 Dec 2018 4:25 UTC
54 points
15 comments6 min readLW link

“Cor­rigi­bil­ity at some small length” by dath ilan

Christopher King5 Apr 2023 1:47 UTC
32 points
3 comments9 min readLW link
(www.glowfic.com)

AGI in sight: our look at the game board

18 Feb 2023 22:17 UTC
226 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Episte­molog­i­cal Fram­ing for AI Align­ment Research

adamShimi8 Mar 2021 22:05 UTC
58 points
7 comments9 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 2:49 UTC
305 points
88 comments29 min readLW link3 reviews

Con­jec­ture in­ter­nal sur­vey: AGI timelines and prob­a­bil­ity of hu­man ex­tinc­tion from ad­vanced AI

Maris Sala22 May 2023 14:31 UTC
155 points
5 comments3 min readLW link
(www.conjecture.dev)

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
37 points
4 comments2 min readLW link

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

18 Dec 2023 20:35 UTC
168 points
21 comments12 min readLW link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

apc8 Feb 2020 21:28 UTC
104 points
29 comments16 min readLW link

Open Prob­lems in AI X-Risk [PAIS #5]

10 Jun 2022 2:08 UTC
61 points
6 comments36 min readLW link

[Question] Will OpenAI’s work un­in­ten­tion­ally in­crease ex­is­ten­tial risks re­lated to AI?

adamShimi11 Aug 2020 18:16 UTC
53 points
55 comments1 min readLW link

Meta AI an­nounces Cicero: Hu­man-Level Di­plo­macy play (with di­alogue)

Jacy Reese Anthis22 Nov 2022 16:50 UTC
93 points
64 comments1 min readLW link
(www.science.org)

Devel­op­men­tal Stages of GPTs

orthonormal26 Jul 2020 22:03 UTC
140 points
72 comments7 min readLW link1 review

Stampy’s AI Safety Info—New Distil­la­tions #1 [March 2023]

markov7 Apr 2023 11:06 UTC
42 points
0 comments2 min readLW link
(aisafety.info)

MIRI an­nounces new “Death With Dig­nity” strategy

Eliezer Yudkowsky2 Apr 2022 0:43 UTC
344 points
545 comments18 min readLW link1 review

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC
105 points
13 comments4 min readLW link

Bing chat is the AI fire alarm

Ratios17 Feb 2023 6:51 UTC
115 points
63 comments3 min readLW link

Robin Han­son’s lat­est AI risk po­si­tion statement

Liron3 Mar 2023 14:25 UTC
55 points
18 comments1 min readLW link
(www.overcomingbias.com)

AI will change the world, but won’t take it over by play­ing “3-di­men­sional chess”.

22 Nov 2022 18:57 UTC
133 points
97 comments24 min readLW link

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC
74 points
9 comments3 min readLW link
(github.com)

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
244 points
38 comments12 min readLW link1 review

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC
67 points
11 comments1 min readLW link

A Path out of In­suffi­cient Views

Unreal24 Sep 2024 20:00 UTC
55 points
46 comments9 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
14 comments9 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofsky9 Jun 2022 15:50 UTC
170 points
42 comments17 min readLW link
(www.cold-takes.com)

[Question] How likely are sce­nar­ios where AGI ends up overtly or de facto tor­tur­ing us? How likely are sce­nar­ios where AGI pre­vents us from com­mit­ting suicide or dy­ing?

JohnGreer28 Mar 2023 18:00 UTC
11 points
4 comments1 min readLW link

Coun­ter­ar­gu­ments to the ba­sic AI x-risk case

KatjaGrace14 Oct 2022 13:00 UTC
370 points
124 comments34 min readLW link1 review
(aiimpacts.org)

Soft take­off can still lead to de­ci­sive strate­gic advantage

Daniel Kokotajlo23 Aug 2019 16:39 UTC
122 points
47 comments8 min readLW link4 reviews

How good is hu­man­ity at co­or­di­na­tion?

Buck21 Jul 2020 20:01 UTC
82 points
44 comments3 min readLW link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin Pope21 Mar 2023 0:06 UTC
357 points
230 comments39 min readLW link

Don’t ac­cel­er­ate prob­lems you’re try­ing to solve

15 Feb 2023 18:11 UTC
100 points
27 comments4 min readLW link

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

11 Apr 2023 17:30 UTC
141 points
11 comments1 min readLW link

Should we post­pone AGI un­til we reach safety?

otto.barten18 Nov 2020 15:43 UTC
27 points
36 comments3 min readLW link

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannell18 Dec 2021 2:15 UTC
32 points
22 comments4 min readLW link

Be­ing at peace with Doom

Johannes C. Mayer9 Apr 2023 14:53 UTC
23 points
14 comments4 min readLW link1 review

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons to Pre­serve Humanity

James_Miller18 Mar 2023 16:22 UTC
37 points
73 comments12 min readLW link

The Hid­den Com­plex­ity of Wishes

Eliezer Yudkowsky24 Nov 2007 0:12 UTC
176 points
196 comments8 min readLW link

Alexan­der and Yud­kowsky on AGI goals

24 Jan 2023 21:09 UTC
177 points
53 comments26 min readLW link1 review

“Endgame safety” for AGI

Steven Byrnes24 Jan 2023 14:15 UTC
84 points
10 comments6 min readLW link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimi11 Oct 2021 8:20 UTC
110 points
10 comments15 min readLW link

Devil’s Ad­vo­cate: Ad­verse Selec­tion Against Con­scien­tious­ness

lionhearted (Sebastian Marshall)28 May 2023 17:53 UTC
10 points
2 comments1 min readLW link

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
217 points
11 comments8 min readLW link

Did Ben­gio and Teg­mark lose a de­bate about AI x-risk against LeCun and Mitchell?

Karl von Wendt25 Jun 2023 16:59 UTC
106 points
53 comments7 min readLW link

Ar­chi­tects of Our Own Demise: We Should Stop Devel­op­ing AI Carelessly

Roko26 Oct 2023 0:36 UTC
176 points
75 comments3 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
301 points
33 comments2 min readLW link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC
1 point
10 comments3 min readLW link

Truth­ful LMs as a warm-up for al­igned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC
65 points
14 comments13 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhub7 Sep 2019 18:11 UTC
78 points
11 comments8 min readLW link

A Nar­row Path: a plan to deal with AI ex­tinc­tion risk

7 Oct 2024 13:02 UTC
73 points
11 comments2 min readLW link
(www.narrowpath.co)

Stuxnet, not Skynet: Hu­man­ity’s dis­em­pow­er­ment by AI

Roko4 Nov 2023 22:23 UTC
107 points
24 comments6 min readLW link

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC
20 points
2 comments33 min readLW link
(aiimpacts.org)

“Why can’t you just turn it off?”

Roko19 Nov 2023 14:46 UTC
48 points
25 comments1 min readLW link

The other side of the tidal wave

KatjaGrace3 Nov 2023 5:40 UTC
186 points
86 comments1 min readLW link
(worldspiritsockpuppet.com)

Ban­kless Pod­cast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesed20 Feb 2023 16:42 UTC
83 points
54 comments1 min readLW link
(www.youtube.com)

Assess­ment of AI safety agen­das: think about the down­side risk

Roman Leventov19 Dec 2023 9:00 UTC
13 points
1 comment1 min readLW link

“Hu­man­ity vs. AGI” Will Never Look Like “Hu­man­ity vs. AGI” to Humanity

Thane Ruthenis16 Dec 2023 20:08 UTC
189 points
34 comments5 min readLW link

Cri­tiquing “What failure looks like”

Grue_Slinky27 Dec 2019 23:59 UTC
35 points
6 comments3 min readLW link

[Question] Did AI pi­o­neers not worry much about AI risks?

lisperati9 Feb 2020 19:58 UTC
42 points
9 comments1 min readLW link

Some dis­junc­tive rea­sons for ur­gency on AI risk

Wei Dai15 Feb 2019 20:43 UTC
36 points
24 comments1 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

15 Aug 2019 21:29 UTC
79 points
12 comments9 min readLW link

AI safety tax dynamics

owencb23 Oct 2024 12:18 UTC
22 points
0 comments6 min readLW link
(strangecities.substack.com)

Do not delete your mis­al­igned AGI.

mako yass24 Mar 2024 21:37 UTC
62 points
13 comments3 min readLW link

An­nounce­ment: AI al­ign­ment prize win­ners and next round

cousin_it15 Jan 2018 14:33 UTC
81 points
68 comments2 min readLW link

Why AI Safety is Hard

Simon Möller22 Mar 2023 10:44 UTC
3 points
0 comments6 min readLW link

Re­ply to Holden on ‘Tool AI’

Eliezer Yudkowsky12 Jun 2012 18:00 UTC
152 points
356 comments17 min readLW link

The first fu­ture and the best future

KatjaGrace25 Apr 2024 6:40 UTC
106 points
12 comments1 min readLW link
(worldspiritsockpuppet.com)

“Care­fully Boot­strapped Align­ment” is or­ga­ni­za­tion­ally hard

Raemon17 Mar 2023 18:00 UTC
262 points
22 comments11 min readLW link

The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

Akash23 Mar 2023 17:10 UTC
107 points
24 comments6 min readLW link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin Shah2 Jan 2020 18:20 UTC
36 points
95 comments10 min readLW link
(mailchi.mp)

Against a Gen­eral Fac­tor of Doom

Jeffrey Heninger23 Nov 2022 16:50 UTC
61 points
19 comments4 min readLW link1 review
(aiimpacts.org)

In­cen­tives and Selec­tion: A Miss­ing Frame From AI Threat Dis­cus­sions?

DragonGod26 Feb 2023 1:18 UTC
11 points
16 comments2 min readLW link

World-Model In­ter­pretabil­ity Is All We Need

Thane Ruthenis14 Jan 2023 19:37 UTC
35 points
22 comments21 min readLW link

Pod­cast: Shoshan­nah Tekofsky on skil­ling up in AI safety, vis­it­ing Berkeley, and de­vel­op­ing novel re­search ideas

Akash25 Nov 2022 20:47 UTC
37 points
2 comments9 min readLW link

ea.do­mains—Do­mains Free to a Good Home

plex12 Jan 2023 13:32 UTC
24 points
0 comments1 min readLW link

The Prefer­ence Fulfill­ment Hypothesis

Kaj_Sotala26 Feb 2023 10:55 UTC
66 points
62 comments11 min readLW link

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC
221 points
61 comments15 min readLW link2 reviews

AISN #25: White House Ex­ec­u­tive Order on AI, UK AI Safety Sum­mit, and Progress on Vol­un­tary Eval­u­a­tions of AI Risks

31 Oct 2023 19:34 UTC
35 points
1 comment6 min readLW link
(newsletter.safe.ai)

How Josiah be­came an AI safety researcher

Neil Crawford6 Sep 2022 17:17 UTC
4 points
0 comments1 min readLW link

A con­ver­sa­tion about Katja’s coun­ter­ar­gu­ments to AI risk

18 Oct 2022 18:40 UTC
43 points
9 comments33 min readLW link

Over­sight Misses 100% of Thoughts The AI Does Not Think

johnswentworth12 Aug 2022 16:30 UTC
103 points
49 comments1 min readLW link

Con­fu­sions in My Model of AI Risk

peterbarnett7 Jul 2022 1:05 UTC
22 points
9 comments5 min readLW link

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas Teixeira29 Jul 2022 21:06 UTC
38 points
1 comment9 min readLW link

Re­sults from the lan­guage model hackathon

Esben Kran10 Oct 2022 8:29 UTC
22 points
1 comment4 min readLW link

AI Safety Micro­grant Round

Chris_Leong14 Nov 2022 4:25 UTC
22 points
1 comment1 min readLW link

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
108 points
5 comments4 min readLW link

Went­worth and Larsen on buy­ing time

9 Jan 2023 21:31 UTC
73 points
6 comments12 min readLW link

Thoughts on re­fus­ing harm­ful re­quests to large lan­guage models

William_S19 Jan 2023 19:49 UTC
32 points
4 comments2 min readLW link

Many AI gov­er­nance pro­pos­als have a trade­off be­tween use­ful­ness and feasibility

3 Feb 2023 18:49 UTC
22 points
2 comments2 min readLW link

AI Safety Info Distil­la­tion Fellowship

17 Feb 2023 16:16 UTC
47 points
3 comments3 min readLW link

Full Tran­script: Eliezer Yud­kowsky on the Ban­kless podcast

23 Feb 2023 12:34 UTC
138 points
89 comments75 min readLW link

Con­tra Han­son on AI Risk

Liron4 Mar 2023 8:02 UTC
36 points
23 comments8 min readLW link

Plan for mediocre al­ign­ment of brain-like [model-based RL] AGI

Steven Byrnes13 Mar 2023 14:11 UTC
67 points
25 comments12 min readLW link

The Wizard of Oz Prob­lem: How in­cen­tives and nar­ra­tives can skew our per­cep­tion of AI developments

Akash20 Mar 2023 20:44 UTC
16 points
3 comments6 min readLW link

Will GPT-5 be able to self-im­prove?

Nathan Helm-Burger29 Apr 2023 17:34 UTC
18 points
22 comments3 min readLW link

(4 min read) An in­tu­itive ex­pla­na­tion of the AI in­fluence situation

trevor13 Jan 2024 17:34 UTC
12 points
26 comments4 min readLW link

Stan­ford En­cy­clo­pe­dia of Philos­o­phy on AI ethics and superintelligence

Kaj_Sotala2 May 2020 7:35 UTC
43 points
19 comments7 min readLW link
(plato.stanford.edu)

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
82 points
14 comments7 min readLW link

The strat­egy-steal­ing assumption

paulfchristiano16 Sep 2019 15:23 UTC
86 points
53 comments12 min readLW link3 reviews

AI Safety “Suc­cess Sto­ries”

Wei Dai7 Sep 2019 2:54 UTC
125 points
27 comments4 min readLW link1 review

Drexler on AI Risk

PeterMcCluskey1 Feb 2019 5:11 UTC
35 points
10 comments9 min readLW link
(www.bayesianinvestor.com)

Six AI Risk/​Strat­egy Ideas

Wei Dai27 Aug 2019 0:40 UTC
69 points
17 comments4 min readLW link1 review

“Tak­ing AI Risk Se­ri­ously” (thoughts by Critch)

Raemon29 Jan 2018 9:27 UTC
110 points
68 comments13 min readLW link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

John Nay21 Oct 2022 2:03 UTC
5 points
18 comments54 min readLW link

Staged release

Zach Stein-Perlman17 Apr 2024 16:00 UTC
11 points
4 comments2 min readLW link

Gra­di­ent Des­cent on the Hu­man Brain

1 Apr 2024 22:39 UTC
57 points
5 comments2 min readLW link

[Question] Good tax­onomies of all risks (small or large) from AI?

Aryeh Englander5 Mar 2024 18:15 UTC
6 points
1 comment1 min readLW link

A Rocket–In­ter­pretabil­ity Analogy

plex21 Oct 2024 13:55 UTC
149 points
31 comments1 min readLW link

Are we drop­ping the ball on Recom­men­da­tion AIs?

Charbel-Raphaël23 Oct 2024 17:48 UTC
40 points
17 comments6 min readLW link

[Linkpost] Scott Alexan­der re­acts to OpenAI’s lat­est post

Akash11 Mar 2023 22:24 UTC
27 points
0 comments5 min readLW link
(astralcodexten.substack.com)

Evil au­to­com­plete: Ex­is­ten­tial Risk and Next-To­ken Predictors

Yitz28 Feb 2023 8:47 UTC
9 points
3 comments5 min readLW link

Are we there yet?

theflowerpot20 Jun 2022 11:19 UTC
2 points
2 comments1 min readLW link

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC
46 points
20 comments1 min readLW link

AI com­pa­nies aren’t re­ally us­ing ex­ter­nal evaluators

Zach Stein-Perlman24 May 2024 16:01 UTC
240 points
15 comments4 min readLW link

List your AI X-Risk cruxes!

Aryeh Englander28 Apr 2024 18:26 UTC
40 points
7 comments2 min readLW link

How dan­ger­ous is hu­man-level AI?

Alex_Altair10 Jun 2022 17:38 UTC
21 points
4 comments8 min readLW link

Con­fu­sion about neu­ro­science/​cog­ni­tive sci­ence as a dan­ger for AI Alignment

Samuel Nellessen22 Jun 2022 17:59 UTC
2 points
1 comment3 min readLW link
(snellessen.com)

I’m try­ing out “as­ter­oid mind­set”

Alex_Altair3 Jun 2022 13:35 UTC
90 points
5 comments4 min readLW link

Com­plex Sys­tems for AI Safety [Prag­matic AI Safety #3]

24 May 2022 0:00 UTC
58 points
3 comments21 min readLW link

Episte­molog­i­cal Vigilance for Alignment

adamShimi6 Jun 2022 0:27 UTC
66 points
11 comments10 min readLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannell23 Oct 2022 21:48 UTC
61 points
44 comments17 min readLW link

[Question] Would (my­opic) gen­eral pub­lic good pro­duc­ers sig­nifi­cantly ac­cel­er­ate the de­vel­op­ment of AGI?

mako yass2 Mar 2022 23:47 UTC
25 points
10 comments1 min readLW link

More Is Differ­ent for AI

jsteinhardt4 Jan 2022 19:30 UTC
140 points
24 comments3 min readLW link1 review
(bounded-regret.ghost.io)

AXRP Epi­sode 13 - First Prin­ci­ples of AGI Safety with Richard Ngo

DanielFilan31 Mar 2022 5:20 UTC
24 points
1 comment48 min readLW link

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblatt15 Dec 2021 19:06 UTC
16 points
15 comments27 min readLW link

My cur­rent un­cer­tain­ties re­gard­ing AI, al­ign­ment, and the end of the world

dominicq14 Nov 2021 14:08 UTC
2 points
3 comments2 min readLW link

Some ab­stract, non-tech­ni­cal rea­sons to be non-max­i­mally-pes­simistic about AI alignment

Rob Bensinger12 Dec 2021 2:08 UTC
70 points
35 comments7 min readLW link

It Looks Like You’re Try­ing To Take Over The World

gwern9 Mar 2022 16:35 UTC
407 points
120 comments1 min readLW link1 review
(www.gwern.net)

Epistemic Strate­gies of Selec­tion Theorems

adamShimi18 Oct 2021 8:57 UTC
33 points
1 comment12 min readLW link

[Question] Con­di­tional on the first AGI be­ing al­igned cor­rectly, is a good out­come even still likely?

iamthouthouarti6 Sep 2021 17:30 UTC
2 points
1 comment1 min readLW link

Epistemic Strate­gies of Safety-Ca­pa­bil­ities Tradeoffs

adamShimi22 Oct 2021 8:22 UTC
5 points
0 comments6 min readLW link

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTrout22 Jun 2021 22:26 UTC
71 points
43 comments16 min readLW link
(arxiv.org)

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
113 points
13 comments4 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

Drug ad­dicts and de­cep­tively al­igned agents—a com­par­a­tive analysis

Jan5 Nov 2021 21:42 UTC
42 points
2 comments12 min readLW link
(universalprior.substack.com)

[RETRACTED] It’s time for EA lead­er­ship to pull the short-timelines fire alarm.

Not Relevant8 Apr 2022 16:07 UTC
109 points
163 comments4 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

26 Jan 2023 21:01 UTC
39 points
81 comments2 min readLW link

Google’s Eth­i­cal AI team and AI Safety

magfrump20 Feb 2021 9:42 UTC
12 points
16 comments7 min readLW link

The AI Safety Game (UPDATED)

Daniel Kokotajlo5 Dec 2020 10:27 UTC
45 points
10 comments3 min readLW link

Thoughts on Robin Han­son’s AI Im­pacts interview

Steven Byrnes24 Nov 2019 1:40 UTC
25 points
3 comments7 min readLW link

AISN #24: Kiss­inger Urges US-China Co­op­er­a­tion on AI, China’s New AI Law, US Ex­port Con­trols, In­ter­na­tional In­sti­tu­tions, and Open Source AI

18 Oct 2023 17:06 UTC
14 points
0 comments6 min readLW link
(newsletter.safe.ai)

Be­hav­ioral Suffi­cient Statis­tics for Goal-Directedness

adamShimi11 Mar 2021 15:01 UTC
21 points
12 comments9 min readLW link

The Short­est Path Between Scylla and Charybdis

Thane Ruthenis18 Dec 2023 20:08 UTC
50 points
8 comments5 min readLW link

OpenAI: Fallout

Zvi28 May 2024 13:20 UTC
204 points
25 comments36 min readLW link
(thezvi.wordpress.com)

[Question] Sugges­tions of posts on the AF to review

adamShimi16 Feb 2021 12:40 UTC
56 points
20 comments1 min readLW link

The ba­sic rea­sons I ex­pect AGI ruin

Rob Bensinger18 Apr 2023 3:37 UTC
186 points
73 comments14 min readLW link

25 Min Talk on Me­taEth­i­cal.AI with Ques­tions from Stu­art Armstrong

June Ku29 Apr 2021 15:38 UTC
21 points
7 comments1 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
71 points
9 comments3 min readLW link

How difficult is AI Align­ment?

Sammy Martin13 Sep 2024 15:47 UTC
43 points
6 comments23 min readLW link

Ap­proaches to gra­di­ent hacking

adamShimi14 Aug 2021 15:16 UTC
16 points
8 comments8 min readLW link

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

hippke3 Sep 2021 14:19 UTC
5 points
14 comments1 min readLW link

[Book Re­view] “The Align­ment Prob­lem” by Brian Christian

lsusr20 Sep 2021 6:36 UTC
70 points
16 comments6 min readLW link

Is progress in ML-as­sisted the­o­rem-prov­ing benefi­cial?

mako yass28 Sep 2021 1:54 UTC
11 points
3 comments1 min readLW link

Us­ing Brain-Com­puter In­ter­faces to get more data for AI alignment

Robbo7 Nov 2021 0:00 UTC
43 points
10 comments7 min readLW link

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC
75 points
12 comments16 min readLW link

Ngo and Yud­kowsky on al­ign­ment difficulty

15 Nov 2021 20:31 UTC
253 points
151 comments99 min readLW link1 review

[Question] Does the Struc­ture of an al­gorithm mat­ter for AI Risk and/​or con­scious­ness?

Logan Zoellner3 Dec 2021 18:31 UTC
7 points
4 comments1 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC
127 points
9 comments15 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskey28 Dec 2021 2:20 UTC
10 points
1 comment6 min readLW link
(www.bayesianinvestor.com)

Thoughts on AGI safety from the top

jylin042 Feb 2022 20:06 UTC
36 points
3 comments32 min readLW link

How I Formed My Own Views About AI Safety

Neel Nanda27 Feb 2022 18:50 UTC
64 points
6 comments13 min readLW link
(www.neelnanda.io)

AMA Con­jec­ture, A New Align­ment Startup

adamShimi9 Apr 2022 9:43 UTC
47 points
42 comments1 min readLW link

Why I’m Wor­ried About AI

peterbarnett23 May 2022 21:13 UTC
22 points
2 comments12 min readLW link

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

30 May 2022 20:25 UTC
51 points
3 comments25 min readLW link

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_Sotala2 Jun 2022 21:41 UTC
129 points
27 comments4 min readLW link

A Quick Guide to Con­fronting Doom

Ruby13 Apr 2022 19:30 UTC
240 points
33 comments2 min readLW link

Another plau­si­ble sce­nario of AI risk: AI builds mil­i­tary in­fras­truc­ture while col­lab­o­rat­ing with hu­mans, defects later.

avturchin10 Jun 2022 17:24 UTC
10 points
2 comments1 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_Critch14 Jun 2022 19:31 UTC
238 points
41 comments2 min readLW link1 review

[Question] Has there been any work on at­tempt­ing to use Pas­cal’s Mug­ging to make an AGI be­have?

Chris_Leong15 Jun 2022 8:33 UTC
7 points
17 comments1 min readLW link

The Plan − 2023 Version

johnswentworth29 Dec 2023 23:34 UTC
146 points
40 comments31 min readLW link

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimi23 Jul 2022 11:40 UTC
38 points
5 comments3 min readLW link

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_Friedrich3 Aug 2022 19:17 UTC
1 point
0 comments1 min readLW link
(forms.gle)

Thoughts on ‘List of Lethal­ities’

Alex Lawsen 17 Aug 2022 18:33 UTC
27 points
0 comments10 min readLW link

What if we ap­proach AI safety like a tech­ni­cal en­g­ineer­ing safety problem

zeshen20 Aug 2022 10:29 UTC
36 points
4 comments7 min readLW link

Beyond Hyperanthropomorphism

PointlessOne21 Aug 2022 17:55 UTC
3 points
17 comments1 min readLW link
(studio.ribbonfarm.com)

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

Davidmanheim21 Sep 2022 19:32 UTC
13 points
0 comments1 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

19 Dec 2023 19:14 UTC
45 points
4 comments6 min readLW link
(arxiv.org)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [~monthly thread]

Robert Miles1 Nov 2022 23:23 UTC
68 points
105 comments2 min readLW link

Poster Ses­sion on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC
7 points
6 comments1 min readLW link

RA Bounty: Look­ing for feed­back on screen­play about AI Risk

Writer26 Oct 2023 13:23 UTC
30 points
6 comments1 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidad15 Dec 2022 13:42 UTC
67 points
1 comment3 min readLW link

Rac­ing through a minefield: the AI de­ploy­ment problem

HoldenKarnofsky22 Dec 2022 16:10 UTC
38 points
2 comments13 min readLW link
(www.cold-takes.com)

My thoughts on OpenAI’s al­ign­ment plan

Akash30 Dec 2022 19:33 UTC
55 points
3 comments20 min readLW link

How we could stum­ble into AI catastrophe

HoldenKarnofsky13 Jan 2023 16:20 UTC
71 points
18 comments18 min readLW link
(www.cold-takes.com)

[Question] Should peo­ple build pro­duc­ti­za­tions of open source AI mod­els?

lc2 Nov 2023 1:26 UTC
23 points
0 comments1 min readLW link

Cortés, AI Risk, and the Dy­nam­ics of Com­pet­ing Conquerors

James_Miller2 Jan 2024 16:37 UTC
14 points
2 comments3 min readLW link

Ta­boo P(doom)

NathanBarnard3 Feb 2023 10:37 UTC
14 points
10 comments1 min readLW link

How evals might (or might not) pre­vent catas­trophic risks from AI

Akash7 Feb 2023 20:16 UTC
45 points
0 comments9 min readLW link

Many im­por­tant tech­nolo­gies start out as sci­ence fic­tion be­fore be­com­ing real

trevor10 Feb 2023 9:36 UTC
28 points
2 comments2 min readLW link

4 ways to think about de­moc­ra­tiz­ing AI [GovAI Linkpost]

Akash13 Feb 2023 18:06 UTC
24 points
4 comments1 min readLW link
(www.governance.ai)

AI #1: Syd­ney and Bing

Zvi21 Feb 2023 14:00 UTC
171 points
45 comments61 min readLW link1 review
(thezvi.wordpress.com)

Con­tra “Strong Co­her­ence”

DragonGod4 Mar 2023 20:05 UTC
39 points
24 comments1 min readLW link

My mo­ti­va­tion and the­ory of change for work­ing in AI healthtech

Andrew_Critch12 Oct 2024 0:36 UTC
169 points
37 comments14 min readLW link

Ques­tions about Con­je­cure’s CoEm proposal

9 Mar 2023 19:32 UTC
51 points
4 comments2 min readLW link

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian Bradshaw12 Mar 2023 23:09 UTC
211 points
14 comments1 min readLW link

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcs15 Mar 2023 17:55 UTC
180 points
42 comments1 min readLW link

[Question] First and Last Ques­tions for GPT-5*

Mitchell_Porter24 Nov 2023 5:03 UTC
15 points
5 comments1 min readLW link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
22 points
12 comments4 min readLW link

But ex­actly how com­plex and frag­ile?

KatjaGrace3 Nov 2019 18:20 UTC
87 points
32 comments3 min readLW link1 review
(meteuphoric.com)

In­for­ma­tion war­fare his­tor­i­cally re­volved around hu­man conduits

trevor28 Aug 2023 18:54 UTC
37 points
7 comments3 min readLW link

The AI Revolu­tion in Biology

Roman Leventov26 May 2024 9:30 UTC
13 points
0 comments1 min readLW link
(www.cognitiverevolution.ai)

[Question] Why not con­strain wet­labs in­stead of AI?

Lone Pine21 Mar 2023 18:02 UTC
15 points
10 comments1 min readLW link

AGI Safety Liter­a­ture Re­view (Ever­itt, Lea & Hut­ter 2018)

Kaj_Sotala4 May 2018 8:56 UTC
14 points
1 comment1 min readLW link
(arxiv.org)

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
31 points
23 comments10 min readLW link

MIRI’s April 2024 Newsletter

Harlan12 Apr 2024 23:38 UTC
95 points
0 comments3 min readLW link
(intelligence.org)

The Main Sources of AI Risk?

21 Mar 2019 18:28 UTC
121 points
26 comments2 min readLW link

Think­ing soberly about the con­text and con­se­quences of Friendly AI

Mitchell_Porter16 Oct 2012 4:33 UTC
21 points
39 comments1 min readLW link

New vol­un­tary com­mit­ments (AI Seoul Sum­mit)

Zach Stein-Perlman21 May 2024 11:00 UTC
81 points
17 comments7 min readLW link
(www.gov.uk)

Non-Ad­ver­sar­ial Good­hart and AI Risks

Davidmanheim27 Mar 2018 1:39 UTC
22 points
11 comments6 min readLW link

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

Richard_Ngo21 Jan 2019 12:41 UTC
133 points
23 comments8 min readLW link

A shift in ar­gu­ments for AI risk

Richard_Ngo28 May 2019 13:47 UTC
32 points
7 comments1 min readLW link
(fragile-credences.github.io)

AI risk hub in Sin­ga­pore?

Daniel Kokotajlo29 Oct 2020 11:45 UTC
58 points
18 comments4 min readLW link

[Question] Mea­sure of com­plex­ity al­lowed by the laws of the uni­verse and rel­a­tive the­ory?

dr_s7 Sep 2023 12:21 UTC
8 points
22 comments1 min readLW link

Some con­cep­tual high­lights from “Disjunc­tive Sce­nar­ios of Catas­trophic AI Risk”

Kaj_Sotala12 Feb 2018 12:30 UTC
45 points
4 comments6 min readLW link
(kajsotala.fi)

AISN #23: New OpenAI Models, News from An­thropic, and Rep­re­sen­ta­tion Engineering

4 Oct 2023 17:37 UTC
15 points
2 comments5 min readLW link
(newsletter.safe.ai)

Re­view of “Fun with +12 OOMs of Com­pute”

28 Mar 2021 14:55 UTC
65 points
21 comments8 min readLW link1 review

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
185 points
42 comments12 min readLW link3 reviews

Uber Self-Driv­ing Crash

jefftk7 Nov 2019 15:00 UTC
109 points
1 comment2 min readLW link
(www.jefftk.com)

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhub10 Sep 2019 23:03 UTC
69 points
27 comments27 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
82 points
5 comments6 min readLW link

Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach (4th edi­tion) on the Align­ment Problem

Zack_M_Davis17 Sep 2020 2:23 UTC
72 points
12 comments5 min readLW link
(aima.cs.berkeley.edu)

Win­ners of AI Align­ment Awards Re­search Contest

13 Jul 2023 16:14 UTC
115 points
4 comments12 min readLW link
(alignmentawards.com)

A short­com­ing of con­crete demon­stra­tions as AGI risk advocacy

Steven Byrnes11 Dec 2024 16:48 UTC
103 points
27 comments2 min readLW link

Clar­ify­ing “What failure looks like”

Sam Clarke20 Sep 2020 20:40 UTC
97 points
14 comments17 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
118 points
20 comments17 min readLW link

Ten Levels of AI Align­ment Difficulty

Sammy Martin3 Jul 2023 20:20 UTC
121 points
14 comments12 min readLW link

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
116 points
22 comments2 min readLW link

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwern3 Jul 2023 0:48 UTC
425 points
54 comments7 min readLW link
(www.youtube.com)

Against ubiquitous al­ign­ment taxes

beren6 Mar 2023 19:50 UTC
56 points
10 comments2 min readLW link

AI #2

Zvi2 Mar 2023 14:50 UTC
66 points
18 comments55 min readLW link
(thezvi.wordpress.com)

Levels of safety for AI and other technologies

jasoncrawford28 Jun 2023 18:35 UTC
16 points
0 comments2 min readLW link
(rootsofprogress.org)

My re­search agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC
71 points
9 comments11 min readLW link

I Think Eliezer Should Go on Glenn Beck

Lao Mein30 Jun 2023 3:12 UTC
29 points
21 comments1 min readLW link

A case for AI al­ign­ment be­ing difficult

jessicata31 Dec 2023 19:55 UTC
105 points
58 comments15 min readLW link1 review
(unstableontology.com)

Talk to me about your sum­mer/​ca­reer plans

Akash31 Jan 2023 18:29 UTC
31 points
3 comments2 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
103 points
17 comments13 min readLW link

[Linkpost] TIME ar­ti­cle: Deep­Mind’s CEO Helped Take AI Main­stream. Now He’s Urg­ing Caution

Akash21 Jan 2023 16:51 UTC
58 points
2 comments3 min readLW link
(time.com)

April drafts

AI Impacts1 Apr 2021 18:10 UTC
49 points
2 comments1 min readLW link
(aiimpacts.org)

A plea for solu­tion­ism on AI safety

jasoncrawford9 Jun 2023 16:29 UTC
72 points
6 comments6 min readLW link
(rootsofprogress.org)

Min­i­mum Vi­able Exterminator

Richard Horvath29 May 2023 16:32 UTC
14 points
5 comments5 min readLW link

Catas­trophic Risks from AI #1: Introduction

22 Jun 2023 17:09 UTC
40 points
1 comment5 min readLW link
(arxiv.org)

Why and When In­ter­pretabil­ity Work is Dangerous

Nicholas / Heather Kross28 May 2023 0:27 UTC
20 points
9 comments8 min readLW link
(www.thinkingmuchbetter.com)

Work on Se­cu­rity In­stead of Friendli­ness?

Wei Dai21 Jul 2012 18:28 UTC
71 points
107 comments2 min readLW link

Hands-On Ex­pe­rience Is Not Magic

Thane Ruthenis27 May 2023 16:57 UTC
21 points
14 comments5 min readLW link

Catas­trophic Risks from AI #2: Mal­i­cious Use

22 Jun 2023 17:10 UTC
38 points
1 comment17 min readLW link
(arxiv.org)

The Fu­sion Power Gen­er­a­tor Scenario

johnswentworth8 Aug 2020 18:31 UTC
142 points
30 comments3 min readLW link

Three Sto­ries for How AGI Comes Be­fore FAI

John_Maxwell17 Sep 2019 23:26 UTC
27 points
5 comments6 min readLW link

AI Align­ment 2018-19 Review

Rohin Shah28 Jan 2020 2:19 UTC
126 points
6 comments35 min readLW link

Why don’t sin­gu­lar­i­tar­i­ans bet on the cre­ation of AGI by buy­ing stocks?

John_Maxwell11 Mar 2020 16:27 UTC
43 points
21 comments4 min readLW link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC
5 points
2 comments4 min readLW link

AISN #35: Lob­by­ing on AI Reg­u­la­tion Plus, New Models from OpenAI and Google, and Le­gal Regimes for Train­ing on Copy­righted Data

16 May 2024 14:29 UTC
2 points
3 comments6 min readLW link
(newsletter.safe.ai)

The prob­lem/​solu­tion ma­trix: Calcu­lat­ing the prob­a­bil­ity of AI safety “on the back of an en­velope”

John_Maxwell20 Oct 2019 8:03 UTC
22 points
4 comments2 min readLW link

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael Harth15 Nov 2020 17:14 UTC
75 points
12 comments15 min readLW link

Catas­trophic Risks from AI #3: AI Race

23 Jun 2023 19:21 UTC
18 points
9 comments29 min readLW link
(arxiv.org)

AISafety.com – Re­sources for AI Safety

17 May 2024 15:57 UTC
81 points
3 comments1 min readLW link

AI Safety Newslet­ter #7: Dis­in­for­ma­tion, Gover­nance Recom­men­da­tions for AI labs, and Se­nate Hear­ings on AI

23 May 2023 21:47 UTC
25 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI Doom Is Not (Only) Disjunctive

NickGabs30 Mar 2023 1:42 UTC
12 points
0 comments5 min readLW link

Brain­storm­ing ad­di­tional AI risk re­duc­tion ideas

John_Maxwell14 Jun 2012 7:55 UTC
19 points
37 comments1 min readLW link

My AI Model Delta Com­pared To Yudkowsky

johnswentworth10 Jun 2024 16:12 UTC
277 points
102 comments4 min readLW link

Wizards and prophets of AI [draft for com­ment]

jasoncrawford31 Mar 2023 20:22 UTC
16 points
11 comments6 min readLW link

Against “ar­gu­ment from over­hang risk”

RobertM16 May 2024 4:44 UTC
30 points
11 comments5 min readLW link

Re­quest: stop ad­vanc­ing AI capabilities

So8res26 May 2023 17:42 UTC
153 points
24 comments1 min readLW link

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

16 May 2023 15:14 UTC
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

Safety isn’t safety with­out a so­cial model (or: dis­pel­ling the myth of per se tech­ni­cal safety)

Andrew_Critch14 Jun 2024 0:16 UTC
346 points
38 comments4 min readLW link

The case for re­mov­ing al­ign­ment and ML re­search from the train­ing dataset

beren30 May 2023 20:54 UTC
48 points
8 comments5 min readLW link

The Con­trol Prob­lem: Un­solved or Un­solv­able?

Remmelt2 Jun 2023 15:42 UTC
54 points
46 comments14 min readLW link

A moral back­lash against AI will prob­a­bly slow down AGI development

geoffreymiller7 Jun 2023 20:39 UTC
51 points
10 comments14 min readLW link

Difficul­ties in mak­ing pow­er­ful al­igned AI

DanielFilan14 May 2023 20:50 UTC
41 points
1 comment10 min readLW link
(danielfilan.com)

De­tach­ment vs at­tach­ment [AI risk and men­tal health]

Neil 15 Jan 2024 0:41 UTC
14 points
4 comments3 min readLW link

Catas­trophic Risks from AI #6: Dis­cus­sion and FAQ

27 Jun 2023 23:23 UTC
24 points
1 comment13 min readLW link
(arxiv.org)

Catas­trophic Risks from AI #5: Rogue AIs

27 Jun 2023 22:06 UTC
15 points
0 comments22 min readLW link
(arxiv.org)

An un­al­igned benchmark

paulfchristiano17 Nov 2018 15:51 UTC
31 points
0 comments9 min readLW link

Orthog­o­nal­ity is expensive

beren3 Apr 2023 10:20 UTC
43 points
9 comments3 min readLW link

Let’s build a fire alarm for AGI

chaosmage15 May 2023 9:16 UTC
−1 points
0 comments2 min readLW link

If you wish to make an ap­ple pie, you must first be­come dic­ta­tor of the universe

jasoncrawford5 Jul 2023 18:14 UTC
27 points
9 comments13 min readLW link
(rootsofprogress.org)

Eight Short Stud­ies On Excuses

Scott Alexander20 Apr 2010 23:01 UTC
836 points
253 comments10 min readLW link

Ac­ti­va­tion ad­di­tions in a sim­ple MNIST network

Garrett Baker18 May 2023 2:49 UTC
26 points
0 comments2 min readLW link

Com­plex Sys­tems are Hard to Control

jsteinhardt4 Apr 2023 0:00 UTC
42 points
5 comments10 min readLW link
(bounded-regret.ghost.io)

New sur­vey: 46% of Amer­i­cans are con­cerned about ex­tinc­tion from AI; 69% sup­port a six-month pause in AI development

Akash5 Apr 2023 1:26 UTC
46 points
9 comments1 min readLW link
(today.yougov.com)

Thoughts on shar­ing in­for­ma­tion about lan­guage model capabilities

paulfchristiano31 Jul 2023 16:04 UTC
208 points
44 comments11 min readLW link1 review

Will Ar­tifi­cial Su­per­in­tel­li­gence Kill Us?

James_Miller23 May 2023 16:27 UTC
33 points
2 comments22 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
213 points
36 comments38 min readLW link2 reviews

In­vi­ta­tion to lead a pro­ject at AI Safety Camp (Vir­tual Edi­tion, 2025)

23 Aug 2024 14:18 UTC
17 points
2 comments4 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
84 points
48 comments12 min readLW link

Risks from AI Overview: Summary

18 Aug 2023 1:21 UTC
25 points
1 comment13 min readLW link
(www.safe.ai)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
108 points
16 comments5 min readLW link
(arxiv.org)

“If we go ex­tinct due to mis­al­igned AI, at least na­ture will con­tinue, right? … right?”

plex18 May 2024 14:09 UTC
47 points
23 comments2 min readLW link
(aisafety.info)

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

13 Sep 2023 13:29 UTC
19 points
0 comments5 min readLW link
(aisafety.camp)

Microdooms averted by work­ing on AI Safety

Nikola Jurkovic17 Sep 2023 21:46 UTC
31 points
2 comments3 min readLW link
(forum.effectivealtruism.org)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [April 2023]

steven04618 Apr 2023 4:21 UTC
57 points
88 comments2 min readLW link

Paus­ing AI is Pos­i­tive Ex­pected Value

Liron10 Mar 2024 17:10 UTC
8 points
2 comments3 min readLW link
(twitter.com)

Four lenses on AI risks

jasoncrawford28 Mar 2023 21:52 UTC
23 points
5 comments3 min readLW link
(rootsofprogress.org)

Reli­a­bil­ity, Se­cu­rity, and AI risk: Notes from in­fosec text­book chap­ter 1

Akash7 Apr 2023 15:47 UTC
34 points
1 comment4 min readLW link

n=3 AI Risk Quick Math and Reasoning

lionhearted (Sebastian Marshall)7 Apr 2023 20:27 UTC
6 points
3 comments4 min readLW link

All images from the WaitButWhy se­quence on AI

trevor8 Apr 2023 7:36 UTC
73 points
5 comments2 min readLW link

AI Safety is Drop­ping the Ball on Clown Attacks

trevor22 Oct 2023 20:09 UTC
65 points
78 comments34 min readLW link

[Question] What does it look like for AI to sig­nifi­cantly im­prove hu­man co­or­di­na­tion, be­fore su­per­in­tel­li­gence?

jacobjacob15 Jan 2024 19:22 UTC
22 points
2 comments1 min readLW link

[Question] Why don’t quan­tiliz­ers also cut off the up­per end of the dis­tri­bu­tion?

Alex_Altair15 May 2023 1:40 UTC
25 points
2 comments1 min readLW link

“warn­ing about ai doom” is also “an­nounc­ing ca­pa­bil­ities progress to noobs”

the gears to ascension8 Apr 2023 23:42 UTC
23 points
5 comments3 min readLW link

[Question] Is this a Pivotal Weak Act? Creat­ing bac­te­ria that de­com­pose metal

doomyeser11 Sep 2024 18:07 UTC
9 points
9 comments3 min readLW link

AI Safety Newslet­ter #1 [CAIS Linkpost]

10 Apr 2023 20:18 UTC
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

Con­sider the hum­ble rock (or: why the dumb thing kills you)

pleiotroth4 Jul 2024 13:54 UTC
60 points
11 comments4 min readLW link

Adam Smith Meets AI Doomers

James_Miller31 Jan 2024 15:53 UTC
34 points
10 comments5 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

steven04618 May 2023 22:30 UTC
33 points
44 comments2 min readLW link

Where’s the foom?

Fergus Fettes11 Apr 2023 15:50 UTC
34 points
27 comments2 min readLW link

Cur­rent AIs Provide Nearly No Data Rele­vant to AGI Alignment

Thane Ruthenis15 Dec 2023 20:16 UTC
124 points
157 comments8 min readLW link1 review

Sam Alt­man and Ezra Klein on the AI Revolution

Zack_M_Davis27 Jun 2021 4:53 UTC
38 points
17 comments1 min readLW link
(www.nytimes.com)

[Link] Sarah Con­stantin: “Why I am Not An AI Doomer”

lbThingrb12 Apr 2023 1:52 UTC
61 points
13 comments1 min readLW link
(sarahconstantin.substack.com)

[Question] What are good al­ign­ment con­fer­ence pa­pers?

adamShimi28 Aug 2021 13:35 UTC
12 points
2 comments1 min readLW link

AI al­ign­ment as a trans­la­tion problem

Roman Leventov5 Feb 2024 14:14 UTC
22 points
2 comments3 min readLW link

Bayeswatch 7: Wildfire

lsusr8 Sep 2021 5:35 UTC
50 points
6 comments3 min readLW link

[Question] How much do per­sonal bi­ases in risk as­sess­ment af­fect as­sess­ment of AI risks?

Gordon Seidoh Worley3 May 2023 6:12 UTC
10 points
8 comments1 min readLW link

In­ter­view with Skynet

lsusr30 Sep 2021 2:20 UTC
49 points
1 comment2 min readLW link

Fi­nan­cial Times: We must slow down the race to God-like AI

trevor13 Apr 2023 19:55 UTC
113 points
17 comments16 min readLW link
(www.ft.com)

My dis­agree­ments with “AGI ruin: A List of Lethal­ities”

Noosphere8915 Sep 2024 17:22 UTC
36 points
46 comments18 min readLW link

Re­sponse to Aschen­bren­ner’s “Si­tu­a­tional Aware­ness”

Rob Bensinger6 Jun 2024 22:57 UTC
194 points
27 comments3 min readLW link

Us­ing blin­ders to help you see things for what they are

Adam Zerner11 Nov 2021 7:07 UTC
13 points
2 comments2 min readLW link

An­nounc­ing Hu­man-al­igned AI Sum­mer School

22 May 2024 8:55 UTC
50 points
0 comments1 min readLW link
(humanaligned.ai)

Top les­son from GPT: we will prob­a­bly de­stroy hu­man­ity “for the lulz” as soon as we are able.

Shmi16 Apr 2023 20:27 UTC
63 points
28 comments1 min readLW link

AI Takeover Sce­nario with Scaled LLMs

simeon_c16 Apr 2023 23:28 UTC
42 points
15 comments8 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimi17 Nov 2021 21:42 UTC
47 points
3 comments1 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane Ruthenis21 Dec 2023 20:02 UTC
158 points
42 comments1 min readLW link

Sum­mary of the Acausal At­tack Is­sue for AIXI

Diffractor13 Dec 2021 8:16 UTC
12 points
6 comments4 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
53 points
3 comments28 min readLW link

Challenges with Break­ing into MIRI-Style Research

Chris_Leong17 Jan 2022 9:23 UTC
75 points
15 comments3 min readLW link

Paradigm-build­ing from first prin­ci­ples: Effec­tive al­tru­ism, AGI, and alignment

Cameron Berg8 Feb 2022 16:12 UTC
29 points
5 comments14 min readLW link

Paradigm-build­ing: Introduction

Cameron Berg8 Feb 2022 0:06 UTC
28 points
0 comments2 min readLW link

AI Safety Memes Wiki

24 Jul 2024 18:53 UTC
33 points
1 comment1 min readLW link
(aisafety.info)

Gaia Net­work: a prac­ti­cal, in­cre­men­tal path­way to Open Agency Architecture

20 Dec 2023 17:11 UTC
22 points
8 comments16 min readLW link

[Talk tran­script] What “struc­ture” is and why it matters

Alex_Altair25 Jul 2024 15:49 UTC
23 points
0 comments5 min readLW link
(www.youtube.com)

Will work­ing here ad­vance AGI? Help us not de­stroy the world!

Yonatan Cale29 May 2022 11:42 UTC
30 points
47 comments1 min readLW link

Distil­led—AGI Safety from First Principles

Harrison G29 May 2022 0:57 UTC
11 points
1 comment14 min readLW link

Deep­Mind and Google Brain are merg­ing [Linkpost]

Akash20 Apr 2023 18:47 UTC
55 points
5 comments1 min readLW link
(www.deepmind.com)

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

9 May 2023 15:26 UTC
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

Why I don’t be­lieve in doom

mukashi7 Jun 2022 23:49 UTC
6 points
30 comments4 min readLW link

Talk­ing pub­li­cly about AI risk

Jan_Kulveit21 Apr 2023 11:28 UTC
180 points
9 comments6 min readLW link

In­tro­duc­ing AI Lab Watch

Zach Stein-Perlman30 Apr 2024 17:00 UTC
224 points
30 comments1 min readLW link
(ailabwatch.org)

On A List of Lethalities

Zvi13 Jun 2022 12:30 UTC
165 points
50 comments54 min readLW link1 review
(thezvi.wordpress.com)

Con­ti­nu­ity Assumptions

Jan_Kulveit13 Jun 2022 21:31 UTC
37 points
13 comments4 min readLW link

Twit­ter thread on AI takeover scenarios

Richard_Ngo31 Jul 2024 0:24 UTC
37 points
0 comments2 min readLW link
(x.com)

Align­ment Risk Doesn’t Re­quire Superintelligence

JustisMills15 Jun 2022 3:12 UTC
35 points
4 comments2 min readLW link

Sys­tems that can­not be un­safe can­not be safe

Davidmanheim2 May 2023 8:53 UTC
62 points
27 comments2 min readLW link

[Linkpost] Ex­is­ten­tial Risk Anal­y­sis in Em­piri­cal Re­search Papers

Dan H2 Jul 2022 0:09 UTC
40 points
0 comments1 min readLW link
(arxiv.org)

The Align­ment Problem

lsusr11 Jul 2022 3:03 UTC
46 points
18 comments3 min readLW link

What would a com­pute mon­i­tor­ing plan look like? [Linkpost]

Akash26 Mar 2023 19:33 UTC
158 points
10 comments4 min readLW link
(arxiv.org)

The Dis­solu­tion of AI Safety

Roko12 Dec 2024 10:34 UTC
8 points
44 comments1 min readLW link
(www.transhumanaxiology.com)

The al­ign­ment prob­lem from a deep learn­ing perspective

Richard_Ngo10 Aug 2022 22:46 UTC
107 points
15 comments27 min readLW link1 review

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

Vika12 Aug 2022 21:06 UTC
395 points
37 comments14 min readLW link1 review

Shapes of Mind and Plu­ral­ism in Alignment

adamShimi13 Aug 2022 10:01 UTC
33 points
2 comments2 min readLW link

Up­grad­ing the AI Safety Community

16 Dec 2023 15:34 UTC
42 points
9 comments42 min readLW link

Refram­ing the bur­den of proof: Com­pa­nies should prove that mod­els are safe (rather than ex­pect­ing au­di­tors to prove that mod­els are dan­ger­ous)

Akash25 Apr 2023 18:49 UTC
27 points
11 comments3 min readLW link
(childrenoficarus.substack.com)

No One-Size-Fit-All Epistemic Strategy

adamShimi20 Aug 2022 12:56 UTC
24 points
2 comments2 min readLW link

Chad Jones pa­per mod­el­ing AI and x-risk vs. growth

jasoncrawford26 Apr 2023 20:07 UTC
39 points
7 comments2 min readLW link
(web.stanford.edu)

AI #76: Six Shorts Sto­ries About OpenAI

Zvi8 Aug 2024 13:50 UTC
53 points
10 comments48 min readLW link
(thezvi.wordpress.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [Sept 2022]

plex8 Sep 2022 11:56 UTC
22 points
48 comments3 min readLW link

[Question] Why are we sure that AI will “want” some­thing?

Shmi16 Sep 2022 20:35 UTC
31 points
57 comments1 min readLW link

Refine’s Third Blog Post Day/​Week

adamShimi17 Sep 2022 17:03 UTC
18 points
0 comments1 min readLW link

Against most, but not all, AI risk analogies

Matthew Barnett14 Jan 2024 3:36 UTC
63 points
41 comments7 min readLW link

Sen­sor Ex­po­sure can Com­pro­mise the Hu­man Brain in the 2020s

trevor26 Oct 2023 3:31 UTC
17 points
6 comments10 min readLW link

Neu­ro­science and Alignment

Garrett Baker18 Mar 2024 21:09 UTC
40 points
25 comments2 min readLW link

A Case for the Least For­giv­ing Take On Alignment

Thane Ruthenis2 May 2023 21:34 UTC
100 points
84 comments22 min readLW link

Re­sponse to Oren Etz­ioni’s “How to know if ar­tifi­cial in­tel­li­gence is about to de­stroy civ­i­liza­tion”

Daniel Kokotajlo27 Feb 2020 18:10 UTC
27 points
5 comments8 min readLW link

Ac­ti­va­tion ad­di­tions in a small resi­d­ual network

Garrett Baker22 May 2023 20:28 UTC
22 points
4 comments3 min readLW link

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofsky27 Oct 2023 15:19 UTC
200 points
33 comments8 min readLW link

A Com­mon-Sense Case For Mu­tu­ally-Misal­igned AGIs Ally­ing Against Humans

Thane Ruthenis17 Dec 2023 20:28 UTC
29 points
7 comments11 min readLW link

Catas­trophic Risks from AI #4: Or­ga­ni­za­tional Risks

26 Jun 2023 19:36 UTC
23 points
0 comments21 min readLW link
(arxiv.org)

Mechanism De­sign for AI Safety—Read­ing Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC
15 points
3 comments1 min readLW link

AI Safety Seems Hard to Measure

HoldenKarnofsky8 Dec 2022 19:50 UTC
71 points
6 comments14 min readLW link
(www.cold-takes.com)

[Linkpost] Bi­den-Har­ris Ex­ec­u­tive Order on AI

beren30 Oct 2023 15:20 UTC
3 points
0 comments1 min readLW link

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

Akash20 Dec 2022 21:39 UTC
18 points
2 comments11 min readLW link

Quote quiz: “drift­ing into de­pen­dence”

jasoncrawford27 Apr 2023 15:13 UTC
7 points
6 comments1 min readLW link
(rootsofprogress.org)

ChatGPT Plu­g­ins—The Begin­ning of the End

Bary Levy25 Mar 2023 11:45 UTC
15 points
4 comments1 min readLW link

More ex­per­i­ments in GPT-4 agency: writ­ing memos

Christopher King24 Mar 2023 17:51 UTC
5 points
2 comments10 min readLW link

Does GPT-4 ex­hibit agency when sum­ma­riz­ing ar­ti­cles?

Christopher King24 Mar 2023 15:49 UTC
16 points
2 comments5 min readLW link

LLMs seem (rel­a­tively) safe

JustisMills25 Apr 2024 22:13 UTC
53 points
24 comments7 min readLW link
(justismills.substack.com)

Grind­ing slimes in the dun­geon of AI al­ign­ment research

Max H24 Mar 2023 4:51 UTC
10 points
2 comments4 min readLW link

Course recom­men­da­tions for Friendli­ness researchers

Louie9 Jan 2013 14:33 UTC
96 points
112 comments10 min readLW link

A rant against robots

Lê Nguyên Hoang14 Jan 2020 22:03 UTC
65 points
7 comments5 min readLW link

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
69 points
38 comments2 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

Zvi12 Mar 2018 12:10 UTC
28 points
2 comments3 min readLW link
(thezvi.wordpress.com)

GPT-4 al­ign­ing with aca­sual de­ci­sion the­ory when in­structed to play games, but in­cludes a CDT ex­pla­na­tion that’s in­cor­rect if they differ

Christopher King23 Mar 2023 16:16 UTC
7 points
4 comments8 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part III: Failure Modes

DeLesley Hutchins5 Dec 2024 9:39 UTC
4 points
0 comments20 min readLW link

Limit in­tel­li­gent weapons

Lucas Pfeifer23 Mar 2023 17:54 UTC
−11 points
36 comments1 min readLW link

ChatGPT’s “fuzzy al­ign­ment” isn’t ev­i­dence of AGI al­ign­ment: the ba­nana test

Michael Tontchev23 Mar 2023 7:12 UTC
23 points
6 comments4 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

Larks24 Dec 2017 18:52 UTC
41 points
5 comments23 min readLW link

Soon: a weekly AI Safety pre­req­ui­sites mod­ule on LessWrong

null30 Apr 2018 13:23 UTC
35 points
10 comments1 min readLW link

Why We MUST Create an AGI that Disem­pow­ers Hu­man­ity. For Real.

twkaiser22 Mar 2023 23:01 UTC
−17 points
1 comment4 min readLW link

Take­off speeds pre­sen­ta­tion at Anthropic

Tom Davidson4 Jun 2024 22:46 UTC
92 points
0 comments25 min readLW link

[Question] Why is so much dis­cus­sion hap­pen­ing in pri­vate Google Docs?

Wei Dai12 Jan 2019 2:19 UTC
101 points
22 comments1 min readLW link

Key Ques­tions for Digi­tal Minds

Jacy Reese Anthis22 Mar 2023 17:13 UTC
22 points
0 comments7 min readLW link
(www.sentienceinstitute.org)

Swim­ming Up­stream: A Case Study in In­stru­men­tal Rationality

TurnTrout3 Jun 2018 3:16 UTC
76 points
7 comments8 min readLW link

When is un­al­igned AI morally valuable?

paulfchristiano25 May 2018 1:57 UTC
81 points
53 comments10 min readLW link

[Question] Em­ployer con­sid­er­ing part­ner­ing with ma­jor AI labs. What to do?

GraduallyMoreAgitated21 Mar 2023 17:43 UTC
37 points
7 comments2 min readLW link

I cre­ated an Asi Align­ment Tier List

TimeGoat21 Apr 2024 18:44 UTC
−6 points
0 comments1 min readLW link

Two Tales of AI Takeover: My Doubts

Violet Hour5 Mar 2024 15:51 UTC
30 points
8 comments29 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part II: The­ory and Experiment

DeLesley Hutchins5 Dec 2024 9:04 UTC
2 points
0 comments17 min readLW link

Ex­plor­ing the Pre­cau­tion­ary Prin­ci­ple in AI Devel­op­ment: His­tor­i­cal Analo­gies and Les­sons Learned

Christopher King21 Mar 2023 3:53 UTC
−1 points
2 comments9 min readLW link

Ca­pa­bil­ities De­nial: The Danger of Un­der­es­ti­mat­ing AI

Christopher King21 Mar 2023 1:24 UTC
6 points
5 comments3 min readLW link

How can I re­duce ex­is­ten­tial risk from AI?

lukeprog13 Nov 2012 21:56 UTC
63 points
92 comments8 min readLW link

How to solve the mi­suse prob­lem as­sum­ing that in 10 years the de­fault sce­nario is that AGI agents are ca­pa­ble of syn­thetiz­ing pathogens

jeremtti27 Nov 2024 21:17 UTC
6 points
0 comments9 min readLW link

Ap­ply to the Pivotal Re­search Fel­low­ship (AI Safety & Biose­cu­rity)

10 Apr 2024 12:08 UTC
18 points
0 comments1 min readLW link

Hope to live or fear to die?

Knight Lee27 Nov 2024 10:42 UTC
3 points
0 comments1 min readLW link

Tak­ing Away the Guns First: The Fun­da­men­tal Flaw in AI Development

s-ice26 Nov 2024 22:11 UTC
1 point
0 comments17 min readLW link

An­nounc­ing At­las Computing

miyazono11 Apr 2024 15:56 UTC
44 points
4 comments4 min readLW link

[Question] What’s the pro­to­col for if a novice has ML ideas that are un­likely to work, but might im­prove ca­pa­bil­ities if they do work?

drocta9 Jan 2024 22:51 UTC
6 points
2 comments2 min readLW link

What should AI safety be try­ing to achieve?

EuanMcLean23 May 2024 11:17 UTC
16 points
0 comments13 min readLW link

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC
1 point
0 comments9 min readLW link

A bet­ter “State­ment on AI Risk?”

Knight Lee25 Nov 2024 4:50 UTC
4 points
4 comments3 min readLW link

[Question] Can sin­gu­lar­ity emerge from trans­form­ers?

MP8 Apr 2024 14:26 UTC
−3 points
1 comment1 min readLW link

[Question] Have we seen any “ReLU in­stead of sig­moid-type im­prove­ments” recently

KvmanThinking23 Nov 2024 3:51 UTC
2 points
4 comments1 min readLW link

Truth Ter­mi­nal: A re­con­struc­tion of events

17 Nov 2024 23:51 UTC
1 point
1 comment7 min readLW link

Sur­vey of 2,778 AI au­thors: six parts in pictures

KatjaGrace6 Jan 2024 4:43 UTC
80 points
1 comment2 min readLW link

[Question] What (if any­thing) made your p(doom) go down in 2024?

Satron16 Nov 2024 16:46 UTC
4 points
6 comments1 min readLW link

[Question] fake al­ign­ment solu­tions????

KvmanThinking11 Dec 2024 3:31 UTC
1 point
6 comments1 min readLW link

Propos­ing the Con­di­tional AI Safety Treaty (linkpost TIME)

otto.barten15 Nov 2024 13:59 UTC
10 points
8 comments3 min readLW link
(time.com)

Con­fronting the le­gion of doom.

Spiritus Dei13 Nov 2024 17:03 UTC
−20 points
3 comments5 min readLW link

Why em­piri­cists should be­lieve in AI risk

Knight Lee11 Dec 2024 3:51 UTC
5 points
0 comments1 min readLW link

AI de­mands un­prece­dented reliability

Jono9 Jan 2024 16:30 UTC
22 points
5 comments2 min readLW link

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments1 min readLW link

Thoughts af­ter the Wolfram and Yud­kowsky discussion

Tahp14 Nov 2024 1:43 UTC
25 points
13 comments6 min readLW link

[Question] How does the ever-in­creas­ing use of AI in the mil­i­tary for the di­rect pur­pose of mur­der­ing peo­ple af­fect your p(doom)?

Justausername6 Apr 2024 6:31 UTC
19 points
16 comments1 min readLW link

What AI safety re­searchers can learn from Ma­hatma Gandhi

Lysandre Terrisse8 Nov 2024 19:49 UTC
−6 points
0 comments3 min readLW link

Call for sub­mis­sions: Choice of Fu­tures sur­vey questions

c.trout30 Apr 2023 6:59 UTC
4 points
0 comments2 min readLW link
(airtable.com)

Ac­cess to AI: a hu­man right?

dmtea25 Jul 2020 9:38 UTC
5 points
3 comments2 min readLW link

Nav­i­gat­ing pub­lic AI x-risk hype while pur­su­ing tech­ni­cal solutions

Dan Braun19 Feb 2023 12:22 UTC
18 points
0 comments2 min readLW link

$250K in Prizes: SafeBench Com­pe­ti­tion An­nounce­ment

ozhang3 Apr 2024 22:07 UTC
26 points
0 comments1 min readLW link

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

Con­ver­sa­tion with Paul Christiano

abergal11 Sep 2019 23:20 UTC
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic14 Nov 2011 17:02 UTC
112 points
9 comments56 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

3 Aug 2024 10:16 UTC
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprog8 Jul 2013 14:33 UTC
17 points
8 comments1 min readLW link

[Question] Ac­cu­racy of ar­gu­ments that are seen as ridicu­lous and in­tu­itively false but don’t have good counter-arguments

Christopher King29 Apr 2023 23:58 UTC
30 points
39 comments1 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC
46 points
8 comments4 min readLW link

A call for a quan­ti­ta­tive re­port card for AI bioter­ror­ism threat models

Juno4 Dec 2023 6:35 UTC
12 points
0 comments10 min readLW link

The benefits and risks of op­ti­mism (about AI safety)

Karl von Wendt3 Dec 2023 12:45 UTC
−7 points
6 comments5 min readLW link

Refram­ing mis­al­igned AGI’s: well-in­ten­tioned non-neu­rotyp­i­cal assistants

zhukeepa1 Apr 2018 1:22 UTC
46 points
14 comments2 min readLW link

Dario Amodei’s “Machines of Lov­ing Grace” sound in­cred­ibly dan­ger­ous, for Humans

Super AGI27 Oct 2024 5:05 UTC
8 points
1 comment1 min readLW link

In­tro­duc­ing the AI Align­ment Fo­rum (FAQ)

29 Oct 2018 21:07 UTC
87 points
8 comments6 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC
197 points
64 comments14 min readLW link1 review

Cur­rent AI Safety Roles for Soft­ware Engineers

ozziegooen9 Nov 2018 20:57 UTC
70 points
9 comments4 min readLW link

Sup­port me in a Week-Long Pick­et­ing Cam­paign Near OpenAI’s HQ: Seek­ing Sup­port and Ideas from the LessWrong Community

Percy30 Apr 2023 17:48 UTC
−21 points
15 comments1 min readLW link

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei Dai17 Aug 2019 17:38 UTC
78 points
14 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei Dai16 Dec 2018 22:13 UTC
102 points
25 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

Miles Brundage re­signed from OpenAI, and his AGI readi­ness team was disbanded

garrison23 Oct 2024 23:40 UTC
118 points
1 comment7 min readLW link
(garrisonlovely.substack.com)

And the AI would have got away with it too, if...

Stuart_Armstrong22 May 2019 21:35 UTC
75 points
7 comments1 min readLW link

A Semiotic Cri­tique of the Orthog­o­nal­ity Thesis

Nicolas Villarreal4 Jun 2024 18:52 UTC
3 points
10 comments15 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky12 Dec 2018 1:40 UTC
97 points
7 comments9 min readLW link

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC
11 points
0 comments2 min readLW link
(www.youtube.com)

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreig29 Nov 2023 13:59 UTC
4 points
0 comments5 min readLW link

I Vouch For MIRI

Zvi17 Dec 2017 17:50 UTC
39 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Be­ware of black boxes in AI al­ign­ment research

cousin_it18 Jan 2018 15:07 UTC
39 points
10 comments1 min readLW link

AISC 2024 - Pro­ject Summaries

NickyP27 Nov 2023 22:32 UTC
48 points
3 comments18 min readLW link

Lenses of Control

WillPetillo22 Oct 2024 7:51 UTC
14 points
0 comments9 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

Alex Flint31 Mar 2018 18:37 UTC
30 points
3 comments11 min readLW link

Thou­sands of mal­i­cious ac­tors on the fu­ture of AI misuse

1 Apr 2024 10:08 UTC
37 points
0 comments1 min readLW link

The two para­graph ar­gu­ment for AI risk

CronoDAS25 Nov 2023 2:01 UTC
19 points
8 comments1 min readLW link

Q&A with Stan Fran­klin on risks from AI

XiXiDu11 Jun 2011 15:22 UTC
36 points
10 comments2 min readLW link

A Friendly Face (Another Failure Story)

20 Jun 2023 10:31 UTC
65 points
21 comments16 min readLW link

Muehlhauser-Go­ertzel Dialogue, Part 1

lukeprog16 Mar 2012 17:12 UTC
42 points
161 comments33 min readLW link

Ex­plor­ing Last-Re­sort Mea­sures for AI Align­ment: Hu­man­ity’s Ex­tinc­tion Switch

0xPetra23 Jun 2023 17:01 UTC
7 points
0 comments2 min readLW link

[LINK] NYT Ar­ti­cle about Ex­is­ten­tial Risk from AI

[deleted]28 Jan 2013 10:37 UTC
38 points
23 comments1 min readLW link

Refram­ing the Prob­lem of AI Progress

Wei Dai12 Apr 2012 19:31 UTC
32 points
47 comments1 min readLW link

New AI risks re­search in­sti­tute at Oxford University

lukeprog16 Nov 2011 18:52 UTC
36 points
10 comments1 min readLW link

Memes and Ra­tional Decisions

inferential9 Jan 2015 6:42 UTC
35 points
18 comments10 min readLW link

I just watched don’t look up.

ATheCoder23 Jun 2023 21:22 UTC
0 points
5 comments2 min readLW link

AI-Plans.com—a con­tributable compendium

Iknownothing25 Jun 2023 14:40 UTC
39 points
7 comments4 min readLW link
(ai-plans.com)

Cheat sheet of AI X-risk

momom229 Jun 2023 4:28 UTC
19 points
1 comment7 min readLW link

Challenge pro­posal: small­est pos­si­ble self-hard­en­ing back­door for RLHF

Christopher King29 Jun 2023 16:56 UTC
7 points
0 comments2 min readLW link

Biosafety Reg­u­la­tions (BMBL) and their rele­vance for AI

Štěpán Los29 Jun 2023 19:22 UTC
4 points
0 comments4 min readLW link

Me­taphors for AI, and why I don’t like them

boazbarak28 Jun 2023 22:47 UTC
33 points
18 comments12 min readLW link

AGI & War

Calecute29 Jun 2023 22:20 UTC
9 points
1 comment1 min readLW link

AI In­ci­dent Shar­ing—Best prac­tices from other fields and a com­pre­hen­sive list of ex­ist­ing platforms

Štěpán Los28 Jun 2023 17:21 UTC
20 points
0 comments4 min readLW link

AI Safety with­out Align­ment: How hu­mans can WIN against AI

vicchain29 Jun 2023 17:53 UTC
1 point
1 comment2 min readLW link

Levels of AI Self-Im­prove­ment

avturchin29 Apr 2018 11:45 UTC
11 points
1 comment39 min readLW link

Op­ti­mis­ing So­ciety to Con­strain Risk of War from an Ar­tifi­cial Su­per­in­tel­li­gence

JohnCDraper30 Apr 2020 10:47 UTC
4 points
1 comment51 min readLW link

Quan­ti­ta­tive cruxes in Alignment

Martín Soto2 Jul 2023 20:38 UTC
19 points
0 comments23 min readLW link

Sources of ev­i­dence in Alignment

Martín Soto2 Jul 2023 20:38 UTC
20 points
0 comments11 min readLW link

Some Thoughts on Sin­gu­lar­ity Strategies

Wei Dai13 Jul 2011 2:41 UTC
45 points
30 comments3 min readLW link

An AGI kill switch with defined se­cu­rity properties

Peterpiper5 Jul 2023 17:40 UTC
−5 points
6 comments1 min readLW link

How I Learned To Stop Wor­ry­ing And Love The Shoggoth

Peter Merel12 Jul 2023 17:47 UTC
9 points
15 comments5 min readLW link

OpenAI Launches Su­per­al­ign­ment Taskforce

Zvi11 Jul 2023 13:00 UTC
149 points
40 comments49 min readLW link
(thezvi.wordpress.com)

Do you feel that AGI Align­ment could be achieved in a Type 0 civ­i­liza­tion?

Super AGI6 Jul 2023 4:52 UTC
−2 points
1 comment1 min readLW link

An Overview of AI risks—the Flyer

17 Jul 2023 12:03 UTC
20 points
0 comments1 min readLW link
(docs.google.com)

ACI#4: Seed AI is the new Per­pet­ual Mo­tion Machine

Akira Pyinya8 Jul 2023 1:17 UTC
−7 points
0 comments6 min readLW link

An­nounc­ing AI Align­ment work­shop at the ALIFE 2023 conference

rorygreig8 Jul 2023 13:52 UTC
16 points
0 comments1 min readLW link
(humanvaluesandartificialagency.com)

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric Drexler10 Jul 2023 13:42 UTC
117 points
9 comments12 min readLW link

A trick for Safer GPT-N

Razied23 Aug 2020 0:39 UTC
7 points
1 comment2 min readLW link

Gear­ing Up for Long Timelines in a Hard World

Dalcy14 Jul 2023 6:11 UTC
15 points
0 comments4 min readLW link

against “AI risk”

Wei Dai11 Apr 2012 22:46 UTC
35 points
91 comments1 min readLW link

“Smarter than us” is out!

Stuart_Armstrong25 Feb 2014 15:50 UTC
41 points
57 comments1 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC
22 points
16 comments4 min readLW link

[Question] What crite­rion would you use to se­lect com­pa­nies likely to cause AI doom?

momom213 Jul 2023 20:31 UTC
8 points
4 comments1 min readLW link

Why was the AI Align­ment com­mu­nity so un­pre­pared for this mo­ment?

Ras151315 Jul 2023 0:26 UTC
121 points
65 comments2 min readLW link

Sim­ple al­ign­ment plan that maybe works

Iknownothing18 Jul 2023 22:48 UTC
4 points
8 comments1 min readLW link

Ru­n­away Op­ti­miz­ers in Mind Space

silentbob16 Jul 2023 14:26 UTC
16 points
0 comments12 min readLW link

Quick Thoughts on Lan­guage Models

RohanS18 Jul 2023 20:38 UTC
6 points
0 comments4 min readLW link

[Question] Why don’t we con­sider large forms of so­cial or­ga­ni­za­tion (eco­nomic and poli­ti­cal forms, in par­tic­u­lar) to qual­ify as AGI and Trans­for­ma­tive AI?

T L16 Jul 2023 18:54 UTC
1 point
0 comments2 min readLW link

Proof of pos­te­ri­or­ity: a defense against AI-gen­er­ated misinformation

jchan17 Jul 2023 12:04 UTC
33 points
3 comments5 min readLW link

Q&A with Abram Dem­ski on risks from AI

XiXiDu17 Jan 2012 9:43 UTC
33 points
71 comments9 min readLW link

Q&A with ex­perts on risks from AI #2

XiXiDu9 Jan 2012 19:40 UTC
22 points
29 comments7 min readLW link

AI Safety Dis­cus­sion Day

Linda Linsefors15 Sep 2020 14:40 UTC
20 points
0 comments1 min readLW link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

otto.barten24 Jul 2023 10:07 UTC
12 points
0 comments7 min readLW link
(time.com)

A long re­ply to Ben Garfinkel on Scru­ti­niz­ing Clas­sic AI Risk Arguments

Søren Elverlin27 Sep 2020 17:51 UTC
17 points
6 comments1 min readLW link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2023]

smallsilo20 Jul 2023 20:20 UTC
38 points
40 comments2 min readLW link
(forum.effectivealtruism.org)

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

Justausername23 Jul 2023 16:08 UTC
4 points
1 comment3 min readLW link

Au­tonomous Align­ment Over­sight Frame­work (AAOF)

Justausername25 Jul 2023 10:25 UTC
−9 points
0 comments4 min readLW link

A re­sponse to the Richards et al.’s “The Illu­sion of AI’s Ex­is­ten­tial Risk”

Harrison Fell26 Jul 2023 17:34 UTC
1 point
0 comments10 min readLW link

[Question] Have you ever con­sid­ered tak­ing the ‘Tur­ing Test’ your­self?

Super AGI27 Jul 2023 3:48 UTC
2 points
6 comments1 min readLW link

Eval­u­at­ing Su­per­hu­man Models with Con­sis­tency Checks

1 Aug 2023 7:51 UTC
21 points
2 comments9 min readLW link
(arxiv.org)

3 lev­els of threat obfuscation

HoldenKarnofsky2 Aug 2023 14:58 UTC
69 points
14 comments7 min readLW link

4 types of AGI se­lec­tion, and how to con­strain them

Remmelt8 Aug 2023 10:02 UTC
−4 points
3 comments3 min readLW link

Embed­ding Eth­i­cal Pri­ors into AI Sys­tems: A Bayesian Approach

Justausername3 Aug 2023 15:31 UTC
−5 points
3 comments21 min readLW link

Ilya Sutskever’s thoughts on AI safety (July 2023): a tran­script with my comments

mishka10 Aug 2023 19:07 UTC
21 points
3 comments5 min readLW link

Some al­ign­ment ideas

SelonNerias10 Aug 2023 17:51 UTC
1 point
0 comments11 min readLW link

Seek­ing In­put to AI Safety Book for non-tech­ni­cal audience

Darren McKee10 Aug 2023 17:58 UTC
10 points
4 comments1 min readLW link

On­line AI Safety Dis­cus­sion Day

Linda Linsefors8 Oct 2020 12:11 UTC
5 points
0 comments1 min readLW link

Ex­is­ten­tially rele­vant thought ex­per­i­ment: To kill or not to kill, a sniper, a man and a but­ton.

AlexFromSafeTransition14 Aug 2023 10:53 UTC
−18 points
6 comments4 min readLW link

We Should Pre­pare for a Larger Rep­re­sen­ta­tion of Academia in AI Safety

Leon Lang13 Aug 2023 18:03 UTC
90 points
13 comments5 min readLW link

De­com­pos­ing in­de­pen­dent gen­er­al­iza­tions in neu­ral net­works via Hes­sian analysis

14 Aug 2023 17:04 UTC
83 points
4 comments1 min readLW link

AISN #19: US-China Com­pe­ti­tion on AI Chips, Mea­sur­ing Lan­guage Agent Devel­op­ments, Eco­nomic Anal­y­sis of Lan­guage Model Pro­pa­ganda, and White House AI Cy­ber Challenge

15 Aug 2023 16:10 UTC
21 points
0 comments5 min readLW link
(newsletter.safe.ai)

Ideas for im­prov­ing epistemics in AI safety outreach

mic21 Aug 2023 19:55 UTC
64 points
6 comments3 min readLW link

Mesa-Op­ti­miza­tion: Ex­plain it like I’m 10 Edition

brook26 Aug 2023 23:04 UTC
20 points
1 comment6 min readLW link

En­hanc­ing Cor­rigi­bil­ity in AI Sys­tems through Ro­bust Feed­back Loops

Justausername24 Aug 2023 3:53 UTC
1 point
0 comments6 min readLW link

Reflec­tions on “Mak­ing the Atomic Bomb”

boazbarak17 Aug 2023 2:48 UTC
51 points
7 comments8 min readLW link

AI Reg­u­la­tion May Be More Im­por­tant Than AI Align­ment For Ex­is­ten­tial Safety

otto.barten24 Aug 2023 11:41 UTC
65 points
39 comments5 min readLW link

When dis­cussing AI doom bar­ri­ers pro­pose spe­cific plau­si­ble scenarios

anithite18 Aug 2023 4:06 UTC
5 points
0 comments3 min readLW link

Mili­tary AI as a Con­ver­gent Goal of Self-Im­prov­ing AI

avturchin13 Nov 2017 12:17 UTC
5 points
3 comments1 min readLW link

2 un­usual rea­sons for why we can avoid be­ing turned into paperclips

Artem Panush19 Aug 2023 10:28 UTC
1 point
0 comments4 min readLW link

[Question] Clar­ify­ing how mis­al­ign­ment can arise from scal­ing LLMs

Util19 Aug 2023 14:16 UTC
3 points
1 comment1 min readLW link

Will AI kill ev­ery­one? Here’s what the god­fathers of AI have to say [RA video]

Writer19 Aug 2023 17:29 UTC
58 points
8 comments1 min readLW link
(youtu.be)

Re­port on Fron­tier Model Training

YafahEdelman30 Aug 2023 20:02 UTC
122 points
21 comments21 min readLW link
(docs.google.com)

Ram­ble on STUFF: in­tel­li­gence, simu­la­tion, AI, doom, de­fault mode, the usual

Bill Benzon26 Aug 2023 15:49 UTC
5 points
0 comments4 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC
24 points
15 comments6 min readLW link

A Let­ter to the Edi­tor of MIT Tech­nol­ogy Review

Jeffs30 Aug 2023 16:59 UTC
0 points
0 comments2 min readLW link

The Epistemic Author­ity of Deep Learn­ing Pioneers

Dylan Bowman29 Aug 2023 18:14 UTC
8 points
2 comments3 min readLW link

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

29 Aug 2023 15:07 UTC
12 points
0 comments8 min readLW link
(newsletter.safe.ai)

Neu­ral pro­gram syn­the­sis is a dan­ger­ous technology

syllogism12 Jan 2018 16:19 UTC
10 points
6 comments2 min readLW link

New, Brief Pop­u­lar-Level In­tro­duc­tion to AI Risks and Superintelligence

LyleN23 Jan 2015 15:43 UTC
33 points
3 comments1 min readLW link

Video es­say: How Will We Know When AI is Con­scious?

JanPro6 Sep 2023 18:10 UTC
11 points
7 comments1 min readLW link
(www.youtube.com)

AISN #21: Google Deep­Mind’s GPT-4 Com­peti­tor, Mili­tary In­vest­ments in Au­tonomous Drones, The UK AI Safety Sum­mit, and Case Stud­ies in AI Policy

5 Sep 2023 15:03 UTC
15 points
0 comments5 min readLW link
(newsletter.safe.ai)

[Question] What is to be done? (About the profit mo­tive)

Connor Barber8 Sep 2023 19:27 UTC
1 point
21 comments1 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 8

9 Sep 2023 16:34 UTC
28 points
0 comments13 min readLW link

FAI Re­search Con­straints and AGI Side Effects

JustinShovelain3 Jun 2015 19:25 UTC
27 points
59 comments7 min readLW link

A con­ver­sa­tion with Pi, a con­ver­sa­tional AI.

Spiritus Dei15 Sep 2023 23:13 UTC
1 point
0 comments1 min readLW link

Jimmy Ap­ples, source of the ru­mor that OpenAI has achieved AGI in­ter­nally, is a cred­ible in­sider.

Jorterder28 Sep 2023 1:20 UTC
−6 points
2 comments1 min readLW link
(twitter.com)

MMLU’s Mo­ral Sce­nar­ios Bench­mark Doesn’t Mea­sure What You Think it Measures

corey morris27 Sep 2023 17:54 UTC
18 points
2 comments4 min readLW link
(medium.com)

Tech­ni­cal AI Safety Re­search Land­scape [Slides]

Magdalena Wache18 Sep 2023 13:56 UTC
41 points
0 comments4 min readLW link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AI14 Nov 2020 15:51 UTC
34 points
6 comments1 min readLW link

Im­mor­tal­ity or death by AGI

ImmortalityOrDeathByAGI21 Sep 2023 23:59 UTC
47 points
30 comments4 min readLW link
(forum.effectivealtruism.org)

[Linkpost] Mark Zucker­berg con­fronted about Meta’s Llama 2 AI’s abil­ity to give users de­tailed guidance on mak­ing an­thrax—Busi­ness Insider

mic26 Sep 2023 12:05 UTC
18 points
11 comments2 min readLW link
(www.businessinsider.com)

The mind-killer

Paul Crowley2 May 2009 16:49 UTC
29 points
160 comments2 min readLW link

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotal29 Sep 2023 14:01 UTC
154 points
79 comments1 min readLW link
(titotal.substack.com)

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni Angelou23 Sep 2023 22:03 UTC
37 points
15 comments2 min readLW link

Tak­ing fea­tures out of su­per­po­si­tion with sparse au­toen­coders more quickly with in­formed initialization

Pierre Peigné23 Sep 2023 16:21 UTC
30 points
8 comments5 min readLW link

Linkpost: Are Emer­gent Abil­ities in Large Lan­guage Models just In-Con­text Learn­ing?

Erich_Grunewald8 Oct 2023 12:14 UTC
12 points
7 comments2 min readLW link
(arxiv.org)

Paper: Iden­ti­fy­ing the Risks of LM Agents with an LM-Emu­lated Sand­box—Univer­sity of Toronto 2023 - Bench­mark con­sist­ing of 36 high-stakes tools and 144 test cases!

Singularian25019 Oct 2023 0:00 UTC
6 points
0 comments1 min readLW link

Ideation and Tra­jec­tory Model­ling in Lan­guage Models

NickyP5 Oct 2023 19:21 UTC
16 points
2 comments10 min readLW link

The Gra­di­ent – The Ar­tifi­cial­ity of Alignment

mic8 Oct 2023 4:06 UTC
12 points
1 comment5 min readLW link
(thegradient.pub)

[Question] Should I do it?

MrLight19 Nov 2020 1:08 UTC
−3 points
16 comments2 min readLW link

Be­come a PIBBSS Re­search Affiliate

10 Oct 2023 7:41 UTC
24 points
6 comments6 min readLW link

Ra­tion­al­is­ing hu­mans: an­other mug­ging, but not Pas­cal’s

Stuart_Armstrong14 Nov 2017 15:46 UTC
7 points
1 comment3 min readLW link

Ma­chine learn­ing could be fun­da­men­tally unexplainable

George3d616 Dec 2020 13:32 UTC
26 points
15 comments15 min readLW link
(cerebralab.com)

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

12 Oct 2023 19:58 UTC
151 points
29 comments14 min readLW link

Back to the Past to the Future

Prometheus18 Oct 2023 16:51 UTC
5 points
0 comments1 min readLW link

[Question] What do you make of AGI:un­al­igned::space­ships:not enough food?

Ronny Fernandez22 Feb 2020 14:14 UTC
4 points
3 comments1 min readLW link

Tax­on­omy of AI-risk counterarguments

Odd anon16 Oct 2023 0:12 UTC
62 points
13 comments8 min readLW link

Risk Map of AI Systems

15 Dec 2020 9:16 UTC
28 points
3 comments8 min readLW link

AISU 2021

Linda Linsefors30 Jan 2021 17:40 UTC
28 points
2 comments1 min readLW link

Non­per­son Predicates

Eliezer Yudkowsky27 Dec 2008 1:47 UTC
66 points
177 comments6 min readLW link

For­mal Solu­tion to the In­ner Align­ment Problem

michaelcohen18 Feb 2021 14:51 UTC
49 points
123 comments2 min readLW link

[Question] What are the biggest cur­rent im­pacts of AI?

Sam Clarke7 Mar 2021 21:44 UTC
15 points
5 comments1 min readLW link

[Question] Is a Self-Iter­at­ing AGI Vuln­er­a­ble to Thomp­son-style Tro­jans?

sxae25 Mar 2021 14:46 UTC
15 points
6 comments3 min readLW link

AI or­a­cles on blockchain

Caravaggio6 Apr 2021 20:13 UTC
5 points
0 comments3 min readLW link

What if AGI is near?

Wulky Wilkinsen14 Apr 2021 0:05 UTC
11 points
5 comments1 min readLW link

[Question] Is there any­thing that can stop AGI de­vel­op­ment in the near term?

Wulky Wilkinsen22 Apr 2021 20:37 UTC
5 points
5 comments1 min readLW link

[Question] [time­boxed ex­er­cise] write me your model of AI hu­man-ex­is­ten­tial safety and the al­ign­ment prob­lems in 15 minutes

Quinn4 May 2021 19:10 UTC
6 points
2 comments1 min readLW link

AI Safety Re­search Pro­ject Ideas

Owain_Evans21 May 2021 13:39 UTC
58 points
2 comments3 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
65 points
11 comments7 min readLW link

[Question] What are some claims or opinions about multi-multi del­e­ga­tion you’ve seen in the meme­plex that you think de­serve scrutiny?

Quinn27 Jun 2021 17:44 UTC
17 points
6 comments2 min readLW link

Mauhn Re­leases AI Safety Documentation

Berg Severens3 Jul 2021 21:23 UTC
4 points
0 comments1 min readLW link

A gen­tle apoc­a­lypse

pchvykov16 Aug 2021 5:03 UTC
3 points
5 comments3 min readLW link

Could you have stopped Ch­er­nobyl?

Carlos Ramirez27 Aug 2021 1:48 UTC
29 points
17 comments8 min readLW link

The Gover­nance Prob­lem and the “Pretty Good” X-Risk

Zach Stein-Perlman29 Aug 2021 18:00 UTC
5 points
2 comments11 min readLW link

Dist­in­guish­ing AI takeover scenarios

8 Sep 2021 16:19 UTC
74 points
11 comments14 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

Buck9 Sep 2021 19:46 UTC
88 points
12 comments5 min readLW link

How truth­ful is GPT-3? A bench­mark for lan­guage models

Owain_Evans16 Sep 2021 10:09 UTC
58 points
24 comments6 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy Martin17 Sep 2021 18:47 UTC
27 points
1 comment27 min readLW link

AI take­off story: a con­tinu­a­tion of progress by other means

Edouard Harris27 Sep 2021 15:55 UTC
76 points
13 comments10 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC
30 points
7 comments10 min readLW link

The Dark Side of Cog­ni­tion Hypothesis

Cameron Berg3 Oct 2021 20:10 UTC
19 points
1 comment16 min readLW link

Truth­ful AI: Devel­op­ing and gov­ern­ing AI that does not lie

18 Oct 2021 18:37 UTC
82 points
9 comments10 min readLW link

AMA on Truth­ful AI: Owen Cot­ton-Bar­ratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC
31 points
15 comments1 min readLW link

Truth­ful and hon­est AI

29 Oct 2021 7:28 UTC
42 points
1 comment13 min readLW link

What is the most evil AI that we could build, to­day?

ThomasJ1 Nov 2021 19:58 UTC
−2 points
14 comments1 min readLW link

What are red flags for Neu­ral Net­work suffer­ing?

Marius Hobbhahn8 Nov 2021 12:51 UTC
29 points
15 comments12 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJules11 Nov 2021 7:04 UTC
2 points
2 comments1 min readLW link

What would we do if al­ign­ment were fu­tile?

Grant Demaree14 Nov 2021 8:09 UTC
75 points
39 comments3 min readLW link

Two Stupid AI Align­ment Ideas

aphyer16 Nov 2021 16:13 UTC
27 points
3 comments4 min readLW link

Su­per in­tel­li­gent AIs that don’t re­quire alignment

Yair Halberstadt16 Nov 2021 19:55 UTC
10 points
2 comments6 min readLW link

AI Tracker: mon­i­tor­ing cur­rent and near-fu­ture risks from su­per­scale models

23 Nov 2021 19:16 UTC
67 points
13 comments3 min readLW link
(aitracker.org)

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

Madhulika Srikumar24 Nov 2021 8:27 UTC
6 points
0 comments1 min readLW link

How to mea­sure FLOP/​s for Neu­ral Net­works em­piri­cally?

Marius Hobbhahn29 Nov 2021 15:18 UTC
16 points
5 comments7 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

6 Dec 2021 13:54 UTC
54 points
1 comment12 min readLW link

HIRING: In­form and shape a new pro­ject on AI safety at Part­ner­ship on AI

madhu_lika7 Dec 2021 19:37 UTC
1 point
0 comments1 min readLW link

Univer­sal­ity and the “Filter”

maggiehayes16 Dec 2021 0:47 UTC
10 points
2 comments11 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC
80 points
20 comments1 min readLW link

2+2: On­tolog­i­cal Framework

Lyrialtus1 Feb 2022 1:07 UTC
−15 points
2 comments12 min readLW link

[un­ti­tled post]

superads916 Feb 2022 20:39 UTC
−5 points
8 comments1 min readLW link

How harm­ful are im­prove­ments in AI? + Poll

15 Feb 2022 18:16 UTC
15 points
4 comments8 min readLW link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donor6 Mar 2022 18:43 UTC
41 points
11 comments5 min readLW link

Ask AI com­pa­nies about what they are do­ing for AI safety?

mic9 Mar 2022 15:14 UTC
51 points
0 comments2 min readLW link

Is There a Valley of Bad Civ­i­liza­tional Ad­e­quacy?

lbThingrb11 Mar 2022 19:49 UTC
13 points
1 comment2 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

Yitz16 Mar 2022 2:47 UTC
8 points
8 comments1 min readLW link

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor Leahy8 Apr 2022 11:40 UTC
197 points
25 comments4 min readLW link

Is tech­ni­cal AI al­ign­ment re­search a net pos­i­tive?

cranberry_bear12 Apr 2022 13:07 UTC
6 points
2 comments2 min readLW link

[Question] Can some­one ex­plain to me why MIRI is so pes­simistic of our chances of sur­vival?

iamthouthouarti14 Apr 2022 20:28 UTC
10 points
7 comments1 min readLW link

[Question] Con­vince me that hu­man­ity *isn’t* doomed by AGI

Yitz15 Apr 2022 17:26 UTC
61 points
50 comments1 min readLW link

Reflec­tions on My Own Miss­ing Mood

Lone Pine21 Apr 2022 16:19 UTC
52 points
25 comments5 min readLW link

Code Gen­er­a­tion as an AI risk setting

Not Relevant17 Apr 2022 22:27 UTC
91 points
16 comments2 min readLW link

[Question] What is be­ing im­proved in re­cur­sive self im­prove­ment?

Lone Pine25 Apr 2022 18:30 UTC
7 points
6 comments1 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

Kakili27 Apr 2022 22:07 UTC
10 points
2 comments8 min readLW link

Video and Tran­script of Pre­sen­ta­tion on Ex­is­ten­tial Risk from Power-Seek­ing AI

Joe Carlsmith8 May 2022 3:50 UTC
20 points
1 comment29 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
58 points
0 comments59 min readLW link

Agency As a Nat­u­ral Abstraction

Thane Ruthenis13 May 2022 18:02 UTC
55 points
9 comments13 min readLW link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenz14 May 2022 16:01 UTC
34 points
0 comments1 min readLW link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

16 May 2022 21:21 UTC
63 points
6 comments6 min readLW link

Ac­tion­able-guidance and roadmap recom­men­da­tions for the NIST AI Risk Man­age­ment Framework

17 May 2022 15:26 UTC
26 points
0 comments3 min readLW link

Why I’m Op­ti­mistic About Near-Term AI Risk

harsimony15 May 2022 23:05 UTC
57 points
27 comments1 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon Fischer21 Aug 2022 17:13 UTC
28 points
3 comments7 min readLW link

Re­shap­ing the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC
147 points
35 comments21 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy Gillen24 May 2022 23:10 UTC
9 points
2 comments10 min readLW link

A Story of AI Risk: In­struc­tGPT-N

peterbarnett26 May 2022 23:22 UTC
24 points
0 comments8 min readLW link

We will be around in 30 years

mukashi7 Jun 2022 3:47 UTC
12 points
205 comments2 min readLW link

Re­search Ques­tions from Stained Glass Windows

StefanHex8 Jun 2022 12:38 UTC
4 points
0 comments2 min readLW link

Towards Gears-Level Un­der­stand­ing of Agency

Thane Ruthenis16 Jun 2022 22:00 UTC
25 points
4 comments18 min readLW link

A plau­si­ble story about AI risk.

DeLesley Hutchins10 Jun 2022 2:08 UTC
16 points
2 comments4 min readLW link

Sum­mary of “AGI Ruin: A List of Lethal­ities”

Stephen McAleese10 Jun 2022 22:35 UTC
44 points
2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane Ruthenis11 Jun 2022 18:29 UTC
48 points
5 comments4 min readLW link

Con­tra EY: Can AGI de­stroy us with­out trial & er­ror?

Nikita Sokolsky13 Jun 2022 18:26 UTC
136 points
72 comments15 min readLW link

A Modest Pivotal Act

anonymousaisafety13 Jun 2022 19:24 UTC
−16 points
1 comment5 min readLW link

[Question] AI mis­al­ign­ment risk from GPT-like sys­tems?

fiso6419 Jun 2022 17:35 UTC
10 points
8 comments1 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
86 points
30 comments15 min readLW link

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGod25 Jun 2022 13:53 UTC
5 points
3 comments6 min readLW link

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

16 Jul 2022 12:57 UTC
84 points
132 comments3 min readLW link

Refram­ing the AI Risk

Thane Ruthenis1 Jul 2022 18:44 UTC
26 points
7 comments6 min readLW link

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanS2 Jul 2022 17:45 UTC
3 points
0 comments2 min readLW link
(forum.effectivealtruism.org)

Can we achieve AGI Align­ment by bal­anc­ing mul­ti­ple hu­man ob­jec­tives?

Ben Smith3 Jul 2022 2:51 UTC
11 points
1 comment4 min readLW link

New US Se­nate Bill on X-Risk Miti­ga­tion [Linkpost]

Evan R. Murphy4 Jul 2022 1:25 UTC
35 points
12 comments1 min readLW link
(www.hsgac.senate.gov)

My Most Likely Rea­son to Die Young is AI X-Risk

AISafetyIsNotLongtermist4 Jul 2022 17:08 UTC
61 points
24 comments4 min readLW link
(forum.effectivealtruism.org)

Please help us com­mu­ni­cate AI xrisk. It could save the world.

otto.barten4 Jul 2022 21:47 UTC
4 points
7 comments2 min readLW link

When is it ap­pro­pri­ate to use statis­ti­cal mod­els and prob­a­bil­ities for de­ci­sion mak­ing ?

Younes Kamel5 Jul 2022 12:34 UTC
10 points
7 comments4 min readLW link
(youneskamel.substack.com)

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

12 Jul 2022 20:11 UTC
50 points
0 comments1 min readLW link
(docs.google.com)

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane Ruthenis13 Jul 2022 20:23 UTC
43 points
16 comments4 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
59 points
8 comments20 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCeph19 Jul 2022 2:28 UTC
9 points
1 comment9 min readLW link

What En­vi­ron­ment Prop­er­ties Select Agents For World-Model­ing?

Thane Ruthenis23 Jul 2022 19:27 UTC
25 points
1 comment12 min readLW link

Align­ment be­ing im­pos­si­ble might be bet­ter than it be­ing re­ally difficult

Martín Soto25 Jul 2022 23:57 UTC
13 points
2 comments2 min readLW link

AGI ruin sce­nar­ios are likely (and dis­junc­tive)

So8res27 Jul 2022 3:21 UTC
175 points
38 comments6 min readLW link

[Question] How likely do you think worse-than-ex­tinc­tion type fates to be?

span11 Aug 2022 4:08 UTC
3 points
3 comments1 min readLW link

Three pillars for avoid­ing AGI catas­tro­phe: Tech­ni­cal al­ign­ment, de­ploy­ment de­ci­sions, and coordination

Alex Lintz3 Aug 2022 23:15 UTC
22 points
0 comments11 min readLW link

Con­ver­gence Towards World-Models: A Gears-Level Model

Thane Ruthenis4 Aug 2022 23:31 UTC
38 points
1 comment13 min readLW link

Com­plex­ity No Bar to AI (Or, why Com­pu­ta­tional Com­plex­ity mat­ters less than you think for real life prob­lems)

Noosphere897 Aug 2022 19:55 UTC
17 points
14 comments3 min readLW link
(www.gwern.net)

How To Go From In­ter­pretabil­ity To Align­ment: Just Re­tar­get The Search

johnswentworth10 Aug 2022 16:08 UTC
204 points
34 comments3 min readLW link1 review

Anti-squat­ted AI x-risk do­mains index

plex12 Aug 2022 12:01 UTC
56 points
6 comments1 min readLW link

In­fant AI Scenario

Nathan112312 Aug 2022 21:20 UTC
1 point
0 comments3 min readLW link

The Dumbest Pos­si­ble Gets There First

Artaxerxes13 Aug 2022 10:20 UTC
44 points
7 comments2 min readLW link

In­ter­pretabil­ity Tools Are an At­tack Channel

Thane Ruthenis17 Aug 2022 18:47 UTC
42 points
14 comments1 min readLW link

Align­ment’s phlo­gis­ton

Eleni Angelou18 Aug 2022 22:27 UTC
10 points
2 comments2 min readLW link

Bench­mark­ing Pro­pos­als on Risk Scenarios

Paul Bricman20 Aug 2022 10:01 UTC
25 points
2 comments14 min readLW link

What’s the Least Im­pres­sive Thing GPT-4 Won’t be Able to Do

Algon20 Aug 2022 19:48 UTC
80 points
125 comments1 min readLW link

My Plan to Build Aligned Superintelligence

apollonianblues21 Aug 2022 13:16 UTC
18 points
7 comments8 min readLW link

The Align­ment Prob­lem Needs More Pos­i­tive Fiction

Netcentrica21 Aug 2022 22:01 UTC
6 points
3 comments5 min readLW link

It Looks Like You’re Try­ing To Take Over The Narrative

George3d624 Aug 2022 13:36 UTC
3 points
20 comments9 min readLW link
(www.epistem.ink)

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane Ruthenis26 Aug 2022 18:49 UTC
30 points
1 comment6 min readLW link

Tak­ing the pa­ram­e­ters which seem to mat­ter and ro­tat­ing them un­til they don’t

Garrett Baker26 Aug 2022 18:26 UTC
120 points
48 comments1 min readLW link

An­nual AGI Bench­mark­ing Event

Lawrence Phillips27 Aug 2022 0:06 UTC
24 points
3 comments2 min readLW link
(www.metaculus.com)

Help Un­der­stand­ing Prefer­ences And Evil

Netcentrica27 Aug 2022 3:42 UTC
6 points
7 comments2 min readLW link

[Question] What would you ex­pect a mas­sive mul­ti­modal on­line fed­er­ated learner to be ca­pa­ble of?

Aryeh Englander27 Aug 2022 17:31 UTC
13 points
4 comments1 min readLW link

Are Gen­er­a­tive World Models a Mesa-Op­ti­miza­tion Risk?

Thane Ruthenis29 Aug 2022 18:37 UTC
13 points
2 comments3 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni Angelou3 Sep 2022 12:47 UTC
9 points
0 comments3 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworth30 Aug 2022 20:48 UTC
207 points
30 comments10 min readLW link1 review

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni Angelou1 Sep 2022 16:57 UTC
7 points
8 comments3 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubc2 Sep 2022 18:14 UTC
9 points
10 comments6 min readLW link

Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhub2 Sep 2022 21:57 UTC
39 points
13 comments3 min readLW link

A Game About AI Align­ment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardt5 Sep 2022 7:55 UTC
18 points
15 comments2 min readLW link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford6 Sep 2022 17:17 UTC
6 points
0 comments4 min readLW link

It’s (not) how you use it

Eleni Angelou7 Sep 2022 17:15 UTC
8 points
1 comment2 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC
20 points
6 comments10 min readLW link

Ide­olog­i­cal In­fer­ence Eng­ines: Mak­ing Deon­tol­ogy Differ­en­tiable*

Paul Bricman12 Sep 2022 12:00 UTC
6 points
0 comments14 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

11 Sep 2022 10:57 UTC
46 points
13 comments30 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

Akash11 Sep 2022 23:43 UTC
46 points
8 comments6 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasper12 Sep 2022 19:07 UTC
97 points
7 comments2 min readLW link
(arxiv.org)

Rep­re­sen­ta­tional Tethers: Ty­ing AI La­tents To Hu­man Ones

Paul Bricman16 Sep 2022 14:45 UTC
30 points
0 comments16 min readLW link

Risk aver­sion and GPT-3

casualphysicsenjoyer13 Sep 2022 20:50 UTC
1 point
0 comments1 min readLW link

[Question] Would a Misal­igned SSI Really Kill Us All?

DragonGod14 Sep 2022 12:15 UTC
6 points
7 comments6 min readLW link

Emily Brontë on: Psy­chol­ogy Re­quired for Se­ri­ous™ AGI Safety Research

robertzk14 Sep 2022 14:47 UTC
2 points
0 comments1 min readLW link

Pre­cise P(doom) isn’t very im­por­tant for pri­ori­ti­za­tion or strategy

harsimony14 Sep 2022 17:19 UTC
14 points
6 comments1 min readLW link

Re­spond­ing to ‘Beyond Hyper­an­thro­po­mor­phism’

ukc1001414 Sep 2022 20:37 UTC
9 points
0 comments16 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilm15 Sep 2022 8:25 UTC
10 points
4 comments12 min readLW link

Un­der­stand­ing Con­jec­ture: Notes from Con­nor Leahy interview

Akash15 Sep 2022 18:37 UTC
107 points
23 comments15 min readLW link

[Question] Up­dates on FLI’s Value Alig­ment Map?

Fer32dwt34r3dfsz17 Sep 2022 22:27 UTC
17 points
4 comments1 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon Lang18 Sep 2022 13:08 UTC
44 points
3 comments1 min readLW link
(docs.google.com)

Lev­er­ag­ing Le­gal In­for­mat­ics to Align AI

John Nay18 Sep 2022 20:39 UTC
11 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

How to Train Your AGI Dragon

Eris Discordia21 Sep 2022 22:28 UTC
−1 points
3 comments5 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

22 Sep 2022 13:55 UTC
22 points
0 comments27 min readLW link

In­ter­lude: But Who Op­ti­mizes The Op­ti­mizer?

Paul Bricman23 Sep 2022 15:30 UTC
15 points
0 comments10 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022 2:33 UTC
7 points
6 comments3 min readLW link

On Generality

Eris Discordia26 Sep 2022 4:06 UTC
2 points
0 comments5 min readLW link

Oren’s Field Guide of Bad AGI Outcomes

Eris Discordia26 Sep 2022 4:06 UTC
0 points
0 comments1 min readLW link

(Struc­tural) Sta­bil­ity of Cou­pled Optimizers

Paul Bricman30 Sep 2022 11:28 UTC
25 points
0 comments10 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC
17 points
2 comments12 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

elifland30 Sep 2022 12:21 UTC
67 points
0 comments3 min readLW link
(docs.google.com)

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

Krieger2 Oct 2022 9:53 UTC
8 points
6 comments1 min readLW link

An­nounc­ing the AI Safety Nudge Com­pe­ti­tion to Help Beat Procrastination

Marc Carauleanu1 Oct 2022 1:49 UTC
10 points
0 comments1 min readLW link

Gen­er­a­tive, Epi­sodic Ob­jec­tives for Safe AI

Michael Glass5 Oct 2022 23:18 UTC
11 points
3 comments8 min readLW link

[Linkpost] “Blueprint for an AI Bill of Rights”—Office of Science and Tech­nol­ogy Policy, USA (2022)

Fer32dwt34r3dfsz5 Oct 2022 16:42 UTC
9 points
4 comments2 min readLW link
(www.whitehouse.gov)

What does it mean for an AGI to be ‘safe’?

So8res7 Oct 2022 4:13 UTC
74 points
29 comments3 min readLW link

Pos­si­ble miracles

9 Oct 2022 18:17 UTC
64 points
34 comments8 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

12 Oct 2022 12:24 UTC
33 points
4 comments8 min readLW link
(www.gladstone.ai)

Cat­a­logu­ing Pri­ors in The­ory and Practice

Paul Bricman13 Oct 2022 12:36 UTC
13 points
8 comments7 min readLW link

Misal­ign­ment-by-de­fault in multi-agent systems

13 Oct 2022 15:38 UTC
21 points
8 comments20 min readLW link
(www.gladstone.ai)

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

14 Oct 2022 15:50 UTC
22 points
0 comments17 min readLW link
(www.gladstone.ai)

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio Franca11 Oct 2022 22:50 UTC
6 points
0 comments9 min readLW link

Nice­ness is unnatural

So8res13 Oct 2022 1:30 UTC
125 points
20 comments8 min readLW link1 review

Greed Is the Root of This Evil

Thane Ruthenis13 Oct 2022 20:40 UTC
18 points
7 comments8 min readLW link

Prov­ably Hon­est—A First Step

Srijanak De5 Nov 2022 19:18 UTC
10 points
2 comments8 min readLW link

[Question] How easy is it to su­per­vise pro­cesses vs out­comes?

Noosphere8918 Oct 2022 17:48 UTC
3 points
0 comments1 min readLW link

Re­sponse to Katja Grace’s AI x-risk counterarguments

19 Oct 2022 1:17 UTC
77 points
18 comments15 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC
29 points
0 comments1 min readLW link
(github.com)

Wor­ld­view iPeo­ple—Fu­ture Fund’s AI Wor­ld­view Prize

Toni MUENDEL28 Oct 2022 1:53 UTC
−22 points
4 comments9 min readLW link

AI Re­searchers On AI Risk

Scott Alexander22 May 2015 11:16 UTC
19 points
0 comments16 min readLW link

AI as a Civ­i­liza­tional Risk Part 2/​6: Be­hav­ioral Modification

PashaKamyshev30 Oct 2022 16:57 UTC
9 points
0 comments10 min readLW link

AI as a Civ­i­liza­tional Risk Part 3/​6: Anti-econ­omy and Sig­nal Pollution

PashaKamyshev31 Oct 2022 17:03 UTC
7 points
4 comments14 min readLW link

AI as a Civ­i­liza­tional Risk Part 4/​6: Bioweapons and Philos­o­phy of Modification

PashaKamyshev1 Nov 2022 20:50 UTC
7 points
1 comment8 min readLW link

AI as a Civ­i­liza­tional Risk Part 5/​6: Re­la­tion­ship be­tween C-risk and X-risk

PashaKamyshev3 Nov 2022 2:19 UTC
2 points
0 comments7 min readLW link

AI as a Civ­i­liza­tional Risk Part 6/​6: What can be done

PashaKamyshev3 Nov 2022 19:48 UTC
2 points
4 comments4 min readLW link

Am I se­cretly ex­cited for AI get­ting weird?

porby29 Oct 2022 22:16 UTC
116 points
4 comments4 min readLW link

My (naive) take on Risks from Learned Optimization

Artyom Karpov31 Oct 2022 10:59 UTC
7 points
0 comments5 min readLW link

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
127 points
24 comments4 min readLW link1 review

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
77 points
4 comments25 min readLW link

Why do we post our AI safety plans on the In­ter­net?

Peter S. Park3 Nov 2022 16:02 UTC
4 points
4 comments11 min readLW link

My sum­mary of “Prag­matic AI Safety”

Eleni Angelou5 Nov 2022 12:54 UTC
3 points
0 comments5 min readLW link

4 Key As­sump­tions in AI Safety

Prometheus7 Nov 2022 10:50 UTC
20 points
5 comments7 min readLW link

Loss of con­trol of AI is not a likely source of AI x-risk

squek7 Nov 2022 18:44 UTC
−6 points
0 comments5 min readLW link

AI Safety Un­con­fer­ence NeurIPS 2022

Orpheus7 Nov 2022 15:39 UTC
25 points
0 comments1 min readLW link
(aisafetyevents.org)

Value For­ma­tion: An Over­ar­ch­ing Model

Thane Ruthenis15 Nov 2022 17:16 UTC
34 points
20 comments34 min readLW link

Is AI Gain-of-Func­tion re­search a thing?

MadHatter12 Nov 2022 2:33 UTC
9 points
2 comments2 min readLW link

The limited up­side of interpretability

Peter S. Park15 Nov 2022 18:46 UTC
13 points
11 comments1 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

23 Nov 2022 17:10 UTC
180 points
9 comments8 min readLW link

Con­jec­ture Se­cond Hiring Round

23 Nov 2022 17:11 UTC
92 points
0 comments1 min readLW link

Cor­rigi­bil­ity Via Thought-Pro­cess Deference

Thane Ruthenis24 Nov 2022 17:06 UTC
17 points
5 comments9 min readLW link

Dis­cussing how to al­ign Trans­for­ma­tive AI if it’s de­vel­oped very soon

elifland28 Nov 2022 16:17 UTC
37 points
2 comments28 min readLW link

[Question] Is there any policy for a fair treat­ment of AIs whose friendli­ness is in doubt?

nahoj18 Nov 2022 19:01 UTC
15 points
10 comments1 min readLW link

[Question] Whom Do You Trust?

JackOfAllTrades26 Feb 2024 19:38 UTC
1 point
0 comments1 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC
−15 points
4 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella Duan4 Dec 2022 18:48 UTC
28 points
6 comments1 min readLW link

[Question] Who are some promi­nent rea­son­able peo­ple who are con­fi­dent that AI won’t kill ev­ery­one?

Optimization Process5 Dec 2022 9:12 UTC
72 points
54 comments1 min readLW link

Aligned Be­hav­ior is not Ev­i­dence of Align­ment Past a Cer­tain Level of Intelligence

Ronny Fernandez5 Dec 2022 15:19 UTC
19 points
5 comments7 min readLW link

Fore­sight for AGI Safety Strat­egy: Miti­gat­ing Risks and Iden­ti­fy­ing Golden Opportunities

jacquesthibs5 Dec 2022 16:09 UTC
28 points
6 comments8 min readLW link

AI Safety in a Vuln­er­a­ble World: Re­quest­ing Feed­back on Pre­limi­nary Thoughts

Jordan Arel6 Dec 2022 22:35 UTC
4 points
2 comments3 min readLW link

Fear miti­gated the nu­clear threat, can it do the same to AGI risks?

Igor Ivanov9 Dec 2022 10:04 UTC
6 points
8 comments5 min readLW link

Reflec­tions on the PIBBSS Fel­low­ship 2022

11 Dec 2022 21:53 UTC
32 points
0 comments18 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC
29 points
3 comments20 min readLW link

Why I think that teach­ing philos­o­phy is high impact

Eleni Angelou19 Dec 2022 3:11 UTC
5 points
0 comments2 min readLW link

[Question] Will re­search in AI risk jinx it? Con­se­quences of train­ing AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC
5 points
6 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

19 Dec 2022 21:31 UTC
65 points
28 comments10 min readLW link

New AI risk in­tro from Vox [link post]

JakubK21 Dec 2022 6:00 UTC
5 points
1 comment2 min readLW link
(www.vox.com)

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieve25 Dec 2022 20:14 UTC
3 points
6 comments1 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC
31 points
38 comments9 min readLW link

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

Can26 Dec 2022 18:51 UTC
15 points
2 comments10 min readLW link
(forum.effectivealtruism.org)

In­tro­duc­tion: Bias in Eval­u­at­ing AGI X-Risks

27 Dec 2022 10:27 UTC
1 point
0 comments3 min readLW link

Mere ex­po­sure effect: Bias in Eval­u­at­ing AGI X-Risks

27 Dec 2022 14:05 UTC
0 points
2 comments1 min readLW link

Band­wagon effect: Bias in Eval­u­at­ing AGI X-Risks

28 Dec 2022 7:54 UTC
−1 points
0 comments1 min readLW link

In Defense of Wrap­per-Minds

Thane Ruthenis28 Dec 2022 18:28 UTC
24 points
38 comments3 min readLW link

Friendly and Un­friendly AGI are Indistinguishable

ErgoEcho29 Dec 2022 22:13 UTC
−4 points
4 comments4 min readLW link
(neologos.co)

In­ter­nal In­ter­faces Are a High-Pri­or­ity In­ter­pretabil­ity Target

Thane Ruthenis29 Dec 2022 17:49 UTC
26 points
6 comments7 min readLW link

CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram Rachum29 Dec 2022 16:08 UTC
15 points
0 comments1 min readLW link

Re­ac­tive de­val­u­a­tion: Bias in Eval­u­at­ing AGI X-Risks

30 Dec 2022 9:02 UTC
−15 points
9 comments1 min readLW link

Curse of knowl­edge and Naive re­al­ism: Bias in Eval­u­at­ing AGI X-Risks

31 Dec 2022 13:33 UTC
−7 points
1 comment1 min readLW link
(www.lesswrong.com)

[Question] Are Mix­ture-of-Ex­perts Trans­form­ers More In­ter­pretable Than Dense Trans­form­ers?

simeon_c31 Dec 2022 11:34 UTC
8 points
5 comments1 min readLW link

Challenge to the no­tion that any­thing is (maybe) pos­si­ble with AGI

1 Jan 2023 3:57 UTC
−27 points
4 comments1 min readLW link
(mflb.com)

Sum­mary of 80k’s AI prob­lem profile

JakubK1 Jan 2023 7:30 UTC
7 points
0 comments5 min readLW link
(forum.effectivealtruism.org)

Belief Bias: Bias in Eval­u­at­ing AGI X-Risks

2 Jan 2023 8:59 UTC
−10 points
1 comment1 min readLW link

Sta­tus quo bias; Sys­tem jus­tifi­ca­tion: Bias in Eval­u­at­ing AGI X-Risks

3 Jan 2023 2:50 UTC
−11 points
0 comments1 min readLW link

Causal rep­re­sen­ta­tion learn­ing as a tech­nique to pre­vent goal misgeneralization

PabloAMC4 Jan 2023 0:07 UTC
21 points
0 comments8 min readLW link

Illu­sion of truth effect and Am­bi­guity effect: Bias in Eval­u­at­ing AGI X-Risks

Remmelt5 Jan 2023 4:05 UTC
−13 points
2 comments1 min readLW link

AI Safety Camp, Vir­tual Edi­tion 2023

Linda Linsefors6 Jan 2023 11:09 UTC
40 points
10 comments3 min readLW link
(aisafety.camp)

AI Safety Camp: Ma­chine Learn­ing for Scien­tific Dis­cov­ery

Eleni Angelou6 Jan 2023 3:21 UTC
3 points
0 comments1 min readLW link

An­chor­ing fo­cal­ism and the Iden­ti­fi­able vic­tim effect: Bias in Eval­u­at­ing AGI X-Risks

Remmelt7 Jan 2023 9:59 UTC
1 point
2 comments1 min readLW link

Big list of AI safety videos

JakubK9 Jan 2023 6:12 UTC
11 points
2 comments1 min readLW link
(docs.google.com)

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

10 Jan 2023 16:06 UTC
84 points
8 comments39 min readLW link
(arxiv.org)

[Question] Could Si­mu­lat­ing an AGI Tak­ing Over the World Ac­tu­ally Lead to a LLM Tak­ing Over the World?

simeon_c13 Jan 2023 6:33 UTC
15 points
1 comment1 min readLW link

Reflec­tions on Trust­ing Trust & AI

Itay Yona16 Jan 2023 6:36 UTC
10 points
1 comment3 min readLW link
(mentaleap.ai)

OpenAI’s Align­ment Plan is not S.M.A.R.T.

Søren Elverlin18 Jan 2023 6:39 UTC
9 points
19 comments4 min readLW link

Gra­di­ent Filtering

18 Jan 2023 20:09 UTC
55 points
16 comments13 min readLW link

6-para­graph AI risk in­tro for MAISI

JakubK19 Jan 2023 9:22 UTC
11 points
0 comments2 min readLW link
(www.maisi.club)

List of tech­ni­cal AI safety ex­er­cises and projects

JakubK19 Jan 2023 9:35 UTC
41 points
5 comments1 min readLW link
(docs.google.com)

NYT: Google will “re­cal­ibrate” the risk of re­leas­ing AI due to com­pe­ti­tion with OpenAI

Michael Huang22 Jan 2023 8:38 UTC
47 points
2 comments1 min readLW link
(www.nytimes.com)

Next steps af­ter AGISF at UMich

JakubK25 Jan 2023 20:57 UTC
10 points
0 comments5 min readLW link
(docs.google.com)

What is the ground re­al­ity of coun­tries tak­ing steps to re­cal­ibrate AI de­vel­op­ment to­wards Align­ment first?

Nebuch29 Jan 2023 13:26 UTC
8 points
6 comments3 min readLW link

In­ter­views with 97 AI Re­searchers: Quan­ti­ta­tive Analysis

2 Feb 2023 1:01 UTC
23 points
0 comments7 min readLW link

[Question] What qual­ities does an AGI need to have to re­al­ize the risk of false vac­uum, with­out hard­cod­ing physics the­o­ries into it?

RationalSieve3 Feb 2023 16:00 UTC
1 point
4 comments1 min readLW link

Monthly Doom Ar­gu­ment Threads? Doom Ar­gu­ment Wiki?

LVSN4 Feb 2023 16:59 UTC
3 points
0 comments1 min readLW link

Se­cond call: CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram Rachum5 Feb 2023 12:18 UTC
2 points
0 comments2 min readLW link

Early situ­a­tional aware­ness and its im­pli­ca­tions, a story

Jacob Pfau6 Feb 2023 20:45 UTC
29 points
6 comments3 min readLW link

Is this a weak pivotal act: cre­at­ing nanobots that eat evil AGIs (but noth­ing else)?

Christopher King10 Feb 2023 19:26 UTC
0 points
3 comments1 min readLW link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC
33 points
2 comments1 min readLW link

Near-Term Risks of an Obe­di­ent Ar­tifi­cial Intelligence

ymeskhout18 Feb 2023 18:30 UTC
20 points
1 comment6 min readLW link

Should we cry “wolf”?

Tapatakt18 Feb 2023 11:24 UTC
24 points
5 comments1 min readLW link

The pub­lic sup­ports reg­u­lat­ing AI for safety

Zach Stein-Perlman17 Feb 2023 4:10 UTC
114 points
9 comments1 min readLW link
(aiimpacts.org)

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC
10 points
2 comments18 min readLW link

Bing find­ing ways to by­pass Microsoft’s filters with­out be­ing asked. Is it re­pro­ducible?

Christopher King20 Feb 2023 15:11 UTC
27 points
15 comments1 min readLW link

De­cep­tive Align­ment is <1% Likely by Default

DavidW21 Feb 2023 15:09 UTC
90 points
29 comments14 min readLW link

An­nounc­ing aisafety.training

JJ Hepburn21 Jan 2023 1:01 UTC
61 points
4 comments1 min readLW link

Is there a ML agent that aban­dons it’s util­ity func­tion out-of-dis­tri­bu­tion with­out los­ing ca­pa­bil­ities?

Christopher King22 Feb 2023 16:49 UTC
1 point
7 comments1 min readLW link

Au­to­mated Sand­wich­ing & Quan­tify­ing Hu­man-LLM Co­op­er­a­tion: ScaleOver­sight hackathon results

23 Feb 2023 10:48 UTC
8 points
0 comments6 min readLW link

Re­search pro­posal: Lev­er­ag­ing Jun­gian archetypes to cre­ate val­ues-based models

MiguelDev5 Mar 2023 17:39 UTC
5 points
2 comments2 min readLW link

Ret­ro­spec­tive on the 2022 Con­jec­ture AI Discussions

Andrea_Miotti24 Feb 2023 22:41 UTC
90 points
5 comments2 min readLW link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

24 Feb 2023 23:03 UTC
61 points
7 comments47 min readLW link

[Question] Pink Shog­goths: What does al­ign­ment look like in prac­tice?

Yuli_Ban25 Feb 2023 12:23 UTC
25 points
13 comments11 min readLW link

[Question] Would more model evals teams be good?

Ryan Kidd25 Feb 2023 22:01 UTC
20 points
4 comments1 min readLW link

Cu­ri­os­ity as a Solu­tion to AGI Alignment

Harsha G.26 Feb 2023 23:36 UTC
7 points
7 comments3 min readLW link

Ta­boo “hu­man-level in­tel­li­gence”

Sherrinford26 Feb 2023 20:42 UTC
12 points
7 comments1 min readLW link

The idea of an “al­igned su­per­in­tel­li­gence” seems misguided

ssadler27 Feb 2023 11:19 UTC
6 points
7 comments3 min readLW link
(ssadler.substack.com)

Tran­script: Yud­kowsky on Ban­kless fol­low-up Q&A

vonk28 Feb 2023 3:46 UTC
54 points
40 comments22 min readLW link

The bur­den of knowing

arisAlexis28 Feb 2023 18:40 UTC
5 points
0 comments2 min readLW link

Call for Cruxes by Rhyme, a Longter­mist His­tory Consultancy

Lara1 Mar 2023 18:39 UTC
1 point
0 comments3 min readLW link
(forum.effectivealtruism.org)

Reflec­tion Mechanisms as an Align­ment Tar­get—At­ti­tudes on “near-term” AI

2 Mar 2023 4:29 UTC
21 points
0 comments8 min readLW link

Con­scious­ness is ir­rele­vant—in­stead solve al­ign­ment by ask­ing this question

Oliver Siegel4 Mar 2023 22:06 UTC
−10 points
6 comments1 min readLW link

Why kill ev­ery­one?

arisAlexis5 Mar 2023 11:53 UTC
7 points
5 comments2 min readLW link

Is it time to talk about AI dooms­day prep­ping yet?

bokov5 Mar 2023 21:17 UTC
0 points
8 comments1 min readLW link

Who Aligns the Align­ment Re­searchers?

Ben Smith5 Mar 2023 23:22 UTC
48 points
0 comments11 min readLW link

Cap Model Size for AI Safety

research_prime_space6 Mar 2023 1:11 UTC
0 points
4 comments1 min readLW link

In­tro­duc­ing AI Align­ment Inc., a Cal­ifor­nia pub­lic benefit cor­po­ra­tion...

TherapistAI7 Mar 2023 18:47 UTC
1 point
4 comments1 min readLW link

[Question] What‘s in your list of un­solved prob­lems in AI al­ign­ment?

jacquesthibs7 Mar 2023 18:58 UTC
60 points
9 comments1 min readLW link

Pod­cast Tran­script: Daniela and Dario Amodei on Anthropic

remember7 Mar 2023 16:47 UTC
46 points
2 comments79 min readLW link
(futureoflife.org)

Why Un­con­trol­lable AI Looks More Likely Than Ever

8 Mar 2023 15:41 UTC
18 points
0 comments4 min readLW link
(time.com)

Speed run­ning ev­ery­one through the bad al­ign­ment bingo. $5k bounty for a LW con­ver­sa­tional agent

ArthurB9 Mar 2023 9:26 UTC
140 points
33 comments2 min readLW link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC
17 points
1 comment22 min readLW link
(www.anthropic.com)

Value drift threat models

Garrett Baker12 May 2023 23:03 UTC
27 points
4 comments5 min readLW link

Every­thing’s nor­mal un­til it’s not

Eleni Angelou10 Mar 2023 2:02 UTC
7 points
0 comments3 min readLW link

Is AI Safety drop­ping the ball on pri­vacy?

markov13 Sep 2023 13:07 UTC
50 points
17 comments7 min readLW link

The hu­man­ity’s biggest mistake

RomanS10 Mar 2023 16:30 UTC
0 points
1 comment2 min readLW link

On tak­ing AI risk se­ri­ously

Eleni Angelou13 Mar 2023 5:50 UTC
6 points
0 comments1 min readLW link
(www.nytimes.com)

We don’t trade with ants

KatjaGrace10 Jan 2023 23:50 UTC
269 points
109 comments7 min readLW link1 review
(worldspiritsockpuppet.com)

Linkpost: A tale of 2.5 or­thog­o­nal­ity theses

DavidW13 Mar 2023 14:19 UTC
9 points
3 comments1 min readLW link
(forum.effectivealtruism.org)

Linkpost: A Con­tra AI FOOM Read­ing List

DavidW13 Mar 2023 14:45 UTC
25 points
4 comments1 min readLW link
(magnusvinding.com)

Linkpost: ‘Dis­solv­ing’ AI Risk – Pa­ram­e­ter Uncer­tainty in AI Fu­ture Forecasting

DavidW13 Mar 2023 16:52 UTC
6 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Could Roko’s basilisk acausally bar­gain with a pa­per­clip max­i­mizer?

Christopher King13 Mar 2023 18:21 UTC
1 point
8 comments1 min readLW link

A bet­ter anal­ogy and ex­am­ple for teach­ing AI takeover: the ML Inferno

Christopher King14 Mar 2023 19:14 UTC
18 points
0 comments5 min readLW link

New eco­nomic sys­tem for AI era

ksme sho17 Mar 2023 17:42 UTC
−1 points
1 comment5 min readLW link

Sur­vey on in­ter­me­di­ate goals in AI governance

17 Mar 2023 13:12 UTC
25 points
3 comments1 min readLW link

(re­tired ar­ti­cle) AGI With In­ter­net Ac­cess: Why we won’t stuff the ge­nie back in its bot­tle.

Max TK18 Mar 2023 3:43 UTC
5 points
10 comments4 min readLW link

An Ap­peal to AI Su­per­in­tel­li­gence: Rea­sons Not to Pre­serve (most of) Humanity

Alex Beyman22 Mar 2023 4:09 UTC
−15 points
6 comments19 min readLW link

Hu­man­ity’s Lack of Unity Will Lead to AGI Catastrophe

MiguelDev19 Mar 2023 19:18 UTC
3 points
2 comments4 min readLW link

[Question] Wouldn’t an in­tel­li­gent agent keep us al­ive and help us al­ign it­self to our val­ues in or­der to pre­vent risk ? by Risk I mean ex­per­i­men­ta­tion by try­ing to al­ign po­ten­tially smarter repli­cas?

Terrence Rotoufle21 Mar 2023 17:44 UTC
−3 points
1 comment2 min readLW link

[Question] What does pul­ling the fire alarm look like?

nem20 Mar 2023 21:45 UTC
2 points
0 comments1 min readLW link

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahoj3 Dec 2022 20:32 UTC
1 point
8 comments1 min readLW link

Biose­cu­rity and AI: Risks and Opportunities

Steve Newman27 Feb 2024 18:45 UTC
11 points
1 comment7 min readLW link
(www.safe.ai)

Is Align­ment enough?

gchu16 Aug 2024 11:46 UTC
1 point
0 comments1 min readLW link

The con­ver­gent dy­namic we missed

Remmelt12 Dec 2023 23:19 UTC
2 points
2 comments1 min readLW link

Cor­po­rate Gover­nance for Fron­tier AI Labs: A Re­search Agenda

Matthew Wearden28 Feb 2024 11:29 UTC
4 points
0 comments16 min readLW link
(matthewwearden.co.uk)

AI Safety pro­posal—In­fluenc­ing the su­per­in­tel­li­gence explosion

Morgan22 May 2024 23:31 UTC
0 points
2 comments7 min readLW link

Post se­ries on “Li­a­bil­ity Law for re­duc­ing Ex­is­ten­tial Risk from AI”

Nora_Ammann29 Feb 2024 4:39 UTC
42 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

In­cre­men­tal AI Risks from Proxy-Simulations

kmenou19 Dec 2023 18:56 UTC
2 points
0 comments1 min readLW link
(individual.utoronto.ca)

Mo­ral­ity as Co­op­er­a­tion Part I: Humans

DeLesley Hutchins5 Dec 2024 8:16 UTC
5 points
0 comments19 min readLW link

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments1 min readLW link

What should AI safety be try­ing to achieve?

EuanMcLean23 May 2024 11:17 UTC
16 points
0 comments13 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part II: The­ory and Experiment

DeLesley Hutchins5 Dec 2024 9:04 UTC
2 points
0 comments17 min readLW link

Mo­ral­ity as Co­op­er­a­tion Part III: Failure Modes

DeLesley Hutchins5 Dec 2024 9:39 UTC
4 points
0 comments20 min readLW link

Big Pic­ture AI Safety: Introduction

EuanMcLean23 May 2024 11:15 UTC
46 points
7 comments5 min readLW link

“De­stroy hu­man­ity” as an im­me­di­ate subgoal

Seth Ahrenbach22 Dec 2023 18:52 UTC
3 points
13 comments3 min readLW link

AI safety ad­vo­cates should con­sider pro­vid­ing gen­tle push­back fol­low­ing the events at OpenAI

civilsociety22 Dec 2023 18:55 UTC
16 points
5 comments3 min readLW link

What will the first hu­man-level AI look like, and how might things go wrong?

EuanMcLean23 May 2024 11:17 UTC
20 points
2 comments15 min readLW link

What mis­takes has the AI safety move­ment made?

EuanMcLean23 May 2024 11:19 UTC
63 points
29 comments12 min readLW link

More Thoughts on the Hu­man-AGI War

Seth Ahrenbach27 Dec 2023 1:03 UTC
−3 points
4 comments7 min readLW link

Reflect­ing on the tran­shu­man­ist re­but­tal to AI ex­is­ten­tial risk and cri­tique of our de­bate method­olo­gies and mi­suse of statistics

catgirlsruletheworld20 Aug 2024 1:59 UTC
−5 points
0 comments4 min readLW link

Some­thing Is Lost When AI Makes Art

utilistrutil18 Aug 2024 22:53 UTC
16 points
1 comment11 min readLW link

Plan­ning to build a cryp­to­graphic box with perfect secrecy

Lysandre Terrisse31 Dec 2023 9:31 UTC
40 points
6 comments11 min readLW link

In­ves­ti­gat­ing Alter­na­tive Fu­tures: Hu­man and Su­per­in­tel­li­gence In­ter­ac­tion Scenarios

Hiroshi Yamakawa3 Jan 2024 23:46 UTC
1 point
0 comments17 min readLW link

Does AI care about re­al­ity or just its own per­cep­tion?

RedFishBlueFish5 Jan 2024 4:05 UTC
−6 points
8 comments1 min readLW link

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC
11 points
0 comments2 min readLW link
(www.youtube.com)

AI de­mands un­prece­dented reliability

Jono9 Jan 2024 16:30 UTC
22 points
5 comments2 min readLW link

Sur­vey of 2,778 AI au­thors: six parts in pictures

KatjaGrace6 Jan 2024 4:43 UTC
80 points
1 comment2 min readLW link

[Question] What’s the pro­to­col for if a novice has ML ideas that are un­likely to work, but might im­prove ca­pa­bil­ities if they do work?

drocta9 Jan 2024 22:51 UTC
6 points
2 comments2 min readLW link

Two Tales of AI Takeover: My Doubts

Violet Hour5 Mar 2024 15:51 UTC
30 points
8 comments29 min readLW link

Disprov­ing and par­tially fix­ing a fully ho­mo­mor­phic en­cryp­tion scheme with perfect secrecy

Lysandre Terrisse26 May 2024 14:56 UTC
16 points
1 comment18 min readLW link

The Un­der­re­ac­tion to OpenAI

Sherrinford18 Jan 2024 22:08 UTC
21 points
0 comments6 min readLW link

Brain­storm­ing: Slow Takeoff

David Piepgrass23 Jan 2024 6:58 UTC
2 points
0 comments51 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC
1 point
0 comments1 min readLW link

An­nounc­ing Con­ver­gence Anal­y­sis: An In­sti­tute for AI Sce­nario & Gover­nance Research

7 Mar 2024 21:37 UTC
23 points
1 comment4 min readLW link

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

StellaAthena25 Jan 2024 19:17 UTC
94 points
14 comments1 min readLW link
(www.rand.org)

What Failure Looks Like is not an ex­is­ten­tial risk (and al­ign­ment is not the solu­tion)

otto.barten2 Feb 2024 18:59 UTC
13 points
12 comments9 min readLW link

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

2 Feb 2024 23:17 UTC
98 points
0 comments9 min readLW link

Why I think it’s net harm­ful to do tech­ni­cal safety re­search at AGI labs

Remmelt7 Feb 2024 4:17 UTC
26 points
24 comments1 min readLW link

Sce­nario plan­ning for AI x-risk

Corin Katzke10 Feb 2024 0:14 UTC
24 points
12 comments14 min readLW link
(forum.effectivealtruism.org)

Carl Shul­man On Dwarkesh Pod­cast June 2023

Moonicker11 Feb 2024 21:02 UTC
18 points
0 comments159 min readLW link

Tort Law Can Play an Im­por­tant Role in Miti­gat­ing AI Risk

Gabriel Weil12 Feb 2024 17:17 UTC
38 points
9 comments5 min readLW link

The Astro­nom­i­cal Sacri­fice Dilemma

Matthew McRedmond11 Mar 2024 19:58 UTC
15 points
3 comments4 min readLW link

AI In­ci­dent Re­port­ing: A Reg­u­la­tory Review

11 Mar 2024 21:03 UTC
16 points
0 comments6 min readLW link

Dar­wi­nian Traps and Ex­is­ten­tial Risks

KristianRonn25 Aug 2024 22:37 UTC
80 points
14 comments10 min readLW link

Thoughts on the Fea­si­bil­ity of Pro­saic AGI Align­ment?

iamthouthouarti21 Aug 2020 23:25 UTC
8 points
10 comments1 min readLW link

The Shut­down Prob­lem: In­com­plete Prefer­ences as a Solution

EJT23 Feb 2024 16:01 UTC
52 points
29 comments42 min readLW link

Con­trol­ling AGI Risk

TeaSea15 Mar 2024 4:56 UTC
6 points
8 comments4 min readLW link

AI-gen­er­ated opi­oids could be a catas­trophic risk

ejk6420 Mar 2024 17:48 UTC
0 points
2 comments3 min readLW link

Static vs Dy­namic Alignment

Gracie Green21 Mar 2024 17:44 UTC
5 points
0 comments29 min readLW link

Timelines to Trans­for­ma­tive AI: an investigation

Zershaaneh Qureshi26 Mar 2024 18:28 UTC
20 points
2 comments50 min readLW link

Open-ended ethics of phe­nom­ena (a desider­ata with uni­ver­sal moral­ity)

Ryo 8 Nov 2023 20:10 UTC
1 point
0 comments8 min readLW link

Ar­tifi­cial In­tel­li­gence and Liv­ing Wisdom

TMFOW29 Mar 2024 7:41 UTC
−6 points
1 comment17 min readLW link
(tmfow.substack.com)

[Question] Will OpenAI also re­quire a “Su­per Red Team Agent” for its “Su­per­al­ign­ment” Pro­ject?

Super AGI30 Mar 2024 5:25 UTC
2 points
2 comments1 min readLW link

Death with Awesomeness

osmarks1 Apr 2024 20:24 UTC
3 points
2 comments2 min readLW link

Thou­sands of mal­i­cious ac­tors on the fu­ture of AI misuse

1 Apr 2024 10:08 UTC
37 points
0 comments1 min readLW link

A Semiotic Cri­tique of the Orthog­o­nal­ity Thesis

Nicolas Villarreal4 Jun 2024 18:52 UTC
3 points
10 comments15 min readLW link

$250K in Prizes: SafeBench Com­pe­ti­tion An­nounce­ment

ozhang3 Apr 2024 22:07 UTC
26 points
0 comments1 min readLW link

[Question] How does the ever-in­creas­ing use of AI in the mil­i­tary for the di­rect pur­pose of mur­der­ing peo­ple af­fect your p(doom)?

Justausername6 Apr 2024 6:31 UTC
19 points
16 comments1 min readLW link

Why em­piri­cists should be­lieve in AI risk

Knight Lee11 Dec 2024 3:51 UTC
5 points
0 comments1 min readLW link

[Question] fake al­ign­ment solu­tions????

KvmanThinking11 Dec 2024 3:31 UTC
1 point
6 comments1 min readLW link

[Question] Can sin­gu­lar­ity emerge from trans­form­ers?

MP8 Apr 2024 14:26 UTC
−3 points
1 comment1 min readLW link

An­nounc­ing At­las Computing

miyazono11 Apr 2024 15:56 UTC
44 points
4 comments4 min readLW link

Ap­ply to the Pivotal Re­search Fel­low­ship (AI Safety & Biose­cu­rity)

10 Apr 2024 12:08 UTC
18 points
0 comments1 min readLW link

I cre­ated an Asi Align­ment Tier List

TimeGoat21 Apr 2024 18:44 UTC
−6 points
0 comments1 min readLW link

Take­off speeds pre­sen­ta­tion at Anthropic

Tom Davidson4 Jun 2024 22:46 UTC
92 points
0 comments25 min readLW link

LLMs seem (rel­a­tively) safe

JustisMills25 Apr 2024 22:13 UTC
53 points
24 comments7 min readLW link
(justismills.substack.com)

The po­ten­tial Ai risk

The White Death7 May 2024 20:31 UTC
1 point
0 comments1 min readLW link

Is There a Power Play Over­hang?

crispweed8 May 2024 17:39 UTC
3 points
0 comments1 min readLW link
(upcoder.com)

An­nounc­ing the AI Safety Sum­mit Talks with Yoshua Bengio

otto.barten14 May 2024 12:52 UTC
9 points
1 comment1 min readLW link

MIT Fu­tureTech are hiring for an Oper­a­tions and Pro­ject Man­age­ment role.

peterslattery17 May 2024 23:21 UTC
2 points
0 comments3 min readLW link

AI 2030 – AI Policy Roadmap

LTM17 May 2024 23:29 UTC
8 points
0 comments1 min readLW link

Giv­ing away your predictions

[Error communicating with LW2 server]8 Sep 2024 13:11 UTC
1 point
0 comments1 min readLW link

La­bor Par­ti­ci­pa­tion is a High-Pri­or­ity AI Align­ment Risk

alex17 Jun 2024 18:09 UTC
4 points
0 comments17 min readLW link

Propos­ing the Post-Sin­gu­lar­ity Sym­biotic Researches

Hiroshi Yamakawa20 Jun 2024 4:05 UTC
5 points
0 comments12 min readLW link

Con­nect­ing the Dots: LLMs can In­fer & Ver­bal­ize La­tent Struc­ture from Train­ing Data

21 Jun 2024 15:54 UTC
160 points
13 comments8 min readLW link
(arxiv.org)

Grey Goo Re­quires AI

harsimony15 Jan 2021 4:45 UTC
8 points
11 comments4 min readLW link
(harsimony.wordpress.com)

A nec­es­sary Mem­brane for­mal­ism feature

ThomasCederborg10 Sep 2024 21:33 UTC
20 points
6 comments11 min readLW link

Ted Kaczy­in­ski proves in­stru­men­tal con­ver­gence?

xXAlphaSigmaXx28 Jun 2024 3:50 UTC
0 points
0 comments1 min readLW link

Anal­y­sis of key AI analogies

Kevin Kohler29 Jun 2024 10:55 UTC
10 points
2 comments15 min readLW link

Re­view of METR’s pub­lic eval­u­a­tion protocol

30 Jun 2024 22:03 UTC
10 points
0 comments5 min readLW link

Towards shut­down­able agents via stochas­tic choice

8 Jul 2024 10:14 UTC
59 points
12 comments23 min readLW link
(arxiv.org)

[Question] Self-cen­sor­ing on AI x-risk dis­cus­sions?

Decaeneus1 Jul 2024 18:24 UTC
17 points
2 comments1 min readLW link

I read ev­ery ma­jor AI lab’s safety plan so you don’t have to

sarahhw16 Dec 2024 18:51 UTC
20 points
0 comments12 min readLW link
(longerramblings.substack.com)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

31 Oct 2024 12:01 UTC
192 points
52 comments2 min readLW link
(www.thecompendium.ai)

AIS Hun­gary is hiring a part-time Tech­ni­cal Lead! (Dead­line: Dec 31st)

gergogaspar17 Dec 2024 14:12 UTC
1 point
0 comments2 min readLW link

Yoshua Ben­gio: Rea­son­ing through ar­gu­ments against tak­ing AI safety seriously

Judd Rosenblatt11 Jul 2024 23:53 UTC
70 points
3 comments1 min readLW link
(yoshuabengio.org)

Align­ment: “Do what I would have wanted you to do”

Oleg Trott12 Jul 2024 16:47 UTC
11 points
48 comments1 min readLW link

A Solu­tion for AGI/​ASI Safety

Weibing Wang18 Dec 2024 19:44 UTC
48 points
21 comments1 min readLW link

The AI al­ign­ment prob­lem in so­cio-tech­ni­cal sys­tems from a com­pu­ta­tional per­spec­tive: A Top-Down-Top view and outlook

zhaoweizhang15 Jul 2024 18:56 UTC
3 points
0 comments9 min readLW link

Re­cur­sion in AI is scary. But let’s talk solu­tions.

Oleg Trott16 Jul 2024 20:34 UTC
3 points
10 comments2 min readLW link

[Question] Would a scope-in­sen­si­tive AGI be less likely to in­ca­pac­i­tate hu­man­ity?

Jim Buhler21 Jul 2024 14:15 UTC
2 points
3 comments1 min readLW link

The $100B plan with “70% risk of kil­ling us all” w Stephen Fry [video]

Oleg Trott21 Jul 2024 20:06 UTC
34 points
8 comments1 min readLW link
(www.youtube.com)

How rea­son­able is tak­ing ex­tinc­tion risk?

FVelde23 Jul 2024 18:05 UTC
2 points
4 comments4 min readLW link

Knowl­edge Base 8: The truth as an at­trac­tor in the in­for­ma­tion space

iwis25 Apr 2024 15:28 UTC
−8 points
0 comments2 min readLW link

AI ex­is­ten­tial risk prob­a­bil­ities are too un­re­li­able to in­form policy

Oleg Trott28 Jul 2024 0:59 UTC
18 points
5 comments1 min readLW link
(www.aisnakeoil.com)

[Question] Has Eliezer pub­li­cly and satis­fac­to­rily re­sponded to at­tempted re­but­tals of the anal­ogy to evolu­tion?

kaler28 Jul 2024 12:23 UTC
10 points
14 comments1 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

30 Jul 2024 16:22 UTC
193 points
43 comments12 min readLW link

The need for multi-agent experiments

Martín Soto1 Aug 2024 17:14 UTC
43 points
3 comments9 min readLW link

Eth­i­cal De­cep­tion: Should AI Ever Lie?

Jason Reid2 Aug 2024 17:53 UTC
5 points
2 comments7 min readLW link

LLMs stifle cre­ativity, elimi­nate op­por­tu­ni­ties for serendipi­tous dis­cov­ery and dis­rupt in­ter­gen­er­a­tional trans­fer of wisdom

Ghdz5 Aug 2024 18:27 UTC
6 points
2 comments7 min readLW link

[Question] Is an AI re­li­gion jus­tified?

p4rziv4l6 Aug 2024 15:42 UTC
−35 points
11 comments1 min readLW link

Case Story: Lack of Con­sumer Pro­tec­tion Pro­ce­dures AI Ma­nipu­la­tion and the Threat of Fund Con­cen­tra­tion in Crypto Seek­ing As­sis­tance to Fund a Civil Case to Estab­lish Facts and Pro­tect Vuln­er­a­ble Con­sumers from Da­m­age Caused by Au­to­mated Sys­tems

Petr Andreev8 Aug 2024 5:55 UTC
−9 points
0 comments9 min readLW link

[Question] Does VETLM solve AI su­per­al­ign­ment?

Oleg Trott8 Aug 2024 18:22 UTC
−1 points
10 comments1 min readLW link

The Con­scious­ness Co­nun­drum: Why We Can’t Dis­miss Ma­chine Sentience

SystematicApproach13 Aug 2024 18:01 UTC
−21 points
1 comment3 min readLW link

Cri­tique of ‘Many Peo­ple Fear A.I. They Shouldn’t’ by David Brooks.

Axel Ahlqvist15 Aug 2024 18:38 UTC
12 points
8 comments3 min readLW link

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

25 Sep 2024 14:52 UTC
31 points
2 comments4 min readLW link
(arxiv.org)

The Over­looked Ne­ces­sity of Com­plete Se­man­tic Rep­re­sen­ta­tion in AI Safety and Alignment

williamsae15 Aug 2024 19:42 UTC
−1 points
0 comments3 min readLW link

AGI’s Op­pos­ing Force

SimonBaars16 Aug 2024 4:18 UTC
9 points
2 comments1 min readLW link

An “Ob­ser­va­tory” For a Shy Su­per AI?

Sherrinford27 Sep 2024 21:22 UTC
5 points
0 comments1 min readLW link
(robreid.substack.com)

Knowl­edge Base 1: Could it in­crease in­tel­li­gence and make it safer?

iwis30 Sep 2024 16:00 UTC
−4 points
0 comments4 min readLW link

Does nat­u­ral se­lec­tion fa­vor AIs over hu­mans?

cdkg3 Oct 2024 18:47 UTC
20 points
1 comment1 min readLW link
(link.springer.com)

[Question] Seek­ing AI Align­ment Tu­tor/​Ad­vi­sor: $100–150/​hr

MrThink5 Oct 2024 21:28 UTC
26 points
3 comments2 min readLW link

Limits of safe and al­igned AI

Shivam8 Oct 2024 21:30 UTC
2 points
0 comments4 min readLW link

Pal­isade is hiring: Exec As­sis­tant, Con­tent Lead, Ops Lead, and Policy Lead

Charlie Rogers-Smith9 Oct 2024 0:04 UTC
11 points
0 comments4 min readLW link

Ge­offrey Hin­ton on the Past, Pre­sent, and Fu­ture of AI

Stephen McAleese12 Oct 2024 16:41 UTC
22 points
5 comments18 min readLW link

Fac­tor­ing P(doom) into a bayesian network

Joseph Gardi17 Oct 2024 17:55 UTC
1 point
0 comments1 min readLW link

Lenses of Control

WillPetillo22 Oct 2024 7:51 UTC
14 points
0 comments9 min readLW link

Miles Brundage re­signed from OpenAI, and his AGI readi­ness team was disbanded

garrison23 Oct 2024 23:40 UTC
118 points
1 comment7 min readLW link
(garrisonlovely.substack.com)

Dario Amodei’s “Machines of Lov­ing Grace” sound in­cred­ibly dan­ger­ous, for Humans

Super AGI27 Oct 2024 5:05 UTC
8 points
1 comment1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC
46 points
8 comments4 min readLW link

Nav­i­gat­ing pub­lic AI x-risk hype while pur­su­ing tech­ni­cal solutions

Dan Braun19 Feb 2023 12:22 UTC
18 points
0 comments2 min readLW link

What AI safety re­searchers can learn from Ma­hatma Gandhi

Lysandre Terrisse8 Nov 2024 19:49 UTC
−6 points
0 comments3 min readLW link

Thoughts af­ter the Wolfram and Yud­kowsky discussion

Tahp14 Nov 2024 1:43 UTC
25 points
13 comments6 min readLW link

Con­fronting the le­gion of doom.

Spiritus Dei13 Nov 2024 17:03 UTC
−20 points
3 comments5 min readLW link

Propos­ing the Con­di­tional AI Safety Treaty (linkpost TIME)

otto.barten15 Nov 2024 13:59 UTC
10 points
8 comments3 min readLW link
(time.com)

[Question] What (if any­thing) made your p(doom) go down in 2024?

Satron16 Nov 2024 16:46 UTC
4 points
6 comments1 min readLW link

Truth Ter­mi­nal: A re­con­struc­tion of events

17 Nov 2024 23:51 UTC
1 point
1 comment7 min readLW link

[Question] Have we seen any “ReLU in­stead of sig­moid-type im­prove­ments” recently

KvmanThinking23 Nov 2024 3:51 UTC
2 points
4 comments1 min readLW link

A bet­ter “State­ment on AI Risk?”

Knight Lee25 Nov 2024 4:50 UTC
4 points
4 comments3 min readLW link

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC
1 point
0 comments9 min readLW link

Tak­ing Away the Guns First: The Fun­da­men­tal Flaw in AI Development

s-ice26 Nov 2024 22:11 UTC
1 point
0 comments17 min readLW link

Hope to live or fear to die?

Knight Lee27 Nov 2024 10:42 UTC
3 points
0 comments1 min readLW link

How to solve the mi­suse prob­lem as­sum­ing that in 10 years the de­fault sce­nario is that AGI agents are ca­pa­ble of syn­thetiz­ing pathogens

jeremtti27 Nov 2024 21:17 UTC
6 points
0 comments9 min readLW link

Ca­pa­bil­ities De­nial: The Danger of Un­der­es­ti­mat­ing AI

Christopher King21 Mar 2023 1:24 UTC
6 points
5 comments3 min readLW link

Ex­plor­ing the Pre­cau­tion­ary Prin­ci­ple in AI Devel­op­ment: His­tor­i­cal Analo­gies and Les­sons Learned

Christopher King21 Mar 2023 3:53 UTC
−1 points
2 comments9 min readLW link

[Question] Em­ployer con­sid­er­ing part­ner­ing with ma­jor AI labs. What to do?

GraduallyMoreAgitated21 Mar 2023 17:43 UTC
37 points
7 comments2 min readLW link

Key Ques­tions for Digi­tal Minds

Jacy Reese Anthis22 Mar 2023 17:13 UTC
22 points
0 comments7 min readLW link
(www.sentienceinstitute.org)

Why We MUST Create an AGI that Disem­pow­ers Hu­man­ity. For Real.

twkaiser22 Mar 2023 23:01 UTC
−17 points
1 comment4 min readLW link

ChatGPT’s “fuzzy al­ign­ment” isn’t ev­i­dence of AGI al­ign­ment: the ba­nana test

Michael Tontchev23 Mar 2023 7:12 UTC
23 points
6 comments4 min readLW link

Limit in­tel­li­gent weapons

Lucas Pfeifer23 Mar 2023 17:54 UTC
−11 points
36 comments1 min readLW link

GPT-4 al­ign­ing with aca­sual de­ci­sion the­ory when in­structed to play games, but in­cludes a CDT ex­pla­na­tion that’s in­cor­rect if they differ

Christopher King23 Mar 2023 16:16 UTC
7 points
4 comments8 min readLW link

Grind­ing slimes in the dun­geon of AI al­ign­ment research

Max H24 Mar 2023 4:51 UTC
10 points
2 comments4 min readLW link

Does GPT-4 ex­hibit agency when sum­ma­riz­ing ar­ti­cles?

Christopher King24 Mar 2023 15:49 UTC
16 points
2 comments5 min readLW link

More ex­per­i­ments in GPT-4 agency: writ­ing memos

Christopher King24 Mar 2023 17:51 UTC
5 points
2 comments10 min readLW link

ChatGPT Plu­g­ins—The Begin­ning of the End

Bary Levy25 Mar 2023 11:45 UTC
15 points
4 comments1 min readLW link

[Question] How Poli­tics in­ter­acts with AI ?

qbolec26 Mar 2023 9:53 UTC
−18 points
4 comments1 min readLW link

What can we learn from Lex Frid­man’s in­ter­view with Sam Alt­man?

Karl von Wendt27 Mar 2023 6:27 UTC
56 points
22 comments9 min readLW link

Half-baked al­ign­ment idea

ozb28 Mar 2023 17:47 UTC
6 points
27 comments1 min readLW link

Adapt­ing to Change: Over­com­ing Chronos­ta­sis in AI Lan­guage Models

RationalMindset28 Mar 2023 14:32 UTC
−1 points
0 comments6 min readLW link

I had a chat with GPT-4 on the fu­ture of AI and AI safety

Kristian Freed28 Mar 2023 17:47 UTC
1 point
0 comments8 min readLW link

“Un­in­ten­tional AI safety re­search”: Why not sys­tem­at­i­cally mine AI tech­ni­cal re­search for safety pur­poses?

Jemal Young29 Mar 2023 15:56 UTC
27 points
3 comments6 min readLW link

I made AI Risk Propaganda

monkymind29 Mar 2023 14:26 UTC
−3 points
0 comments1 min readLW link

“Sorcerer’s Ap­pren­tice” from Fan­ta­sia as an anal­ogy for alignment

awg29 Mar 2023 18:21 UTC
9 points
4 comments1 min readLW link
(video.disney.com)

Paus­ing AI Devel­op­ments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibs29 Mar 2023 23:16 UTC
298 points
297 comments3 min readLW link
(time.com)

Align­ment—Path to AI as ally, not slave nor foe

ozb30 Mar 2023 14:54 UTC
10 points
3 comments2 min readLW link

Wi­den­ing Over­ton Win­dow—Open Thread

Prometheus31 Mar 2023 10:03 UTC
23 points
8 comments1 min readLW link

Why Yud­kowsky Is Wrong And What He Does Can Be More Dangerous

idontagreewiththat6 Jun 2023 17:59 UTC
−38 points
4 comments3 min readLW link

GPT-4 busted? Clear self-in­ter­est when sum­ma­riz­ing ar­ti­cles about it­self vs when ar­ti­cle talks about Claude, LLaMA, or DALL·E 2

Christopher King31 Mar 2023 17:05 UTC
6 points
4 comments4 min readLW link

Imag­ine a world where Microsoft em­ploy­ees used Bing

Christopher King31 Mar 2023 18:36 UTC
6 points
2 comments2 min readLW link

Is AGI suici­dal­ity the golden ray of hope?

Alex Kirko4 Apr 2023 23:29 UTC
−18 points
4 comments1 min readLW link

AI com­mu­nity build­ing: EliezerKart

Christopher King1 Apr 2023 15:25 UTC
45 points
0 comments2 min readLW link

Pes­simism about AI Safety

2 Apr 2023 7:43 UTC
4 points
1 comment25 min readLW link

AI Safety via Luck

Jozdien1 Apr 2023 20:13 UTC
81 points
7 comments11 min readLW link

The AI gov­er­nance gaps in de­vel­op­ing countries

ntran17 Jun 2023 2:50 UTC
20 points
1 comment14 min readLW link

How to safely use an optimizer

Simon Fischer28 Mar 2024 16:11 UTC
47 points
21 comments7 min readLW link

Ex­plor­ing non-an­thro­pocen­tric as­pects of AI ex­is­ten­tial safety

mishka3 Apr 2023 18:07 UTC
8 points
0 comments3 min readLW link

Steer­ing systems

Max H4 Apr 2023 0:56 UTC
50 points
1 comment15 min readLW link

ICA Simulacra

Ozyrus5 Apr 2023 6:41 UTC
26 points
2 comments7 min readLW link

Against sac­ri­fic­ing AI trans­parency for gen­er­al­ity gains

Ape in the coat7 May 2023 6:52 UTC
4 points
0 comments2 min readLW link

Su­per­in­tel­li­gence will out­smart us or it isn’t superintelligence

Neil 3 Apr 2023 15:01 UTC
−4 points
4 comments1 min readLW link

Do we have a plan for the “first crit­i­cal try” prob­lem?

Christopher King3 Apr 2023 16:27 UTC
−3 points
14 comments1 min readLW link

Towards em­pa­thy in RL agents and be­yond: In­sights from cog­ni­tive sci­ence for AI Align­ment

Marc Carauleanu3 Apr 2023 19:59 UTC
15 points
6 comments1 min readLW link
(clipchamp.com)

[Question] Does it be­come eas­ier, or harder, for the world to co­or­di­nate around not build­ing AGI as time goes on?

Eli Tyre29 Jul 2019 22:59 UTC
86 points
31 comments3 min readLW link2 reviews

Strate­gies to Prevent AI Annihilation

lastchanceformankind4 Apr 2023 8:59 UTC
−2 points
0 comments4 min readLW link

AGI de­ploy­ment as an act of aggression

dr_s5 Apr 2023 6:39 UTC
27 points
29 comments13 min readLW link

[Question] Daisy-chain­ing ep­silon-step verifiers

Decaeneus6 Apr 2023 2:07 UTC
2 points
1 comment1 min readLW link

One Does Not Sim­ply Re­place the Hu­mans

JerkyTreats6 Apr 2023 20:56 UTC
9 points
3 comments4 min readLW link
(www.lesswrong.com)

OpenAI: Our ap­proach to AI safety

Jacob G-W5 Apr 2023 20:26 UTC
1 point
1 comment1 min readLW link
(openai.com)

Willi­ams-Beuren Syn­drome: Frendly Mutations

Takk5 Apr 2023 20:59 UTC
−1 points
1 comment1 min readLW link

Yoshua Ben­gio: “Slow­ing down de­vel­op­ment of AI sys­tems pass­ing the Tur­ing test”

Roman Leventov6 Apr 2023 3:31 UTC
49 points
2 comments5 min readLW link
(yoshuabengio.org)

A decade of lurk­ing, a month of posting

Max H9 Apr 2023 0:21 UTC
70 points
4 comments5 min readLW link

Align­ment of Au­toGPT agents

Ozyrus12 Apr 2023 12:54 UTC
14 points
1 comment4 min readLW link

Why I’m not wor­ried about im­mi­nent doom

Ariel Kwiatkowski10 Apr 2023 15:31 UTC
7 points
2 comments4 min readLW link

Mea­sur­ing ar­tifi­cial in­tel­li­gence on hu­man bench­marks is naive

Anomalous11 Apr 2023 11:34 UTC
11 points
4 comments1 min readLW link
(forum.effectivealtruism.org)

In fa­vor of ac­cel­er­at­ing prob­lems you’re try­ing to solve

Christopher King11 Apr 2023 18:15 UTC
2 points
2 comments4 min readLW link

AI Risk US Pres­i­den­tial Candidate

Simon Berens11 Apr 2023 19:31 UTC
5 points
3 comments1 min readLW link

Open-source LLMs may prove Bostrom’s vuln­er­a­ble world hypothesis

Roope Ahvenharju15 Apr 2023 19:16 UTC
1 point
1 comment1 min readLW link

Ar­tifi­cial In­tel­li­gence as exit strat­egy from the age of acute ex­is­ten­tial risk

Arturo Macias12 Apr 2023 14:48 UTC
−7 points
15 comments7 min readLW link

AGI goal space is big, but nar­row­ing might not be as hard as it seems.

Jacy Reese Anthis12 Apr 2023 19:03 UTC
15 points
0 comments3 min readLW link

Pol­lut­ing the agen­tic commons

hamandcheese13 Apr 2023 17:42 UTC
7 points
4 comments2 min readLW link
(www.secondbest.ca)

The Virus—Short Story

Michael Soareverix13 Apr 2023 18:18 UTC
4 points
0 comments4 min readLW link

On the pos­si­bil­ity of im­pos­si­bil­ity of AGI Long-Term Safety

Roman Yen13 May 2023 18:38 UTC
6 points
3 comments9 min readLW link

Spec­u­la­tion on map­ping the moral land­scape for fu­ture Ai Alignment

Sven Heinz (Welwordion)16 Apr 2023 13:43 UTC
1 point
0 comments1 min readLW link

On ur­gency, pri­or­ity and col­lec­tive re­ac­tion to AI-Risks: Part I

Denreik16 Apr 2023 19:14 UTC
−10 points
15 comments5 min readLW link

AGI Clinics: A Safe Haven for Hu­man­ity’s First En­coun­ters with Superintelligence

portr.17 Apr 2023 1:52 UTC
−5 points
1 comment1 min readLW link

Defin­ing Boundaries on Out­comes

Takk7 Jun 2023 17:41 UTC
1 point
0 comments1 min readLW link

No, re­ally, it pre­dicts next to­kens.

simon18 Apr 2023 3:47 UTC
58 points
55 comments3 min readLW link

Pre­dic­tion: any un­con­trol­lable AI will turn earth into a gi­ant computer

Karl von Wendt17 Apr 2023 12:30 UTC
11 points
8 comments3 min readLW link

What is your timelines for ADI (ar­tifi­cial dis­em­pow­er­ing in­tel­li­gence)?

Christopher King17 Apr 2023 17:01 UTC
3 points
3 comments2 min readLW link

Green goo is plausible

anithite18 Apr 2023 0:04 UTC
61 points
31 comments4 min readLW link1 review

World and Mind in Ar­tifi­cial In­tel­li­gence: ar­gu­ments against the AI pause

Arturo Macias18 Apr 2023 14:40 UTC
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

18 Apr 2023 18:44 UTC
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

I Believe I Know Why AI Models Hallucinate

Richard Aragon19 Apr 2023 21:07 UTC
−10 points
6 comments7 min readLW link
(turingssolutions.com)

[Cross­post] Or­ga­niz­ing a de­bate with ex­perts and MPs to raise AI xrisk aware­ness: a pos­si­ble blueprint

otto.barten19 Apr 2023 11:45 UTC
8 points
0 comments4 min readLW link
(forum.effectivealtruism.org)

How to ex­press this sys­tem for eth­i­cally al­igned AGI as a Math­e­mat­i­cal for­mula?

Oliver Siegel19 Apr 2023 20:13 UTC
−1 points
0 comments1 min readLW link

[Question] Is there any liter­a­ture on us­ing so­cial­iza­tion for AI al­ign­ment?

Nathan112319 Apr 2023 22:16 UTC
10 points
9 comments2 min readLW link

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

Ozyrus20 Apr 2023 6:04 UTC
11 points
3 comments1 min readLW link
(github.com)

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher King20 Apr 2023 19:57 UTC
2 points
7 comments3 min readLW link

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

JakubK21 Apr 2023 10:07 UTC
13 points
0 comments5 min readLW link
(sohl-dickstein.github.io)

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarks22 Apr 2023 14:36 UTC
39 points
7 comments5 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dov23 Apr 2023 5:34 UTC
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Paths to failure

25 Apr 2023 8:03 UTC
29 points
1 comment8 min readLW link

A con­cise sum-up of the ba­sic ar­gu­ment for AI doom

Mergimio H. Doefevmil24 Apr 2023 17:37 UTC
11 points
6 comments2 min readLW link

A re­sponse to Con­jec­ture’s CoEm proposal

Kristian Freed24 Apr 2023 17:23 UTC
7 points
0 comments4 min readLW link

A Pro­posal for AI Align­ment: Us­ing Directly Op­pos­ing Models

Arne B27 Apr 2023 18:05 UTC
0 points
5 comments3 min readLW link

Mak­ing Nanobots isn’t a one-shot pro­cess, even for an ar­tifi­cial superintelligance

dankrad25 Apr 2023 0:39 UTC
20 points
13 comments6 min readLW link

My Assess­ment of the Chi­nese AI Safety Community

Lao Mein25 Apr 2023 4:21 UTC
250 points
94 comments3 min readLW link

Briefly how I’ve up­dated since ChatGPT

rime25 Apr 2023 14:47 UTC
48 points
2 comments2 min readLW link

Free­dom Is All We Need

Leo Glisic27 Apr 2023 0:09 UTC
−1 points
8 comments10 min readLW link

Hal­loween Problem

Saint Blasphemer24 Oct 2023 16:46 UTC
−10 points
1 comment1 min readLW link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

otto.barten24 Oct 2023 10:11 UTC
17 points
1 comment1 min readLW link

[Question] What if AGI had its own uni­verse to maybe wreck?

mseale26 Oct 2023 17:49 UTC
−1 points
2 comments1 min readLW link

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_c25 Oct 2023 23:46 UTC
122 points
35 comments22 min readLW link1 review
(www.navigatingrisks.ai)

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal Naran26 Oct 2023 17:54 UTC
−7 points
2 comments2 min readLW link

[un­ti­tled post]

NeuralSystem_e5e127 Apr 2023 17:37 UTC
3 points
0 comments1 min readLW link

Re­sponse to “Co­or­di­nated paus­ing: An eval­u­a­tion-based co­or­di­na­tion scheme for fron­tier AI de­vel­op­ers”

Matthew Wearden30 Oct 2023 17:27 UTC
5 points
2 comments6 min readLW link
(matthewwearden.co.uk)

Char­bel-Raphaël and Lu­cius dis­cuss interpretability

30 Oct 2023 5:50 UTC
110 points
7 comments21 min readLW link

An In­ter­na­tional Man­hat­tan Pro­ject for Ar­tifi­cial Intelligence

Glenn Clayton27 Apr 2023 17:34 UTC
−11 points
2 comments5 min readLW link

Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC
−19 points
11 comments2 min readLW link

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbance2 Nov 2023 17:43 UTC
83 points
89 comments5 min readLW link

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
275 points
42 comments3 min readLW link

AI as Su­per-Demagogue

RationalDino5 Nov 2023 21:21 UTC
0 points
11 comments9 min readLW link

Sym­biotic self-al­ign­ment of AIs.

Spiritus Dei7 Nov 2023 17:18 UTC
1 point
0 comments3 min readLW link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
36 points
2 comments2 min readLW link
(arxiv.org)

The So­cial Align­ment Problem

irving28 Apr 2023 14:16 UTC
98 points
13 comments8 min readLW link

Do you want a first-prin­ci­pled pre­pared­ness guide to pre­pare your­self and loved ones for po­ten­tial catas­tro­phes?

Ulrik Horn14 Nov 2023 12:13 UTC
16 points
5 comments15 min readLW link

[Question] Real­is­tic near-fu­ture sce­nar­ios of AI doom un­der­stand­able for non-techy peo­ple?

RomanS28 Apr 2023 14:45 UTC
4 points
4 comments1 min readLW link

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir Kumar16 Nov 2023 2:02 UTC
1 point
0 comments1 min readLW link

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Joerg Weiss16 Nov 2023 18:46 UTC
11 points
2 comments29 min readLW link

On ex­clud­ing dan­ger­ous in­for­ma­tion from training

ShayBenMoshe17 Nov 2023 11:14 UTC
23 points
5 comments3 min readLW link

Killswitch

Junio18 Nov 2023 22:53 UTC
2 points
0 comments3 min readLW link

Ilya: The AI sci­en­tist shap­ing the world

David Varga20 Nov 2023 13:09 UTC
11 points
0 comments4 min readLW link

A Guide to Fore­cast­ing AI Science Ca­pa­bil­ities

Eleni Angelou29 Apr 2023 23:24 UTC
6 points
1 comment4 min readLW link

The two para­graph ar­gu­ment for AI risk

CronoDAS25 Nov 2023 2:01 UTC
19 points
8 comments1 min readLW link

AISC 2024 - Pro­ject Summaries

NickyP27 Nov 2023 22:32 UTC
48 points
3 comments18 min readLW link

Re­think Pri­ori­ties: Seek­ing Ex­pres­sions of In­ter­est for Spe­cial Pro­jects Next Year

kierangreig29 Nov 2023 13:59 UTC
4 points
0 comments5 min readLW link

Sup­port me in a Week-Long Pick­et­ing Cam­paign Near OpenAI’s HQ: Seek­ing Sup­port and Ideas from the LessWrong Community

Percy30 Apr 2023 17:48 UTC
−21 points
15 comments1 min readLW link

Thoughts on “AI is easy to con­trol” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC
197 points
64 comments14 min readLW link1 review

The benefits and risks of op­ti­mism (about AI safety)

Karl von Wendt3 Dec 2023 12:45 UTC
−7 points
6 comments5 min readLW link

A call for a quan­ti­ta­tive re­port card for AI bioter­ror­ism threat models

Juno4 Dec 2023 6:35 UTC
12 points
0 comments10 min readLW link

[Question] Ac­cu­racy of ar­gu­ments that are seen as ridicu­lous and in­tu­itively false but don’t have good counter-arguments

Christopher King29 Apr 2023 23:58 UTC
30 points
39 comments1 min readLW link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

3 Aug 2024 10:16 UTC
8 points
0 comments14 min readLW link
(www.oliversourbut.net)

Call for sub­mis­sions: Choice of Fu­tures sur­vey questions

c.trout30 Apr 2023 6:59 UTC
4 points
0 comments2 min readLW link
(airtable.com)

Ac­cess to AI: a hu­man right?

dmtea25 Jul 2020 9:38 UTC
5 points
3 comments2 min readLW link

Agen­tic Lan­guage Model Memes

FactorialCode1 Aug 2020 18:03 UTC
16 points
1 comment2 min readLW link

Con­ver­sa­tion with Paul Christiano

abergal11 Sep 2019 23:20 UTC
44 points
6 comments30 min readLW link
(aiimpacts.org)

Tran­scrip­tion of Eliezer’s Jan­uary 2010 video Q&A

curiousepic14 Nov 2011 17:02 UTC
112 points
9 comments56 min readLW link

Re­sponses to Catas­trophic AGI Risk: A Survey

lukeprog8 Jul 2013 14:33 UTC
17 points
8 comments1 min readLW link

How can I re­duce ex­is­ten­tial risk from AI?

lukeprog13 Nov 2012 21:56 UTC
63 points
92 comments8 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)6 Feb 2019 19:09 UTC
25 points
17 comments1 min readLW link

Refram­ing mis­al­igned AGI’s: well-in­ten­tioned non-neu­rotyp­i­cal assistants

zhukeepa1 Apr 2018 1:22 UTC
46 points
14 comments2 min readLW link

When is un­al­igned AI morally valuable?

paulfchristiano25 May 2018 1:57 UTC
81 points
53 comments10 min readLW link

In­tro­duc­ing the AI Align­ment Fo­rum (FAQ)

29 Oct 2018 21:07 UTC
87 points
8 comments6 min readLW link

Swim­ming Up­stream: A Case Study in In­stru­men­tal Rationality

TurnTrout3 Jun 2018 3:16 UTC
76 points
7 comments8 min readLW link

Cur­rent AI Safety Roles for Soft­ware Engineers

ozziegooen9 Nov 2018 20:57 UTC
70 points
9 comments4 min readLW link

[Question] Why is so much dis­cus­sion hap­pen­ing in pri­vate Google Docs?

Wei Dai12 Jan 2019 2:19 UTC
101 points
22 comments1 min readLW link

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei Dai17 Aug 2019 17:38 UTC
78 points
14 comments2 min readLW link

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei Dai16 Dec 2018 22:13 UTC
102 points
25 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

Soon: a weekly AI Safety pre­req­ui­sites mod­ule on LessWrong

null30 Apr 2018 13:23 UTC
35 points
10 comments1 min readLW link

And the AI would have got away with it too, if...

Stuart_Armstrong22 May 2019 21:35 UTC
75 points
7 comments1 min readLW link

2017 AI Safety Liter­a­ture Re­view and Char­ity Com­par­i­son

Larks24 Dec 2017 18:52 UTC
41 points
5 comments23 min readLW link

Should ethi­cists be in­side or out­side a pro­fes­sion?

Eliezer Yudkowsky12 Dec 2018 1:40 UTC
97 points
7 comments9 min readLW link

I Vouch For MIRI

Zvi17 Dec 2017 17:50 UTC
39 points
9 comments5 min readLW link
(thezvi.wordpress.com)

Be­ware of black boxes in AI al­ign­ment research

cousin_it18 Jan 2018 15:07 UTC
39 points
10 comments1 min readLW link

AI Align­ment Prize: Round 2 due March 31, 2018

Zvi12 Mar 2018 12:10 UTC
28 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Three AI Safety Re­lated Ideas

Wei Dai13 Dec 2018 21:32 UTC
69 points
38 comments2 min readLW link

A rant against robots

Lê Nguyên Hoang14 Jan 2020 22:03 UTC
65 points
7 comments5 min readLW link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

Alex Flint31 Mar 2018 18:37 UTC
30 points
3 comments11 min readLW link

Course recom­men­da­tions for Friendli­ness researchers

Louie9 Jan 2013 14:33 UTC
96 points
112 comments10 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson2 Feb 2018 4:25 UTC
29 points
11 comments8 min readLW link

AI Sum­mer Fel­lows Program

colm21 Mar 2018 15:32 UTC
21 points
0 comments1 min readLW link

The ge­nie knows, but doesn’t care

Rob Bensinger6 Sep 2013 6:42 UTC
119 points
495 comments8 min readLW link

[Question] Does agency nec­es­sar­ily im­ply self-preser­va­tion in­stinct?

Mislav Jurić1 May 2023 16:06 UTC
5 points
8 comments1 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin Shah2 Jul 2018 16:10 UTC
70 points
12 comments8 min readLW link
(mailchi.mp)

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël Trazzi1 Jul 2019 15:26 UTC
63 points
16 comments5 min readLW link

The sim­ple pic­ture on AI safety

Alex Flint27 May 2018 19:43 UTC
31 points
10 comments2 min readLW link

Shah (Deep­Mind) and Leahy (Con­jec­ture) Dis­cuss Align­ment Cruxes

1 May 2023 16:47 UTC
96 points
10 comments30 min readLW link

Elon Musk donates $10M to the Fu­ture of Life In­sti­tute to keep AI benefi­cial

Paul Crowley15 Jan 2015 16:33 UTC
78 points
52 comments1 min readLW link

Strate­gic im­pli­ca­tions of AIs’ abil­ity to co­or­di­nate at low cost, for ex­am­ple by merging

Wei Dai25 Apr 2019 5:08 UTC
69 points
46 comments2 min readLW link1 review

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

Ramana Kumar21 Jun 2019 12:50 UTC
43 points
6 comments1 min readLW link
(arxiv.org)

Henry Kiss­inger: AI Could Mean the End of Hu­man History

ESRogs15 May 2018 20:11 UTC
17 points
12 comments1 min readLW link
(www.theatlantic.com)

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_Armstrong10 Oct 2017 11:06 UTC
23 points
8 comments1 min readLW link

A Vi­su­al­iza­tion of Nick Bostrom’s Superintelligence

[deleted]23 Jul 2014 0:24 UTC
62 points
28 comments3 min readLW link

AI Align­ment Re­search Overview (by Ja­cob Stein­hardt)

Ben Pace6 Nov 2019 19:24 UTC
44 points
0 comments7 min readLW link
(docs.google.com)

A gen­eral model of safety-ori­ented AI development

Wei Dai11 Jun 2018 21:00 UTC
65 points
8 comments1 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei Dai10 Sep 2019 8:29 UTC
52 points
26 comments3 min readLW link

AI Safety Newslet­ter #4: AI and Cy­ber­se­cu­rity, Per­sua­sive AIs, Weaponiza­tion, and Ge­offrey Hin­ton talks AI risks

2 May 2023 18:41 UTC
32 points
0 comments5 min readLW link
(newsletter.safe.ai)

Avert­ing Catas­tro­phe: De­ci­sion The­ory for COVID-19, Cli­mate Change, and Po­ten­tial Disasters of All Kinds

JakubK2 May 2023 22:50 UTC
10 points
0 comments1 min readLW link

Siren wor­lds and the per­ils of over-op­ti­mised search

Stuart_Armstrong7 Apr 2014 11:00 UTC
83 points
418 comments7 min readLW link

Reg­u­late or Com­pete? The China Fac­tor in U.S. AI Policy (NAIR #2)

charles_m5 May 2023 17:43 UTC
2 points
1 comment7 min readLW link
(navigatingairisks.substack.com)

But What If We Ac­tu­ally Want To Max­i­mize Paper­clips?

snerx25 May 2023 7:13 UTC
−17 points
6 comments7 min readLW link

Top 9+2 myths about AI risk

Stuart_Armstrong29 Jun 2015 20:41 UTC
68 points
44 comments2 min readLW link

For­mal­iz­ing the “AI x-risk is un­likely be­cause it is ridicu­lous” argument

Christopher King3 May 2023 18:56 UTC
48 points
17 comments3 min readLW link

Ro­hin Shah on rea­sons for AI optimism

abergal31 Oct 2019 12:10 UTC
40 points
58 comments1 min readLW link
(aiimpacts.org)

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC
38 points
25 comments3 min readLW link

We don’t need AGI for an amaz­ing future

Karl von Wendt4 May 2023 12:10 UTC
18 points
32 comments5 min readLW link

[Question] Why not use ac­tive SETI to pre­vent AI Doom?

RomanS5 May 2023 14:41 UTC
13 points
13 comments1 min readLW link

CHAT Di­plo­macy: LLMs and Na­tional Security

SebastianG 5 May 2023 19:45 UTC
25 points
6 comments7 min readLW link

The Mag­ni­tude of His Own Folly

Eliezer Yudkowsky30 Sep 2008 11:31 UTC
117 points
128 comments6 min readLW link

AI al­ign­ment landscape

paulfchristiano13 Oct 2019 2:10 UTC
40 points
3 comments1 min readLW link
(ai-alignment.com)

Launched: Friend­ship is Optimal

iceman15 Nov 2012 4:57 UTC
77 points
32 comments1 min readLW link

Friend­ship is Op­ti­mal: A My Lit­tle Pony fan­fic about an op­ti­miza­tion process

iceman8 Sep 2012 6:16 UTC
110 points
152 comments1 min readLW link

Is “red” for GPT-4 the same as “red” for you?

Yusuke Hayashi6 May 2023 17:55 UTC
9 points
6 comments2 min readLW link

Oh, Think of the Bananas

Jeffs1 Jun 2023 6:46 UTC
3 points
0 comments2 min readLW link

Do Earths with slower eco­nomic growth have a bet­ter chance at FAI?

Eliezer Yudkowsky12 Jun 2013 19:54 UTC
59 points
175 comments4 min readLW link

TED talk by Eliezer Yud­kowsky: Un­leash­ing the Power of Ar­tifi­cial Intelligence

bayesed7 May 2023 5:45 UTC
49 points
36 comments1 min readLW link
(www.youtube.com)

An­no­tated re­ply to Ben­gio’s “AI Scien­tists: Safe and Use­ful AI?”

Roman Leventov8 May 2023 21:26 UTC
18 points
2 comments7 min readLW link
(yoshuabengio.org)
No comments.