AI Risk

TagLast edit: Nov 2, 2022, 8:27 PM by brook

AI Risk is analysis of the risks associated with building powerful AI systems.

Related: AI, Orthogonality thesis, Complexity of value, Goodhart’s law, Paperclip maximiser

Superintelligence FAQ

Scott AlexanderSep 20, 2016, 7:00 PM

137 points

39 comments27 min readLW link

What failure looks like

paulfchristianoMar 17, 2019, 8:18 PM

432 points

55 comments8 min readLW link 2 reviews

AGI Ruin: A List of Lethalities

Eliezer YudkowskyJun 5, 2022, 10:05 PM

936 points

708 comments30 min readLW link 3 reviews

Specification gaming examples in AI

VikaApr 3, 2018, 12:30 PM

47 points

9 comments1 min readLW link 2 reviews

An artificially structured argument for expecting AGI ruin

Rob BensingerMay 7, 2023, 9:52 PM

91 points

26 comments19 min readLW link

Where I agree and disagree with Eliezer

paulfchristianoJun 19, 2022, 7:15 PM

899 points

223 comments18 min readLW link 2 reviews

Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger and Eliezer Yudkowsky

Nov 11, 2021, 3:01 AM

328 points

253 comments34 min readLW link 1 review

“Corrigibility at some small length” by dath ilan

Christopher KingApr 5, 2023, 1:47 AM

32 points

3 comments9 min readLW link

(www.glowfic.com)

Intuitions about goal-directed behavior

Rohin ShahDec 1, 2018, 4:25 AM

54 points

15 comments6 min readLW link

On how various plans miss the hard bits of the alignment challenge

So8resJul 12, 2022, 2:49 AM

313 points

89 comments29 min readLW link 3 reviews

AGI in sight: our look at the game board

Andrea_Miotti and Gabriel Alfour

Feb 18, 2023, 10:17 PM

227 points

135 comments6 min readLW link

(andreamiotti.substack.com)

Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI

Maris SalaMay 22, 2023, 2:31 PM

155 points

5 comments3 min readLW link

(www.conjecture.dev)

Epistemological Framing for AI Alignment Research

adamShimiMar 8, 2021, 10:05 PM

58 points

7 comments9 min readLW link

Open Problems in AI X-Risk [PAIS #5]

Dan H and TW123

Jun 10, 2022, 2:08 AM

61 points

6 comments36 min readLW link

What can the principal-agent literature tell us about AI risk?

apcFeb 8, 2020, 9:28 PM

104 points

29 comments16 min readLW link

The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda

Cameron Berg, Judd Rosenblatt, AE Studio and Marc Carauleanu

Dec 18, 2023, 8:35 PM

175 points

22 comments12 min readLW link 1 review

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)

Jacy Reese AnthisNov 22, 2022, 4:50 PM

93 points

64 comments1 min readLW link

(www.science.org)

MIRI announces new “Death With Dignity” strategy

Eliezer YudkowskyApr 2, 2022, 12:43 AM

360 points

546 comments18 min readLW link 1 review

A transcript of the TED talk by Eliezer Yudkowsky

Mikhail SaminJul 12, 2023, 12:12 PM

105 points

13 comments4 min readLW link

A Gym Gridworld Environment for the Treacherous Turn

Michaël TrazziJul 28, 2018, 9:27 PM

74 points

9 comments3 min readLW link

(github.com)

Developmental Stages of GPTs

orthonormalJul 26, 2020, 10:03 PM

140 points

72 comments7 min readLW link 1 review

A Path out of Insufficient Views

UnrealSep 24, 2024, 8:00 PM

40 points

55 comments9 min readLW link

Another (outer) alignment failure story

paulfchristianoApr 7, 2021, 8:12 PM

247 points

38 comments12 min readLW link 1 review

Bing chat is the AI fire alarm

RatiosFeb 17, 2023, 6:51 AM

115 points

63 comments3 min readLW link

Robin Hanson’s latest AI risk position statement

LironMar 3, 2023, 2:25 PM

55 points

18 comments1 min readLW link

(www.overcomingbias.com)

Interpreting the Learning of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM

30 points

14 comments9 min readLW link

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak and benedelman

Nov 22, 2022, 6:57 PM

134 points

97 comments24 min readLW link

Stampy’s AI Safety Info—New Distillations #1 [March 2023]

markovApr 7, 2023, 11:06 AM

42 points

0 comments2 min readLW link

(aisafety.info)

Don’t Share Information Exfohazardous on Others’ AI-Risk Models

Thane RuthenisDec 19, 2023, 8:09 PM

68 points

11 comments1 min readLW link

[Question] Will OpenAI’s work unintentionally increase existential risks related to AI?

adamShimiAug 11, 2020, 6:16 PM

53 points

55 comments1 min readLW link

Should we postpone AGI until we reach safety?

otto.bartenNov 18, 2020, 3:43 PM

27 points

36 comments3 min readLW link

A Bear Case: My Predictions Regarding AI Progress

Thane RuthenisMar 5, 2025, 4:41 PM

352 points

155 comments9 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofskyJun 9, 2022, 3:50 PM

170 points

42 comments17 min readLW link

(www.cold-takes.com)

Don’t accelerate problems you’re trying to solve

Andrea_Miotti and remember

Feb 15, 2023, 6:11 PM

100 points

27 comments4 min readLW link

[Question] How likely are scenarios where AGI ends up overtly or de facto torturing us? How likely are scenarios where AGI prevents us from committing suicide or dying?

JohnGreerMar 28, 2023, 6:00 PM

11 points

4 comments1 min readLW link

The Hidden Complexity of Wishes

Eliezer YudkowskyNov 24, 2007, 12:12 AM

178 points

199 comments8 min readLW link

Architects of Our Own Demise: We Should Stop Developing AI Carelessly

RokoOct 26, 2023, 12:36 AM

170 points

75 comments3 min readLW link

On Solving Problems Before They Appear: The Weird Epistemologies of Alignment

adamShimiOct 11, 2021, 8:20 AM

110 points

10 comments15 min readLW link

Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell?

Karl von WendtJun 25, 2023, 4:59 PM

106 points

53 comments7 min readLW link

Are minimal circuits deceptive?

evhubSep 7, 2019, 6:11 PM

78 points

11 comments8 min readLW link

Soft takeoff can still lead to decisive strategic advantage

Daniel KokotajloAug 23, 2019, 4:39 PM

122 points

47 comments8 min readLW link 4 reviews

Truthful LMs as a warm-up for aligned AGI

Jacob_HiltonJan 17, 2022, 4:49 PM

65 points

14 comments13 min readLW link

DL towards the unaligned Recursive Self-Optimization attractor

jacob_cannellDec 18, 2021, 2:15 AM

32 points

22 comments4 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

Jul 30, 2024, 4:22 PM

215 points

49 comments12 min readLW link

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

James_MillerMar 18, 2023, 4:22 PM

41 points

73 comments12 min readLW link

Alexander and Yudkowsky on AGI goals

Scott Alexander and Eliezer Yudkowsky

Jan 24, 2023, 9:09 PM

178 points

53 comments26 min readLW link 1 review

“Endgame safety” for AGI

Steven ByrnesJan 24, 2023, 2:15 PM

85 points

10 comments6 min readLW link

Being at peace with Doom

Johannes C. MayerApr 9, 2023, 2:53 PM

23 points

14 comments4 min readLW link 1 review

Announcing Apollo Research

Marius Hobbhahn, beren, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni and Jérémy Scheurer

May 30, 2023, 4:17 PM

217 points

11 comments8 min readLW link

Devil’s Advocate: Adverse Selection Against Conscientiousness

lionhearted (Sebastian Marshall)May 28, 2023, 5:53 PM

10 points

2 comments1 min readLW link

A challenge for AGI organizations, and a challenge for readers

Rob Bensinger and Eliezer Yudkowsky

Dec 1, 2022, 11:11 PM

302 points

33 comments2 min readLW link

Intent alignment should not be the goal for AGI x-risk reduction

John NayOct 26, 2022, 1:24 AM

1 point

10 comments3 min readLW link

Request to AGI organizations: Share your views on pausing AI progress

Orpheus16 and simeon_c

Apr 11, 2023, 5:30 PM

141 points

11 comments1 min readLW link

Counterarguments to the basic AI x-risk case

KatjaGraceOct 14, 2022, 1:00 PM

371 points

124 comments34 min readLW link 1 review

(aiimpacts.org)

My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

Quintin PopeMar 21, 2023, 12:06 AM

359 points

233 comments39 min readLW link 1 review

How good is humanity at coordination?

BuckJul 21, 2020, 8:01 PM

82 points

44 comments3 min readLW link

World-Model Interpretability Is All We Need

Thane RuthenisJan 14, 2023, 7:37 PM

36 points

22 comments21 min readLW link

Talk to me about your summer/career plans

Orpheus16Jan 31, 2023, 6:29 PM

31 points

3 comments2 min readLW link

My thoughts on OpenAI’s alignment plan

Orpheus16Dec 30, 2022, 7:33 PM

55 points

3 comments20 min readLW link

A Rocket–Interpretability Analogy

plexOct 21, 2024, 1:55 PM

155 points

31 comments1 min readLW link

Evil autocomplete: Existential Risk and Next-Token Predictors

YitzFeb 28, 2023, 8:47 AM

9 points

3 comments5 min readLW link

Towards the Operationalization of Philosophy & Wisdom

Thane RuthenisOct 28, 2024, 7:45 PM

20 points

2 comments33 min readLW link

(aiimpacts.org)

Racing through a minefield: the AI deployment problem

HoldenKarnofskyDec 22, 2022, 4:10 PM

38 points

2 comments13 min readLW link

(www.cold-takes.com)

The other side of the tidal wave

KatjaGraceNov 3, 2023, 5:40 AM

189 points

86 comments1 min readLW link

(worldspiritsockpuppet.com)

(4 min read) An intuitive explanation of the AI influence situation

trevorJan 13, 2024, 5:34 PM

12 points

26 comments4 min readLW link

[Linkpost] Biden-Harris Executive Order on AI

berenOct 30, 2023, 3:20 PM

3 points

0 comments1 min readLW link

[Question] First and Last Questions for GPT-5*

Mitchell_PorterNov 24, 2023, 5:03 AM

15 points

5 comments1 min readLW link

[Linkpost] Scott Alexander reacts to OpenAI’s latest post

Orpheus16Mar 11, 2023, 10:24 PM

27 points

0 comments5 min readLW link

(astralcodexten.substack.com)

“Carefully Bootstrapped Alignment” is organizationally hard

RaemonMar 17, 2023, 6:00 PM

262 points

23 comments11 min readLW link 1 review

“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity

Thane RuthenisDec 16, 2023, 8:08 PM

190 points

34 comments5 min readLW link

AI Safety Seems Hard to Measure

HoldenKarnofskyDec 8, 2022, 7:50 PM

71 points

6 comments14 min readLW link

(www.cold-takes.com)

Incentives and Selection: A Missing Frame From AI Threat Discussions?

DragonGodFeb 26, 2023, 1:18 AM

11 points

16 comments2 min readLW link

AI #1: Sydney and Bing

ZviFeb 21, 2023, 2:00 PM

171 points

45 comments61 min readLW link 1 review

(thezvi.wordpress.com)

Contra Hanson on AI Risk

LironMar 4, 2023, 8:02 AM

36 points

23 comments8 min readLW link

Against a General Factor of Doom

Jeffrey HeningerNov 23, 2022, 4:50 PM

61 points

19 comments4 min readLW link 1 review

(aiimpacts.org)

Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Orpheus16Dec 20, 2022, 9:39 PM

18 points

2 comments11 min readLW link

How evals might (or might not) prevent catastrophic risks from AI

Orpheus16Feb 7, 2023, 8:16 PM

45 points

0 comments9 min readLW link

Against ubiquitous alignment taxes

berenMar 6, 2023, 7:50 PM

57 points

10 comments2 min readLW link

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Robert MilesNov 1, 2022, 11:23 PM

68 points

105 comments2 min readLW link

How Josiah became an AI safety researcher

Neil CrawfordSep 6, 2022, 5:17 PM

4 points

0 comments1 min readLW link

AI Safety Microgrant Round

Chris_LeongNov 14, 2022, 4:25 AM

22 points

1 comment1 min readLW link

Comparing Four Approaches to Inner Alignment

Lucas TeixeiraJul 29, 2022, 9:06 PM

38 points

1 comment9 min readLW link

Are we there yet?

theflowerpotJun 20, 2022, 11:19 AM

2 points

2 comments1 min readLW link

Survey: What (de)motivates you about AI risk?

Daniel_FriedrichAug 3, 2022, 7:17 PM

1 point

0 comments1 min readLW link

(forms.gle)

Poster Session on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM

7 points

8 comments1 min readLW link

Questions about Conjecure’s CoEm proposal

Orpheus16 and NicholasKees

Mar 9, 2023, 7:32 PM

51 points

4 comments2 min readLW link

A Narrow Path: a plan to deal with AI extinction risk

Andrea_Miotti, davekasten and Tolga

Oct 7, 2024, 1:02 PM

73 points

12 comments2 min readLW link

(www.narrowpath.co)

Perform Tractable Research While Avoiding Capabilities Externalities [Pragmatic AI Safety #4]

Dan H and TW123

May 30, 2022, 8:25 PM

51 points

3 comments25 min readLW link

Why I’m Worried About AI

peterbarnettMay 23, 2022, 9:13 PM

22 points

2 comments12 min readLW link

[Question] Would (myopic) general public good producers significantly accelerate the development of AGI?

mako yassMar 2, 2022, 11:47 PM

25 points

10 comments1 min readLW link

Another plausible scenario of AI risk: AI builds military infrastructure while collaborating with humans, defects later.

avturchinJun 10, 2022, 5:24 PM

10 points

2 comments1 min readLW link

[Question] Has there been any work on attempting to use Pascal’s Mugging to make an AGI behave?

Chris_LeongJun 15, 2022, 8:33 AM

7 points

17 comments1 min readLW link

Confusions in My Model of AI Risk

peterbarnettJul 7, 2022, 1:05 AM

22 points

9 comments5 min readLW link

Robustness to Scaling Down: More Important Than I Thought

adamShimiJul 23, 2022, 11:40 AM

38 points

5 comments3 min readLW link

Thoughts on ‘List of Lethalities’

Alex Lawsen Aug 17, 2022, 6:33 PM

27 points

0 comments10 min readLW link

Beyond Hyperanthropomorphism

PointlessOneAug 21, 2022, 5:55 PM

3 points

17 comments1 min readLW link

(studio.ribbonfarm.com)

Announcing AISIC 2022 - the AI Safety Israel Conference, October 19-20

DavidmanheimSep 21, 2022, 7:32 PM

13 points

0 comments1 min readLW link

Learning societal values from law as part of an AGI alignment strategy

John NayOct 21, 2022, 2:03 AM

5 points

18 comments54 min readLW link

Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas

Orpheus16Nov 25, 2022, 8:47 PM

37 points

2 comments9 min readLW link

Thoughts on refusing harmful requests to large language models

William_SJan 19, 2023, 7:49 PM

32 points

4 comments2 min readLW link

Many AI governance proposals have a tradeoff between usefulness and feasibility

Orpheus16 and Carson Ezell

Feb 3, 2023, 6:49 PM

22 points

2 comments2 min readLW link

4 ways to think about democratizing AI [GovAI Linkpost]

Orpheus16Feb 13, 2023, 6:06 PM

24 points

4 comments1 min readLW link

(www.governance.ai)

Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky

bayesedFeb 20, 2023, 4:42 PM

83 points

54 comments1 min readLW link

(www.youtube.com)

How Would an Utopia-Maximizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM

32 points

23 comments10 min readLW link

AI #2

ZviMar 2, 2023, 2:50 PM

66 points

18 comments55 min readLW link

(thezvi.wordpress.com)

We’re Not Ready: thoughts on “pausing” and responsible scaling policies

HoldenKarnofskyOct 27, 2023, 3:19 PM

200 points

33 comments8 min readLW link

ChatGPT (and now GPT4) is very easily distracted from its rules

dmcsMar 15, 2023, 5:55 PM

180 points

42 comments1 min readLW link

But exactly how complex and fragile?

KatjaGraceNov 3, 2019, 6:20 PM

87 points

32 comments3 min readLW link 1 review

(meteuphoric.com)

Staged release

Zach Stein-PerlmanApr 17, 2024, 4:00 PM

11 points

4 comments2 min readLW link

Stuxnet, not Skynet: Humanity’s disempowerment by AI

RokoNov 4, 2023, 10:23 PM

107 points

24 comments6 min readLW link

Mistral Large 2 (123B) exhibits alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana and AE Studio

Mar 27, 2025, 3:39 PM

80 points

4 comments13 min readLW link

[Question] Should people build productizations of open source AI models?

lcNov 2, 2023, 1:26 AM

23 points

0 comments1 min readLW link

Quote quiz: “drifting into dependence”

jasoncrawfordApr 27, 2023, 3:13 PM

7 points

6 comments1 min readLW link

(rootsofprogress.org)

An AI risk argument that resonates with NYTimes readers

Julian BradshawMar 12, 2023, 11:09 PM

212 points

14 comments1 min readLW link

RA Bounty: Looking for feedback on screenplay about AI Risk

WriterOct 26, 2023, 1:23 PM

32 points

6 comments1 min readLW link

The Preference Fulfillment Hypothesis

Kaj_SotalaFeb 26, 2023, 10:55 AM

66 points

62 comments11 min readLW link

Sensor Exposure can Compromise the Human Brain in the 2020s

trevorOct 26, 2023, 3:31 AM

17 points

6 comments10 min readLW link

Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous)

Orpheus16Apr 25, 2023, 6:49 PM

27 points

11 comments3 min readLW link

(childrenoficarus.substack.com)

A (EtA: quick) note on terminology: AI Alignment != AI x-safety

David Scott Krueger (formerly: capybaralet)Feb 8, 2023, 10:33 PM

46 points

20 comments1 min readLW link

The first future and the best future

KatjaGraceApr 25, 2024, 6:40 AM

106 points

12 comments1 min readLW link

(worldspiritsockpuppet.com)

ea.domains—Domains Free to a Good Home

plexJan 12, 2023, 1:32 PM

24 points

0 comments1 min readLW link

Thoughts on AGI safety from the top

jylin04Feb 2, 2022, 8:06 PM

36 points

3 comments32 min readLW link

AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo

DanielFilanMar 31, 2022, 5:20 AM

25 points

1 comment48 min readLW link

Confused why a “capabilities research is good for alignment progress” position isn’t discussed more

Kaj_SotalaJun 2, 2022, 9:41 PM

129 points

27 comments4 min readLW link

My Overview of the AI Alignment Landscape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM

53 points

3 comments28 min readLW link

Applications for AI Safety Camp 2022 Now Open!

adamShimiNov 17, 2021, 9:42 PM

47 points

3 comments1 min readLW link

AI Fire Alarm Scenarios

PeterMcCluskeyDec 28, 2021, 2:20 AM

10 points

1 comment6 min readLW link

(www.bayesianinvestor.com)

Epistemic Strategies of Safety-Capabilities Tradeoffs

adamShimiOct 22, 2021, 8:22 AM

5 points

0 comments6 min readLW link

[Book Review] “The Alignment Problem” by Brian Christian

lsusrSep 20, 2021, 6:36 AM

72 points

16 comments6 min readLW link

Drug addicts and deceptively aligned agents—a comparative analysis

JanNov 5, 2021, 9:42 PM

42 points

2 comments12 min readLW link

(universalprior.substack.com)

More Is Different for AI

jsteinhardtJan 4, 2022, 7:30 PM

140 points

24 comments3 min readLW link 1 review

(bounded-regret.ghost.io)

I’m trying out “asteroid mindset”

Alex_AltairJun 3, 2022, 1:35 PM

90 points

5 comments4 min readLW link

Rogue AGI Embodies Valuable Intellectual Property

Mark Xu and CarlShulman

Jun 3, 2021, 8:37 PM

71 points

9 comments3 min readLW link

Behavioral Sufficient Statistics for Goal-Directedness

adamShimiMar 11, 2021, 3:01 PM

21 points

12 comments9 min readLW link

Environmental Structure Can Cause Instrumental Convergence

TurnTroutJun 22, 2021, 10:26 PM

71 points

43 comments16 min readLW link

(arxiv.org)

The AI Safety Game (UPDATED)

Daniel KokotajloDec 5, 2020, 10:27 AM

45 points

10 comments3 min readLW link

Microdooms averted by working on AI Safety

Nikola JurkovicSep 17, 2023, 9:46 PM

34 points

3 comments3 min readLW link

(forum.effectivealtruism.org)

AISN #24: Kissinger Urges US-China Cooperation on AI, China’s New AI Law, US Export Controls, International Institutions, and Open Source AI

Dan H and Corin Katzke

Oct 18, 2023, 5:06 PM

14 points

0 comments6 min readLW link

(newsletter.safe.ai)

Alex Turner’s Research, Comprehensive Information Gathering

adamShimiJun 23, 2021, 9:44 AM

15 points

3 comments3 min readLW link

Risks from AI Overview: Summary

Dan H, Mantas Mazeika and TW123

Aug 18, 2023, 1:21 AM

25 points

1 comment13 min readLW link

(www.safe.ai)

Risks from Learned Optimization: Conclusion and Related Work

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 7, 2019, 7:53 PM

82 points

5 comments6 min readLW link

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg and Maximilian Kaufmann

Sep 4, 2023, 12:54 PM

109 points

16 comments5 min readLW link

(arxiv.org)

Clarifying “What failure looks like”

Sam ClarkeSep 20, 2020, 8:40 PM

97 points

14 comments17 min readLW link

Thoughts on sharing information about language model capabilities

paulfchristianoJul 31, 2023, 4:04 PM

210 points

44 comments11 min readLW link 1 review

Information warfare historically revolved around human conduits

trevorAug 28, 2023, 6:54 PM

37 points

7 comments3 min readLW link

Sam Altman and Ezra Klein on the AI Revolution

Zack_M_DavisJun 27, 2021, 4:53 AM

38 points

17 comments1 min readLW link

(www.nytimes.com)

Epistemological Vigilance for Alignment

adamShimiJun 6, 2022, 12:27 AM

66 points

11 comments10 min readLW link

Levels of safety for AI and other technologies

jasoncrawfordJun 28, 2023, 6:35 PM

16 points

0 comments2 min readLW link

(rootsofprogress.org)

My research agenda in agent foundations

Alex_AltairJun 28, 2023, 6:00 PM

72 points

9 comments11 min readLW link

I Think Eliezer Should Go on Glenn Beck

Lao MeinJun 30, 2023, 3:12 AM

29 points

21 comments1 min readLW link

Catastrophic Risks from AI #3: AI Race

Dan H, Mantas Mazeika and TW123

Jun 23, 2023, 7:21 PM

18 points

9 comments29 min readLW link

(arxiv.org)

Catastrophic Risks from AI #2: Malicious Use

Dan H, Mantas Mazeika and TW123

Jun 22, 2023, 5:10 PM

38 points

1 comment17 min readLW link

(arxiv.org)

Catastrophic Risks from AI #4: Organizational Risks

Dan H, Mantas Mazeika and TW123

Jun 26, 2023, 7:36 PM

23 points

0 comments21 min readLW link

(arxiv.org)

Winners of AI Alignment Awards Research Contest

Orpheus16 and OliviaJ

Jul 13, 2023, 4:14 PM

115 points

4 comments12 min readLW link

(alignmentawards.com)

Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem

Zack_M_DavisSep 17, 2020, 2:23 AM

72 points

12 comments5 min readLW link

(aima.cs.berkeley.edu)

Relaxed adversarial training for inner alignment

evhubSep 10, 2019, 11:03 PM

69 points

27 comments27 min readLW link

Risks from Learned Optimization: Introduction

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

May 31, 2019, 11:44 PM

187 points

42 comments12 min readLW link 3 reviews

The Inner Alignment Problem

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 4, 2019, 1:20 AM

104 points

17 comments13 min readLW link

AI risk hub in Singapore?

Daniel KokotajloOct 29, 2020, 11:45 AM

58 points

18 comments4 min readLW link

[Question] Measure of complexity allowed by the laws of the universe and relative theory?

dr_sSep 7, 2023, 12:21 PM

8 points

22 comments1 min readLW link

Projects I would like to see (possibly at AI Safety Camp)

Linda LinseforsSep 27, 2023, 9:27 PM

22 points

12 comments4 min readLW link

Thoughts on Robin Hanson’s AI Impacts interview

Steven ByrnesNov 24, 2019, 1:40 AM

25 points

3 comments7 min readLW link

My disagreements with “AGI ruin: A List of Lethalities”

Noosphere89Sep 15, 2024, 5:22 PM

36 points

46 comments18 min readLW link

Google’s Ethical AI team and AI Safety

magfrumpFeb 20, 2021, 9:42 AM

12 points

16 comments7 min readLW link

Review of “Fun with +12 OOMs of Compute”

adamShimi, Joe Collman and Gyrodiot

Mar 28, 2021, 2:55 PM

65 points

21 comments8 min readLW link 1 review

Less Realistic Tales of Doom

Mark XuMay 6, 2021, 11:01 PM

113 points

13 comments4 min readLW link

[Question] What are good alignment conference papers?

adamShimiAug 28, 2021, 1:35 PM

12 points

2 comments1 min readLW link

Bayeswatch 7: Wildfire

lsusrSep 8, 2021, 5:35 AM

51 points

6 comments3 min readLW link

Is progress in ML-assisted theorem-proving beneficial?

mako yassSep 28, 2021, 1:54 AM

11 points

3 comments1 min readLW link

Epistemic Strategies of Selection Theorems

adamShimiOct 18, 2021, 8:57 AM

33 points

1 comment12 min readLW link

Using Brain-Computer Interfaces to get more data for AI alignment

RobboNov 7, 2021, 12:00 AM

43 points

10 comments7 min readLW link

Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky and Richard_Ngo

Nov 15, 2021, 8:31 PM

254 points

151 comments99 min readLW link 1 review

Framing approaches to alignment and the hard problem of AI cognition

ryan_greenblattDec 15, 2021, 7:06 PM

16 points

15 comments27 min readLW link

My Overview of the AI Alignment Landscape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:44 PM

127 points

9 comments15 min readLW link

The Plan − 2023 Version

johnswentworthDec 29, 2023, 11:34 PM

152 points

40 comments31 min readLW link 1 review

Challenges with Breaking into MIRI-Style Research

Chris_LeongJan 17, 2022, 9:23 AM

75 points

16 comments2 min readLW link

Paradigm-building from first principles: Effective altruism, AGI, and alignment

Cameron BergFeb 8, 2022, 4:12 PM

29 points

5 comments14 min readLW link

How I Formed My Own Views About AI Safety

Neel NandaFeb 27, 2022, 6:50 PM

66 points

6 comments13 min readLW link

(www.neelnanda.io)

It Looks Like You’re Trying To Take Over The World

gwernMar 9, 2022, 4:35 PM

407 points

120 comments1 min readLW link 1 review

(www.gwern.net)

AMA Conjecture, A New Alignment Startup

adamShimiApr 9, 2022, 9:43 AM

47 points

42 comments1 min readLW link

Will working here advance AGI? Help us not destroy the world!

Yonatan CaleMay 29, 2022, 11:42 AM

30 points

47 comments1 min readLW link

Distilled—AGI Safety from First Principles

Harrison GMay 29, 2022, 12:57 AM

11 points

1 comment14 min readLW link

Why I don’t believe in doom

mukashiJun 7, 2022, 11:49 PM

6 points

30 comments4 min readLW link

[Link] Sarah Constantin: “Why I am Not An AI Doomer”

lbThingrbApr 12, 2023, 1:52 AM

61 points

13 comments1 min readLW link

(sarahconstantin.substack.com)

On A List of Lethalities

ZviJun 13, 2022, 12:30 PM

165 points

50 comments54 min readLW link 1 review

(thezvi.wordpress.com)

Alignment Risk Doesn’t Require Superintelligence

JustisMillsJun 15, 2022, 3:12 AM

35 points

4 comments2 min readLW link

Confusion about neuroscience/cognitive science as a danger for AI Alignment

Samuel NellessenJun 22, 2022, 5:59 PM

3 points

1 comment3 min readLW link

(snellessen.com)

[Linkpost] Existential Risk Analysis in Empirical Research Papers

Dan HJul 2, 2022, 12:09 AM

40 points

0 comments1 min readLW link

(arxiv.org)

The Alignment Problem

lsusrJul 11, 2022, 3:03 AM

46 points

18 comments3 min readLW link

A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans

Thane RuthenisDec 17, 2023, 8:28 PM

29 points

7 comments11 min readLW link

Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworthAug 12, 2022, 4:30 PM

110 points

49 comments1 min readLW link

Shapes of Mind and Pluralism in Alignment

adamShimiAug 13, 2022, 10:01 AM

33 points

2 comments2 min readLW link

No One-Size-Fit-All Epistemic Strategy

adamShimiAug 20, 2022, 12:56 PM

24 points

2 comments2 min readLW link

Refine’s Third Blog Post Day/Week

adamShimiSep 17, 2022, 5:03 PM

18 points

0 comments1 min readLW link

Results from the language model hackathon

Esben KranOct 10, 2022, 8:29 AM

22 points

1 comment4 min readLW link

Mechanism Design for AI Safety—Reading Group Curriculum

Rubi J. HudsonOct 25, 2022, 3:54 AM

15 points

3 comments1 min readLW link

DeepMind and Google Brain are merging [Linkpost]

Orpheus16Apr 20, 2023, 6:47 PM

55 points

5 comments1 min readLW link

(www.deepmind.com)

What I Learned Running Refine

adamShimiNov 24, 2022, 2:49 PM

108 points

5 comments4 min readLW link

AI Neorealism: a threat model & success criterion for existential safety

davidadDec 15, 2022, 1:42 PM

67 points

1 comment3 min readLW link

Wentworth and Larsen on buying time

Orpheus16, Thomas Larsen and johnswentworth

Jan 9, 2023, 9:31 PM

74 points

6 comments12 min readLW link

How we could stumble into AI catastrophe

HoldenKarnofskyJan 13, 2023, 4:20 PM

71 points

18 comments18 min readLW link

(www.cold-takes.com)

[Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution

Orpheus16Jan 21, 2023, 4:51 PM

58 points

2 comments3 min readLW link

(time.com)

Taboo P(doom)

NathanBarnardFeb 3, 2023, 10:37 AM

14 points

10 comments1 min readLW link

AI Safety Info Distillation Fellowship

Robert Miles and mwatkins

Feb 17, 2023, 4:16 PM

47 points

3 comments3 min readLW link

Chad Jones paper modeling AI and x-risk vs. growth

jasoncrawfordApr 26, 2023, 8:07 PM

39 points

7 comments2 min readLW link

(web.stanford.edu)

Gradient Descent on the Human Brain

Jozdien and gaspode

Apr 1, 2024, 10:39 PM

59 points

5 comments2 min readLW link

Are we dropping the ball on Recommendation AIs?

Charbel-RaphaëlOct 23, 2024, 5:48 PM

41 points

17 comments6 min readLW link

Contra “Strong Coherence”

DragonGodMar 4, 2023, 8:05 PM

39 points

24 comments1 min readLW link

AI safety tax dynamics

owencbOct 23, 2024, 12:18 PM

22 points

0 comments6 min readLW link

(strangecities.substack.com)

New voluntary commitments (AI Seoul Summit)

Zach Stein-PerlmanMay 21, 2024, 11:00 AM

81 points

17 comments7 min readLW link

(www.gov.uk)

Plan for mediocre alignment of brain-like [model-based RL] AGI

Steven ByrnesMar 13, 2023, 2:11 PM

67 points

25 comments12 min readLW link

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher KingMar 15, 2023, 12:29 AM

116 points

22 comments2 min readLW link

The Wizard of Oz Problem: How incentives and narratives can skew our perception of AI developments

Orpheus16Mar 20, 2023, 8:44 PM

16 points

3 comments6 min readLW link

AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks

Dan HOct 31, 2023, 7:34 PM

35 points

1 comment6 min readLW link

(newsletter.safe.ai)

A shortcoming of concrete demonstrations as AGI risk advocacy

Steven ByrnesDec 11, 2024, 4:48 PM

103 points

27 comments2 min readLW link

Will GPT-5 be able to self-improve?

Nathan Helm-BurgerApr 29, 2023, 5:34 PM

18 points

22 comments3 min readLW link

MIRI’s April 2024 Newsletter

HarlanApr 12, 2024, 11:38 PM

95 points

0 comments3 min readLW link

(intelligence.org)

[Question] Good taxonomies of all risks (small or large) from AI?

Aryeh EnglanderMar 5, 2024, 6:15 PM

6 points

1 comment1 min readLW link

“Why can’t you just turn it off?”

RokoNov 19, 2023, 2:46 PM

48 points

25 comments1 min readLW link

The AI Revolution in Biology

Roman LeventovMay 26, 2024, 9:30 AM

13 points

0 comments1 min readLW link

(www.cognitiverevolution.ai)

An unaligned benchmark

paulfchristianoNov 17, 2018, 3:51 PM

31 points

0 comments9 min readLW link

The Control Problem: Unsolved or Unsolvable?

RemmeltJun 2, 2023, 3:42 PM

55 points

46 comments14 min readLW link

The Shortest Path Between Scylla and Charybdis

Thane RuthenisDec 18, 2023, 8:08 PM

50 points

8 comments5 min readLW link

A moral backlash against AI will probably slow down AGI development

geoffreymillerJun 7, 2023, 8:39 PM

51 points

10 comments14 min readLW link

Request: stop advancing AI capabilities

So8resMay 26, 2023, 5:42 PM

154 points

24 comments1 min readLW link

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)

Linda Linsefors, Remmelt Ellen and Robert Kralisch

Aug 23, 2024, 2:18 PM

17 points

2 comments4 min readLW link

Hands-On Experience Is Not Magic

Thane RuthenisMay 27, 2023, 4:57 PM

21 points

14 comments5 min readLW link

A plea for solutionism on AI safety

jasoncrawfordJun 9, 2023, 4:29 PM

72 points

6 comments6 min readLW link

(rootsofprogress.org)

Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

gwernJul 3, 2023, 12:48 AM

426 points

54 comments7 min readLW link

(www.youtube.com)

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

remember and Andrea_Miotti

Feb 23, 2023, 12:34 PM

138 points

89 comments75 min readLW link

The Fusion Power Generator Scenario

johnswentworthAug 8, 2020, 6:31 PM

155 points

30 comments3 min readLW link

The problem/solution matrix: Calculating the probability of AI safety “on the back of an envelope”

John_MaxwellOct 20, 2019, 8:03 AM

22 points

4 comments2 min readLW link

Three Stories for How AGI Comes Before FAI

John_MaxwellSep 17, 2019, 11:26 PM

27 points

5 comments6 min readLW link

My motivation and theory of change for working in AI healthtech

Andrew_CritchOct 12, 2024, 12:36 AM

176 points

37 comments14 min readLW link

AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI

Dan H and Orpheus16

May 23, 2023, 9:47 PM

25 points

0 comments6 min readLW link

(newsletter.safe.ai)

Activation additions in a small residual network

Garrett BakerMay 22, 2023, 8:28 PM

22 points

4 comments3 min readLW link

Why don’t singularitarians bet on the creation of AGI by buying stocks?

John_MaxwellMar 11, 2020, 4:27 PM

43 points

21 comments4 min readLW link

Making alignment a law of the universe

jugginsFeb 25, 2025, 10:44 AM

0 points

3 comments15 min readLW link

AI companies aren’t really using external evaluators

Zach Stein-PerlmanMay 24, 2024, 4:01 PM

242 points

15 comments4 min readLW link

A guide to Iterated Amplification & Debate

Rafael HarthNov 15, 2020, 5:14 PM

75 points

12 comments15 min readLW link

Let’s build a fire alarm for AGI

chaosmageMay 15, 2023, 9:16 AM

−1 points

0 comments2 min readLW link

Difficulties in making powerful aligned AI

DanielFilanMay 14, 2023, 8:50 PM

41 points

1 comment10 min readLW link

(danielfilan.com)

AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control

Dan H and Orpheus16

May 16, 2023, 3:14 PM

31 points

0 comments6 min readLW link

(newsletter.safe.ai)

Response to Oren Etzioni’s “How to know if artificial intelligence is about to destroy civilization”

Daniel KokotajloFeb 27, 2020, 6:10 PM

27 points

5 comments8 min readLW link

Recommender Alignment for Lock-In Risk

alamertonMar 24, 2025, 12:56 PM

2 points

0 comments7 min readLW link

Will Artificial Superintelligence Kill Us?

James_MillerMay 23, 2023, 4:27 PM

33 points

2 comments22 min readLW link

Activation additions in a simple MNIST network

Garrett BakerMay 18, 2023, 2:49 AM

26 points

0 comments2 min readLW link

Work on Security Instead of Friendliness?

Wei DaiJul 21, 2012, 6:28 PM

71 points

107 comments2 min readLW link

Ten Levels of AI Alignment Difficulty

Sammy MartinJul 3, 2023, 8:20 PM

130 points

24 comments12 min readLW link 1 review

Systems that cannot be unsafe cannot be safe

DavidmanheimMay 2, 2023, 8:53 AM

62 points

27 comments2 min readLW link

Gradual Disempowerment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM

124 points

36 comments6 min readLW link

A Case for the Least Forgiving Take On Alignment

Thane RuthenisMay 2, 2023, 9:34 PM

100 points

85 comments22 min readLW link

Stanford Encyclopedia of Philosophy on AI ethics and superintelligence

Kaj_SotalaMay 2, 2020, 7:35 AM

43 points

19 comments7 min readLW link

(plato.stanford.edu)

My AI Model Delta Compared To Yudkowsky

johnswentworthJun 10, 2024, 4:12 PM

280 points

103 comments4 min readLW link

AGI Safety Literature Review (Everitt, Lea & Hutter 2018)

Kaj_SotalaMay 4, 2018, 8:56 AM

14 points

1 comment1 min readLW link

(arxiv.org)

[Question] How much do personal biases in risk assessment affect assessment of AI risks?

Gordon Seidoh WorleyMay 3, 2023, 6:12 AM

10 points

8 comments1 min readLW link

The strategy-stealing assumption

paulfchristianoSep 16, 2019, 3:23 PM

86 points

54 comments12 min readLW link 3 reviews

AISafety.com – Resources for AI Safety

Søren Elverlin, plex, Bryce Robertson and Melissa Samworth

May 17, 2024, 3:57 PM

82 points

3 comments1 min readLW link

Announcement: AI alignment prize winners and next round

cousin_itJan 15, 2018, 2:33 PM

81 points

68 comments2 min readLW link

AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data

Dan H and Corin Katzke

May 16, 2024, 2:29 PM

2 points

3 comments6 min readLW link

(newsletter.safe.ai)

Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben PaceOct 4, 2019, 4:08 AM

221 points

61 comments15 min readLW link 2 reviews

Against “argument from overhang risk”

RobertMMay 16, 2024, 4:44 AM

30 points

11 comments5 min readLW link

Kessler’s Second Syndrome

Jesse HooglandJan 26, 2025, 7:04 AM

69 points

2 comments3 min readLW link

Thinking soberly about the context and consequences of Friendly AI

Mitchell_PorterOct 16, 2012, 4:33 AM

21 points

39 comments1 min readLW link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet) and David Duvenaud

Jan 30, 2025, 5:03 PM

160 points

52 comments2 min readLW link

(gradual-disempowerment.ai)

Catastrophe through Chaos

Marius HobbhahnJan 31, 2025, 2:19 PM

182 points

17 comments12 min readLW link

Uber Self-Driving Crash

jefftkNov 7, 2019, 3:00 PM

109 points

1 comment2 min readLW link

(www.jefftk.com)

Reply to Holden on ‘Tool AI’

Eliezer YudkowskyJun 12, 2012, 6:00 PM

152 points

356 comments17 min readLW link

Worrisome misunderstanding of the core issues with AI transition

Roman LeventovJan 18, 2024, 10:05 AM

5 points

2 comments4 min readLW link

[AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin ShahJan 2, 2020, 6:20 PM

36 points

95 comments10 min readLW link

(mailchi.mp)

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Andrew_CritchJun 14, 2024, 12:16 AM

357 points

38 comments4 min readLW link

We need (a lot) more rogue agent honeypots

OzyrusMar 23, 2025, 10:24 PM

37 points

12 comments4 min readLW link

What Failure Looks Like: Distilling the Discussion

Ben PaceJul 29, 2020, 9:49 PM

82 points

14 comments7 min readLW link

Predictable updating about AI risk

Joe CarlsmithMay 8, 2023, 9:53 PM

293 points

25 comments36 min readLW link 1 review

AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models

Dan H and Orpheus16

May 9, 2023, 3:26 PM

28 points

1 comment4 min readLW link

(newsletter.safe.ai)

[Question] Why don’t quantilizers also cut off the upper end of the distribution?

Alex_AltairMay 15, 2023, 1:40 AM

25 points

2 comments1 min readLW link

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM

129 points

21 comments21 min readLW link

(thezvi.wordpress.com)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Thane RuthenisFeb 21, 2025, 8:15 PM

148 points

51 comments6 min readLW link

Eight Short Studies On Excuses

Scott AlexanderApr 20, 2010, 11:01 PM

853 points

253 comments10 min readLW link

Brainrot

Jesse HooglandJan 26, 2025, 5:35 AM

42 points

0 comments3 min readLW link

[Question] Why not constrain wetlabs instead of AI?

Lone PineMar 21, 2023, 6:02 PM

15 points

10 comments1 min readLW link

The Rising Sea

Jesse HooglandJan 25, 2025, 8:48 PM

93 points

2 comments2 min readLW link

Why AI Safety is Hard

Simon MöllerMar 22, 2023, 10:44 AM

1 point

0 comments6 min readLW link

Brainstorming additional AI risk reduction ideas

John_MaxwellJun 14, 2012, 7:55 AM

19 points

37 comments1 min readLW link

AI Alignment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM

126 points

6 comments35 min readLW link

Detachment vs attachment [AI risk and mental health]

Neil Jan 15, 2024, 12:41 AM

14 points

4 comments3 min readLW link

The Overton Window widens: Examples of AI risk in the media

Orpheus16Mar 23, 2023, 5:10 PM

107 points

24 comments6 min readLW link

“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”

plexMay 18, 2024, 2:09 PM

54 points

23 comments2 min readLW link

(aisafety.info)

Why and When Interpretability Work is Dangerous

Nicholas / Heather KrossMay 28, 2023, 12:27 AM

20 points

9 comments8 min readLW link

(www.thinkingmuchbetter.com)

The case for removing alignment and ML research from the training dataset

berenMay 30, 2023, 8:54 PM

48 points

8 comments5 min readLW link

Minimum Viable Exterminator

Richard HorvathMay 29, 2023, 4:32 PM

14 points

5 comments5 min readLW link

Drexler on AI Risk

PeterMcCluskeyFeb 1, 2019, 5:11 AM

35 points

10 comments9 min readLW link

(www.bayesianinvestor.com)

Catastrophic Risks from AI #1: Introduction

Dan H, Mantas Mazeika and TW123

Jun 22, 2023, 5:09 PM

40 points

1 comment5 min readLW link

(arxiv.org)

What would a compute monitoring plan look like? [Linkpost]

Orpheus16Mar 26, 2023, 7:33 PM

158 points

10 comments4 min readLW link

(arxiv.org)

Pausing AI is Positive Expected Value

LironMar 10, 2024, 5:10 PM

9 points

2 comments3 min readLW link

(twitter.com)

OpenAI: Fallout

ZviMay 28, 2024, 1:20 PM

204 points

25 comments36 min readLW link

(thezvi.wordpress.com)

A shift in arguments for AI risk

Richard_NgoMay 28, 2019, 1:47 PM

32 points

7 comments1 min readLW link

(fragile-credences.github.io)

Catastrophic Risks from AI #6: Discussion and FAQ

Dan H, Mantas Mazeika and TW123

Jun 27, 2023, 11:23 PM

24 points

1 comment13 min readLW link

(arxiv.org)

Catastrophic Risks from AI #5: Rogue AIs

Dan H, Mantas Mazeika and TW123

Jun 27, 2023, 10:06 PM

15 points

0 comments22 min readLW link

(arxiv.org)

Why Billionaires Will Not Survive an AGI Extinction Event

funnyfrancoMar 15, 2025, 6:08 AM

7 points

0 comments14 min readLW link

Theory of Change for AI Safety Camp

Linda LinseforsJan 22, 2025, 10:07 PM

36 points

3 comments7 min readLW link

Four lenses on AI risks

jasoncrawfordMar 28, 2023, 9:52 PM

23 points

5 comments3 min readLW link

(rootsofprogress.org)

Adam Smith Meets AI Doomers

James_MillerJan 31, 2024, 3:53 PM

34 points

10 comments5 min readLW link

The Mask Comes Off: A Trio of Tales

ZviFeb 14, 2025, 3:30 PM

81 points

1 comment13 min readLW link

(thezvi.wordpress.com)

[Question] Did AI pioneers not worry much about AI risks?

lisperatiFeb 9, 2020, 7:58 PM

42 points

9 comments1 min readLW link

[Question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal

doomyeserSep 11, 2024, 6:07 PM

9 points

9 comments3 min readLW link

[Question] What does it look like for AI to significantly improve human coordination, before superintelligence?

Bird ConceptJan 15, 2024, 7:22 PM

22 points

2 comments1 min readLW link

AI Doom Is Not (Only) Disjunctive

NickGabsMar 30, 2023, 1:42 AM

12 points

0 comments5 min readLW link

Consider the humble rock (or: why the dumb thing kills you)

pleiotrothJul 4, 2024, 1:54 PM

62 points

11 comments4 min readLW link

An overview of 11 proposals for building safe advanced AI

evhubMay 29, 2020, 8:38 PM

220 points

37 comments38 min readLW link 2 reviews

The basic reasons I expect AGI ruin

Rob BensingerApr 18, 2023, 3:37 AM

189 points

73 comments14 min readLW link

Deceptive Alignment

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 5, 2019, 8:16 PM

118 points

20 comments17 min readLW link

How difficult is AI Alignment?

Sammy MartinSep 13, 2024, 3:47 PM

44 points

6 comments23 min readLW link

Conditions for Mesa-Optimization

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 1, 2019, 8:52 PM

84 points

48 comments12 min readLW link

Wizards and prophets of AI [draft for comment]

jasoncrawfordMar 31, 2023, 8:22 PM

16 points

11 comments6 min readLW link

AI alignment as a translation problem

Roman LeventovFeb 5, 2024, 2:14 PM

22 points

2 comments3 min readLW link

Announcing Human-aligned AI Summer School

Jan_Kulveit and Tomáš Gavenčiak

May 22, 2024, 8:55 AM

50 points

0 comments1 min readLW link

(humanaligned.ai)

Apply to lead a project during the next virtual AI Safety Camp

Linda Linsefors and Remmelt

Sep 13, 2023, 1:29 PM

19 points

0 comments5 min readLW link

(aisafety.camp)

My guess at Conjecture’s vision: triggering a narrative bifurcation

Alexandre VariengienFeb 6, 2024, 7:10 PM

75 points

12 comments16 min readLW link

We don’t want to post again “This might be the last AI Safety Camp”

Remmelt, Linda Linsefors and Robert Kralisch

Jan 21, 2025, 12:03 PM

36 points

17 comments1 min readLW link

(manifund.org)

Six AI Risk/Strategy Ideas

Wei DaiAug 27, 2019, 12:40 AM

73 points

17 comments4 min readLW link 1 review

AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

Dan HOct 4, 2023, 5:37 PM

15 points

2 comments5 min readLW link

(newsletter.safe.ai)

Some disjunctive reasons for urgency on AI risk

Wei DaiFeb 15, 2019, 8:43 PM

36 points

24 comments1 min readLW link

AI Safety is Dropping the Ball on Clown Attacks

trevorOct 22, 2023, 8:09 PM

74 points

82 comments34 min readLW link

Most People Don’t Realize We Have No Idea How Our AIs Work

Thane RuthenisDec 21, 2023, 8:02 PM

159 points

42 comments1 min readLW link

[Question] Suggestions of posts on the AF to review

adamShimiFeb 16, 2021, 12:40 PM

56 points

20 comments1 min readLW link

Disentangling arguments for the importance of AI safety

Richard_NgoJan 21, 2019, 12:41 PM

133 points

23 comments8 min readLW link

April drafts

AI ImpactsApr 1, 2021, 6:10 PM

49 points

2 comments1 min readLW link

(aiimpacts.org)

Orthogonality is expensive

berenApr 3, 2023, 10:20 AM

43 points

9 comments3 min readLW link

25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong

June KuApr 29, 2021, 3:38 PM

21 points

7 comments1 min readLW link

Gaia Network: a practical, incremental pathway to Open Agency Architecture

Roman Leventov and Rafael Kaufmann Nedal

Dec 20, 2023, 5:11 PM

22 points

8 comments16 min readLW link

Approaches to gradient hacking

adamShimiAug 14, 2021, 3:16 PM

16 points

8 comments8 min readLW link

Why the technological singularity by AGI may never happen

hippkeSep 3, 2021, 2:19 PM

5 points

14 comments1 min readLW link

[Question] Conditional on the first AGI being aligned correctly, is a good outcome even still likely?

iamthouthouartiSep 6, 2021, 5:30 PM

2 points

1 comment1 min readLW link

Complex Systems are Hard to Control

jsteinhardtApr 4, 2023, 12:00 AM

42 points

5 comments10 min readLW link

(bounded-regret.ghost.io)

New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development

Orpheus16Apr 5, 2023, 1:26 AM

46 points

9 comments1 min readLW link

(today.yougov.com)

Interview with Skynet

lsusrSep 30, 2021, 2:20 AM

49 points

1 comment2 min readLW link

“Taking AI Risk Seriously” (thoughts by Critch)

RaemonJan 29, 2018, 9:27 AM

110 points

68 comments13 min readLW link

Response to Aschenbrenner’s “Situational Awareness”

Rob BensingerJun 6, 2024, 10:57 PM

194 points

27 comments3 min readLW link

Using blinders to help you see things for what they are

Adam ZernerNov 11, 2021, 7:07 AM

13 points

2 comments2 min readLW link

AI Safety Memes Wiki

plex and Vishakha

Jul 24, 2024, 6:53 PM

34 points

1 comment1 min readLW link

(aisafety.info)

My current uncertainties regarding AI, alignment, and the end of the world

dominicqNov 14, 2021, 2:08 PM

2 points

3 comments2 min readLW link

[Talk transcript] What “structure” is and why it matters

Alex_AltairJul 25, 2024, 3:49 PM

23 points

0 comments5 min readLW link

(www.youtube.com)

[Question] Does the Structure of an algorithm matter for AI Risk and/or consciousness?

Logan ZoellnerDec 3, 2021, 6:31 PM

7 points

4 comments1 min readLW link

Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

Rob BensingerDec 12, 2021, 2:08 AM

70 points

35 comments7 min readLW link

Summary of the Acausal Attack Issue for AIXI

DiffractorDec 13, 2021, 8:16 AM

12 points

6 comments4 min readLW link

Introducing AI Lab Watch

Zach Stein-PerlmanApr 30, 2024, 5:00 PM

224 points

30 comments1 min readLW link

(ailabwatch.org)

All AGI Safety questions welcome (especially basic ones) [April 2023]

steven0461Apr 8, 2023, 4:21 AM

57 points

88 comments2 min readLW link

Existing Safety Frameworks Imply Unreasonable Confidence

Joe Rogero, yams and Joe Collman

Apr 10, 2025, 4:31 PM

36 points

1 comment15 min readLW link

(intelligence.org)

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

Mar 13, 2025, 7:09 PM

155 points

40 comments6 min readLW link

Reliability, Security, and AI risk: Notes from infosec textbook chapter 1

Orpheus16Apr 7, 2023, 3:47 PM

34 points

1 comment4 min readLW link

n=3 AI Risk Quick Math and Reasoning

lionhearted (Sebastian Marshall)Apr 7, 2023, 8:27 PM

6 points

3 comments4 min readLW link

All images from the WaitButWhy sequence on AI

trevorApr 8, 2023, 7:36 AM

73 points

5 comments2 min readLW link

Paradigm-building: Introduction

Cameron BergFeb 8, 2022, 12:06 AM

28 points

0 comments2 min readLW link

Twitter thread on AI takeover scenarios

Richard_NgoJul 31, 2024, 12:24 AM

37 points

0 comments2 min readLW link

(x.com)

“warning about ai doom” is also “announcing capabilities progress to noobs”

the gears to ascensionApr 8, 2023, 11:42 PM

23 points

5 comments3 min readLW link

Upgrading the AI Safety Community

trevor and Nicholas / Heather Kross

Dec 16, 2023, 3:34 PM

42 points

9 comments42 min readLW link

[RETRACTED] It’s time for EA leadership to pull the short-timelines fire alarm.

Not RelevantApr 8, 2022, 4:07 PM

115 points

166 comments4 min readLW link

AI Safety Newsletter #1 [CAIS Linkpost]

Orpheus16, Dan H and ozhang

Apr 10, 2023, 8:18 PM

45 points

0 comments4 min readLW link

(newsletter.safe.ai)

Complex Systems for AI Safety [Pragmatic AI Safety #3]

Dan H and TW123

May 24, 2022, 12:00 AM

58 points

3 comments21 min readLW link

Why do misalignment risks increase as AIs get more capable?

ryan_greenblattApr 11, 2025, 3:06 AM

33 points

6 comments3 min readLW link

Current AIs Provide Nearly No Data Relevant to AGI Alignment

Thane RuthenisDec 15, 2023, 8:16 PM

131 points

157 comments8 min readLW link 1 review

Where’s the foom?

Fergus FettesApr 11, 2023, 3:50 PM

34 points

27 comments2 min readLW link

Paper: Tell, Don’t Show- Declarative facts influence how LLMs generalize

Owain_Evans and AlexMeinke

Dec 19, 2023, 7:14 PM

45 points

4 comments6 min readLW link

(arxiv.org)

Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

Kaj_SotalaFeb 12, 2018, 12:30 PM

45 points

4 comments6 min readLW link

(kajsotala.fi)

The Main Sources of AI Risk?

Daniel Kokotajlo and Wei Dai

Mar 21, 2019, 6:28 PM

126 points

26 comments2 min readLW link

Neuroscience and Alignment

Garrett BakerMar 18, 2024, 9:09 PM

40 points

25 comments2 min readLW link

How dangerous is human-level AI?

Alex_AltairJun 10, 2022, 5:38 PM

21 points

4 comments8 min readLW link

Continuity Assumptions

Jan_KulveitJun 13, 2022, 9:31 PM

44 points

13 comments4 min readLW link

Slow motion videos as AI risk intuition pumps

Andrew_CritchJun 14, 2022, 7:31 PM

241 points

41 comments2 min readLW link 1 review

AI #76: Six Shorts Stories About OpenAI

ZviAug 8, 2024, 1:50 PM

53 points

10 comments48 min readLW link

(thezvi.wordpress.com)

Financial Times: We must slow down the race to God-like AI

trevorApr 13, 2023, 7:55 PM

113 points

17 comments16 min readLW link

(www.ft.com)

The Dissolution of AI Safety

RokoDec 12, 2024, 10:34 AM

8 points

44 comments1 min readLW link

(www.transhumanaxiology.com)

Clarifying some key hypotheses in AI alignment

Ben Cottier and Rohin Shah

Aug 15, 2019, 9:29 PM

79 points

12 comments9 min readLW link

Top lesson from GPT: we will probably destroy humanity “for the lulz” as soon as we are able.

ShmiApr 16, 2023, 8:27 PM

63 points

28 comments1 min readLW link

AI Takeover Scenario with Scaled LLMs

simeon_cApr 16, 2023, 11:28 PM

42 points

15 comments8 min readLW link

Cortés, AI Risk, and the Dynamics of Competing Conquerors

James_MillerJan 2, 2024, 4:37 PM

14 points

2 comments3 min readLW link

The alignment problem from a deep learning perspective

Richard_NgoAug 10, 2022, 10:46 PM

107 points

15 comments27 min readLW link 1 review

Assessment of AI safety agendas: think about the downside risk

Roman LeventovDec 19, 2023, 9:00 AM

13 points

1 comment1 min readLW link

DeepMind alignment team opinions on AGI ruin arguments

VikaAug 12, 2022, 9:06 PM

395 points

37 comments14 min readLW link 1 review

Against most, but not all, AI risk analogies

Matthew BarnettJan 14, 2024, 3:36 AM

63 points

41 comments7 min readLW link

Do not delete your misaligned AGI.

mako yassMar 24, 2024, 9:37 PM

62 points

13 comments3 min readLW link

List your AI X-Risk cruxes!

Aryeh EnglanderApr 28, 2024, 6:26 PM

42 points

7 comments2 min readLW link

What if we approach AI safety like a technical engineering safety problem

zeshenAug 20, 2022, 10:29 AM

36 points

4 comments7 min readLW link

A case for AI alignment being difficult

jessicataDec 31, 2023, 7:55 PM

105 points

59 comments15 min readLW link 1 review

(unstableontology.com)

All AGI safety questions welcome (especially basic ones) [Sept 2022]

plexSep 8, 2022, 11:56 AM

22 points

48 comments3 min readLW link

[Question] Why are we sure that AI will “want” something?

ShmiSep 16, 2022, 8:35 PM

31 points

57 comments1 min readLW link

Critiquing “What failure looks like”

Grue_SlinkyDec 27, 2019, 11:59 PM

35 points

6 comments3 min readLW link

A conversation about Katja’s counterarguments to AI risk

Matthew Barnett and Ege Erdil

Oct 18, 2022, 6:40 PM

43 points

9 comments33 min readLW link

Empowerment is (almost) All We Need

jacob_cannellOct 23, 2022, 9:48 PM

61 points

44 comments17 min readLW link

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

mwatkins and Robert Miles

Jan 26, 2023, 9:01 PM

39 points

81 comments2 min readLW link

Non-Adversarial Goodhart and AI Risks

DavidmanheimMar 27, 2018, 1:39 AM

22 points

11 comments6 min readLW link

AI Safety “Success Stories”

Wei DaiSep 7, 2019, 2:54 AM

126 points

27 comments4 min readLW link 1 review

All AGI Safety questions welcome (especially basic ones) [May 2023]

steven0461May 8, 2023, 10:30 PM

33 points

44 comments2 min readLW link

Talking publicly about AI risk

Jan_KulveitApr 21, 2023, 11:28 AM

180 points

9 comments6 min readLW link

If you wish to make an apple pie, you must first become dictator of the universe

jasoncrawfordJul 5, 2023, 6:14 PM

27 points

9 comments13 min readLW link

(rootsofprogress.org)

A Quick Guide to Confronting Doom

RubyApr 13, 2022, 7:30 PM

243 points

33 comments2 min readLW link

Thoughts on the In-Context Scheming AI Experiment

ExCephJan 9, 2025, 2:19 AM

3 points

0 comments4 min readLW link

The “Everyone Can’t Be Wrong” Prior causes AI risk denial but helped prehistoric people

Knight LeeJan 9, 2025, 5:54 AM

1 point

0 comments2 min readLW link

Logic vs intuition ⇔ algorithm vs ML

pchvykovJan 4, 2025, 9:06 AM

5 points

0 comments7 min readLW link

The Intelligence Curse

lukedragoJan 3, 2025, 7:07 PM

123 points

26 comments18 min readLW link

(lukedrago.substack.com)

LLMs seem (relatively) safe

JustisMillsApr 25, 2024, 10:13 PM

53 points

24 comments7 min readLW link

(justismills.substack.com)

Alignment Is Not All You Need

Adam JonesJan 2, 2025, 5:50 PM

43 points

10 comments6 min readLW link

(adamjones.me)

I Recommend More Training Rationales

Gianluca CalcagniDec 31, 2024, 2:06 PM

2 points

0 comments6 min readLW link

[Question] Accuracy of arguments that are seen as ridiculous and intuitively false but don’t have good counter-arguments

Christopher KingApr 29, 2023, 11:58 PM

30 points

39 comments1 min readLW link

Human study on AI spear phishing campaigns

Simon Lermen, Fred Heiding and Andrew Kao

Jan 3, 2025, 3:11 PM

79 points

8 comments5 min readLW link

Morality as Cooperation Part III: Failure Modes

DeLesley HutchinsDec 5, 2024, 9:39 AM

4 points

0 comments20 min readLW link

METR’s preliminary evaluation of o3 and o4-mini

Christopher KingApr 16, 2025, 8:23 PM

12 points

2 comments1 min readLW link

(metr.github.io)

How to solve the misuse problem assuming that in 10 years the default scenario is that AGI agents are capable of synthetizing pathogens

jeremttiNov 27, 2024, 9:17 PM

6 points

0 comments9 min readLW link

Takeoff speeds presentation at Anthropic

Tom DavidsonJun 4, 2024, 10:46 PM

92 points

0 comments25 min readLW link

Rethink Priorities: Seeking Expressions of Interest for Special Projects Next Year

kierangreigNov 29, 2023, 1:59 PM

4 points

0 comments5 min readLW link

Hope to live or fear to die?

Knight LeeNov 27, 2024, 10:42 AM

3 points

0 comments1 min readLW link

I created an Asi Alignment Tier List

TimeGoatApr 21, 2024, 6:44 PM

−6 points

0 comments1 min readLW link

Taking Away the Guns First: The Fundamental Flaw in AI Development

s-iceNov 26, 2024, 10:11 PM

1 point

0 comments17 min readLW link

The two paragraph argument for AI risk

CronoDASNov 25, 2023, 2:01 AM

19 points

8 comments1 min readLW link

Why Recursive Self-Improvement Might Not Be the Existential Risk We Fear

Nassim_ANov 24, 2024, 5:17 PM

1 point

0 comments9 min readLW link

A better “Statement on AI Risk?”

Knight LeeNov 25, 2024, 4:50 AM

9 points

6 comments3 min readLW link

Ilya: The AI scientist shaping the world

David VargaNov 20, 2023, 1:09 PM

11 points

0 comments4 min readLW link

[Question] Have we seen any “ReLU instead of sigmoid-type improvements” recently

KvmanThinkingNov 23, 2024, 3:51 AM

2 points

4 comments1 min readLW link

Two Tales of AI Takeover: My Doubts

Violet HourMar 5, 2024, 3:51 PM

30 points

8 comments29 min readLW link

Morality as Cooperation Part II: Theory and Experiment

DeLesley HutchinsDec 5, 2024, 9:04 AM

2 points

0 comments17 min readLW link

Truth Terminal: A reconstruction of events

crvr.fr and MTorrents

Nov 17, 2024, 11:51 PM

3 points

1 comment7 min readLW link

[Question] AI Safety orgs- what’s your biggest bottleneck right now?

Kabir KumarNov 16, 2023, 2:02 AM

1 point

0 comments1 min readLW link

Proposing the Conditional AI Safety Treaty (linkpost TIME)

otto.bartenNov 15, 2024, 1:59 PM

10 points

8 comments3 min readLW link

(time.com)

Announcing Atlas Computing

miyazonoApr 11, 2024, 3:56 PM

44 points

4 comments4 min readLW link

Confronting the legion of doom.

Spiritus DeiNov 13, 2024, 5:03 PM

−20 points

3 comments5 min readLW link

[Question] Can singularity emerge from transformers?

MPApr 8, 2024, 2:26 PM

−3 points

1 comment1 min readLW link

Thoughts after the Wolfram and Yudkowsky discussion

TahpNov 14, 2024, 1:43 AM

25 points

13 comments6 min readLW link

AI as Super-Demagogue

RationalDinoNov 5, 2023, 9:21 PM

11 points

12 comments9 min readLW link

What AI safety researchers can learn from Mahatma Gandhi

Lysandre TerrisseNov 8, 2024, 7:49 PM

−6 points

0 comments3 min readLW link

[Question] What’s the protocol for if a novice has ML ideas that are unlikely to work, but might improve capabilities if they do work?

droctaJan 9, 2024, 10:51 PM

6 points

2 comments2 min readLW link

What should AI safety be trying to achieve?

EuanMcLeanMay 23, 2024, 11:17 AM

17 points

1 comment13 min readLW link

Symbiotic self-alignment of AIs.

Spiritus DeiNov 7, 2023, 5:18 PM

1 point

0 comments3 min readLW link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush and scasper

Nov 7, 2023, 5:59 PM

38 points

2 comments2 min readLW link

(arxiv.org)

The Social Alignment Problem

irvingApr 28, 2023, 2:16 PM

99 points

13 comments8 min readLW link

Do you want a first-principled preparedness guide to prepare yourself and loved ones for potential catastrophes?

Ulrik HornNov 14, 2023, 12:13 PM

16 points

5 comments15 min readLW link

[Question] Realistic near-future scenarios of AI doom understandable for non-techy people?

RomanSApr 28, 2023, 2:45 PM

4 points

4 comments1 min readLW link

We Should Talk About This More. Epistemic World Collapse as Imminent Safety Risk of Generative AI.

Joerg WeissNov 16, 2023, 6:46 PM

11 points

2 comments29 min readLW link

On excluding dangerous information from training

ShayBenMosheNov 17, 2023, 11:14 AM

23 points

5 comments3 min readLW link

Killswitch

JunioNov 18, 2023, 10:53 PM

2 points

0 comments3 min readLW link

Tetherware #2: What every human should know about our most likely AI future

Jáchym FibírFeb 28, 2025, 11:12 AM

3 points

0 comments11 min readLW link

(tetherware.substack.com)

The 6D effect: When companies take risks, one email can be very powerful.

scasperNov 4, 2023, 8:08 PM

278 points

42 comments3 min readLW link

A Guide to Forecasting AI Science Capabilities

Eleni AngelouApr 29, 2023, 11:24 PM

6 points

1 comment4 min readLW link

Navigating public AI x-risk hype while pursuing technical solutions

Dan BraunFeb 19, 2023, 12:22 PM

18 points

0 comments2 min readLW link

[Question] fake alignment solutions????

KvmanThinkingDec 11, 2024, 3:31 AM

1 point

6 comments1 min readLW link

Saying the quiet part out loud: trading off x-risk for personal immortality

disturbanceNov 2, 2023, 5:43 PM

84 points

89 comments5 min readLW link

AISC 2024 - Project Summaries

NickyPNov 27, 2023, 10:32 PM

48 points

3 comments18 min readLW link

Support me in a Week-Long Picketing Campaign Near OpenAI’s HQ: Seeking Support and Ideas from the LessWrong Community

PercyApr 30, 2023, 5:48 PM

−21 points

15 comments1 min readLW link

AI as a powerful meme, via CGP Grey

TheManxLoinerOct 30, 2024, 6:31 PM

46 points

8 comments4 min readLW link

Thoughts on “AI is easy to control” by Pope & Belrose

Steven ByrnesDec 1, 2023, 5:30 PM

197 points

62 comments14 min readLW link 1 review

The benefits and risks of optimism (about AI safety)

Karl von WendtDec 3, 2023, 12:45 PM

−7 points

6 comments5 min readLW link

A call for a quantitative report card for AI bioterrorism threat models

JunoDec 4, 2023, 6:35 AM

12 points

0 comments10 min readLW link

Why empiricists should believe in AI risk

Knight LeeDec 11, 2024, 3:51 AM

5 points

0 comments1 min readLW link

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver Sourbut, Lewis Hammond and HarrietW

Aug 3, 2024, 10:16 AM

8 points

0 comments14 min readLW link

(www.oliversourbut.net)

Survey of 2,778 AI authors: six parts in pictures

KatjaGraceJan 6, 2024, 4:43 AM

80 points

1 comment2 min readLW link

Call for submissions: Choice of Futures survey questions

c.troutApr 30, 2023, 6:59 AM

4 points

0 comments2 min readLW link

(airtable.com)

Focus on existential risk is a distraction from the real issues. A false fallacy

Nik SamoylovOct 30, 2023, 11:42 PM

−19 points

11 comments2 min readLW link

The Last Light

Bridgett KayApr 14, 2025, 3:41 PM

19 points

0 comments4 min readLW link

On the Impossibility of Intelligent Paperclip Maximizers

Michael SimkinMay 29, 2023, 4:55 PM

−21 points

5 comments4 min readLW link

Engaging First Introductions to AI Risk

Rob BensingerAug 19, 2013, 6:26 AM

31 points

21 comments3 min readLW link

Winners-take-how-much?

YonatanKMay 29, 2023, 9:56 PM

3 points

2 comments3 min readLW link

Evaluating the feasibility of SI’s plan

JoshuaFoxJan 10, 2013, 8:17 AM

39 points

187 comments4 min readLW link

Limiting factors to predict AI take-off speed

Alfonso Pérez EscuderoMay 31, 2023, 11:19 PM

1 point

0 comments6 min readLW link

A Double-Feature on The Extropians

Maxwell TabarrokJun 3, 2023, 6:27 PM

59 points

4 comments1 min readLW link

Intrinsic vs. Extrinsic Alignment

Alfonso Pérez EscuderoJun 1, 2023, 1:06 AM

1 point

1 comment3 min readLW link

How will they feed us

meijer1973Jun 1, 2023, 8:49 AM

4 points

3 comments5 min readLW link

Unpredictability and the Increasing Difficulty of AI Alignment for Increasingly Intelligent AI

Max_He-HoMay 31, 2023, 10:25 PM

5 points

2 comments20 min readLW link

The unspoken but ridiculous assumption of AI doom: the hidden doom assumption

Christopher KingJun 1, 2023, 5:01 PM

−9 points

1 comment3 min readLW link

Open Source LLMs Can Now Actively Lie

Josh LevyJun 1, 2023, 10:03 PM

6 points

0 comments3 min readLW link

Q&A with experts on risks from AI #1

XiXiDuJan 8, 2012, 11:46 AM

45 points

67 comments9 min readLW link

Algo trading is a central example of AI risk

Vanessa KosoyJul 28, 2018, 8:31 PM

27 points

5 comments1 min readLW link

Will the world’s elites navigate the creation of AI just fine?

lukeprogMay 31, 2013, 6:49 PM

36 points

266 comments2 min readLW link

Proposal: labs should precommit to pausing if an AI argues for itself to be improved

NickGabsJun 2, 2023, 10:31 PM

3 points

3 comments4 min readLW link

[FICTION] Prometheus Rising: The Emergence of an AI Consciousness

Super AGIJun 10, 2023, 4:41 AM

−14 points

0 comments9 min readLW link

Andrew Ng wants to have a conversation about extinction risk from AI

Leon LangJun 5, 2023, 10:29 PM

32 points

2 comments1 min readLW link

(twitter.com)

The (local) unit of intelligence is FLOPs

boazbarakJun 5, 2023, 6:23 PM

42 points

7 comments5 min readLW link

Non-loss of control AGI-related catastrophes are out of control too

Yi-Yang, Mo Putera and zeshen

Jun 12, 2023, 12:01 PM

2 points

3 comments24 min readLW link

A Playbook for AI Risk Reduction (focused on misaligned AI)

HoldenKarnofskyJun 6, 2023, 6:05 PM

90 points

42 comments14 min readLW link 1 review

Using Consensus Mechanisms as an approach to Alignment

PrometheusJun 10, 2023, 11:38 PM

11 points

2 comments6 min readLW link

Agentic Mess (A Failure Story)

Karl von Wendt, Sofia Bharadia, PeterDrotos, Artem Korotkov, mespa and mruwnik

Jun 6, 2023, 1:09 PM

46 points

5 comments13 min readLW link

Manifold Predicted the AI Extinction Statement and CAIS Wanted it Deleted

David CheeJun 12, 2023, 3:54 PM

71 points

15 comments12 min readLW link

Current AI harms are also sci-fi

Christopher KingJun 8, 2023, 5:49 PM

26 points

3 comments1 min readLW link

[FICTION] Unboxing Elysium: An AI’S Escape

Super AGIJun 10, 2023, 4:41 AM

−16 points

4 comments14 min readLW link

Why AI may not save the World

Alberto ZannoniJun 9, 2023, 5:42 PM

0 points

0 comments4 min readLW link

(a16z.com)

[Question] AI Rights: In your view, what would be required for an AGI to gain rights and protections from the various Governments of the World?

Super AGIJun 9, 2023, 1:24 AM

10 points

26 comments1 min readLW link

Introduction to Towards Causal Foundations of Safe AGI

tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott and sbenthall

Jun 12, 2023, 5:55 PM

67 points

6 comments4 min readLW link

Let’s talk about “Convergent Rationality”

David Scott Krueger (formerly: capybaralet)Jun 12, 2019, 9:53 PM

44 points

33 comments6 min readLW link

AISC team report: Soft-optimization, Bayes and Goodhart

Simon Fischer, benjaminko, jazcarretao, DFNaiff and Jeremy Gillen

Jun 27, 2023, 6:05 AM

38 points

2 comments15 min readLW link

Breaking Oracles: superrationality and acausal trade

Stuart_ArmstrongNov 25, 2019, 10:40 AM

25 points

15 comments1 min readLW link

Aligned Objectives Prize Competition

PrometheusJun 15, 2023, 12:42 PM

8 points

0 comments2 min readLW link

(app.impactmarkets.io)

Lightning Post: Things people in AI Safety should stop talking about

PrometheusJun 20, 2023, 3:00 PM

23 points

6 comments2 min readLW link

Q&A with Stan Franklin on risks from AI

XiXiDuJun 11, 2011, 3:22 PM

36 points

10 comments2 min readLW link

A Friendly Face (Another Failure Story)

Karl von Wendt, Sofia Bharadia, PeterDrotos, Artem Korotkov, mespa and mruwnik

Jun 20, 2023, 10:31 AM

65 points

21 comments16 min readLW link

Muehlhauser-Goertzel Dialogue, Part 1

lukeprogMar 16, 2012, 5:12 PM

42 points

161 comments33 min readLW link

Exploring Last-Resort Measures for AI Alignment: Humanity’s Extinction Switch

0xPetraJun 23, 2023, 5:01 PM

7 points

0 comments2 min readLW link

[LINK] NYT Article about Existential Risk from AI

[deleted]Jan 28, 2013, 10:37 AM

38 points

23 comments1 min readLW link

Reframing the Problem of AI Progress

Wei DaiApr 12, 2012, 7:31 PM

32 points

47 comments1 min readLW link

New AI risks research institute at Oxford University

lukeprogNov 16, 2011, 6:52 PM

36 points

10 comments1 min readLW link

Memes and Rational Decisions

inferentialJan 9, 2015, 6:42 AM

35 points

18 comments10 min readLW link

I just watched don’t look up.

ATheCoderJun 23, 2023, 9:22 PM

0 points

5 comments2 min readLW link

AI-Plans.com—a contributable compendium

IknownothingJun 25, 2023, 2:40 PM

39 points

7 comments4 min readLW link

(ai-plans.com)

Cheat sheet of AI X-risk

momom2Jun 29, 2023, 4:28 AM

19 points

1 comment7 min readLW link

Challenge proposal: smallest possible self-hardening backdoor for RLHF

Christopher KingJun 29, 2023, 4:56 PM

7 points

0 comments2 min readLW link

Biosafety Regulations (BMBL) and their relevance for AI

Štěpán LosJun 29, 2023, 7:22 PM

4 points

0 comments4 min readLW link

Metaphors for AI, and why I don’t like them

boazbarakJun 28, 2023, 10:47 PM

38 points

18 comments12 min readLW link

AGI & War

CalecuteJun 29, 2023, 10:20 PM

9 points

1 comment1 min readLW link

AI Incident Sharing—Best practices from other fields and a comprehensive list of existing platforms

Štěpán LosJun 28, 2023, 5:21 PM

20 points

0 comments4 min readLW link

AI Safety without Alignment: How humans can WIN against AI

vicchainJun 29, 2023, 5:53 PM

1 point

1 comment2 min readLW link

Levels of AI Self-Improvement

avturchinApr 29, 2018, 11:45 AM

11 points

1 comment39 min readLW link

Optimising Society to Constrain Risk of War from an Artificial Superintelligence

JohnCDraperApr 30, 2020, 10:47 AM

4 points

1 comment51 min readLW link

Quantitative cruxes in Alignment

Martín SotoJul 2, 2023, 8:38 PM

19 points

0 comments23 min readLW link

Sources of evidence in Alignment

Martín SotoJul 2, 2023, 8:38 PM

20 points

0 comments11 min readLW link

Some Thoughts on Singularity Strategies

Wei DaiJul 13, 2011, 2:41 AM

45 points

30 comments3 min readLW link

An AGI kill switch with defined security properties

PeterpiperJul 5, 2023, 5:40 PM

−5 points

6 comments1 min readLW link

How I Learned To Stop Worrying And Love The Shoggoth

Peter MerelJul 12, 2023, 5:47 PM

9 points

15 comments5 min readLW link

Do you feel that AGI Alignment could be achieved in a Type 0 civilization?

Super AGIJul 6, 2023, 4:52 AM

−2 points

1 comment1 min readLW link

An Overview of AI risks—the Flyer

Charbel-Raphaël, Jonathan Claybrough and tchauvin

Jul 17, 2023, 12:03 PM

20 points

0 comments1 min readLW link

(docs.google.com)

ACI#4: Seed AI is the new Perpetual Motion Machine

Akira PyinyaJul 8, 2023, 1:17 AM

−1 points

0 comments6 min readLW link

Announcing AI Alignment workshop at the ALIFE 2023 conference

rorygreigJul 8, 2023, 1:52 PM

16 points

0 comments1 min readLW link

(humanvaluesandartificialagency.com)

“Reframing Superintelligence” + LLMs + 4 years

Eric DrexlerJul 10, 2023, 1:42 PM

117 points

9 comments12 min readLW link

A trick for Safer GPT-N

RaziedAug 23, 2020, 12:39 AM

7 points

1 comment2 min readLW link

Gearing Up for Long Timelines in a Hard World

DalcyJul 14, 2023, 6:11 AM

15 points

0 comments4 min readLW link

against “AI risk”

Wei DaiApr 11, 2012, 10:46 PM

35 points

91 comments1 min readLW link

“Smarter than us” is out!

Stuart_ArmstrongFeb 25, 2014, 3:50 PM

41 points

57 comments1 min readLW link

Analysing: Dangerous messages from future UFAI via Oracles

Stuart_ArmstrongNov 22, 2019, 2:17 PM

22 points

16 comments4 min readLW link

[Question] What criterion would you use to select companies likely to cause AI doom?

momom2Jul 13, 2023, 8:31 PM

8 points

4 comments1 min readLW link

Why was the AI Alignment community so unprepared for this moment?

Ras1513Jul 15, 2023, 12:26 AM

121 points

65 comments2 min readLW link

Simple alignment plan that maybe works

IknownothingJul 18, 2023, 10:48 PM

4 points

8 comments1 min readLW link

Runaway Optimizers in Mind Space

silentbobJul 16, 2023, 2:26 PM

16 points

0 comments12 min readLW link

Quick Thoughts on Language Models

RohanSJul 18, 2023, 8:38 PM

6 points

0 comments4 min readLW link

[Question] Why don’t we consider large forms of social organization (economic and political forms, in particular) to qualify as AGI and Transformative AI?

T LJul 16, 2023, 6:54 PM

1 point

0 comments2 min readLW link

Proof of posteriority: a defense against AI-generated misinformation

jchanJul 17, 2023, 12:04 PM

33 points

3 comments5 min readLW link

Q&A with Abram Demski on risks from AI

XiXiDuJan 17, 2012, 9:43 AM

33 points

71 comments9 min readLW link

Q&A with experts on risks from AI #2

XiXiDuJan 9, 2012, 7:40 PM

22 points

29 comments7 min readLW link

AI Safety Discussion Day

Linda LinseforsSep 15, 2020, 2:40 PM

20 points

0 comments1 min readLW link

[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)

otto.bartenJul 24, 2023, 10:07 AM

12 points

0 comments7 min readLW link

(time.com)

A long reply to Ben Garfinkel on Scrutinizing Classic AI Risk Arguments

Søren ElverlinSep 27, 2020, 5:51 PM

17 points

6 comments1 min readLW link

All AGI Safety questions welcome (especially basic ones) [July 2023]

smallsiloJul 20, 2023, 8:20 PM

38 points

40 comments2 min readLW link

(forum.effectivealtruism.org)

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

JustausernameJul 23, 2023, 4:08 PM

4 points

1 comment3 min readLW link

Autonomous Alignment Oversight Framework (AAOF)

JustausernameJul 25, 2023, 10:25 AM

−9 points

0 comments4 min readLW link

A response to the Richards et al.’s “The Illusion of AI’s Existential Risk”

Harrison FellJul 26, 2023, 5:34 PM

1 point

0 comments10 min readLW link

[Question] Have you ever considered taking the ‘Turing Test’ yourself?

Super AGIJul 27, 2023, 3:48 AM

2 points

6 comments1 min readLW link

Evaluating Superhuman Models with Consistency Checks

Daniel Paleka and Lukas Fluri

Aug 1, 2023, 7:51 AM

21 points

2 comments9 min readLW link

(arxiv.org)

3 levels of threat obfuscation

HoldenKarnofskyAug 2, 2023, 2:58 PM

69 points

14 comments7 min readLW link

4 types of AGI selection, and how to constrain them

RemmeltAug 8, 2023, 10:02 AM

−4 points

3 comments3 min readLW link

Embedding Ethical Priors into AI Systems: A Bayesian Approach

JustausernameAug 3, 2023, 3:31 PM

−5 points

3 comments21 min readLW link

Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments

mishkaAug 10, 2023, 7:07 PM

21 points

3 comments5 min readLW link

Some alignment ideas

SelonNeriasAug 10, 2023, 5:51 PM

1 point

0 comments11 min readLW link

Seeking Input to AI Safety Book for non-technical audience

Darren McKeeAug 10, 2023, 5:58 PM

10 points

4 comments1 min readLW link

Online AI Safety Discussion Day

Linda LinseforsOct 8, 2020, 12:11 PM

5 points

0 comments1 min readLW link

Existentially relevant thought experiment: To kill or not to kill, a sniper, a man and a button.

AlexFromSafeTransitionAug 14, 2023, 10:53 AM

−18 points

6 comments4 min readLW link

We Should Prepare for a Larger Representation of Academia in AI Safety

Leon LangAug 13, 2023, 6:03 PM

90 points

14 comments5 min readLW link

Decomposing independent generalizations in neural networks via Hessian analysis

Dmitry Vaintrob and Nina Panickssery

Aug 14, 2023, 5:04 PM

84 points

4 comments1 min readLW link

AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge

Dan HAug 15, 2023, 4:10 PM

21 points

0 comments5 min readLW link

(newsletter.safe.ai)

Ideas for improving epistemics in AI safety outreach

micAug 21, 2023, 7:55 PM

64 points

6 comments3 min readLW link

Mesa-Optimization: Explain it like I’m 10 Edition

brookAug 26, 2023, 11:04 PM

20 points

1 comment6 min readLW link

Enhancing Corrigibility in AI Systems through Robust Feedback Loops

JustausernameAug 24, 2023, 3:53 AM

1 point

0 comments6 min readLW link

Reflections on “Making the Atomic Bomb”

boazbarakAug 17, 2023, 2:48 AM

51 points

7 comments8 min readLW link

AI Regulation May Be More Important Than AI Alignment For Existential Safety

otto.bartenAug 24, 2023, 11:41 AM

65 points

39 comments5 min readLW link

When discussing AI doom barriers propose specific plausible scenarios

anithiteAug 18, 2023, 4:06 AM

5 points

0 comments3 min readLW link

Military AI as a Convergent Goal of Self-Improving AI

avturchinNov 13, 2017, 12:17 PM

5 points

3 comments1 min readLW link

2 unusual reasons for why we can avoid being turned into paperclips

Artem PanushAug 19, 2023, 10:28 AM

1 point

0 comments4 min readLW link

[Question] Clarifying how misalignment can arise from scaling LLMs

UtilAug 19, 2023, 2:16 PM

3 points

1 comment1 min readLW link

Will AI kill everyone? Here’s what the godfathers of AI have to say [RA video]

WriterAug 19, 2023, 5:29 PM

58 points

8 comments1 min readLW link

(youtu.be)

Report on Frontier Model Training

YafahEdelmanAug 30, 2023, 8:02 PM

122 points

21 comments21 min readLW link

(docs.google.com)

Ramble on STUFF: intelligence, simulation, AI, doom, default mode, the usual

Bill BenzonAug 26, 2023, 3:49 PM

5 points

0 comments4 min readLW link

The Game of Dominance

Karl von WendtAug 27, 2023, 11:04 AM

24 points

15 comments6 min readLW link

A Letter to the Editor of MIT Technology Review

JeffsAug 30, 2023, 4:59 PM

0 points

0 comments2 min readLW link

The Epistemic Authority of Deep Learning Pioneers

Dylan BowmanAug 29, 2023, 6:14 PM

8 points

2 comments3 min readLW link

AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities

Dan HAug 29, 2023, 3:07 PM

12 points

0 comments8 min readLW link

(newsletter.safe.ai)

Neural program synthesis is a dangerous technology

syllogismJan 12, 2018, 4:19 PM

10 points

6 comments2 min readLW link

New, Brief Popular-Level Introduction to AI Risks and Superintelligence

LyleNJan 23, 2015, 3:43 PM

33 points

3 comments1 min readLW link

Video essay: How Will We Know When AI is Conscious?

JanProSep 6, 2023, 6:10 PM

11 points

7 comments1 min readLW link

(www.youtube.com)

AISN #21: Google DeepMind’s GPT-4 Competitor, Military Investments in Autonomous Drones, The UK AI Safety Summit, and Case Studies in AI Policy

Dan HSep 5, 2023, 3:03 PM

15 points

0 comments5 min readLW link

(newsletter.safe.ai)

[Question] What is to be done? (About the profit motive)

Connor BarberSep 8, 2023, 7:27 PM

1 point

21 comments1 min readLW link

How teams went about their research at AI Safety Camp edition 8

Remmelt, Linda Linsefors and Kristi Uustalu

Sep 9, 2023, 4:34 PM

28 points

0 comments13 min readLW link

FAI Research Constraints and AGI Side Effects

JustinShovelainJun 3, 2015, 7:25 PM

27 points

59 comments7 min readLW link

A conversation with Pi, a conversational AI.

Spiritus DeiSep 15, 2023, 11:13 PM

1 point

0 comments1 min readLW link

Jimmy Apples, source of the rumor that OpenAI has achieved AGI internally, is a credible insider.

JorterderSep 28, 2023, 1:20 AM

−6 points

2 comments1 min readLW link

(twitter.com)

MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures

corey morrisSep 27, 2023, 5:54 PM

18 points

2 comments4 min readLW link

(medium.com)

Technical AI Safety Research Landscape [Slides]

Magdalena WacheSep 18, 2023, 1:56 PM

41 points

0 comments4 min readLW link

[Question] Whom Do You Trust?

JackOfAllTradesFeb 26, 2024, 7:38 PM

1 point

0 comments1 min readLW link

European Master’s Programs in Machine Learning, Artificial Intelligence, and related fields

Master Programs ML/AINov 14, 2020, 3:51 PM

34 points

6 comments1 min readLW link

Immortality or death by AGI

ImmortalityOrDeathByAGISep 21, 2023, 11:59 PM

47 points

30 comments4 min readLW link

(forum.effectivealtruism.org)

[Linkpost] Mark Zuckerberg confronted about Meta’s Llama 2 AI’s ability to give users detailed guidance on making anthrax—Business Insider

micSep 26, 2023, 12:05 PM

18 points

11 comments2 min readLW link

(www.businessinsider.com)

The mind-killer

Paul CrowleyMay 2, 2009, 4:49 PM

29 points

160 comments2 min readLW link

“Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation

titotalSep 29, 2023, 2:01 PM

160 points

79 comments1 min readLW link

(titotal.substack.com)

I designed an AI safety course (for a philosophy department)

Eleni AngelouSep 23, 2023, 10:03 PM

37 points

15 comments2 min readLW link

Taking features out of superposition with sparse autoencoders more quickly with informed initialization

Pierre PeignéSep 23, 2023, 4:21 PM

30 points

8 comments5 min readLW link

Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

Erich_GrunewaldOct 8, 2023, 12:14 PM

12 points

7 comments2 min readLW link

(arxiv.org)

Paper: Identifying the Risks of LM Agents with an LM-Emulated Sandbox—University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases!

Singularian2501Oct 9, 2023, 12:00 AM

6 points

0 comments1 min readLW link

Ideation and Trajectory Modelling in Language Models

NickyPOct 5, 2023, 7:21 PM

16 points

2 comments10 min readLW link

The Gradient – The Artificiality of Alignment

micOct 8, 2023, 4:06 AM

12 points

1 comment5 min readLW link

(thegradient.pub)

[Question] Should I do it?

MrLightNov 19, 2020, 1:08 AM

−3 points

16 comments2 min readLW link

Become a PIBBSS Research Affiliate

Nora_Ammann and DusanDNesic

Oct 10, 2023, 7:41 AM

24 points

6 comments6 min readLW link

Rationalising humans: another mugging, but not Pascal’s

Stuart_ArmstrongNov 14, 2017, 3:46 PM

7 points

1 comment3 min readLW link

Machine learning could be fundamentally unexplainable

George3d6Dec 16, 2020, 1:32 PM

26 points

15 comments15 min readLW link

(cerebralab.com)

LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B

Simon Lermen and Jeffrey Ladish

Oct 12, 2023, 7:58 PM

151 points

29 comments14 min readLW link

Back to the Past to the Future

PrometheusOct 18, 2023, 4:51 PM

5 points

0 comments1 min readLW link

[Question] What do you make of AGI:unaligned::spaceships:not enough food?

Ronny FernandezFeb 22, 2020, 2:14 PM

4 points

3 comments1 min readLW link

Taxonomy of AI-risk counterarguments

Odd anonOct 16, 2023, 12:12 AM

65 points

13 comments8 min readLW link

Risk Map of AI Systems

VojtaKovarik and Jan_Kulveit

Dec 15, 2020, 9:16 AM

28 points

3 comments8 min readLW link

AISU 2021

Linda LinseforsJan 30, 2021, 5:40 PM

28 points

2 comments1 min readLW link

Nonperson Predicates

Eliezer YudkowskyDec 27, 2008, 1:47 AM

68 points

177 comments6 min readLW link

Formal Solution to the Inner Alignment Problem

michaelcohenFeb 18, 2021, 2:51 PM

49 points

123 comments2 min readLW link

[Question] What are the biggest current impacts of AI?

Sam ClarkeMar 7, 2021, 9:44 PM

15 points

5 comments1 min readLW link

[Question] Is a Self-Iterating AGI Vulnerable to Thompson-style Trojans?

sxaeMar 25, 2021, 2:46 PM

15 points

6 comments3 min readLW link

AI oracles on blockchain

CaravaggioApr 6, 2021, 8:13 PM

5 points

0 comments3 min readLW link

What if AGI is near?

Wulky WilkinsenApr 14, 2021, 12:05 AM

11 points

5 comments1 min readLW link

[Question] Is there anything that can stop AGI development in the near term?

Wulky WilkinsenApr 22, 2021, 8:37 PM

5 points

5 comments1 min readLW link

[Question] [timeboxed exercise] write me your model of AI human-existential safety and the alignment problems in 15 minutes

QuinnMay 4, 2021, 7:10 PM

6 points

2 comments1 min readLW link

AI Safety Research Project Ideas

Owain_EvansMay 21, 2021, 1:39 PM

58 points

2 comments3 min readLW link

Survey on AI existential risk scenarios

Sam Clarke, apc and Jonas Schuett

Jun 8, 2021, 5:12 PM

65 points

11 comments7 min readLW link

[Question] What are some claims or opinions about multi-multi delegation you’ve seen in the memeplex that you think deserve scrutiny?

QuinnJun 27, 2021, 5:44 PM

17 points

6 comments2 min readLW link

Mauhn Releases AI Safety Documentation

Berg SeverensJul 3, 2021, 9:23 PM

4 points

0 comments1 min readLW link

A gentle apocalypse

pchvykovAug 16, 2021, 5:03 AM

3 points

5 comments3 min readLW link

Could you have stopped Chernobyl?

Carlos RamirezAug 27, 2021, 1:48 AM

29 points

17 comments8 min readLW link

The Governance Problem and the “Pretty Good” X-Risk

Zach Stein-PerlmanAug 29, 2021, 6:00 PM

5 points

2 comments11 min readLW link

Distinguishing AI takeover scenarios

Sam Clarke and Sammy Martin

Sep 8, 2021, 4:19 PM

74 points

11 comments14 min readLW link

The alignment problem in different capability regimes

BuckSep 9, 2021, 7:46 PM

88 points

12 comments5 min readLW link

How truthful is GPT-3? A benchmark for language models

Owain_EvansSep 16, 2021, 10:09 AM

58 points

24 comments6 min readLW link

Investigating AI Takeover Scenarios

Sammy MartinSep 17, 2021, 6:47 PM

31 points

1 comment27 min readLW link

AI takeoff story: a continuation of progress by other means

Edouard HarrisSep 27, 2021, 3:55 PM

76 points

13 comments10 min readLW link

A brief review of the reasons multi-objective RL could be important in AI Safety Research

Ben Smith and Roland Pihlakas

Sep 29, 2021, 5:09 PM

30 points

7 comments10 min readLW link

The Dark Side of Cognition Hypothesis

Cameron BergOct 3, 2021, 8:10 PM

19 points

1 comment16 min readLW link

Truthful AI: Developing and governing AI that does not lie

Owain_Evans, owencb and Lukas Finnveden

Oct 18, 2021, 6:37 PM

82 points

9 comments10 min readLW link

AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors

Owain_EvansOct 22, 2021, 4:23 PM

31 points

15 comments1 min readLW link

Truthful and honest AI

abergal, Nick_Beckstead and Owain_Evans

Oct 29, 2021, 7:28 AM

42 points

1 comment13 min readLW link

What is the most evil AI that we could build, today?

ThomasJNov 1, 2021, 7:58 PM

−2 points

14 comments1 min readLW link

What are red flags for Neural Network suffering?

Marius HobbhahnNov 8, 2021, 12:51 PM

29 points

15 comments12 min readLW link

Hardcode the AGI to need our approval indefinitely?

MichaelStJulesNov 11, 2021, 7:04 AM

2 points

2 comments1 min readLW link

What would we do if alignment were futile?

Grant DemareeNov 14, 2021, 8:09 AM

75 points

39 comments3 min readLW link

Two Stupid AI Alignment Ideas

aphyerNov 16, 2021, 4:13 PM

27 points

3 comments4 min readLW link

Super intelligent AIs that don’t require alignment

Yair HalberstadtNov 16, 2021, 7:55 PM

10 points

2 comments6 min readLW link

AI Tracker: monitoring current and near-future risks from superscale models

Edouard Harris and Jeremie Harris

Nov 23, 2021, 7:16 PM

67 points

13 comments3 min readLW link

(aitracker.org)

HIRING: Inform and shape a new project on AI safety at Partnership on AI

Madhulika SrikumarNov 24, 2021, 8:27 AM

6 points

0 comments1 min readLW link

How to measure FLOP/s for Neural Networks empirically?

Marius HobbhahnNov 29, 2021, 3:18 PM

16 points

5 comments7 min readLW link

Modeling Failure Modes of High-Level Machine Intelligence

Ben Cottier, Daniel_Eth and Sammy Martin

Dec 6, 2021, 1:54 PM

54 points

1 comment12 min readLW link

HIRING: Inform and shape a new project on AI safety at Partnership on AI

madhu_likaDec 7, 2021, 7:37 PM

1 point

0 comments1 min readLW link

Universality and the “Filter”

maggiehayesDec 16, 2021, 12:47 AM

10 points

2 comments11 min readLW link

Reviews of “Is power-seeking AI an existential risk?”

Joe CarlsmithDec 16, 2021, 8:48 PM

80 points

20 comments1 min readLW link

2+2: Ontological Framework

LyrialtusFeb 1, 2022, 1:07 AM

−15 points

2 comments12 min readLW link

[untitled post]

superads91Feb 6, 2022, 8:39 PM

−5 points

8 comments1 min readLW link

How harmful are improvements in AI? + Poll

tilmanr and Marius Hobbhahn

Feb 15, 2022, 6:16 PM

15 points

4 comments8 min readLW link

Preserving and continuing alignment research through a severe global catastrophe

A_donorMar 6, 2022, 6:43 PM

48 points

11 comments5 min readLW link

Ask AI companies about what they are doing for AI safety?

micMar 9, 2022, 3:14 PM

51 points

0 comments2 min readLW link

Is There a Valley of Bad Civilizational Adequacy?

lbThingrbMar 11, 2022, 7:49 PM

13 points

1 comment2 min readLW link

[Question] Danger(s) of theorem-proving AI?

YitzMar 16, 2022, 2:47 AM

8 points

8 comments1 min readLW link

We Are Conjecture, A New Alignment Research Startup

Connor LeahyApr 8, 2022, 11:40 AM

197 points

25 comments4 min readLW link

Is technical AI alignment research a net positive?

cranberry_bearApr 12, 2022, 1:07 PM

6 points

2 comments2 min readLW link

[Question] Can someone explain to me why MIRI is so pessimistic of our chances of survival?

iamthouthouartiApr 14, 2022, 8:28 PM

10 points

7 comments1 min readLW link

[Question] Convince me that humanity isn’t doomed by AGI

YitzApr 15, 2022, 5:26 PM

61 points

50 comments1 min readLW link

Reflections on My Own Missing Mood

Lone PineApr 21, 2022, 4:19 PM

52 points

25 comments5 min readLW link

Code Generation as an AI risk setting

Not RelevantApr 17, 2022, 10:27 PM

91 points

16 comments2 min readLW link

[Question] What is being improved in recursive self improvement?

Lone PineApr 25, 2022, 6:30 PM

7 points

6 comments1 min readLW link

AI Alternative Futures: Scenario Mapping Artificial Intelligence Risk—Request for Participation (Closed)

KakiliApr 27, 2022, 10:07 PM

10 points

2 comments8 min readLW link

Video and Transcript of Presentation on Existential Risk from Power-Seeking AI

Joe CarlsmithMay 8, 2022, 3:50 AM

20 points

1 comment29 min readLW link

Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM

58 points

0 comments59 min readLW link

Agency As a Natural Abstraction

Thane RuthenisMay 13, 2022, 6:02 PM

55 points

9 comments13 min readLW link

[Link post] Promising Paths to Alignment—Connor Leahy | Talk

frances_lorenzMay 14, 2022, 4:01 PM

34 points

0 comments1 min readLW link

DeepMind’s generalist AI, Gato: A non-technical explainer

frances_lorenz, Nora Belrose and jonmenaster

May 16, 2022, 9:21 PM

63 points

6 comments6 min readLW link

Actionable-guidance and roadmap recommendations for the NIST AI Risk Management Framework

Dan H and Tony Barrett

May 17, 2022, 3:26 PM

26 points

0 comments3 min readLW link

Why I’m Optimistic About Near-Term AI Risk

harsimonyMay 15, 2022, 11:05 PM

57 points

27 comments1 min readLW link

Pivotal acts using an unaligned AGI?

Simon FischerAug 21, 2022, 5:13 PM

28 points

3 comments7 min readLW link

Reshaping the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM

147 points

35 comments21 min readLW link

Explaining inner alignment to myself

Jeremy GillenMay 24, 2022, 11:10 PM

9 points

2 comments10 min readLW link

A Story of AI Risk: InstructGPT-N

peterbarnettMay 26, 2022, 11:22 PM

24 points

0 comments8 min readLW link

We will be around in 30 years

mukashiJun 7, 2022, 3:47 AM

12 points

205 comments2 min readLW link

Research Questions from Stained Glass Windows

StefanHexJun 8, 2022, 12:38 PM

4 points

0 comments2 min readLW link

Towards Gears-Level Understanding of Agency

Thane RuthenisJun 16, 2022, 10:00 PM

25 points

4 comments18 min readLW link

A plausible story about AI risk.

DeLesley HutchinsJun 10, 2022, 2:08 AM

16 points

2 comments4 min readLW link

Summary of “AGI Ruin: A List of Lethalities”

Stephen McAleeseJun 10, 2022, 10:35 PM

45 points

2 comments8 min readLW link

Poorly-Aimed Death Rays

Thane RuthenisJun 11, 2022, 6:29 PM

48 points

5 comments4 min readLW link

Contra EY: Can AGI destroy us without trial & error?

nsokolskyJun 13, 2022, 6:26 PM

137 points

72 comments15 min readLW link

A Modest Pivotal Act

anonymousaisafetyJun 13, 2022, 7:24 PM

−16 points

1 comment5 min readLW link

[Question] AI misalignment risk from GPT-like systems?

fiso64Jun 19, 2022, 5:35 PM

10 points

8 comments1 min readLW link

Causal confusion as an argument against the scaling hypothesis

RobertKirk and David Scott Krueger (formerly: capybaralet)

Jun 20, 2022, 10:54 AM

86 points

30 comments15 min readLW link

[LQ] Some Thoughts on Messaging Around AI Risk

DragonGodJun 25, 2022, 1:53 PM

5 points

3 comments6 min readLW link

All AGI safety questions welcome (especially basic ones) [July 2022]

plex and Robert Miles

Jul 16, 2022, 12:57 PM

84 points

132 comments3 min readLW link

Reframing the AI Risk

Thane RuthenisJul 1, 2022, 6:44 PM

26 points

7 comments6 min readLW link

Follow along with Columbia EA’s Advanced AI Safety Fellowship!

RohanSJul 2, 2022, 5:45 PM

3 points

0 comments2 min readLW link

(forum.effectivealtruism.org)

Can we achieve AGI Alignment by balancing multiple human objectives?

Ben SmithJul 3, 2022, 2:51 AM

11 points

1 comment4 min readLW link

New US Senate Bill on X-Risk Mitigation [Linkpost]

Evan R. MurphyJul 4, 2022, 1:25 AM

35 points

12 comments1 min readLW link

(www.hsgac.senate.gov)

My Most Likely Reason to Die Young is AI X-Risk

AISafetyIsNotLongtermistJul 4, 2022, 5:08 PM

61 points

24 comments4 min readLW link

(forum.effectivealtruism.org)

Please help us communicate AI xrisk. It could save the world.

otto.bartenJul 4, 2022, 9:47 PM

4 points

7 comments2 min readLW link

When is it appropriate to use statistical models and probabilities for decision making ?

Younes KamelJul 5, 2022, 12:34 PM

10 points

7 comments4 min readLW link

(youneskamel.substack.com)

Acceptability Verification: A Research Agenda

David Udell and evhub

Jul 12, 2022, 8:11 PM

50 points

0 comments1 min readLW link

(docs.google.com)

Goal Alignment Is Robust To the Sharp Left Turn

Thane RuthenisJul 13, 2022, 8:23 PM

43 points

16 comments4 min readLW link

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

8 comments20 min readLW link

A Critique of AI Alignment Pessimism

ExCephJul 19, 2022, 2:28 AM

9 points

1 comment9 min readLW link

What Environment Properties Select Agents For World-Modeling?

Thane RuthenisJul 23, 2022, 7:27 PM

25 points

1 comment12 min readLW link

Alignment being impossible might be better than it being really difficult

Martín SotoJul 25, 2022, 11:57 PM

13 points

2 comments2 min readLW link

AGI ruin scenarios are likely (and disjunctive)

So8resJul 27, 2022, 3:21 AM

175 points

38 comments6 min readLW link

[Question] How likely do you think worse-than-extinction type fates to be?

span1Aug 1, 2022, 4:08 AM

3 points

3 comments1 min readLW link

Three pillars for avoiding AGI catastrophe: Technical alignment, deployment decisions, and coordination

LintzAAug 3, 2022, 11:15 PM

24 points

0 comments11 min readLW link

Convergence Towards World-Models: A Gears-Level Model

Thane RuthenisAug 4, 2022, 11:31 PM

38 points

1 comment13 min readLW link

Complexity No Bar to AI (Or, why Computational Complexity matters less than you think for real life problems)

Noosphere89Aug 7, 2022, 7:55 PM

17 points

14 comments3 min readLW link

(www.gwern.net)

How To Go From Interpretability To Alignment: Just Retarget The Search

johnswentworthAug 10, 2022, 4:08 PM

209 points

34 comments3 min readLW link 1 review

Anti-squatted AI x-risk domains index

plexAug 12, 2022, 12:01 PM

59 points

6 comments1 min readLW link

Infant AI Scenario

Nathan1123Aug 12, 2022, 9:20 PM

1 point

0 comments3 min readLW link

The Dumbest Possible Gets There First

ArtaxerxesAug 13, 2022, 10:20 AM

44 points

7 comments2 min readLW link

Interpretability Tools Are an Attack Channel

Thane RuthenisAug 17, 2022, 6:47 PM

42 points

14 comments1 min readLW link

Alignment’s phlogiston

Eleni AngelouAug 18, 2022, 10:27 PM

10 points

2 comments2 min readLW link

Benchmarking Proposals on Risk Scenarios

Paul BricmanAug 20, 2022, 10:01 AM

25 points

2 comments14 min readLW link

What’s the Least Impressive Thing GPT-4 Won’t be Able to Do

AlgonAug 20, 2022, 7:48 PM

80 points

125 comments1 min readLW link

My Plan to Build Aligned Superintelligence

apollonianbluesAug 21, 2022, 1:16 PM

18 points

7 comments8 min readLW link

The Alignment Problem Needs More Positive Fiction

NetcentricaAug 21, 2022, 10:01 PM

6 points

3 comments5 min readLW link

It Looks Like You’re Trying To Take Over The Narrative

George3d6Aug 24, 2022, 1:36 PM

3 points

20 comments9 min readLW link

(www.epistem.ink)

AI Risk in Terms of Unstable Nuclear Software

Thane RuthenisAug 26, 2022, 6:49 PM

30 points

1 comment6 min readLW link

Taking the parameters which seem to matter and rotating them until they don’t

Garrett BakerAug 26, 2022, 6:26 PM

120 points

48 comments1 min readLW link

Annual AGI Benchmarking Event

Lawrence PhillipsAug 27, 2022, 12:06 AM

24 points

3 comments2 min readLW link

(www.metaculus.com)

Help Understanding Preferences And Evil

NetcentricaAug 27, 2022, 3:42 AM

6 points

7 comments2 min readLW link

[Question] What would you expect a massive multimodal online federated learner to be capable of?

Aryeh EnglanderAug 27, 2022, 5:31 PM

13 points

4 comments1 min readLW link

Are Generative World Models a Mesa-Optimization Risk?

Thane RuthenisAug 29, 2022, 6:37 PM

14 points

2 comments3 min readLW link

Three scenarios of pseudo-alignment

Eleni AngelouSep 3, 2022, 12:47 PM

9 points

0 comments3 min readLW link

Worlds Where Iterative Design Fails

johnswentworthAug 30, 2022, 8:48 PM

208 points

30 comments10 min readLW link 1 review

Alignment is hard. Communicating that, might be harder

Eleni AngelouSep 1, 2022, 4:57 PM

7 points

8 comments3 min readLW link

Agency engineering: is AI-alignment “to human intent” enough?

catubcSep 2, 2022, 6:14 PM

9 points

10 comments6 min readLW link

Sticky goals: a concrete experiment for understanding deceptive alignment

evhubSep 2, 2022, 9:57 PM

39 points

13 comments3 min readLW link

A Game About AI Alignment (& Meta-Ethics): What Are the Must Haves?

JonathanErhardtSep 5, 2022, 7:55 AM

18 points

15 comments2 min readLW link

Community Building for Graduate Students: A Targeted Approach

Neil CrawfordSep 6, 2022, 5:17 PM

6 points

0 comments4 min readLW link

It’s (not) how you use it

Eleni AngelouSep 7, 2022, 5:15 PM

8 points

1 comment2 min readLW link

Oversight Leagues: The Training Game as a Feature

Paul BricmanSep 9, 2022, 10:08 AM

20 points

6 comments10 min readLW link

Ideological Inference Engines: Making Deontology Differentiable*

Paul BricmanSep 12, 2022, 12:00 PM

6 points

0 comments14 min readLW link

AI Risk Intro 1: Advanced AI Might Be Very Bad

CallumMcDougall and L Rudolf L

Sep 11, 2022, 10:57 AM

46 points

13 comments30 min readLW link

AI Safety field-building projects I’d like to see

Orpheus16Sep 11, 2022, 11:43 PM

46 points

8 comments6 min readLW link

[Linkpost] A survey on over 300 works about interpretability in deep networks

scasperSep 12, 2022, 7:07 PM

97 points

7 comments2 min readLW link

(arxiv.org)

Representational Tethers: Tying AI Latents To Human Ones

Paul BricmanSep 16, 2022, 2:45 PM

30 points

0 comments16 min readLW link

Risk aversion and GPT-3

casualphysicsenjoyerSep 13, 2022, 8:50 PM

1 point

0 comments1 min readLW link

[Question] Would a Misaligned SSI Really Kill Us All?

DragonGodSep 14, 2022, 12:15 PM

6 points

7 comments6 min readLW link

Emily Brontë on: Psychology Required for Serious™ AGI Safety Research

robertzkSep 14, 2022, 2:47 PM

2 points

0 comments1 min readLW link

Precise P(doom) isn’t very important for prioritization or strategy

harsimonySep 14, 2022, 5:19 PM

14 points

6 comments1 min readLW link

Responding to ‘Beyond Hyperanthropomorphism’

ukc10014Sep 14, 2022, 8:37 PM

9 points

0 comments16 min readLW link

Capability and Agency as Cornerstones of AI risk — My current model

wilmSep 15, 2022, 8:25 AM

10 points

4 comments12 min readLW link

Understanding Conjecture: Notes from Connor Leahy interview

Orpheus16Sep 15, 2022, 6:37 PM

107 points

23 comments15 min readLW link

[Question] Updates on FLI’s Value Aligment Map?

T431Sep 17, 2022, 10:27 PM

17 points

4 comments1 min readLW link

Summaries: Alignment Fundamentals Curriculum

Leon LangSep 18, 2022, 1:08 PM

44 points

3 comments1 min readLW link

(docs.google.com)

Leveraging Legal Informatics to Align AI

John NaySep 18, 2022, 8:39 PM

11 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

How to Train Your AGI Dragon

Eris DiscordiaSep 21, 2022, 10:28 PM

−1 points

3 comments5 min readLW link

AI Risk Intro 2: Solving The Problem

CallumMcDougall and L Rudolf L

Sep 22, 2022, 1:55 PM

22 points

0 comments27 min readLW link

Interlude: But Who Optimizes The Optimizer?

Paul BricmanSep 23, 2022, 3:30 PM

15 points

0 comments10 min readLW link

[Question] Why Do AI researchers Rate the Probability of Doom So Low?

AorouSep 24, 2022, 2:33 AM

7 points

6 comments3 min readLW link

On Generality

Eris DiscordiaSep 26, 2022, 4:06 AM

2 points

0 comments5 min readLW link

Oren’s Field Guide of Bad AGI Outcomes

Eris DiscordiaSep 26, 2022, 4:06 AM

0 points

0 comments1 min readLW link

(Structural) Stability of Coupled Optimizers

Paul BricmanSep 30, 2022, 11:28 AM

25 points

0 comments10 min readLW link

Distribution Shifts and The Importance of AI Safety

Leon LangSep 29, 2022, 10:38 PM

17 points

2 comments12 min readLW link

Eli’s review of “Is power-seeking AI an existential risk?”

eliflandSep 30, 2022, 12:21 PM

67 points

0 comments3 min readLW link

(docs.google.com)

[Question] Any further work on AI Safety Success Stories?

KriegerOct 2, 2022, 9:53 AM

8 points

6 comments1 min readLW link

Announcing the AI Safety Nudge Competition to Help Beat Procrastination

Marc CarauleanuOct 1, 2022, 1:49 AM

10 points

0 comments1 min readLW link

Generative, Episodic Objectives for Safe AI

Michael GlassOct 5, 2022, 11:18 PM

11 points

3 comments8 min readLW link

[Linkpost] “Blueprint for an AI Bill of Rights”—Office of Science and Technology Policy, USA (2022)

T431Oct 5, 2022, 4:42 PM

9 points

4 comments2 min readLW link

(www.whitehouse.gov)

What does it mean for an AGI to be ‘safe’?

So8resOct 7, 2022, 4:13 AM

74 points

29 comments3 min readLW link

Possible miracles

Orpheus16 and Thomas Larsen

Oct 9, 2022, 6:17 PM

64 points

34 comments8 min readLW link

Instrumental convergence in single-agent systems

Edouard Harris and simonsdsuo

Oct 12, 2022, 12:24 PM

33 points

4 comments8 min readLW link

(www.gladstone.ai)

Cataloguing Priors in Theory and Practice

Paul BricmanOct 13, 2022, 12:36 PM

13 points

8 comments7 min readLW link

Misalignment-by-default in multi-agent systems

Edouard Harris and simonsdsuo

Oct 13, 2022, 3:38 PM

21 points

8 comments20 min readLW link

(www.gladstone.ai)

Instrumental convergence: scale and physical interactions

Edouard Harris and simonsdsuo

Oct 14, 2022, 3:50 PM

22 points

0 comments17 min readLW link

(www.gladstone.ai)

Power-Seeking AI and Existential Risk

Antonio FrancaOct 11, 2022, 10:50 PM

6 points

0 comments9 min readLW link

Niceness is unnatural

So8resOct 13, 2022, 1:30 AM

130 points

20 comments8 min readLW link 1 review

Greed Is the Root of This Evil

Thane RuthenisOct 13, 2022, 8:40 PM

21 points

7 comments8 min readLW link

Provably Honest—A First Step

Srijanak DeNov 5, 2022, 7:18 PM

10 points

2 comments8 min readLW link

[Question] How easy is it to supervise processes vs outcomes?

Noosphere89Oct 18, 2022, 5:48 PM

3 points

0 comments1 min readLW link

Response to Katja Grace’s AI x-risk counterarguments

Erik Jenner and Johannes Treutlein

Oct 19, 2022, 1:17 AM

77 points

18 comments15 min readLW link

POWERplay: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM

29 points

0 comments1 min readLW link

(github.com)

Worldview iPeople—Future Fund’s AI Worldview Prize

Toni MUENDELOct 28, 2022, 1:53 AM

−22 points

4 comments9 min readLW link

AI Researchers On AI Risk

Scott AlexanderMay 22, 2015, 11:16 AM

19 points

0 comments16 min readLW link

AI as a Civilizational Risk Part 2/6: Behavioral Modification

PashaKamyshevOct 30, 2022, 4:57 PM

9 points

0 comments10 min readLW link

AI as a Civilizational Risk Part 3/6: Anti-economy and Signal Pollution

PashaKamyshevOct 31, 2022, 5:03 PM

7 points

4 comments14 min readLW link

AI as a Civilizational Risk Part 4/6: Bioweapons and Philosophy of Modification

PashaKamyshevNov 1, 2022, 8:50 PM

7 points

1 comment8 min readLW link

AI as a Civilizational Risk Part 5/6: Relationship between C-risk and X-risk

PashaKamyshevNov 3, 2022, 2:19 AM

2 points

0 comments7 min readLW link

AI as a Civilizational Risk Part 6/6: What can be done

PashaKamyshevNov 3, 2022, 7:48 PM

2 points

4 comments4 min readLW link

Am I secretly excited for AI getting weird?

porbyOct 29, 2022, 10:16 PM

116 points

4 comments4 min readLW link

My (naive) take on Risks from Learned Optimization

Artyom KarpovOct 31, 2022, 10:59 AM

7 points

0 comments5 min readLW link

Clarifying AI X-risk

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

Nov 1, 2022, 11:03 AM

127 points

24 comments4 min readLW link 1 review

Threat Model Literature Review

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar and Elliot Catt

Nov 1, 2022, 11:03 AM

78 points

4 comments25 min readLW link

Why do we post our AI safety plans on the Internet?

Peter S. ParkNov 3, 2022, 4:02 PM

4 points

4 comments11 min readLW link

My summary of “Pragmatic AI Safety”

Eleni AngelouNov 5, 2022, 12:54 PM

3 points

0 comments5 min readLW link

4 Key Assumptions in AI Safety

PrometheusNov 7, 2022, 10:50 AM

20 points

5 comments7 min readLW link

Loss of control of AI is not a likely source of AI x-risk

squekNov 7, 2022, 6:44 PM

−6 points

0 comments5 min readLW link

AI Safety Unconference NeurIPS 2022

OrpheusNov 7, 2022, 3:39 PM

25 points

0 comments1 min readLW link

(aisafetyevents.org)

Value Formation: An Overarching Model

Thane RuthenisNov 15, 2022, 5:16 PM

34 points

20 comments34 min readLW link

Is AI Gain-of-Function research a thing?

MadHatterNov 12, 2022, 2:33 AM

9 points

2 comments2 min readLW link

The limited upside of interpretability

Peter S. ParkNov 15, 2022, 6:46 PM

13 points

11 comments1 min readLW link

Conjecture: a retrospective after 8 months of work

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

Nov 23, 2022, 5:10 PM

180 points

9 comments8 min readLW link

Conjecture Second Hiring Round

Connor Leahy, Sid Black, Gabriel Alfour and Chris Scammell

Nov 23, 2022, 5:11 PM

92 points

0 comments1 min readLW link

Corrigibility Via Thought-Process Deference

Thane RuthenisNov 24, 2022, 5:06 PM

17 points

5 comments9 min readLW link

Discussing how to align Transformative AI if it’s developed very soon

eliflandNov 28, 2022, 4:17 PM

37 points

2 comments28 min readLW link

[Question] Is there any policy for a fair treatment of AIs whose friendliness is in doubt?

nahojNov 18, 2022, 7:01 PM

15 points

10 comments1 min readLW link

[Question] Will the first AGI agent have been designed as an agent (in addition to an AGI)?

nahojDec 3, 2022, 8:32 PM

1 point

8 comments1 min readLW link

AI can exploit safety plans posted on the Internet

Peter S. ParkDec 4, 2022, 12:17 PM

−15 points

4 comments1 min readLW link

Race to the Top: Benchmarks for AI Safety

Isabella DuanDec 4, 2022, 6:48 PM

29 points

6 comments1 min readLW link

[Question] Who are some prominent reasonable people who are confident that AI won’t kill everyone?

Optimization ProcessDec 5, 2022, 9:12 AM

72 points

54 comments1 min readLW link

Aligned Behavior is not Evidence of Alignment Past a Certain Level of Intelligence

Ronny FernandezDec 5, 2022, 3:19 PM

19 points

5 comments7 min readLW link

Foresight for AGI Safety Strategy: Mitigating Risks and Identifying Golden Opportunities

jacquesthibsDec 5, 2022, 4:09 PM

28 points

6 comments8 min readLW link

AI Safety in a Vulnerable World: Requesting Feedback on Preliminary Thoughts

Jordan ArelDec 6, 2022, 10:35 PM

4 points

2 comments3 min readLW link

Fear mitigated the nuclear threat, can it do the same to AGI risks?

Igor IvanovDec 9, 2022, 10:04 AM

6 points

8 comments5 min readLW link

Reflections on the PIBBSS Fellowship 2022

Nora_Ammann and particlemania

Dec 11, 2022, 9:53 PM

32 points

0 comments18 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

Computational signatures of psychopathy

Cameron BergDec 19, 2022, 5:01 PM

30 points

3 comments20 min readLW link

Why I think that teaching philosophy is high impact

Eleni AngelouDec 19, 2022, 3:11 AM

5 points

0 comments2 min readLW link

[Question] Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann DuboisDec 19, 2022, 10:42 PM

5 points

6 comments1 min readLW link

AGI Timelines in Governance: Different Strategies for Different Timeframes

simeon_c and AmberDawn

Dec 19, 2022, 9:31 PM

65 points

28 comments10 min readLW link

New AI risk intro from Vox [link post]

JakubKDec 21, 2022, 6:00 AM

5 points

1 comment2 min readLW link

(www.vox.com)

[Question] Oracle AGI—How can it escape, other than security issues? (Steganography?)

RationalSieveDec 25, 2022, 8:14 PM

3 points

6 comments1 min readLW link

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Thane RuthenisDec 25, 2022, 4:50 PM

33 points

38 comments9 min readLW link

Safety of Self-Assembled Neuromorphic Hardware

CanDec 26, 2022, 6:51 PM

16 points

2 comments10 min readLW link

(forum.effectivealtruism.org)

Introduction: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

Dec 27, 2022, 10:27 AM

1 point

0 comments3 min readLW link

Mere exposure effect: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

Dec 27, 2022, 2:05 PM

0 points

2 comments1 min readLW link

Bandwagon effect: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

Dec 28, 2022, 7:54 AM

−1 points

0 comments1 min readLW link

In Defense of Wrapper-Minds

Thane RuthenisDec 28, 2022, 6:28 PM

24 points

38 comments3 min readLW link

Friendly and Unfriendly AGI are Indistinguishable

ErgoEchoDec 29, 2022, 10:13 PM

−4 points

4 comments4 min readLW link

(neologos.co)

Internal Interfaces Are a High-Priority Interpretability Target

Thane RuthenisDec 29, 2022, 5:49 PM

26 points

6 comments7 min readLW link

CFP for Rebellion and Disobedience in AI workshop

Ram RachumDec 29, 2022, 4:08 PM

15 points

0 comments1 min readLW link

Reactive devaluation: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

Dec 30, 2022, 9:02 AM

−15 points

9 comments1 min readLW link

Curse of knowledge and Naive realism: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

Dec 31, 2022, 1:33 PM

−7 points

1 comment1 min readLW link

(www.lesswrong.com)

[Question] Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers?

simeon_cDec 31, 2022, 11:34 AM

8 points

5 comments1 min readLW link

Challenge to the notion that anything is (maybe) possible with AGI

Remmelt and flandry19

Jan 1, 2023, 3:57 AM

−27 points

4 comments1 min readLW link

(mflb.com)

Summary of 80k’s AI problem profile

JakubKJan 1, 2023, 7:30 AM

7 points

0 comments5 min readLW link

(forum.effectivealtruism.org)

Belief Bias: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

Jan 2, 2023, 8:59 AM

−10 points

1 comment1 min readLW link

Status quo bias; System justification: Bias in Evaluating AGI X-Risks

Remmelt and flandry19

Jan 3, 2023, 2:50 AM

−11 points

0 comments1 min readLW link

Causal representation learning as a technique to prevent goal misgeneralization

PabloAMCJan 4, 2023, 12:07 AM

21 points

0 comments8 min readLW link

Illusion of truth effect and Ambiguity effect: Bias in Evaluating AGI X-Risks

RemmeltJan 5, 2023, 4:05 AM

−13 points

2 comments1 min readLW link

AI Safety Camp, Virtual Edition 2023

Linda LinseforsJan 6, 2023, 11:09 AM

40 points

10 comments3 min readLW link

(aisafety.camp)

AI Safety Camp: Machine Learning for Scientific Discovery

Eleni AngelouJan 6, 2023, 3:21 AM

3 points

0 comments1 min readLW link

Anchoring focalism and the Identifiable victim effect: Bias in Evaluating AGI X-Risks

RemmeltJan 7, 2023, 9:59 AM

1 point

2 comments1 min readLW link

Big list of AI safety videos

JakubKJan 9, 2023, 6:12 AM

11 points

2 comments1 min readLW link

(docs.google.com)

The Alignment Problem from a Deep Learning Perspective (major rewrite)

SoerenMind, Richard_Ngo and LawrenceC

Jan 10, 2023, 4:06 PM

84 points

8 comments39 min readLW link

(arxiv.org)

[Question] Could Simulating an AGI Taking Over the World Actually Lead to a LLM Taking Over the World?

simeon_cJan 13, 2023, 6:33 AM

15 points

1 comment1 min readLW link

Reflections on Trusting Trust & AI

Itay YonaJan 16, 2023, 6:36 AM

10 points

1 comment3 min readLW link

(mentaleap.ai)

OpenAI’s Alignment Plan is not S.M.A.R.T.

Søren ElverlinJan 18, 2023, 6:39 AM

9 points

19 comments4 min readLW link

Gradient Filtering

Jozdien and janus

Jan 18, 2023, 8:09 PM

56 points

16 comments13 min readLW link

6-paragraph AI risk intro for MAISI

JakubKJan 19, 2023, 9:22 AM

11 points

0 comments2 min readLW link

(www.maisi.club)

List of technical AI safety exercises and projects

JakubKJan 19, 2023, 9:35 AM

41 points

5 comments1 min readLW link

(docs.google.com)

NYT: Google will “recalibrate” the risk of releasing AI due to competition with OpenAI

Michael HuangJan 22, 2023, 8:38 AM

47 points

2 comments1 min readLW link

(www.nytimes.com)

Next steps after AGISF at UMich

JakubKJan 25, 2023, 8:57 PM

10 points

0 comments5 min readLW link

(docs.google.com)

What is the ground reality of countries taking steps to recalibrate AI development towards Alignment first?

NebuchJan 29, 2023, 1:26 PM

8 points

6 comments3 min readLW link

Interviews with 97 AI Researchers: Quantitative Analysis

Maheen Shermohammed and Vael Gates

Feb 2, 2023, 1:01 AM

23 points

0 comments7 min readLW link

[Question] What qualities does an AGI need to have to realize the risk of false vacuum, without hardcoding physics theories into it?

RationalSieveFeb 3, 2023, 4:00 PM

1 point

4 comments1 min readLW link

Monthly Doom Argument Threads? Doom Argument Wiki?

LVSNFeb 4, 2023, 4:59 PM

3 points

0 comments1 min readLW link

Second call: CFP for Rebellion and Disobedience in AI workshop

Ram RachumFeb 5, 2023, 12:18 PM

2 points

0 comments2 min readLW link

Early situational awareness and its implications, a story

Jacob PfauFeb 6, 2023, 8:45 PM

29 points

6 comments3 min readLW link

Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)?

Christopher KingFeb 10, 2023, 7:26 PM

0 points

3 comments1 min readLW link

The Importance of AI Alignment, explained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM

33 points

2 comments1 min readLW link

Near-Term Risks of an Obedient Artificial Intelligence

ymeskhoutFeb 18, 2023, 6:30 PM

20 points

1 comment6 min readLW link

Should we cry “wolf”?

TapataktFeb 18, 2023, 11:24 AM

24 points

5 comments1 min readLW link

The public supports regulating AI for safety

Zach Stein-PerlmanFeb 17, 2023, 4:10 AM

114 points

9 comments1 min readLW link

(aiimpacts.org)

AGI doesn’t need understanding, intention, or consciousness in order to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM

10 points

2 comments18 min readLW link

Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?

Christopher KingFeb 20, 2023, 3:11 PM

27 points

15 comments1 min readLW link

Deceptive Alignment is <1% Likely by Default

DavidWFeb 21, 2023, 3:09 PM

89 points

31 comments14 min readLW link 1 review

Announcing aisafety.training

JJ HepburnJan 21, 2023, 1:01 AM

61 points

4 comments1 min readLW link

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher KingFeb 22, 2023, 4:49 PM

1 point

7 comments1 min readLW link

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results

Esben Kran, Fazl, Sabrina Zaki, gabrielrecc and rz2383

Feb 23, 2023, 10:48 AM

8 points

0 comments6 min readLW link

Research proposal: Leveraging Jungian archetypes to create values-based models

MiguelDevMar 5, 2023, 5:39 PM

5 points

2 comments2 min readLW link

Retrospective on the 2022 Conjecture AI Discussions

Andrea_MiottiFeb 24, 2023, 10:41 PM

90 points

5 comments2 min readLW link

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Andrea_Miotti, paulfchristiano, Gabriel Alfour and OliviaJ

Feb 24, 2023, 11:03 PM

61 points

7 comments47 min readLW link

[Question] Pink Shoggoths: What does alignment look like in practice?

Yuli_BanFeb 25, 2023, 12:23 PM

25 points

13 comments11 min readLW link

[Question] Would more model evals teams be good?

Ryan KiddFeb 25, 2023, 10:01 PM

20 points

4 comments1 min readLW link

Curiosity as a Solution to AGI Alignment

Harsha G.Feb 26, 2023, 11:36 PM

7 points

7 comments3 min readLW link

Taboo “human-level intelligence”

SherrinfordFeb 26, 2023, 8:42 PM

12 points

7 comments1 min readLW link

The idea of an “aligned superintelligence” seems misguided

ssadlerFeb 27, 2023, 11:19 AM

6 points

7 comments3 min readLW link

(ssadler.substack.com)

Transcript: Yudkowsky on Bankless follow-up Q&A

vonkFeb 28, 2023, 3:46 AM

54 points

40 comments22 min readLW link

The burden of knowing

arisAlexisFeb 28, 2023, 6:40 PM

5 points

0 comments2 min readLW link

Call for Cruxes by Rhyme, a Longtermist History Consultancy

LaraMar 1, 2023, 6:39 PM

1 point

0 comments3 min readLW link

(forum.effectivealtruism.org)

Reflection Mechanisms as an Alignment Target—Attitudes on “near-term” AI

elandgre, Beth Barnes and Marius Hobbhahn

Mar 2, 2023, 4:29 AM

21 points

0 comments8 min readLW link

Consciousness is irrelevant—instead solve alignment by asking this question

Oliver SiegelMar 4, 2023, 10:06 PM

−10 points

6 comments1 min readLW link

Why kill everyone?

arisAlexisMar 5, 2023, 11:53 AM

7 points

5 comments2 min readLW link

Is it time to talk about AI doomsday prepping yet?

bokovMar 5, 2023, 9:17 PM

0 points

8 comments1 min readLW link

Who Aligns the Alignment Researchers?

Ben SmithMar 5, 2023, 11:22 PM

48 points

0 comments11 min readLW link

Cap Model Size for AI Safety

research_prime_spaceMar 6, 2023, 1:11 AM

0 points

4 comments1 min readLW link

Introducing AI Alignment Inc., a California public benefit corporation...

TherapistAIMar 7, 2023, 6:47 PM

1 point

4 comments1 min readLW link

[Question] What‘s in your list of unsolved problems in AI alignment?

jacquesthibsMar 7, 2023, 6:58 PM

60 points

9 comments1 min readLW link

Podcast Transcript: Daniela and Dario Amodei on Anthropic

rememberMar 7, 2023, 4:47 PM

46 points

2 comments79 min readLW link

(futureoflife.org)

Why Uncontrollable AI Looks More Likely Than Ever

otto.barten and Roman_Yampolskiy

Mar 8, 2023, 3:41 PM

18 points

0 comments4 min readLW link

(time.com)

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurBMar 9, 2023, 9:26 AM

140 points

33 comments2 min readLW link

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:34 PM

17 points

1 comment22 min readLW link

(www.anthropic.com)

Value drift threat models

Garrett BakerMay 12, 2023, 11:03 PM

27 points

4 comments5 min readLW link

Everything’s normal until it’s not

Eleni AngelouMar 10, 2023, 2:02 AM

7 points

0 comments3 min readLW link

Is AI Safety dropping the ball on privacy?

markovSep 13, 2023, 1:07 PM

50 points

17 comments7 min readLW link

The humanity’s biggest mistake

RomanSMar 10, 2023, 4:30 PM

0 points

1 comment2 min readLW link

On taking AI risk seriously

Eleni AngelouMar 13, 2023, 5:50 AM

6 points

0 comments1 min readLW link

(www.nytimes.com)

We don’t trade with ants

KatjaGraceJan 10, 2023, 11:50 PM

271 points

109 comments7 min readLW link 1 review

(worldspiritsockpuppet.com)

Linkpost: A tale of 2.5 orthogonality theses

DavidWMar 13, 2023, 2:19 PM

9 points

3 comments1 min readLW link

(forum.effectivealtruism.org)

Linkpost: A Contra AI FOOM Reading List

DavidWMar 13, 2023, 2:45 PM

25 points

4 comments1 min readLW link

(magnusvinding.com)

Linkpost: ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting

DavidWMar 13, 2023, 4:52 PM

6 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Could Roko’s basilisk acausally bargain with a paperclip maximizer?

Christopher KingMar 13, 2023, 6:21 PM

1 point

8 comments1 min readLW link

A better analogy and example for teaching AI takeover: the ML Inferno

Christopher KingMar 14, 2023, 7:14 PM

18 points

0 comments5 min readLW link

New economic system for AI era

ksme shoMar 17, 2023, 5:42 PM

−1 points

1 comment5 min readLW link

Survey on intermediate goals in AI governance

MichaelA and MaxRa

Mar 17, 2023, 1:12 PM

25 points

3 comments1 min readLW link

(retired article) AGI With Internet Access: Why we won’t stuff the genie back in its bottle.

Max TKMar 18, 2023, 3:43 AM

5 points

10 comments4 min readLW link

An Appeal to AI Superintelligence: Reasons Not to Preserve (most of) Humanity

Alex BeymanMar 22, 2023, 4:09 AM

−14 points

6 comments19 min readLW link

Humanity’s Lack of Unity Will Lead to AGI Catastrophe

MiguelDevMar 19, 2023, 7:18 PM

3 points

2 comments4 min readLW link

[Question] Wouldn’t an intelligent agent keep us alive and help us align itself to our values in order to prevent risk ? by Risk I mean experimentation by trying to align potentially smarter replicas?

Terrence RotoufleMar 21, 2023, 5:44 PM

−3 points

1 comment2 min readLW link

[Question] What does pulling the fire alarm look like?

nemMar 20, 2023, 9:45 PM

2 points

0 comments1 min readLW link

Biosecurity and AI: Risks and Opportunities

Steve NewmanFeb 27, 2024, 6:45 PM

11 points

1 comment7 min readLW link

(www.safe.ai)

Is Alignment enough?

gchuAug 16, 2024, 11:46 AM

1 point

0 comments1 min readLW link

The convergent dynamic we missed

RemmeltDec 12, 2023, 11:19 PM

2 points

2 comments1 min readLW link

Corporate Governance for Frontier AI Labs: A Research Agenda

Matthew WeardenFeb 28, 2024, 11:29 AM

4 points

0 comments16 min readLW link

(matthewwearden.co.uk)

AI Safety proposal—Influencing the superintelligence explosion

MorganMay 22, 2024, 11:31 PM

0 points

2 comments7 min readLW link

Post series on “Liability Law for reducing Existential Risk from AI”

Nora_AmmannFeb 29, 2024, 4:39 AM

42 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

Incremental AI Risks from Proxy-Simulations

kmenouDec 19, 2023, 6:56 PM

2 points

0 comments1 min readLW link

(individual.utoronto.ca)

Morality as Cooperation Part I: Humans

DeLesley HutchinsDec 5, 2024, 8:16 AM

5 points

0 comments19 min readLW link

On the future of language models

owencbDec 20, 2023, 4:58 PM

105 points

17 comments1 min readLW link

What should AI safety be trying to achieve?

EuanMcLeanMay 23, 2024, 11:17 AM

17 points

1 comment13 min readLW link

Morality as Cooperation Part II: Theory and Experiment

DeLesley HutchinsDec 5, 2024, 9:04 AM

2 points

0 comments17 min readLW link

Morality as Cooperation Part III: Failure Modes

DeLesley HutchinsDec 5, 2024, 9:39 AM

4 points

0 comments20 min readLW link

Big Picture AI Safety: Introduction

EuanMcLeanMay 23, 2024, 11:15 AM

46 points

7 comments5 min readLW link

“Destroy humanity” as an immediate subgoal

Seth AhrenbachDec 22, 2023, 6:52 PM

3 points

13 comments3 min readLW link

AI safety advocates should consider providing gentle pushback following the events at OpenAI

civilsocietyDec 22, 2023, 6:55 PM

16 points

5 comments3 min readLW link

What will the first human-level AI look like, and how might things go wrong?

EuanMcLeanMay 23, 2024, 11:17 AM

20 points

2 comments15 min readLW link

What mistakes has the AI safety movement made?

EuanMcLeanMay 23, 2024, 11:19 AM

64 points

29 comments12 min readLW link

More Thoughts on the Human-AGI War

Seth AhrenbachDec 27, 2023, 1:03 AM

−3 points

4 comments7 min readLW link

Reflecting on the transhumanist rebuttal to AI existential risk and critique of our debate methodologies and misuse of statistics

catgirlsruletheworldAug 20, 2024, 1:59 AM

−5 points

0 comments4 min readLW link

Something Is Lost When AI Makes Art

utilistrutilAug 18, 2024, 10:53 PM

17 points

1 comment11 min readLW link

Planning to build a cryptographic box with perfect secrecy

Lysandre TerrisseDec 31, 2023, 9:31 AM

40 points

6 comments11 min readLW link

Investigating Alternative Futures: Human and Superintelligence Interaction Scenarios

Hiroshi YamakawaJan 3, 2024, 11:46 PM

1 point

0 comments17 min readLW link

Does AI care about reality or just its own perception?

RedFishBlueFishJan 5, 2024, 4:05 AM

−6 points

8 comments1 min readLW link

Towards AI Safety Infrastructure: Talk & Outline

Paul BricmanJan 7, 2024, 9:31 AM

11 points

0 comments2 min readLW link

(www.youtube.com)

AI demands unprecedented reliability

JonoJan 9, 2024, 4:30 PM

22 points

5 comments2 min readLW link

Survey of 2,778 AI authors: six parts in pictures

KatjaGraceJan 6, 2024, 4:43 AM

80 points

1 comment2 min readLW link

[Question] What’s the protocol for if a novice has ML ideas that are unlikely to work, but might improve capabilities if they do work?

droctaJan 9, 2024, 10:51 PM

6 points

2 comments2 min readLW link

Two Tales of AI Takeover: My Doubts

Violet HourMar 5, 2024, 3:51 PM

30 points

8 comments29 min readLW link

Disproving and partially fixing a fully homomorphic encryption scheme with perfect secrecy

Lysandre TerrisseMay 26, 2024, 2:56 PM

16 points

1 comment18 min readLW link

The Underreaction to OpenAI

SherrinfordJan 18, 2024, 10:08 PM

21 points

0 comments6 min readLW link

Brainstorming: Slow Takeoff

David PiepgrassJan 23, 2024, 6:58 AM

3 points

0 comments51 min readLW link

OpenAI Credit Account (2510$)

Emirhan BULUTJan 21, 2024, 2:32 AM

1 point

0 comments1 min readLW link

Announcing Convergence Analysis: An Institute for AI Scenario & Governance Research

David_Kristoffersson and Deric Cheng

Mar 7, 2024, 9:37 PM

23 points

1 comment4 min readLW link

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

StellaAthenaJan 25, 2024, 7:17 PM

94 points

14 comments1 min readLW link

(www.rand.org)

What Failure Looks Like is not an existential risk (and alignment is not the solution)

otto.bartenFeb 2, 2024, 6:59 PM

13 points

12 comments9 min readLW link

Announcing the London Initiative for Safe AI (LISA)

James Fox, mike_safeAI and Ryan Kidd

Feb 2, 2024, 11:17 PM

98 points

0 comments9 min readLW link

Why I think it’s net harmful to do technical safety research at AGI labs

RemmeltFeb 7, 2024, 4:17 AM

26 points

24 comments1 min readLW link

Scenario planning for AI x-risk

Corin KatzkeFeb 10, 2024, 12:14 AM

24 points

12 comments14 min readLW link

(forum.effectivealtruism.org)

Carl Shulman On Dwarkesh Podcast June 2023

MoonickerFeb 11, 2024, 9:02 PM

18 points

0 comments159 min readLW link

Tort Law Can Play an Important Role in Mitigating AI Risk

Gabriel WeilFeb 12, 2024, 5:17 PM

39 points

9 comments5 min readLW link

The Astronomical Sacrifice Dilemma

Matthew McRedmondMar 11, 2024, 7:58 PM

15 points

3 comments4 min readLW link

AI Incident Reporting: A Regulatory Review

Deric Cheng and Elliot Mckernon

Mar 11, 2024, 9:03 PM

16 points

0 comments6 min readLW link

Darwinian Traps and Existential Risks

KristianRonnAug 25, 2024, 10:37 PM

85 points

14 comments10 min readLW link

Thoughts on the Feasibility of Prosaic AGI Alignment?

iamthouthouartiAug 21, 2020, 11:25 PM

8 points

10 comments1 min readLW link

The Shutdown Problem: Incomplete Preferences as a Solution

EJTFeb 23, 2024, 4:01 PM

52 points

33 comments42 min readLW link

Controlling AGI Risk

TeaSeaMar 15, 2024, 4:56 AM

6 points

8 comments4 min readLW link

AI-generated opioids could be a catastrophic risk

ejk64Mar 20, 2024, 5:48 PM

0 points

2 comments3 min readLW link

Static vs Dynamic Alignment

Gracie GreenMar 21, 2024, 5:44 PM

5 points

0 comments29 min readLW link

Timelines to Transformative AI: an investigation

Zershaaneh QureshiMar 26, 2024, 6:28 PM

20 points

2 comments50 min readLW link

Open-ended ethics of phenomena (a desiderata with universal morality)

Ryo Nov 8, 2023, 8:10 PM

1 point

0 comments8 min readLW link

Artificial Intelligence and Living Wisdom

TMFOWMar 29, 2024, 7:41 AM

−6 points

1 comment17 min readLW link

(tmfow.substack.com)

[Question] Will OpenAI also require a “Super Red Team Agent” for its “Superalignment” Project?

Super AGIMar 30, 2024, 5:25 AM

2 points

2 comments1 min readLW link

Death with Awesomeness

osmarksApr 1, 2024, 8:24 PM

3 points

2 comments2 min readLW link

Thousands of malicious actors on the future of AI misuse

Zershaaneh Qureshi, Corin Katzke and Convergence Analysis

Apr 1, 2024, 10:08 AM

37 points

0 comments1 min readLW link

A Semiotic Critique of the Orthogonality Thesis

Nicolas VillarrealJun 4, 2024, 6:52 PM

3 points

10 comments15 min readLW link

$250K in Prizes: SafeBench Competition Announcement

ozhangApr 3, 2024, 10:07 PM

26 points

0 comments1 min readLW link

I Have No Mouth but I Must Speak

JackApr 5, 2025, 7:42 AM

7 points

8 comments8 min readLW link

[Question] How does the ever-increasing use of AI in the military for the direct purpose of murdering people affect your p(doom)?

JustausernameApr 6, 2024, 6:31 AM

19 points

16 comments1 min readLW link

Why empiricists should believe in AI risk

Knight LeeDec 11, 2024, 3:51 AM

5 points

0 comments1 min readLW link

[Question] fake alignment solutions????

KvmanThinkingDec 11, 2024, 3:31 AM

1 point

6 comments1 min readLW link

[Question] Can singularity emerge from transformers?

MPApr 8, 2024, 2:26 PM

−3 points

1 comment1 min readLW link

Announcing Atlas Computing

miyazonoApr 11, 2024, 3:56 PM

44 points

4 comments4 min readLW link

I created an Asi Alignment Tier List

TimeGoatApr 21, 2024, 6:44 PM

−6 points

0 comments1 min readLW link

Takeoff speeds presentation at Anthropic

Tom DavidsonJun 4, 2024, 10:46 PM

92 points

0 comments25 min readLW link

LLMs seem (relatively) safe

JustisMillsApr 25, 2024, 10:13 PM

53 points

24 comments7 min readLW link

(justismills.substack.com)

The potential Ai risk

The White DeathMay 7, 2024, 8:31 PM

1 point

0 comments1 min readLW link

Is There a Power Play Overhang?

crispweedMay 8, 2024, 5:39 PM

3 points

0 comments1 min readLW link

(upcoder.com)

Announcing the AI Safety Summit Talks with Yoshua Bengio

otto.bartenMay 14, 2024, 12:52 PM

9 points

1 comment1 min readLW link

MIT FutureTech are hiring for an Operations and Project Management role.

peterslatteryMay 17, 2024, 11:21 PM

2 points

0 comments3 min readLW link

AI 2030 – AI Policy Roadmap

LTMMay 17, 2024, 11:29 PM

8 points

0 comments1 min readLW link

Giving away your predictions

[Error communicating with LW2 server]Sep 8, 2024, 1:11 PM

1 point

0 comments1 min readLW link

Labor Participation is a High-Priority AI Alignment Risk

alexJun 17, 2024, 6:09 PM

6 points

0 comments17 min readLW link

Proposing the Post-Singularity Symbiotic Researches

Hiroshi YamakawaJun 20, 2024, 4:05 AM

7 points

1 comment12 min readLW link

Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data

Johannes Treutlein and Owain_Evans

Jun 21, 2024, 3:54 PM

163 points

13 comments8 min readLW link

(arxiv.org)

Grey Goo Requires AI

harsimonyJan 15, 2021, 4:45 AM

8 points

11 comments4 min readLW link

(harsimony.wordpress.com)

A necessary Membrane formalism feature

ThomasCederborgSep 10, 2024, 9:33 PM

20 points

6 comments11 min readLW link

Ted Kaczyinski proves instrumental convergence?

xXAlphaSigmaXxJun 28, 2024, 3:50 AM

0 points

0 comments1 min readLW link

Analysis of key AI analogies

Kevin KohlerJun 29, 2024, 10:55 AM

10 points

2 comments15 min readLW link

Review of METR’s public evaluation protocol

nahoj and JaimeRV

Jun 30, 2024, 10:03 PM

10 points

0 comments5 min readLW link

Towards shutdownable agents via stochastic choice

EJT, alexr, christosi and LAThomson

Jul 8, 2024, 10:14 AM

59 points

11 comments23 min readLW link

(arxiv.org)

[Question] Self-censoring on AI x-risk discussions?

DecaeneusJul 1, 2024, 6:24 PM

17 points

2 comments1 min readLW link

I read every major AI lab’s safety plan so you don’t have to

sarahhwDec 16, 2024, 6:51 PM

20 points

0 comments12 min readLW link

(longerramblings.substack.com)

Intelligence–Agency Equivalence ≈ Mass–Energy Equivalence: On Static Nature of Intelligence & Physicalization of Ethics

ankFeb 22, 2025, 12:12 AM

1 point

0 comments6 min readLW link

The Compendium, A full argument about extinction risk from AGI

adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell and Andrea_Miotti

Oct 31, 2024, 12:01 PM

194 points

52 comments2 min readLW link

(www.thecompendium.ai)

AIS Hungary is hiring a part-time Technical Lead! (Deadline: Dec 31st)

gergogasparDec 17, 2024, 2:12 PM

1 point

0 comments2 min readLW link

Yoshua Bengio: Reasoning through arguments against taking AI safety seriously

Judd RosenblattJul 11, 2024, 11:53 PM

70 points

3 comments1 min readLW link

(yoshuabengio.org)

Alignment: “Do what I would have wanted you to do”

Oleg TrottJul 12, 2024, 4:47 PM

11 points

48 comments1 min readLW link

A Solution for AGI/ASI Safety

Weibing WangDec 18, 2024, 7:44 PM

50 points

29 comments1 min readLW link

The AI alignment problem in socio-technical systems from a computational perspective: A Top-Down-Top view and outlook

zhaoweizhangJul 15, 2024, 6:56 PM

3 points

0 comments9 min readLW link

AI Apocalypse and the Buddha

pchvykovFeb 22, 2025, 4:33 PM

−17 points

6 comments9 min readLW link

Recursion in AI is scary. But let’s talk solutions.

Oleg TrottJul 16, 2024, 8:34 PM

3 points

10 comments2 min readLW link

[Question] Would a scope-insensitive AGI be less likely to incapacitate humanity?

Jim BuhlerJul 21, 2024, 2:15 PM

2 points

3 comments1 min readLW link

The $100B plan with “70% risk of killing us all” w Stephen Fry [video]

Oleg TrottJul 21, 2024, 8:06 PM

36 points

8 comments1 min readLW link

(www.youtube.com)

How reasonable is taking extinction risk?

FVeldeJul 23, 2024, 6:05 PM

2 points

4 comments4 min readLW link

Knowledge Base 8: The truth as an attractor in the information space

iwisApr 25, 2024, 3:28 PM

−8 points

0 comments2 min readLW link

AI existential risk probabilities are too unreliable to inform policy

Oleg TrottJul 28, 2024, 12:59 AM

18 points

5 comments1 min readLW link

(www.aisnakeoil.com)

[Question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?

kalerJul 28, 2024, 12:23 PM

10 points

14 comments1 min readLW link

The need for multi-agent experiments

Martín SotoAug 1, 2024, 5:14 PM

43 points

3 comments9 min readLW link

Ethical Deception: Should AI Ever Lie?

Jason ReidAug 2, 2024, 5:53 PM

5 points

2 comments7 min readLW link

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom

GhdzAug 5, 2024, 6:27 PM

6 points

2 comments7 min readLW link

[Question] Is an AI religion justified?

p4rziv4lAug 6, 2024, 3:42 PM

−35 points

11 comments1 min readLW link

Case Story: Lack of Consumer Protection Procedures AI Manipulation and the Threat of Fund Concentration in Crypto Seeking Assistance to Fund a Civil Case to Establish Facts and Protect Vulnerable Consumers from Damage Caused by Automated Systems

Petr 'Margot' AndreevAug 8, 2024, 5:55 AM

−9 points

0 comments9 min readLW link

[Question] Does VETLM solve AI superalignment?

Oleg TrottAug 8, 2024, 6:22 PM

−1 points

10 comments1 min readLW link

The Consciousness Conundrum: Why We Can’t Dismiss Machine Sentience

SystematicApproachAug 13, 2024, 6:01 PM

−22 points

1 comment3 min readLW link

Critique of ‘Many People Fear A.I. They Shouldn’t’ by David Brooks.

Axel AhlqvistAug 15, 2024, 6:38 PM

12 points

8 comments3 min readLW link

An Alternate History of the Future, 2025-2040

Mr BeastlyFeb 24, 2025, 5:53 AM

3 points

3 comments10 min readLW link

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

Yohan Mathew, joanv, robert mccarthy, ollie, Nandi and Dylan Cope

Sep 25, 2024, 2:52 PM

37 points

2 comments4 min readLW link

(arxiv.org)

The Overlooked Necessity of Complete Semantic Representation in AI Safety and Alignment

williamsaeAug 15, 2024, 7:42 PM

−1 points

0 comments3 min readLW link

AGI’s Opposing Force

SimonBaarsAug 16, 2024, 4:18 AM

9 points

2 comments1 min readLW link

An “Observatory” For a Shy Super AI?

SherrinfordSep 27, 2024, 9:22 PM

5 points

0 comments1 min readLW link

(robreid.substack.com)

Knowledge Base 1: Could it increase intelligence and make it safer?

iwisSep 30, 2024, 4:00 PM

−4 points

0 comments4 min readLW link

Does natural selection favor AIs over humans?

cdkgOct 3, 2024, 6:47 PM

20 points

1 comment1 min readLW link

(link.springer.com)

[Question] Seeking AI Alignment Tutor/Advisor: $100–150/hr

MrThinkOct 5, 2024, 9:28 PM

26 points

3 comments2 min readLW link

Limits of safe and aligned AI

ShivamOct 8, 2024, 9:30 PM

2 points

0 comments4 min readLW link

Palisade is hiring: Exec Assistant, Content Lead, Ops Lead, and Policy Lead

Charlie Rogers-SmithOct 9, 2024, 12:04 AM

11 points

0 comments4 min readLW link

Geoffrey Hinton on the Past, Present, and Future of AI

Stephen McAleeseOct 12, 2024, 4:41 PM

22 points

5 comments18 min readLW link

Factoring P(doom) into a bayesian network

Joseph GardiOct 17, 2024, 5:55 PM

1 point

0 comments1 min readLW link

Lenses of Control

WillPetilloOct 22, 2024, 7:51 AM

14 points

0 comments9 min readLW link

Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded

garrisonOct 23, 2024, 11:40 PM

118 points

1 comment7 min readLW link

(garrisonlovely.substack.com)

Dario Amodei’s “Machines of Loving Grace” sound incredibly dangerous, for Humans

Super AGIOct 27, 2024, 5:05 AM

8 points

1 comment1 min readLW link

The Last Light

Bridgett KayApr 14, 2025, 3:41 PM

19 points

0 comments4 min readLW link

AI as a powerful meme, via CGP Grey

TheManxLoinerOct 30, 2024, 6:31 PM

46 points

8 comments4 min readLW link

Navigating public AI x-risk hype while pursuing technical solutions

Dan BraunFeb 19, 2023, 12:22 PM

18 points

0 comments2 min readLW link

Tetherware #2: What every human should know about our most likely AI future

Jáchym FibírFeb 28, 2025, 11:12 AM

3 points

0 comments11 min readLW link

(tetherware.substack.com)

What AI safety researchers can learn from Mahatma Gandhi

Lysandre TerrisseNov 8, 2024, 7:49 PM

−6 points

0 comments3 min readLW link

Thoughts after the Wolfram and Yudkowsky discussion

TahpNov 14, 2024, 1:43 AM

25 points

13 comments6 min readLW link

Confronting the legion of doom.

Spiritus DeiNov 13, 2024, 5:03 PM

−20 points

3 comments5 min readLW link

Proposing the Conditional AI Safety Treaty (linkpost TIME)

otto.bartenNov 15, 2024, 1:59 PM

10 points

8 comments3 min readLW link

(time.com)

Truth Terminal: A reconstruction of events

crvr.fr and MTorrents

Nov 17, 2024, 11:51 PM

3 points

1 comment7 min readLW link

[Question] Have we seen any “ReLU instead of sigmoid-type improvements” recently

KvmanThinkingNov 23, 2024, 3:51 AM

2 points

4 comments1 min readLW link

A better “Statement on AI Risk?”

Knight LeeNov 25, 2024, 4:50 AM

9 points

6 comments3 min readLW link

Why Recursive Self-Improvement Might Not Be the Existential Risk We Fear

Nassim_ANov 24, 2024, 5:17 PM

1 point

0 comments9 min readLW link

Taking Away the Guns First: The Fundamental Flaw in AI Development

s-iceNov 26, 2024, 10:11 PM

1 point

0 comments17 min readLW link

Hope to live or fear to die?

Knight LeeNov 27, 2024, 10:42 AM

3 points

0 comments1 min readLW link

How to solve the misuse problem assuming that in 10 years the default scenario is that AGI agents are capable of synthetizing pathogens

jeremttiNov 27, 2024, 9:17 PM

6 points

0 comments9 min readLW link

METR’s preliminary evaluation of o3 and o4-mini

Christopher KingApr 16, 2025, 8:23 PM

12 points

2 comments1 min readLW link

(metr.github.io)

Human study on AI spear phishing campaigns

Simon Lermen, Fred Heiding and Andrew Kao

Jan 3, 2025, 3:11 PM

79 points

8 comments5 min readLW link

I Recommend More Training Rationales

Gianluca CalcagniDec 31, 2024, 2:06 PM

2 points

0 comments6 min readLW link

Alignment Is Not All You Need

Adam JonesJan 2, 2025, 5:50 PM

43 points

10 comments6 min readLW link

(adamjones.me)

The Intelligence Curse

lukedragoJan 3, 2025, 7:07 PM

123 points

26 comments18 min readLW link

(lukedrago.substack.com)

Logic vs intuition ⇔ algorithm vs ML

pchvykovJan 4, 2025, 9:06 AM

5 points

0 comments7 min readLW link

The “Everyone Can’t Be Wrong” Prior causes AI risk denial but helped prehistoric people

Knight LeeJan 9, 2025, 5:54 AM

1 point

0 comments2 min readLW link

Thoughts on the In-Context Scheming AI Experiment

ExCephJan 9, 2025, 2:19 AM

3 points

0 comments4 min readLW link

False Positives in Entity-Level Hallucination Detection: A Technical Challenge

MaxKamacheeJan 14, 2025, 7:22 PM

1 point

0 comments2 min readLW link

A problem shared by many different alignment targets

ThomasCederborgJan 15, 2025, 2:22 PM

12 points

16 comments36 min readLW link

AI Alignment Meme Viruses

RationalDinoJan 15, 2025, 3:55 PM

4 points

0 comments2 min readLW link

Experts’ AI timelines are longer than you have been told?

Vasco GriloJan 16, 2025, 6:03 PM

10 points

4 comments3 min readLW link

(bayes.net)

[Question] What do you mean with ‘alignment is solvable in principle’?

RemmeltJan 17, 2025, 3:03 PM

3 points

9 comments1 min readLW link

AI: How We Got Here—A Neuroscience Perspective

Mordechai RorvigJan 19, 2025, 11:51 PM

5 points

0 comments2 min readLW link

(www.kickstarter.com)

SIGMI Certification Criteria

a littoral wizardJan 20, 2025, 2:41 AM

6 points

0 comments1 min readLW link

The Human Alignment Problem for AIs

rifeJan 22, 2025, 4:06 AM

10 points

5 comments3 min readLW link

Six Thoughts on AI Safety

boazbarakJan 24, 2025, 10:20 PM

91 points

55 comments15 min readLW link

Safe Search is off: root causes of AI catastrophic risks

Jemal YoungJan 31, 2025, 6:22 PM

2 points

0 comments3 min readLW link

A god in a box

predict-wooJan 29, 2025, 12:55 AM

1 point

0 comments7 min readLW link

A critique of Soares “4 background claims”

YanLyutnevJan 27, 2025, 8:27 PM

−8 points

0 comments14 min readLW link

How To Prevent a Dystopia

ankJan 29, 2025, 2:16 PM

−3 points

4 comments1 min readLW link

[Question] Superintelligence Strategy: A Pragmatic Path to… Doom?

Mr BeastlyMar 19, 2025, 10:30 PM

6 points

0 comments3 min readLW link

AI acceleration, DeepSeek, moral philosophy

Josh HFeb 2, 2025, 12:08 AM

2 points

0 comments12 min readLW link

Alignment Can Reduce Performance on Simple Ethical Questions

Daan HenselmansFeb 3, 2025, 7:35 PM

15 points

7 comments6 min readLW link

The Universal Lockpick Hypothesis: A Structural Vulnerability of All Life

Cole HolinMar 23, 2025, 7:27 PM

1 point

0 comments2 min readLW link

Populectomy.ai

YonatanKMar 24, 2025, 10:06 PM

7 points

2 comments2 min readLW link

Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)

ankFeb 11, 2025, 3:21 AM

13 points

8 comments35 min readLW link

How AI Takeover Might Happen in 2 Years

joshcFeb 7, 2025, 5:10 PM

404 points

137 comments29 min readLW link

(x.com)

Scale Was All We Needed, At First

Gabe MFeb 14, 2024, 1:49 AM

295 points

34 comments8 min readLW link

(aiacumen.substack.com)

Capabilities Denial: The Danger of Underestimating AI

Christopher KingMar 21, 2023, 1:24 AM

6 points

5 comments3 min readLW link

Exploring the Precautionary Principle in AI Development: Historical Analogies and Lessons Learned

Christopher KingMar 21, 2023, 3:53 AM

−1 points

2 comments9 min readLW link

[Question] Employer considering partnering with major AI labs. What to do?

GraduallyMoreAgitatedMar 21, 2023, 5:43 PM

37 points

7 comments2 min readLW link

Key Questions for Digital Minds

Jacy Reese AnthisMar 22, 2023, 5:13 PM

22 points

0 comments7 min readLW link

(www.sentienceinstitute.org)

Why We MUST Create an AGI that Disempowers Humanity. For Real.

twkaiserMar 22, 2023, 11:01 PM

−17 points

1 comment4 min readLW link

ChatGPT’s “fuzzy alignment” isn’t evidence of AGI alignment: the banana test

Michael TontchevMar 23, 2023, 7:12 AM

23 points

6 comments4 min readLW link

Limit intelligent weapons

Lucas PfeiferMar 23, 2023, 5:54 PM

−11 points

36 comments1 min readLW link

GPT-4 aligning with acasual decision theory when instructed to play games, but includes a CDT explanation that’s incorrect if they differ

Christopher KingMar 23, 2023, 4:16 PM

7 points

4 comments8 min readLW link

Grinding slimes in the dungeon of AI alignment research

Max HMar 24, 2023, 4:51 AM

10 points

2 comments4 min readLW link

Does GPT-4 exhibit agency when summarizing articles?

Christopher KingMar 24, 2023, 3:49 PM

16 points

2 comments5 min readLW link

More experiments in GPT-4 agency: writing memos

Christopher KingMar 24, 2023, 5:51 PM

5 points

2 comments10 min readLW link

ChatGPT Plugins—The Beginning of the End

Bary LevyMar 25, 2023, 11:45 AM

15 points

4 comments1 min readLW link

[Question] How Politics interacts with AI ?

qbolecMar 26, 2023, 9:53 AM

−11 points

4 comments1 min readLW link

What can we learn from Lex Fridman’s interview with Sam Altman?

Karl von WendtMar 27, 2023, 6:27 AM

56 points

22 comments9 min readLW link

Half-baked alignment idea

ozbMar 28, 2023, 5:47 PM

6 points

27 comments1 min readLW link

Adapting to Change: Overcoming Chronostasis in AI Language Models

RationalMindsetMar 28, 2023, 2:32 PM

−1 points

0 comments6 min readLW link

I had a chat with GPT-4 on the future of AI and AI safety

Kristian FreedMar 28, 2023, 5:47 PM

1 point

0 comments8 min readLW link

“Unintentional AI safety research”: Why not systematically mine AI technical research for safety purposes?

Jemal YoungMar 29, 2023, 3:56 PM

27 points

3 comments6 min readLW link

I made AI Risk Propaganda

monkymindMar 29, 2023, 2:26 PM

−3 points

0 comments1 min readLW link

“Sorcerer’s Apprentice” from Fantasia as an analogy for alignment

awgMar 29, 2023, 6:21 PM

9 points

4 comments1 min readLW link

(video.disney.com)

Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibsMar 29, 2023, 11:16 PM

291 points

297 comments3 min readLW link

(time.com)

Alignment—Path to AI as ally, not slave nor foe

ozbMar 30, 2023, 2:54 PM

10 points

3 comments2 min readLW link

Widening Overton Window—Open Thread

PrometheusMar 31, 2023, 10:03 AM

23 points

8 comments1 min readLW link

Why Yudkowsky Is Wrong And What He Does Can Be More Dangerous

idontagreewiththatJun 6, 2023, 5:59 PM

−38 points

4 comments3 min readLW link

GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2

Christopher KingMar 31, 2023, 5:05 PM

6 points

4 comments4 min readLW link

Imagine a world where Microsoft employees used Bing

Christopher KingMar 31, 2023, 6:36 PM

6 points

2 comments2 min readLW link

Is AGI suicidality the golden ray of hope?

Alex KirkoApr 4, 2023, 11:29 PM

−18 points

4 comments1 min readLW link

AI community building: EliezerKart

Christopher KingApr 1, 2023, 3:25 PM

45 points

0 comments2 min readLW link

Pessimism about AI Safety

Max_He-Ho and Peter Kuhn

Apr 2, 2023, 7:43 AM

4 points

1 comment25 min readLW link

AI Safety via Luck

JozdienApr 1, 2023, 8:13 PM

82 points

7 comments11 min readLW link

The AI governance gaps in developing countries

ntranJun 17, 2023, 2:50 AM

20 points

1 comment14 min readLW link

How to safely use an optimizer

Simon FischerMar 28, 2024, 4:11 PM

47 points

21 comments7 min readLW link

Exploring non-anthropocentric aspects of AI existential safety

mishkaApr 3, 2023, 6:07 PM

8 points

0 comments3 min readLW link

Steering systems

Max HApr 4, 2023, 12:56 AM

50 points

1 comment15 min readLW link

ICA Simulacra

OzyrusApr 5, 2023, 6:41 AM

26 points

2 comments7 min readLW link

Against sacrificing AI transparency for generality gains

Ape in the coatMay 7, 2023, 6:52 AM

4 points

0 comments2 min readLW link

Superintelligence will outsmart us or it isn’t superintelligence

Neil Apr 3, 2023, 3:01 PM

−4 points

4 comments1 min readLW link

Do we have a plan for the “first critical try” problem?

Christopher KingApr 3, 2023, 4:27 PM

−3 points

14 comments1 min readLW link

Towards empathy in RL agents and beyond: Insights from cognitive science for AI Alignment

Marc CarauleanuApr 3, 2023, 7:59 PM

15 points

6 comments1 min readLW link

(clipchamp.com)

[Question] Does it become easier, or harder, for the world to coordinate around not building AGI as time goes on?

Eli TyreJul 29, 2019, 10:59 PM

86 points

31 comments3 min readLW link 2 reviews

Strategies to Prevent AI Annihilation

lastchanceformankindApr 4, 2023, 8:59 AM

−2 points

0 comments4 min readLW link

AGI deployment as an act of aggression

dr_sApr 5, 2023, 6:39 AM

28 points

30 comments13 min readLW link

[Question] Daisy-chaining epsilon-step verifiers

DecaeneusApr 6, 2023, 2:07 AM

2 points

1 comment1 min readLW link

One Does Not Simply Replace the Humans

JerkyTreatsApr 6, 2023, 8:56 PM

9 points

3 comments4 min readLW link

(www.lesswrong.com)

OpenAI: Our approach to AI safety

Jacob G-WApr 5, 2023, 8:26 PM

1 point

1 comment1 min readLW link

(openai.com)

Williams-Beuren Syndrome: Frendly Mutations

TakkApr 5, 2023, 8:59 PM

−1 points

1 comment1 min readLW link

Yoshua Bengio: “Slowing down development of AI systems passing the Turing test”

Roman LeventovApr 6, 2023, 3:31 AM

49 points

2 comments5 min readLW link

(yoshuabengio.org)

A decade of lurking, a month of posting

Max HApr 9, 2023, 12:21 AM

70 points

4 comments5 min readLW link

Alignment of AutoGPT agents

OzyrusApr 12, 2023, 12:54 PM

14 points

1 comment4 min readLW link

Why I’m not worried about imminent doom

kwiat.devApr 10, 2023, 3:31 PM

7 points

2 comments4 min readLW link

Measuring artificial intelligence on human benchmarks is naive

AnomalousApr 11, 2023, 11:34 AM

11 points

4 comments1 min readLW link

(forum.effectivealtruism.org)

In favor of accelerating problems you’re trying to solve

Christopher KingApr 11, 2023, 6:15 PM

2 points

2 comments4 min readLW link

AI Risk US Presidential Candidate

Simon BerensApr 11, 2023, 7:31 PM

5 points

3 comments1 min readLW link

Open-source LLMs may prove Bostrom’s vulnerable world hypothesis

Roope AhvenharjuApr 15, 2023, 7:16 PM

1 point

1 comment1 min readLW link

Artificial Intelligence as exit strategy from the age of acute existential risk

Arturo MaciasApr 12, 2023, 2:48 PM

−7 points

15 comments7 min readLW link

AGI goal space is big, but narrowing might not be as hard as it seems.

Jacy Reese AnthisApr 12, 2023, 7:03 PM

15 points

0 comments3 min readLW link

Polluting the agentic commons

hamandcheeseApr 13, 2023, 5:42 PM

7 points

4 comments2 min readLW link

(www.secondbest.ca)

The Virus—Short Story

Michael SoareverixApr 13, 2023, 6:18 PM

4 points

0 comments4 min readLW link

On the possibility of impossibility of AGI Long-Term Safety

Roman YenMay 13, 2023, 6:38 PM

8 points

3 comments9 min readLW link

Speculation on mapping the moral landscape for future Ai Alignment

Sven Heinz (Welwordion)Apr 16, 2023, 1:43 PM

1 point

0 comments1 min readLW link

On urgency, priority and collective reaction to AI-Risks: Part I

DenreikApr 16, 2023, 7:14 PM

−10 points

15 comments5 min readLW link

AGI Clinics: A Safe Haven for Humanity’s First Encounters with Superintelligence

portr.Apr 17, 2023, 1:52 AM

−5 points

1 comment1 min readLW link

Defining Boundaries on Outcomes

TakkJun 7, 2023, 5:41 PM

1 point

0 comments1 min readLW link

No, really, it predicts next tokens.

simonApr 18, 2023, 3:47 AM

58 points

55 comments3 min readLW link

Prediction: any uncontrollable AI will turn earth into a giant computer

Karl von WendtApr 17, 2023, 12:30 PM

11 points

8 comments3 min readLW link

What is your timelines for ADI (artificial disempowering intelligence)?

Christopher KingApr 17, 2023, 5:01 PM

3 points

3 comments2 min readLW link

Green goo is plausible

anithiteApr 18, 2023, 12:04 AM

62 points

31 comments4 min readLW link 1 review

World and Mind in Artificial Intelligence: arguments against the AI pause

Arturo MaciasApr 18, 2023, 2:40 PM

1 point

0 comments1 min readLW link

(forum.effectivealtruism.org)

AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media

ozhang, Dan H and Orpheus16

Apr 18, 2023, 6:44 PM

30 points

0 comments4 min readLW link

(newsletter.safe.ai)

I Believe I Know Why AI Models Hallucinate

Richard AragonApr 19, 2023, 9:07 PM

−10 points

6 comments7 min readLW link

(turingssolutions.com)

[Crosspost] Organizing a debate with experts and MPs to raise AI xrisk awareness: a possible blueprint

otto.bartenApr 19, 2023, 11:45 AM

8 points

0 comments4 min readLW link

(forum.effectivealtruism.org)

How to express this system for ethically aligned AGI as a Mathematical formula?

Oliver SiegelApr 19, 2023, 8:13 PM

−1 points

0 comments1 min readLW link

[Question] Is there any literature on using socialization for AI alignment?

Nathan1123Apr 19, 2023, 10:16 PM

10 points

9 comments2 min readLW link

Stability AI releases StableLM, an open-source ChatGPT counterpart

OzyrusApr 20, 2023, 6:04 AM

11 points

3 comments1 min readLW link

(github.com)

Ideas for studies on AGI risk

dr_sApr 20, 2023, 6:17 PM

5 points

1 comment11 min readLW link

Proposal: Using Monte Carlo tree search instead of RLHF for alignment research

Christopher KingApr 20, 2023, 7:57 PM

2 points

7 comments3 min readLW link

Notes on “the hot mess theory of AI misalignment”

JakubKApr 21, 2023, 10:07 AM

16 points

0 comments5 min readLW link

(sohl-dickstein.github.io)

The Security Mindset, S-Risk and Publishing Prosaic Alignment Research

lukemarksApr 22, 2023, 2:36 PM

39 points

7 comments5 min readLW link

A great talk for AI noobs (according to an AI noob)

dovApr 23, 2023, 5:34 AM

10 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

Paths to failure

Karl von Wendt and mespa

Apr 25, 2023, 8:03 AM

29 points

1 comment8 min readLW link

A concise sum-up of the basic argument for AI doom

Mergimio H. DoefevmilApr 24, 2023, 5:37 PM

11 points

6 comments2 min readLW link

A response to Conjecture’s CoEm proposal

Kristian FreedApr 24, 2023, 5:23 PM

7 points

0 comments4 min readLW link

A Proposal for AI Alignment: Using Directly Opposing Models

Arne BApr 27, 2023, 6:05 PM

0 points

5 comments3 min readLW link

Making Nanobots isn’t a one-shot process, even for an artificial superintelligance

dankradApr 25, 2023, 12:39 AM

20 points

13 comments6 min readLW link

My Assessment of the Chinese AI Safety Community

Lao MeinApr 25, 2023, 4:21 AM

250 points

94 comments3 min readLW link

Briefly how I’ve updated since ChatGPT

rimeApr 25, 2023, 2:47 PM

48 points

2 comments2 min readLW link

Freedom Is All We Need

Leo GlisicApr 27, 2023, 12:09 AM

−1 points

8 comments10 min readLW link

Halloween Problem

Saint BlasphemerOct 24, 2023, 4:46 PM

−10 points

1 comment1 min readLW link

Announcing #AISummitTalks featuring Professor Stuart Russell and many others

otto.bartenOct 24, 2023, 10:11 AM

17 points

1 comment1 min readLW link

[Question] What if AGI had its own universe to maybe wreck?

msealeOct 26, 2023, 5:49 PM

−1 points

2 comments1 min readLW link

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_cOct 25, 2023, 11:46 PM

123 points

35 comments22 min readLW link 1 review

(www.navigatingrisks.ai)

[Thought Experiment] Tomorrow’s Echo—The future of synthetic companionship.

Vimal NaranOct 26, 2023, 5:54 PM

−7 points

2 comments2 min readLW link

[untitled post]

NeuralSystem_e5e1Apr 27, 2023, 5:37 PM

3 points

0 comments1 min readLW link

Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”

Matthew WeardenOct 30, 2023, 5:27 PM

5 points

2 comments6 min readLW link

(matthewwearden.co.uk)

Charbel-Raphaël and Lucius discuss interpretability

Mateusz Bagiński, Charbel-Raphaël and Lucius Bushnaq

Oct 30, 2023, 5:50 AM

111 points

7 comments21 min readLW link

An International Manhattan Project for Artificial Intelligence

Glenn ClaytonApr 27, 2023, 5:34 PM

−11 points

2 comments5 min readLW link

Focus on existential risk is a distraction from the real issues. A false fallacy

Nik SamoylovOct 30, 2023, 11:42 PM

−19 points

11 comments2 min readLW link

Saying the quiet part out loud: trading off x-risk for personal immortality

disturbanceNov 2, 2023, 5:43 PM

84 points

89 comments5 min readLW link

The 6D effect: When companies take risks, one email can be very powerful.

scasperNov 4, 2023, 8:08 PM

278 points

42 comments3 min readLW link

AI as Super-Demagogue

RationalDinoNov 5, 2023, 9:21 PM

11 points

12 comments9 min readLW link

Symbiotic self-alignment of AIs.

Spiritus DeiNov 7, 2023, 5:18 PM

1 point

0 comments3 min readLW link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush and scasper

Nov 7, 2023, 5:59 PM

38 points

2 comments2 min readLW link

(arxiv.org)

The Social Alignment Problem

irvingApr 28, 2023, 2:16 PM

99 points

13 comments8 min readLW link

Do you want a first-principled preparedness guide to prepare yourself and loved ones for potential catastrophes?

Ulrik HornNov 14, 2023, 12:13 PM

16 points

5 comments15 min readLW link

[Question] Realistic near-future scenarios of AI doom understandable for non-techy people?

RomanSApr 28, 2023, 2:45 PM

4 points

4 comments1 min readLW link

[Question] AI Safety orgs- what’s your biggest bottleneck right now?

Kabir KumarNov 16, 2023, 2:02 AM

1 point

0 comments1 min readLW link

We Should Talk About This More. Epistemic World Collapse as Imminent Safety Risk of Generative AI.

Joerg WeissNov 16, 2023, 6:46 PM

11 points

2 comments29 min readLW link

On excluding dangerous information from training

ShayBenMosheNov 17, 2023, 11:14 AM

23 points

5 comments3 min readLW link

Killswitch

JunioNov 18, 2023, 10:53 PM

2 points

0 comments3 min readLW link

Ilya: The AI scientist shaping the world

David VargaNov 20, 2023, 1:09 PM

11 points

0 comments4 min readLW link

A Guide to Forecasting AI Science Capabilities

Eleni AngelouApr 29, 2023, 11:24 PM

6 points

1 comment4 min readLW link

The two paragraph argument for AI risk

CronoDASNov 25, 2023, 2:01 AM

19 points

8 comments1 min readLW link

AISC 2024 - Project Summaries

NickyPNov 27, 2023, 10:32 PM

48 points

3 comments18 min readLW link

Rethink Priorities: Seeking Expressions of Interest for Special Projects Next Year

kierangreigNov 29, 2023, 1:59 PM

4 points

0 comments5 min readLW link

Support me in a Week-Long Picketing Campaign Near OpenAI’s HQ: Seeking Support and Ideas from the LessWrong Community

PercyApr 30, 2023, 5:48 PM

−21 points

15 comments1 min readLW link

Thoughts on “AI is easy to control” by Pope & Belrose

Steven ByrnesDec 1, 2023, 5:30 PM

197 points

62 comments14 min readLW link 1 review

The benefits and risks of optimism (about AI safety)

Karl von WendtDec 3, 2023, 12:45 PM

−7 points

6 comments5 min readLW link

A call for a quantitative report card for AI bioterrorism threat models

JunoDec 4, 2023, 6:35 AM

12 points

0 comments10 min readLW link

[Question] Accuracy of arguments that are seen as ridiculous and intuitively false but don’t have good counter-arguments

Christopher KingApr 29, 2023, 11:58 PM

30 points

39 comments1 min readLW link

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver Sourbut, Lewis Hammond and HarrietW

Aug 3, 2024, 10:16 AM

8 points

0 comments14 min readLW link

(www.oliversourbut.net)

Call for submissions: Choice of Futures survey questions

c.troutApr 30, 2023, 6:59 AM

4 points

0 comments2 min readLW link

(airtable.com)

Access to AI: a human right?

dmteaJul 25, 2020, 9:38 AM

5 points

3 comments2 min readLW link

Agentic Language Model Memes

FactorialCodeAug 1, 2020, 6:03 PM

16 points

1 comment2 min readLW link

Conversation with Paul Christiano

abergalSep 11, 2019, 11:20 PM

44 points

6 comments30 min readLW link

(aiimpacts.org)

Transcription of Eliezer’s January 2010 video Q&A

curiousepicNov 14, 2011, 5:02 PM

112 points

9 comments56 min readLW link

Responses to Catastrophic AGI Risk: A Survey

lukeprogJul 8, 2013, 2:33 PM

17 points

8 comments1 min readLW link

How can I reduce existential risk from AI?

lukeprogNov 13, 2012, 9:56 PM

63 points

92 comments8 min readLW link

Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

David Scott Krueger (formerly: capybaralet)Feb 6, 2019, 7:09 PM

25 points

17 comments1 min readLW link

Reframing misaligned AGI’s: well-intentioned non-neurotypical assistants

zhukeepaApr 1, 2018, 1:22 AM

46 points

14 comments2 min readLW link

When is unaligned AI morally valuable?

paulfchristianoMay 25, 2018, 1:57 AM

81 points

53 comments10 min readLW link

Introducing the AI Alignment Forum (FAQ)

habryka, Ben Pace, Raemon and jimrandomh

Oct 29, 2018, 9:07 PM

87 points

8 comments6 min readLW link

Swimming Upstream: A Case Study in Instrumental Rationality

TurnTroutJun 3, 2018, 3:16 AM

76 points

7 comments8 min readLW link

Current AI Safety Roles for Software Engineers

ozziegooenNov 9, 2018, 8:57 PM

70 points

9 comments4 min readLW link

[Question] Why is so much discussion happening in private Google Docs?

Wei DaiJan 12, 2019, 2:19 AM

101 points

22 comments1 min readLW link

Problems in AI Alignment that philosophers could potentially contribute to

Wei DaiAug 17, 2019, 5:38 PM

79 points

14 comments2 min readLW link

Two Neglected Problems in Human-AI Safety

Wei DaiDec 16, 2018, 10:13 PM

102 points

25 comments2 min readLW link

Announcement: AI alignment prize round 4 winners

cousin_itJan 20, 2019, 2:46 PM

74 points

41 comments1 min readLW link

Soon: a weekly AI Safety prerequisites module on LessWrong

nullApr 30, 2018, 1:23 PM

35 points

10 comments1 min readLW link

And the AI would have got away with it too, if...

Stuart_ArmstrongMay 22, 2019, 9:35 PM

75 points

7 comments1 min readLW link

2017 AI Safety Literature Review and Charity Comparison

LarksDec 24, 2017, 6:52 PM

41 points

5 comments23 min readLW link

Should ethicists be inside or outside a profession?

Eliezer YudkowskyDec 12, 2018, 1:40 AM

97 points

7 comments9 min readLW link

I Vouch For MIRI

ZviDec 17, 2017, 5:50 PM

39 points

9 comments5 min readLW link

(thezvi.wordpress.com)

Beware of black boxes in AI alignment research

cousin_itJan 18, 2018, 3:07 PM

39 points

10 comments1 min readLW link

AI Alignment Prize: Round 2 due March 31, 2018

ZviMar 12, 2018, 12:10 PM

28 points

2 comments3 min readLW link

(thezvi.wordpress.com)

Three AI Safety Related Ideas

Wei DaiDec 13, 2018, 9:32 PM

69 points

38 comments2 min readLW link

A rant against robots

Lê Nguyên HoangJan 14, 2020, 10:03 PM

65 points

7 comments5 min readLW link

Opportunities for individual donors in AI safety

Alex FlintMar 31, 2018, 6:37 PM

30 points

3 comments11 min readLW link

Course recommendations for Friendliness researchers

LouieJan 9, 2013, 2:33 PM

96 points

112 comments10 min readLW link

AI Safety Research Camp—Project Proposal

David_KristofferssonFeb 2, 2018, 4:25 AM

29 points

11 comments8 min readLW link

AI Summer Fellows Program

colmMar 21, 2018, 3:32 PM

21 points

0 comments1 min readLW link

The genie knows, but doesn’t care

Rob BensingerSep 6, 2013, 6:42 AM

121 points

495 comments8 min readLW link

[Question] Does agency necessarily imply self-preservation instinct?

Mislav JurićMay 1, 2023, 4:06 PM

5 points

8 comments1 min readLW link

Alignment Newsletter #13: 07/02/18

Rohin ShahJul 2, 2018, 4:10 PM

70 points

12 comments8 min readLW link

(mailchi.mp)

An Increasingly Manipulative Newsfeed

Michaël TrazziJul 1, 2019, 3:26 PM

63 points

16 comments5 min readLW link

The simple picture on AI safety

Alex FlintMay 27, 2018, 7:43 PM

31 points

10 comments2 min readLW link

Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes

OliviaJ, Rohin Shah, Connor Leahy and Andrea_Miotti

May 1, 2023, 4:47 PM

96 points

10 comments30 min readLW link

Elon Musk donates $10M to the Future of Life Institute to keep AI beneficial

Paul CrowleyJan 15, 2015, 4:33 PM

78 points

52 comments1 min readLW link

Strategic implications of AIs’ ability to coordinate at low cost, for example by merging

Wei DaiApr 25, 2019, 5:08 AM

69 points

46 comments2 min readLW link 1 review

Modeling AGI Safety Frameworks with Causal Influence Diagrams

Ramana KumarJun 21, 2019, 12:50 PM

43 points

6 comments1 min readLW link

(arxiv.org)

Henry Kissinger: AI Could Mean the End of Human History

ESRogsMay 15, 2018, 8:11 PM

17 points

12 comments1 min readLW link

(www.theatlantic.com)

Toy model of the AI control problem: animated version

Stuart_ArmstrongOct 10, 2017, 11:06 AM

23 points

8 comments1 min readLW link

A Visualization of Nick Bostrom’s Superintelligence

[deleted]Jul 23, 2014, 12:24 AM

62 points

28 comments3 min readLW link

AI Alignment Research Overview (by Jacob Steinhardt)

Ben PaceNov 6, 2019, 7:24 PM

44 points

0 comments7 min readLW link

(docs.google.com)

A general model of safety-oriented AI development

Wei DaiJun 11, 2018, 9:00 PM

65 points

8 comments1 min readLW link

Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei DaiSep 10, 2019, 8:29 AM

52 points

26 comments3 min readLW link

AI Safety Newsletter #4: AI and Cybersecurity, Persuasive AIs, Weaponization, and Geoffrey Hinton talks AI risks

ozhang, Dan H and Orpheus16

May 2, 2023, 6:41 PM

32 points

0 comments5 min readLW link

(newsletter.safe.ai)

Averting Catastrophe: Decision Theory for COVID-19, Climate Change, and Potential Disasters of All Kinds

JakubKMay 2, 2023, 10:50 PM

10 points

0 comments1 min readLW link

Siren worlds and the perils of over-optimised search

Stuart_ArmstrongApr 7, 2014, 11:00 AM

83 points

418 comments7 min readLW link

Regulate or Compete? The China Factor in U.S. AI Policy (NAIR #2)

charles_mMay 5, 2023, 5:43 PM

2 points

1 comment7 min readLW link

(navigatingairisks.substack.com)

But What If We Actually Want To Maximize Paperclips?

snerxMay 25, 2023, 7:13 AM

−17 points

6 comments7 min readLW link

Top 9+2 myths about AI risk

Stuart_ArmstrongJun 29, 2015, 8:41 PM

68 points

44 comments2 min readLW link

Formalizing the “AI x-risk is unlikely because it is ridiculous” argument

Christopher KingMay 3, 2023, 6:56 PM

48 points

17 comments3 min readLW link

Rohin Shah on reasons for AI optimism

abergalOct 31, 2019, 12:10 PM

40 points

58 comments1 min readLW link

(aiimpacts.org)

Plausibly, almost every powerful algorithm would be manipulative

Stuart_ArmstrongFeb 6, 2020, 11:50 AM

38 points

25 comments3 min readLW link

We don’t need AGI for an amazing future

Karl von WendtMay 4, 2023, 12:10 PM

18 points

32 comments5 min readLW link

[Question] Why not use active SETI to prevent AI Doom?

RomanSMay 5, 2023, 2:41 PM

13 points

13 comments1 min readLW link

No comments.