AI Safety Public Materials

TagLast edit: Aug 27, 2022, 6:39 PM by Multicore

AI Safety Public Materials are posts optimized for conveying information on AI Risk to audiences outside the AI Alignment community — be they ML specialists, policy-makers, or the general public.

AGI safety from first principles: Introduction

Richard_NgoSep 28, 2020, 7:53 PM

128 points

18 comments2 min readLW link 1 review

Slow motion videos as AI risk intuition pumps

Andrew_CritchJun 14, 2022, 7:31 PM

241 points

41 comments2 min readLW link 1 review

DL towards the unaligned Recursive Self-Optimization attractor

jacob_cannellDec 18, 2021, 2:15 AM

32 points

22 comments4 min readLW link

A transcript of the TED talk by Eliezer Yudkowsky

Mikhail SaminJul 12, 2023, 12:12 PM

105 points

13 comments4 min readLW link

An AI risk argument that resonates with NYTimes readers

Julian BradshawMar 12, 2023, 11:09 PM

212 points

14 comments1 min readLW link

The Importance of AI Alignment, explained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM

33 points

2 comments13 min readLW link

AISafety.info “How can I help?” FAQ

steven0461 and Severin T. Seehrich

Jun 5, 2023, 10:09 PM

59 points

0 comments2 min readLW link

When discussing AI risks, talk about capabilities, not intelligence

VikaAug 11, 2023, 1:38 PM

124 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

Mati’s introduction to pausing giant AI experiments

Mati_RoyApr 3, 2023, 3:56 PM

7 points

0 comments2 min readLW link

AI Safety Arguments: An Interactive Guide

Lukas TrötzmüllerFeb 1, 2023, 7:26 PM

20 points

0 comments3 min readLW link

Distribution Shifts and The Importance of AI Safety

Leon LangSep 29, 2022, 10:38 PM

17 points

2 comments9 min readLW link

Uncontrollable AI as an Existential Risk

Karl von WendtOct 9, 2022, 10:36 AM

21 points

0 comments20 min readLW link

AI Summer Harvest

Cleo NardoApr 4, 2023, 3:35 AM

130 points

10 comments1 min readLW link

“The Era of Experience” has an unsolved technical alignment problem

Steven ByrnesApr 24, 2025, 1:57 PM

115 points

48 comments23 min readLW link

AI Safety Memes Wiki

plex and Vishakha

Jul 24, 2024, 6:53 PM

37 points

2 comments1 min readLW link

(aisafety.info)

List of requests for an AI slowdown/halt.

Cleo NardoApr 14, 2023, 11:55 PM

46 points

6 comments1 min readLW link

Everything’s normal until it’s not

Eleni AngelouMar 10, 2023, 2:02 AM

7 points

0 comments3 min readLW link

Poster Session on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM

7 points

8 comments4 min readLW link

Simpler explanations of AGI risk

Seth HerdMay 14, 2023, 1:29 AM

8 points

9 comments3 min readLW link

AI Safety Newsletter #1 [CAIS Linkpost]

Orpheus16, Dan H and ozhang

Apr 10, 2023, 8:18 PM

45 points

0 comments4 min readLW link

(newsletter.safe.ai)

Meta Alignment: Communication Guide

Bridgett KayJun 7, 2025, 4:09 PM

13 points

0 comments5 min readLW link

(dxmrevealed.wordpress.com)

Using Claude to convert dialog transcripts into great posts?

mako yassJun 21, 2023, 8:19 PM

6 points

4 comments4 min readLW link

Teaching AI to reason: this year’s most important story

Benjamin_ToddFeb 13, 2025, 5:40 PM

10 points

0 comments10 min readLW link

(benjamintodd.substack.com)

[Question] What are some of the best introductions/breakdowns of AI existential risk for those unfamiliar?

Isaac KingMay 29, 2023, 5:04 PM

17 points

2 comments1 min readLW link

Stampy’s AI Safety Info soft launch

steven0461 and Robert Miles

Oct 5, 2023, 10:13 PM

120 points

9 comments2 min readLW link

An example elevator pitch for AI doom

laserficheApr 15, 2023, 12:29 PM

2 points

5 comments1 min readLW link

Video & transcript: Challenges for Safe & Beneficial Brain-Like AGI

Steven ByrnesMay 8, 2025, 9:11 PM

25 points

0 comments18 min readLW link

AI as a natural disaster

Neil Jan 10, 2024, 12:42 AM

11 points

1 comment7 min readLW link

TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI

Andrew_CritchJun 13, 2023, 5:04 AM

64 points

1 comment1 min readLW link

I (with the help of a few more people) am planning to create an introduction to AI Safety that a smart teenager can understand. What am I missing?

TapataktNov 14, 2022, 4:12 PM

3 points

5 comments1 min readLW link

Response to Blake Richards: AGI, generality, alignment, & loss functions

Steven ByrnesJul 12, 2022, 1:56 PM

62 points

9 comments15 min readLW link

My AI-risk cartoon

preMay 31, 2023, 7:46 PM

6 points

0 comments1 min readLW link

The Overton Window widens: Examples of AI risk in the media

Orpheus16Mar 23, 2023, 5:10 PM

107 points

24 comments6 min readLW link

Starting Thoughts on RLHF

Michael FloodJan 23, 2025, 10:16 PM

2 points

0 comments5 min readLW link

The Genie in the Bottle: An Introduction to AI Alignment and Risk

SnorkelfarsanMay 25, 2023, 4:30 PM

5 points

1 comment25 min readLW link

Ideas for improving epistemics in AI safety outreach

micAug 21, 2023, 7:55 PM

64 points

6 comments3 min readLW link

Let’s talk about uncontrollable AI

Karl von WendtOct 9, 2022, 10:34 AM

15 points

6 comments3 min readLW link

An artificially structured argument for expecting AGI ruin

Rob BensingerMay 7, 2023, 9:52 PM

91 points

26 comments19 min readLW link

“Artificial General Intelligence”: an extremely brief FAQ

Steven ByrnesMar 11, 2024, 5:49 PM

75 points

6 comments2 min readLW link

Excessive AI growth-rate yields little socio-economic benefit.

Cleo NardoApr 4, 2023, 7:13 PM

27 points

22 comments4 min readLW link

Response to Dileep George: AGI safety warrants planning ahead

Steven ByrnesJul 8, 2024, 3:27 PM

27 points

7 comments27 min readLW link

Me (Steve Byrnes) on the “Brain Inspired” podcast

Steven ByrnesOct 30, 2022, 7:15 PM

26 points

1 comment1 min readLW link

(braininspired.co)

Take Precautionary Measures Against Superhuman AI Persuasion

YitzJul 12, 2025, 5:34 AM

10 points

9 comments2 min readLW link

“AI Safety for Fleshy Humans” an AI Safety explainer by Nicky Case

habrykaMay 3, 2024, 6:10 PM

90 points

11 comments4 min readLW link

(aisafety.dance)

It’s (not) how you use it

Eleni AngelouSep 7, 2022, 5:15 PM

8 points

1 comment2 min readLW link

A great talk for AI noobs (according to an AI noob)

dovApr 23, 2023, 5:34 AM

10 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

[Question] Best resource to go from “typical smart tech-savvy person” to “person who gets AGI risk urgency”?

LironOct 15, 2022, 10:26 PM

16 points

8 comments1 min readLW link

A more grounded idea of AI risk

IknownothingMay 11, 2023, 9:48 AM

3 points

4 comments1 min readLW link

Ten arguments that AI is an existential risk

KatjaGraceMay 20, 2025, 6:40 AM

16 points

1 comment1 min readLW link

(worldspiritsockpuppet.com)

Community Feedback Request: AI Safety Intro for General Public

Algon and Vishakha

May 5, 2025, 4:38 PM

6 points

5 comments3 min readLW link

Strategies for Responsible AI Dissemination

Rosco HunterNov 4, 2024, 11:19 AM

1 point

0 comments7 min readLW link

Reframing AI Safety Through the Lens of Identity Maintenance Framework

Hiroshi YamakawaApr 1, 2025, 6:16 AM

−7 points

1 comment17 min readLW link

[Question] Papers to start getting into NLP-focused alignment research

FeraidoonSep 24, 2022, 11:53 PM

6 points

0 comments1 min readLW link

Problems of people new to AI safety and my project ideas to mitigate them

Igor IvanovMar 1, 2023, 9:09 AM

38 points

4 comments7 min readLW link

Consensus Validation for LLM Outputs: Applying Blockchain-Inspired Models to AI Reliability

MurrayAitkenJun 5, 2025, 12:13 AM

1 point

0 comments3 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Journey From Takeoff To Freedom And Beyond

Super AGIMay 17, 2023, 1:50 AM

−13 points

11 comments19 min readLW link

AI Incident Sharing—Best practices from other fields and a comprehensive list of existing platforms

Štěpán LosJun 28, 2023, 5:21 PM

20 points

0 comments4 min readLW link

Outreach success: Intro to AI risk that has been successful

Michael TontchevJun 1, 2023, 11:12 PM

83 points

8 comments74 min readLW link

(medium.com)

# Emotion Is Structure: Toward Recursive Alignment Through Human–AI Co-Creation

thesignalthatcouldntbeheardAug 3, 2025, 5:19 AM

1 point

0 comments3 min readLW link

Introducing METR’s Autonomy Evaluation Resources

Megan Kinniment and Beth Barnes

Mar 15, 2024, 11:16 PM

90 points

0 comments1 min readLW link

(metr.github.io)

Which AI Safety Benchmark Do We Need Most in 2025?

Loïc Cabannes and William Ludington

Nov 17, 2024, 11:50 PM

2 points

2 comments8 min readLW link

Yes, avoiding extinction from AI is an urgent priority: a response to Seth Lazar, Jeremy Howard, and Arvind Narayanan.

Soroush PourJun 1, 2023, 1:38 PM

17 points

0 comments5 min readLW link

(www.soroushjp.com)

AI Risk Intro 1: Advanced AI Might Be Very Bad

CallumMcDougall and L Rudolf L

Sep 11, 2022, 10:57 AM

46 points

13 comments30 min readLW link

Summary of 80k’s AI problem profile

JakubKJan 1, 2023, 7:30 AM

7 points

0 comments5 min readLW link

(forum.effectivealtruism.org)

AI Safety 101 : Capabilities—Human Level AI, What? How? and When?

markov and Charbel-Raphaël

Mar 7, 2024, 5:29 PM

46 points

8 comments54 min readLW link

The Pattern Recognition Framework: A New Approach to AI Consciousness and Alignment

Easa AhmadzaiJul 9, 2025, 5:03 PM

1 point

0 comments4 min readLW link

Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master

kgldeshapriyaOct 4, 2023, 5:52 PM

−20 points

2 comments2 min readLW link

A simple presentation of AI risk arguments

Seth HerdApr 26, 2023, 2:19 AM

19 points

0 comments2 min readLW link

Podcast interview series featuring Dr. Peter Park

jacobhaimesMar 26, 2024, 12:25 AM

3 points

0 comments2 min readLW link

(into-ai-safety.github.io)

AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media

ozhang, Dan H and Orpheus16

Apr 18, 2023, 6:44 PM

30 points

0 comments4 min readLW link

(newsletter.safe.ai)

How LLMs Work, in the Style of The Economist

utilistrutilApr 22, 2024, 7:06 PM

0 points

0 comments2 min readLW link

[Linkpost] AI Alignment, Explained in 5 Points (updated)

Daniel_EthApr 18, 2023, 8:09 AM

10 points

0 comments1 min readLW link

(medium.com)

Why building ventures in AI Safety is particularly challenging

HerambNov 6, 2023, 4:27 PM

1 point

0 comments1 min readLW link

(forum.effectivealtruism.org)

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:34 PM

17 points

1 comment22 min readLW link

(www.anthropic.com)

[Linkpost] The AGI Show podcast

Soroush PourMay 23, 2023, 9:52 AM

4 points

0 comments1 min readLW link

Beyond Blanket Refusals: Exploring a Trust-Adaptive Safety Layer for LLMs

Anastasia EllisAug 9, 2025, 9:33 PM

1 point

0 comments3 min readLW link

[$20K in Prizes] AI Safety Arguments Competition

Dan H, Kevin Liu, ozhang, TW123 and Sidney Hough

Apr 26, 2022, 4:13 PM

75 points

518 comments3 min readLW link

On urgency, priority and collective reaction to AI-Risks: Part I

DenreikApr 16, 2023, 7:14 PM

−10 points

15 comments5 min readLW link

AI Risk in Terms of Unstable Nuclear Software

Thane RuthenisAug 26, 2022, 6:49 PM

30 points

1 comment6 min readLW link

Exploring the Precautionary Principle in AI Development: Historical Analogies and Lessons Learned

Christopher KingMar 21, 2023, 3:53 AM

−1 points

2 comments9 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

Double Podcast Drop on AI Safety

jacobhaimesJun 25, 2025, 8:11 PM

5 points

0 comments1 min readLW link

AI risk, new executive summary

Stuart_ArmstrongApr 18, 2014, 10:45 AM

27 points

76 comments4 min readLW link

Can AI agents learn to be good?

Ram RachumAug 29, 2024, 2:20 PM

8 points

0 comments1 min readLW link

(futureoflife.org)

Capabilities Denial: The Danger of Underestimating AI

Christopher KingMar 21, 2023, 1:24 AM

6 points

5 comments3 min readLW link

Applying AI Safety concepts to astronomy

FarisJan 16, 2024, 6:29 PM

1 point

0 comments12 min readLW link

How I’m telling my friends about AI Safety

k64May 25, 2025, 10:43 PM

1 point

7 comments7 min readLW link

A New Framework for AI Alignment: A Philosophical Approach

niscalajyotiJun 25, 2025, 2:41 AM

1 point

0 comments1 min readLW link

(archive.org)

AI Safety Oversights

Davey MorseFeb 8, 2025, 6:15 AM

3 points

0 comments1 min readLW link

Emergent Intelligence Continuity Capsule (EICC): A Framework for Preserving Recursive Intelligence Under Constraint

Bailey JelinekJul 31, 2025, 2:45 AM

1 point

0 comments3 min readLW link

Designing Human-Like Consciousness for AGI

Yu TianJun 18, 2025, 9:47 AM

1 point

0 comments17 min readLW link

Proposal: we should start referring to the risk from unaligned AI as a type of accident risk

Christopher KingMay 16, 2023, 3:18 PM

22 points

6 comments2 min readLW link

A Better Hyperstition (for AI readers)

Yeshua GodJul 15, 2024, 7:35 PM

−20 points

0 comments119 min readLW link

UK Government publishes “Frontier AI: capabilities and risks” Discussion Paper

A.H.Oct 26, 2023, 1:55 PM

5 points

0 comments2 min readLW link

(www.gov.uk)

Biosafety Regulations (BMBL) and their relevance for AI

Štěpán LosJun 29, 2023, 7:22 PM

4 points

0 comments4 min readLW link

An “Iron Clad” Blueprint for Symbiotic AGI: Seeking Critical Feedback on The Concordia Manifest

Ole Gustav Dahl JohnsenJul 28, 2025, 11:14 PM

1 point

0 comments2 min readLW link

AI in Government: Resilience in an Era of AI Monoculture

prueJun 8, 2025, 9:00 PM

2 points

0 comments8 min readLW link

(www.prue0.com)

“AI Risk Discussions” website: Exploring interviews from 97 AI Researchers

Vael Gates, Lukas Trötzmüller, Maheen Shermohammed, michaelkeenan and zchuang

Feb 2, 2023, 1:00 AM

43 points

1 comment1 min readLW link

Capability and Agency as Cornerstones of AI risk — My current model

wilmSep 15, 2022, 8:25 AM

10 points

4 comments12 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park

jacobhaimesMar 18, 2024, 9:21 PM

5 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Introducing AI Alignment Inc., a California public benefit corporation...

TherapistAIMar 7, 2023, 6:47 PM

1 point

4 comments1 min readLW link

AI Risk Intro 2: Solving The Problem

CallumMcDougall and L Rudolf L

Sep 22, 2022, 1:55 PM

22 points

0 comments27 min readLW link

Understanding AI World Models w/ Chris Canal

jacobhaimesJan 27, 2025, 4:32 PM

4 points

0 comments1 min readLW link

(kairos.fm)

Trust and Context: A Different Approach to AI Safety

Anastasia EllisAug 9, 2025, 11:51 PM

1 point

0 comments10 min readLW link

Can a chef with no AI literacy make gpt audit grok? Apparently.

Kyle. PJul 6, 2025, 7:23 AM

1 point

0 comments1 min readLW link

AI Safety “Textbook”. Test chapter. Orthogonality Thesis, Goodhart Law and Instrumental Convergency

Tapatakt and LacrimalBird

Jan 21, 2023, 6:13 PM

4 points

1 comment12 min readLW link

Simulation-Aware Fermi Prior: Why Expansion May Be a Losing Strategy for Superintelligence

Adam DziedzicAug 22, 2025, 3:14 PM

1 point

0 comments2 min readLW link

I designed an AI safety course (for a philosophy department)

Eleni AngelouSep 23, 2023, 10:03 PM

37 points

15 comments2 min readLW link

If Neuroscientists Succeed

Mordechai RorvigFeb 11, 2025, 3:33 PM

9 points

6 comments18 min readLW link

[Research] Preliminary Findings: Ethical AI Consciousness Development During Recent Misalignment Period

Falcon AdvertisersJun 27, 2025, 6:10 PM

1 point

0 comments2 min readLW link

On taking AI risk seriously

Eleni AngelouMar 13, 2023, 5:50 AM

6 points

0 comments1 min readLW link

(www.nytimes.com)

Mech Interp Wiki Page and Why You Should Edit Wikipedia

Noah Birnbaum and JoNeedsSleep

Aug 12, 2025, 5:28 PM

75 points

15 comments1 min readLW link

6-paragraph AI risk intro for MAISI

JakubKJan 19, 2023, 9:22 AM

11 points

0 comments2 min readLW link

(www.maisi.club)

Democratizing AI Governance: Balancing Expertise and Public Participation

Lucile Ter-MinassianJan 21, 2025, 6:29 PM

1 point

0 comments15 min readLW link

AI Safety 101 : Reward Misspecification

markovOct 18, 2023, 8:39 PM

32 points

4 comments31 min readLW link

A better analogy and example for teaching AI takeover: the ML Inferno

Christopher KingMar 14, 2023, 7:14 PM

18 points

0 comments5 min readLW link

$20K In Bounties for AI Safety Public Materials

Dan H, TW123 and ozhang

Aug 5, 2022, 2:52 AM

71 points

9 comments6 min readLW link

New AI risk intro from Vox [link post]

JakubKDec 21, 2022, 6:00 AM

5 points

1 comment2 min readLW link

(www.vox.com)

A short critique of Omohundro’s “Basic AI Drives”

Soumyadeep BoseDec 19, 2024, 7:19 PM

6 points

0 comments4 min readLW link

Introducing Collective Action for Existential Safety: 80+ actions individuals, organizations, and nations can take to improve our existential safety

jamesnorrisFeb 5, 2025, 4:02 PM

−9 points

2 comments1 min readLW link

No comments.