RSS

AI Safety Public Materials

TagLast edit: Aug 27, 2022, 6:39 PM by Multicore

AI Safety Public Materials are posts optimized for conveying information on AI Risk to audiences outside the AI Alignment community — be they ML specialists, policy-makers, or the general public.

AGI safety from first prin­ci­ples: Introduction

Richard_NgoSep 28, 2020, 7:53 PM
128 points
18 comments2 min readLW link1 review

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_CritchJun 14, 2022, 7:31 PM
241 points
41 comments2 min readLW link1 review

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannellDec 18, 2021, 2:15 AM
32 points
22 comments4 min readLW link

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail SaminJul 12, 2023, 12:12 PM
105 points
13 comments4 min readLW link

When dis­cussing AI risks, talk about ca­pa­bil­ities, not intelligence

VikaAug 11, 2023, 1:38 PM
124 points
7 comments3 min readLW link
(vkrakovna.wordpress.com)

Mati’s in­tro­duc­tion to paus­ing gi­ant AI experiments

Mati_RoyApr 3, 2023, 3:56 PM
7 points
0 comments2 min readLW link

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian BradshawMar 12, 2023, 11:09 PM
212 points
14 comments1 min readLW link

AISafety.info “How can I help?” FAQ

Jun 5, 2023, 10:09 PM
59 points
0 comments2 min readLW link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM
33 points
2 comments1 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon LangSep 29, 2022, 10:38 PM
17 points
2 comments12 min readLW link

Un­con­trol­lable AI as an Ex­is­ten­tial Risk

Karl von WendtOct 9, 2022, 10:36 AM
21 points
0 comments20 min readLW link

AI Safety Ar­gu­ments: An In­ter­ac­tive Guide

Lukas TrötzmüllerFeb 1, 2023, 7:26 PM
20 points
0 comments3 min readLW link

AI as a nat­u­ral disaster

Neil Jan 10, 2024, 12:42 AM
11 points
1 comment7 min readLW link

“Ar­tifi­cial Gen­eral In­tel­li­gence”: an ex­tremely brief FAQ

Steven ByrnesMar 11, 2024, 5:49 PM
74 points
6 comments2 min readLW link

“AI Safety for Fleshy Hu­mans” an AI Safety ex­plainer by Nicky Case

habrykaMay 3, 2024, 6:10 PM
90 points
11 comments4 min readLW link
(aisafety.dance)

Re­sponse to Dileep Ge­orge: AGI safety war­rants plan­ning ahead

Steven ByrnesJul 8, 2024, 3:27 PM
27 points
7 comments27 min readLW link

AI Safety Memes Wiki

Jul 24, 2024, 6:53 PM
34 points
1 comment1 min readLW link
(aisafety.info)

Start­ing Thoughts on RLHF

Michael FloodJan 23, 2025, 10:16 PM
2 points
0 comments5 min readLW link

The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

AkashMar 23, 2023, 5:10 PM
107 points
24 comments6 min readLW link

AI Sum­mer Harvest

Cleo NardoApr 4, 2023, 3:35 AM
130 points
10 comments1 min readLW link

Ex­ces­sive AI growth-rate yields lit­tle so­cio-eco­nomic benefit.

Cleo NardoApr 4, 2023, 7:13 PM
27 points
22 comments4 min readLW link

AI Safety Newslet­ter #1 [CAIS Linkpost]

Apr 10, 2023, 8:18 PM
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

List of re­quests for an AI slow­down/​halt.

Cleo NardoApr 14, 2023, 11:55 PM
46 points
6 comments1 min readLW link

An ex­am­ple ele­va­tor pitch for AI doom

laserficheApr 15, 2023, 12:29 PM
2 points
5 comments1 min readLW link

Re­sponse to Blake Richards: AGI, gen­er­al­ity, al­ign­ment, & loss functions

Steven ByrnesJul 12, 2022, 1:56 PM
62 points
9 comments15 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dovApr 23, 2023, 5:34 AM
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Teach­ing AI to rea­son: this year’s most im­por­tant story

Benjamin_ToddFeb 13, 2025, 5:40 PM
10 points
0 comments10 min readLW link
(benjamintodd.substack.com)

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob BensingerMay 7, 2023, 9:52 PM
91 points
26 comments19 min readLW link

A more grounded idea of AI risk

IknownothingMay 11, 2023, 9:48 AM
3 points
4 comments1 min readLW link

Sim­pler ex­pla­na­tions of AGI risk

Seth HerdMay 14, 2023, 1:29 AM
8 points
9 comments3 min readLW link

The Ge­nie in the Bot­tle: An In­tro­duc­tion to AI Align­ment and Risk

SnorkelfarsanMay 25, 2023, 4:30 PM
5 points
1 comment25 min readLW link

[Question] What are some of the best in­tro­duc­tions/​break­downs of AI ex­is­ten­tial risk for those un­fa­mil­iar?

Isaac KingMay 29, 2023, 5:04 PM
17 points
2 comments1 min readLW link

My AI-risk cartoon

preMay 31, 2023, 7:46 PM
6 points
0 comments1 min readLW link

TASRA: A Tax­on­omy and Anal­y­sis of So­cietal-Scale Risks from AI

Andrew_CritchJun 13, 2023, 5:04 AM
64 points
1 comment1 min readLW link

Us­ing Claude to con­vert di­a­log tran­scripts into great posts?

mako yassJun 21, 2023, 8:19 PM
6 points
4 comments4 min readLW link

Ideas for im­prov­ing epistemics in AI safety outreach

micAug 21, 2023, 7:55 PM
64 points
6 comments3 min readLW link

Stampy’s AI Safety Info soft launch

Oct 5, 2023, 10:13 PM
120 points
9 comments2 min readLW link

It’s (not) how you use it

Eleni AngelouSep 7, 2022, 5:15 PM
8 points
1 comment2 min readLW link

Let’s talk about un­con­trol­lable AI

Karl von WendtOct 9, 2022, 10:34 AM
15 points
6 comments3 min readLW link

[Question] Best re­source to go from “typ­i­cal smart tech-savvy per­son” to “per­son who gets AGI risk ur­gency”?

LironOct 15, 2022, 10:26 PM
16 points
8 comments1 min readLW link

Me (Steve Byrnes) on the “Brain In­spired” podcast

Steven ByrnesOct 30, 2022, 7:15 PM
26 points
1 comment1 min readLW link
(braininspired.co)

Poster Ses­sion on AI Safety

Neil CrawfordNov 12, 2022, 3:50 AM
7 points
8 comments1 min readLW link

I (with the help of a few more peo­ple) am plan­ning to cre­ate an in­tro­duc­tion to AI Safety that a smart teenager can un­der­stand. What am I miss­ing?

TapataktNov 14, 2022, 4:12 PM
3 points
5 comments1 min readLW link

Every­thing’s nor­mal un­til it’s not

Eleni AngelouMar 10, 2023, 2:02 AM
7 points
0 comments3 min readLW link

Why build­ing ven­tures in AI Safety is par­tic­u­larly challenging

HerambNov 6, 2023, 4:27 PM
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

A bet­ter anal­ogy and ex­am­ple for teach­ing AI takeover: the ML Inferno

Christopher KingMar 14, 2023, 7:14 PM
18 points
0 comments5 min readLW link

Ap­ply­ing AI Safety con­cepts to astronomy

FarisJan 16, 2024, 6:29 PM
1 point
0 comments12 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

[Linkpost] The AGI Show podcast

Soroush PourMay 23, 2023, 9:52 AM
4 points
0 comments1 min readLW link

New AI risk in­tro from Vox [link post]

JakubKDec 21, 2022, 6:00 AM
5 points
1 comment2 min readLW link
(www.vox.com)

Pro­posal: we should start refer­ring to the risk from un­al­igned AI as a type of *ac­ci­dent risk*

Christopher KingMay 16, 2023, 3:18 PM
22 points
6 comments2 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Jour­ney From Take­off To Free­dom And Beyond

Super AGIMay 17, 2023, 1:50 AM
−13 points
11 comments19 min readLW link

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimesMar 26, 2024, 12:25 AM
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)

Sum­mary of 80k’s AI prob­lem profile

JakubKJan 1, 2023, 7:30 AM
7 points
0 comments5 min readLW link
(forum.effectivealtruism.org)

6-para­graph AI risk in­tro for MAISI

JakubKJan 19, 2023, 9:22 AM
11 points
0 comments2 min readLW link
(www.maisi.club)

Yes, avoid­ing ex­tinc­tion from AI *is* an ur­gent pri­or­ity: a re­sponse to Seth Lazar, Jeremy Howard, and Arvind Narayanan.

Soroush PourJun 1, 2023, 1:38 PM
17 points
0 comments5 min readLW link
(www.soroushjp.com)

A Bet­ter Hyper­sti­tion (for AI read­ers)

Yeshua GodJul 15, 2024, 7:35 PM
−20 points
0 comments119 min readLW link

Outreach suc­cess: In­tro to AI risk that has been successful

Michael TontchevJun 1, 2023, 11:12 PM
83 points
8 comments74 min readLW link
(medium.com)

AI Safety “Text­book”. Test chap­ter. Orthog­o­nal­ity Th­e­sis, Good­hart Law and In­stru­men­tal Convergency

Jan 21, 2023, 6:13 PM
4 points
0 comments12 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimesMar 18, 2024, 9:21 PM
5 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Biosafety Reg­u­la­tions (BMBL) and their rele­vance for AI

Štěpán LosJun 29, 2023, 7:22 PM
4 points
0 comments4 min readLW link

AI In­ci­dent Shar­ing—Best prac­tices from other fields and a com­pre­hen­sive list of ex­ist­ing platforms

Štěpán LosJun 28, 2023, 5:21 PM
20 points
0 comments4 min readLW link

“AI Risk Dis­cus­sions” web­site: Ex­plor­ing in­ter­views from 97 AI Researchers

Feb 2, 2023, 1:00 AM
43 points
1 comment1 min readLW link

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni AngelouSep 23, 2023, 10:03 PM
37 points
15 comments2 min readLW link

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

Mar 15, 2024, 11:16 PM
90 points
0 comments1 min readLW link
(metr.github.io)

Safe­guard­ing Hu­man­ity: En­sur­ing AI Re­mains a Ser­vant, Not a Master

kgldeshapriyaOct 4, 2023, 5:52 PM
−20 points
2 comments2 min readLW link

How LLMs Work, in the Style of The Economist

utilistrutilApr 22, 2024, 7:06 PM
0 points
0 comments2 min readLW link

AI Safety 101 : Re­ward Misspecification

markovOct 18, 2023, 8:39 PM
30 points
4 comments31 min readLW link

AI risk, new ex­ec­u­tive summary

Stuart_ArmstrongApr 18, 2014, 10:45 AM
27 points
76 comments4 min readLW link

$20K In Boun­ties for AI Safety Public Materials

Aug 5, 2022, 2:52 AM
71 points
9 comments6 min readLW link

[$20K in Prizes] AI Safety Ar­gu­ments Competition

Apr 26, 2022, 4:13 PM
75 points
518 comments3 min readLW link

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane RuthenisAug 26, 2022, 6:49 PM
30 points
1 comment6 min readLW link

Prob­lems of peo­ple new to AI safety and my pro­ject ideas to miti­gate them

Igor IvanovMar 1, 2023, 9:09 AM
38 points
4 comments7 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

Sep 11, 2022, 10:57 AM
46 points
13 comments30 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilmSep 15, 2022, 8:25 AM
10 points
4 comments12 min readLW link

A short cri­tique of Omo­hun­dro’s “Ba­sic AI Drives”

Soumyadeep BoseDec 19, 2024, 7:19 PM
6 points
0 comments4 min readLW link

Strate­gies for Re­spon­si­ble AI Dissemination

Rosco HunterNov 4, 2024, 11:19 AM
1 point
0 comments7 min readLW link

Which AI Safety Bench­mark Do We Need Most in 2025?

Nov 17, 2024, 11:50 PM
2 points
2 comments8 min readLW link

Un­der­stand­ing Bench­marks and mo­ti­vat­ing Evaluations

Feb 6, 2025, 1:32 AM
9 points
0 comments11 min readLW link
(ai-safety-atlas.com)

De­moc­ra­tiz­ing AI Gover­nance: Balanc­ing Ex­per­tise and Public Participation

Lucile Ter-MinassianJan 21, 2025, 6:29 PM
1 point
0 comments15 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

Sep 22, 2022, 1:55 PM
22 points
0 comments27 min readLW link

Un­der­stand­ing AI World Models w/​ Chris Canal

jacobhaimesJan 27, 2025, 4:32 PM
4 points
0 comments1 min readLW link
(kairos.fm)

In­tro­duc­ing Col­lec­tive Ac­tion for Ex­is­ten­tial Safety: 80+ ac­tions in­di­vi­d­u­als, or­ga­ni­za­tions, and na­tions can take to im­prove our ex­is­ten­tial safety

jamesnorrisFeb 5, 2025, 4:02 PM
−14 points
2 comments1 min readLW link

AI Safety Oversights

Davey MorseFeb 8, 2025, 6:15 AM
3 points
0 comments1 min readLW link

If Neu­ro­scien­tists Succeed

Mordechai RorvigFeb 11, 2025, 3:33 PM
9 points
6 comments18 min readLW link

Ca­pa­bil­ities De­nial: The Danger of Un­der­es­ti­mat­ing AI

Christopher KingMar 21, 2023, 1:24 AM
6 points
5 comments3 min readLW link

Ex­plor­ing the Pre­cau­tion­ary Prin­ci­ple in AI Devel­op­ment: His­tor­i­cal Analo­gies and Les­sons Learned

Christopher KingMar 21, 2023, 3:53 AM
−1 points
2 comments9 min readLW link

[Question] Papers to start get­ting into NLP-fo­cused al­ign­ment research

FeraidoonSep 24, 2022, 11:53 PM
6 points
0 comments1 min readLW link

Can AI agents learn to be good?

Ram RachumAug 29, 2024, 2:20 PM
8 points
0 comments1 min readLW link
(futureoflife.org)

In­tro­duc­ing AI Align­ment Inc., a Cal­ifor­nia pub­lic benefit cor­po­ra­tion...

TherapistAIMar 7, 2023, 6:47 PM
1 point
4 comments1 min readLW link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:34 PM
17 points
1 comment22 min readLW link
(www.anthropic.com)

On ur­gency, pri­or­ity and col­lec­tive re­ac­tion to AI-Risks: Part I

DenreikApr 16, 2023, 7:14 PM
−10 points
15 comments5 min readLW link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_EthApr 18, 2023, 8:09 AM
10 points
0 comments1 min readLW link

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

Apr 18, 2023, 6:44 PM
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

AI Safety 101 : Ca­pa­bil­ities—Hu­man Level AI, What? How? and When?

Mar 7, 2024, 5:29 PM
46 points
8 comments54 min readLW link

On tak­ing AI risk se­ri­ously

Eleni AngelouMar 13, 2023, 5:50 AM
6 points
0 comments1 min readLW link
(www.nytimes.com)

A sim­ple pre­sen­ta­tion of AI risk arguments

Seth HerdApr 26, 2023, 2:19 AM
19 points
0 comments2 min readLW link

UK Govern­ment pub­lishes “Fron­tier AI: ca­pa­bil­ities and risks” Dis­cus­sion Paper

A.H.Oct 26, 2023, 1:55 PM
5 points
0 comments2 min readLW link
(www.gov.uk)
No comments.