RSS

AI Safety Public Materials

TagLast edit: 27 Aug 2022 18:39 UTC by Multicore

AI Safety Public Materials are posts optimized for conveying information on AI Risk to audiences outside the AI Alignment community — be they ML specialists, policy-makers, or the general public.

AGI safety from first prin­ci­ples: Introduction

Richard_Ngo28 Sep 2020 19:53 UTC
128 points
18 comments2 min readLW link1 review

a ca­sual in­tro to AI doom and alignment

Tamsin Leake1 Nov 2022 16:38 UTC
18 points
0 comments4 min readLW link
(carado.moe)

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_Critch14 Jun 2022 19:31 UTC
238 points
41 comments2 min readLW link1 review

DL to­wards the un­al­igned Re­cur­sive Self-Op­ti­miza­tion attractor

jacob_cannell18 Dec 2021 2:15 UTC
32 points
22 comments4 min readLW link

A tran­script of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC
105 points
13 comments4 min readLW link

Mati’s in­tro­duc­tion to paus­ing gi­ant AI experiments

Mati_Roy3 Apr 2023 15:56 UTC
7 points
0 comments2 min readLW link

When dis­cussing AI risks, talk about ca­pa­bil­ities, not intelligence

Vika11 Aug 2023 13:38 UTC
116 points
7 comments3 min readLW link
(vkrakovna.wordpress.com)

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian Bradshaw12 Mar 2023 23:09 UTC
209 points
14 comments1 min readLW link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC
33 points
2 comments1 min readLW link

AISafety.info “How can I help?” FAQ

5 Jun 2023 22:09 UTC
59 points
0 comments2 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC
17 points
2 comments12 min readLW link

Un­con­trol­lable AI as an Ex­is­ten­tial Risk

Karl von Wendt9 Oct 2022 10:36 UTC
21 points
0 comments20 min readLW link

AI Safety Ar­gu­ments: An In­ter­ac­tive Guide

Lukas Trötzmüller1 Feb 2023 19:26 UTC
20 points
0 comments3 min readLW link

Let’s talk about un­con­trol­lable AI

Karl von Wendt9 Oct 2022 10:34 UTC
15 points
6 comments3 min readLW link

“Ar­tifi­cial Gen­eral In­tel­li­gence”: an ex­tremely brief FAQ

Steven Byrnes11 Mar 2024 17:49 UTC
70 points
6 comments2 min readLW link

“AI Safety for Fleshy Hu­mans” an AI Safety ex­plainer by Nicky Case

habryka3 May 2024 18:10 UTC
84 points
10 comments4 min readLW link
(aisafety.dance)

Re­sponse to Dileep Ge­orge: AGI safety war­rants plan­ning ahead

Steven Byrnes8 Jul 2024 15:27 UTC
27 points
7 comments27 min readLW link

AI Safety Memes Wiki

24 Jul 2024 18:53 UTC
33 points
1 comment1 min readLW link
(aisafety.info)

The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

Akash23 Mar 2023 17:10 UTC
107 points
24 comments6 min readLW link

AI Sum­mer Harvest

Cleo Nardo4 Apr 2023 3:35 UTC
130 points
10 comments1 min readLW link

Ex­ces­sive AI growth-rate yields lit­tle so­cio-eco­nomic benefit.

Cleo Nardo4 Apr 2023 19:13 UTC
27 points
22 comments4 min readLW link

AI Safety Newslet­ter #1 [CAIS Linkpost]

10 Apr 2023 20:18 UTC
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

List of re­quests for an AI slow­down/​halt.

Cleo Nardo14 Apr 2023 23:55 UTC
46 points
6 comments1 min readLW link

An ex­am­ple ele­va­tor pitch for AI doom

laserfiche15 Apr 2023 12:29 UTC
2 points
5 comments1 min readLW link

Re­sponse to Blake Richards: AGI, gen­er­al­ity, al­ign­ment, & loss functions

Steven Byrnes12 Jul 2022 13:56 UTC
62 points
9 comments15 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dov23 Apr 2023 5:34 UTC
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

An ar­tifi­cially struc­tured ar­gu­ment for ex­pect­ing AGI ruin

Rob Bensinger7 May 2023 21:52 UTC
91 points
26 comments19 min readLW link

A more grounded idea of AI risk

Iknownothing11 May 2023 9:48 UTC
3 points
4 comments1 min readLW link

Sim­pler ex­pla­na­tions of AGI risk

Seth Herd14 May 2023 1:29 UTC
8 points
9 comments3 min readLW link

The Ge­nie in the Bot­tle: An In­tro­duc­tion to AI Align­ment and Risk

Snorkelfarsan25 May 2023 16:30 UTC
5 points
1 comment25 min readLW link

[Question] What are some of the best in­tro­duc­tions/​break­downs of AI ex­is­ten­tial risk for those un­fa­mil­iar?

Isaac King29 May 2023 17:04 UTC
17 points
2 comments1 min readLW link

My AI-risk cartoon

pre31 May 2023 19:46 UTC
6 points
0 comments1 min readLW link

TASRA: A Tax­on­omy and Anal­y­sis of So­cietal-Scale Risks from AI

Andrew_Critch13 Jun 2023 5:04 UTC
64 points
1 comment1 min readLW link

Us­ing Claude to con­vert di­a­log tran­scripts into great posts?

mako yass21 Jun 2023 20:19 UTC
6 points
4 comments4 min readLW link

Ideas for im­prov­ing epistemics in AI safety outreach

mic21 Aug 2023 19:55 UTC
64 points
6 comments3 min readLW link

Stampy’s AI Safety Info soft launch

5 Oct 2023 22:13 UTC
120 points
9 comments2 min readLW link

It’s (not) how you use it

Eleni Angelou7 Sep 2022 17:15 UTC
8 points
1 comment2 min readLW link

AI as a nat­u­ral disaster

Neil 10 Jan 2024 0:42 UTC
11 points
1 comment7 min readLW link

[Question] Best re­source to go from “typ­i­cal smart tech-savvy per­son” to “per­son who gets AGI risk ur­gency”?

Liron15 Oct 2022 22:26 UTC
16 points
8 comments1 min readLW link

Me (Steve Byrnes) on the “Brain In­spired” podcast

Steven Byrnes30 Oct 2022 19:15 UTC
26 points
1 comment1 min readLW link
(braininspired.co)

Poster Ses­sion on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC
7 points
6 comments1 min readLW link

I (with the help of a few more peo­ple) am plan­ning to cre­ate an in­tro­duc­tion to AI Safety that a smart teenager can un­der­stand. What am I miss­ing?

Tapatakt14 Nov 2022 16:12 UTC
3 points
5 comments1 min readLW link

Every­thing’s nor­mal un­til it’s not

Eleni Angelou10 Mar 2023 2:02 UTC
7 points
0 comments3 min readLW link

Sum­mary of 80k’s AI prob­lem profile

JakubK1 Jan 2023 7:30 UTC
7 points
0 comments5 min readLW link
(forum.effectivealtruism.org)

6-para­graph AI risk in­tro for MAISI

JakubK19 Jan 2023 9:22 UTC
11 points
0 comments2 min readLW link
(www.maisi.club)

Yes, avoid­ing ex­tinc­tion from AI *is* an ur­gent pri­or­ity: a re­sponse to Seth Lazar, Jeremy Howard, and Arvind Narayanan.

Soroush Pour1 Jun 2023 13:38 UTC
17 points
0 comments5 min readLW link
(www.soroushjp.com)

Strate­gies for Re­spon­si­ble AI Dissemination

Rosco Hunter4 Nov 2024 11:19 UTC
1 point
0 comments7 min readLW link

Outreach suc­cess: In­tro to AI risk that has been successful

Michael Tontchev1 Jun 2023 23:12 UTC
83 points
8 comments74 min readLW link
(medium.com)

AI Safety “Text­book”. Test chap­ter. Orthog­o­nal­ity Th­e­sis, Good­hart Law and In­stru­men­tal Convergency

21 Jan 2023 18:13 UTC
4 points
0 comments12 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC
5 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Biosafety Reg­u­la­tions (BMBL) and their rele­vance for AI

Štěpán Los29 Jun 2023 19:22 UTC
4 points
0 comments4 min readLW link

AI In­ci­dent Shar­ing—Best prac­tices from other fields and a com­pre­hen­sive list of ex­ist­ing platforms

Štěpán Los28 Jun 2023 17:21 UTC
20 points
0 comments4 min readLW link

A Bet­ter Hyper­sti­tion (for AI read­ers)

Yeshua God15 Jul 2024 19:35 UTC
−20 points
0 comments119 min readLW link

“AI Risk Dis­cus­sions” web­site: Ex­plor­ing in­ter­views from 97 AI Researchers

2 Feb 2023 1:00 UTC
43 points
1 comment1 min readLW link

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni Angelou23 Sep 2023 22:03 UTC
37 points
15 comments2 min readLW link

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

15 Mar 2024 23:16 UTC
90 points
0 comments1 min readLW link
(metr.github.io)

Safe­guard­ing Hu­man­ity: En­sur­ing AI Re­mains a Ser­vant, Not a Master

kgldeshapriya4 Oct 2023 17:52 UTC
−20 points
2 comments2 min readLW link

AI Safety 101 : Re­ward Misspecification

markov18 Oct 2023 20:39 UTC
30 points
4 comments31 min readLW link

AI risk, new ex­ec­u­tive summary

Stuart_Armstrong18 Apr 2014 10:45 UTC
27 points
76 comments4 min readLW link

How LLMs Work, in the Style of The Economist

utilistrutil22 Apr 2024 19:06 UTC
0 points
0 comments2 min readLW link

$20K In Boun­ties for AI Safety Public Materials

5 Aug 2022 2:52 UTC
71 points
9 comments6 min readLW link

[$20K in Prizes] AI Safety Ar­gu­ments Competition

26 Apr 2022 16:13 UTC
75 points
518 comments3 min readLW link

AI Risk in Terms of Un­sta­ble Nu­clear Software

Thane Ruthenis26 Aug 2022 18:49 UTC
30 points
1 comment6 min readLW link

Prob­lems of peo­ple new to AI safety and my pro­ject ideas to miti­gate them

Igor Ivanov1 Mar 2023 9:09 UTC
38 points
4 comments7 min readLW link

AI Risk In­tro 1: Ad­vanced AI Might Be Very Bad

11 Sep 2022 10:57 UTC
46 points
13 comments30 min readLW link

Ca­pa­bil­ity and Agency as Corner­stones of AI risk ­— My cur­rent model

wilm15 Sep 2022 8:25 UTC
10 points
4 comments12 min readLW link

AI Risk In­tro 2: Solv­ing The Problem

22 Sep 2022 13:55 UTC
22 points
0 comments27 min readLW link

[Question] Papers to start get­ting into NLP-fo­cused al­ign­ment research

Feraidoon24 Sep 2022 23:53 UTC
6 points
0 comments1 min readLW link

Which AI Safety Bench­mark Do We Need Most in 2025?

17 Nov 2024 23:50 UTC
2 points
1 comment6 min readLW link

Ca­pa­bil­ities De­nial: The Danger of Un­der­es­ti­mat­ing AI

Christopher King21 Mar 2023 1:24 UTC
6 points
5 comments3 min readLW link

Ex­plor­ing the Pre­cau­tion­ary Prin­ci­ple in AI Devel­op­ment: His­tor­i­cal Analo­gies and Les­sons Learned

Christopher King21 Mar 2023 3:53 UTC
−1 points
2 comments9 min readLW link

Can AI agents learn to be good?

Ram Rachum29 Aug 2024 14:20 UTC
8 points
0 comments1 min readLW link
(futureoflife.org)

In­tro­duc­ing AI Align­ment Inc., a Cal­ifor­nia pub­lic benefit cor­po­ra­tion...

TherapistAI7 Mar 2023 18:47 UTC
1 point
4 comments1 min readLW link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC
17 points
1 comment22 min readLW link
(www.anthropic.com)

AI Safety 101 : Ca­pa­bil­ities—Hu­man Level AI, What? How? and When?

7 Mar 2024 17:29 UTC
46 points
8 comments54 min readLW link

On ur­gency, pri­or­ity and col­lec­tive re­ac­tion to AI-Risks: Part I

Denreik16 Apr 2023 19:14 UTC
−10 points
15 comments5 min readLW link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_Eth18 Apr 2023 8:09 UTC
10 points
0 comments1 min readLW link

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

18 Apr 2023 18:44 UTC
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

On tak­ing AI risk se­ri­ously

Eleni Angelou13 Mar 2023 5:50 UTC
6 points
0 comments1 min readLW link
(www.nytimes.com)

A bet­ter anal­ogy and ex­am­ple for teach­ing AI takeover: the ML Inferno

Christopher King14 Mar 2023 19:14 UTC
18 points
0 comments5 min readLW link

A sim­ple pre­sen­ta­tion of AI risk arguments

Seth Herd26 Apr 2023 2:19 UTC
16 points
0 comments2 min readLW link

UK Govern­ment pub­lishes “Fron­tier AI: ca­pa­bil­ities and risks” Dis­cus­sion Paper

A.H.26 Oct 2023 13:55 UTC
5 points
0 comments2 min readLW link
(www.gov.uk)

Why build­ing ven­tures in AI Safety is par­tic­u­larly challenging

Heramb6 Nov 2023 16:27 UTC
1 point
0 comments1 min readLW link
(forum.effectivealtruism.org)

Ap­ply­ing AI Safety con­cepts to astronomy

Faris16 Jan 2024 18:29 UTC
1 point
0 comments12 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

[Linkpost] The AGI Show podcast

Soroush Pour23 May 2023 9:52 UTC
4 points
0 comments1 min readLW link

New AI risk in­tro from Vox [link post]

JakubK21 Dec 2022 6:00 UTC
5 points
1 comment2 min readLW link
(www.vox.com)

Pro­posal: we should start refer­ring to the risk from un­al­igned AI as a type of *ac­ci­dent risk*

Christopher King16 May 2023 15:18 UTC
22 points
6 comments2 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Jour­ney From Take­off To Free­dom And Beyond

Super AGI17 May 2023 1:50 UTC
−13 points
11 comments19 min readLW link

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)
No comments.