AI Alignment Intro Materials

TagLast edit: Dec 30, 2024, 10:34 AM by Dakara

AI Alignment Intro Materials are posts that help someone get oriented and skill up. Distinct from AI Public Materials is that they are more “inward facing” than “outward facing”, i.e. for people who are already sold AI risk is a problem and want to upskill.

Some basic intro resources include:

Stampy’s AI Safety Info (extensive interactive FAQ)
Scott Alexander’s Superintelligence FAQ
The MIRI Intelligence Explosion FAQ
The AGI Safety Fundamentals courses
Superintelligence (book)

The Alignment Problem from a Deep Learning Perspective (major rewrite)

SoerenMind, Richard_Ngo and LawrenceC

Jan 10, 2023, 4:06 PM

84 points

8 comments39 min readLW link

(arxiv.org)

Superintelligence FAQ

Scott AlexanderSep 20, 2016, 7:00 PM

137 points

39 comments27 min readLW link

“Corrigibility at some small length” by dath ilan

Christopher KingApr 5, 2023, 1:47 AM

32 points

3 comments9 min readLW link

(www.glowfic.com)

A newcomer’s guide to the technical AI safety field

zeshenNov 4, 2022, 2:29 PM

42 points

3 comments10 min readLW link

AI Control: Improving Safety Despite Intentional Subversion

Buck, Fabien Roger, ryan_greenblatt and Kshitij Sachan

Dec 13, 2023, 3:51 PM

236 points

24 comments10 min readLW link 4 reviews

Alignment Org Cheat Sheet

Orpheus16 and Thomas Larsen

Sep 20, 2022, 5:36 PM

70 points

8 comments4 min readLW link

How to pursue a career in technical AI alignment

Charlie Rogers-SmithJun 4, 2022, 9:11 PM

69 points

1 comment39 min readLW link

UC Berkeley course on LLMs and ML Safety

Dan HJul 9, 2024, 3:40 PM

36 points

1 comment1 min readLW link

(rdi.berkeley.edu)

A starter guide for evals

Marius Hobbhahn, Jérémy Scheurer, Mikita Balesni, rusheb and AlexMeinke

Jan 8, 2024, 6:24 PM

53 points

2 comments12 min readLW link

(www.apolloresearch.ai)

A short course on AGI safety from the GDM Alignment team

Vika and Rohin Shah

Feb 14, 2025, 3:43 PM

99 points

1 comment1 min readLW link

(deepmindsafetyresearch.medium.com)

Talk: AI safety fieldbuilding at MATS

Ryan KiddJun 23, 2024, 11:06 PM

26 points

2 comments10 min readLW link

[Question] Where to begin in ML/AI?

Jake the StudentApr 6, 2023, 8:45 PM

9 points

4 comments1 min readLW link

Transcript of a presentation on catastrophic risks from AI

RobertMMay 5, 2023, 1:38 AM

6 points

0 comments8 min readLW link

Wikipedia as an introduction to the alignment problem

SoerenMindMay 29, 2023, 6:43 PM

83 points

10 comments1 min readLW link

(en.wikipedia.org)

Outreach success: Intro to AI risk that has been successful

Michael TontchevJun 1, 2023, 11:12 PM

83 points

8 comments74 min readLW link

(medium.com)

Advice for Entering AI Safety Research

scasperJun 2, 2023, 8:46 PM

26 points

2 comments5 min readLW link

Introducción al Riesgo Existencial de Inteligencia Artificial

david.frivaJul 15, 2023, 8:37 PM

4 points

2 comments4 min readLW link

(youtu.be)

12 career-related questions that may (or may not) be helpful for people interested in alignment research

Orpheus16Dec 12, 2022, 10:36 PM

20 points

0 comments2 min readLW link

My first year in AI alignment

Alex_AltairJan 2, 2023, 1:28 AM

61 points

10 comments7 min readLW link

List of links for getting into AI safety

zefJan 4, 2023, 7:45 PM

6 points

0 comments1 min readLW link

5-day Intro to Transformative AI course

Li-Lian AngDec 9, 2024, 7:15 AM

2 points

0 comments1 min readLW link

[Linkpost] AI Alignment, Explained in 5 Points (updated)

Daniel_EthApr 18, 2023, 8:09 AM

10 points

0 comments1 min readLW link

Into AI Safety Episodes 1 & 2

jacobhaimesNov 9, 2023, 4:36 AM

2 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Levelling Up in AI Safety Research Engineering

Gabe MSep 2, 2022, 4:59 AM

58 points

9 comments17 min readLW link

The Genie in the Bottle: An Introduction to AI Alignment and Risk

SnorkelfarsanMay 25, 2023, 4:30 PM

5 points

1 comment25 min readLW link

AGI doesn’t need understanding, intention, or consciousness in order to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM

10 points

2 comments18 min readLW link

Interview: Applications w/ Alice Rigg

jacobhaimesDec 19, 2023, 7:03 PM

12 points

0 comments1 min readLW link

(into-ai-safety.github.io)

AI Safety Fundamentals: An Informal Cohort Starting Soon!

Tiago de VassalJun 4, 2023, 5:15 PM

4 points

0 comments1 min readLW link

An Exercise to Build Intuitions on AGI Risk

Lauro LangoscoJun 7, 2023, 6:35 PM

52 points

3 comments8 min readLW link

Podcast interview series featuring Dr. Peter Park

jacobhaimesMar 26, 2024, 12:25 AM

3 points

0 comments2 min readLW link

(into-ai-safety.github.io)

Hackathon and Staying Up-to-Date in AI

jacobhaimesJan 8, 2024, 5:10 PM

11 points

0 comments1 min readLW link

(into-ai-safety.github.io)

AIS 101: Task decomposition for scalable oversight

Charbel-RaphaëlJul 25, 2023, 1:34 PM

27 points

0 comments19 min readLW link

(docs.google.com)

Apply to a small iteration of MLAB to be run in Oxford

RP, MariaK and OliverHayman

Aug 27, 2023, 2:21 PM

12 points

0 comments1 min readLW link

Documenting Journey Into AI Safety

jacobhaimesOct 10, 2023, 6:30 PM

17 points

4 comments6 min readLW link

Into AI Safety—Episode 0

jacobhaimesOct 22, 2023, 3:30 AM

5 points

1 comment1 min readLW link

(into-ai-safety.github.io)

So you want to work on technical AI safety

gwJun 24, 2024, 2:29 PM

51 points

3 comments14 min readLW link

Into AI Safety: Episode 3

jacobhaimesDec 11, 2023, 4:30 PM

6 points

0 comments1 min readLW link

(into-ai-safety.github.io)

AI Alignment and the Quest for Artificial Wisdom

MyspyJul 12, 2024, 9:34 PM

1 point

0 comments13 min readLW link

[Question] Doing Nothing Utility Function

k64Sep 26, 2024, 10:05 PM

9 points

9 comments1 min readLW link

Shallow review of technical AI safety, 2024

technicalities, Stag, Stephen McAleese, jordine and Dr. David Mathers

Dec 29, 2024, 12:01 PM

185 points

34 comments41 min readLW link

The Hidden Cost of Our Lies to AI

Nicholas AndresenMar 6, 2025, 5:03 AM

138 points

17 comments7 min readLW link

(substack.com)

Understanding AI World Models w/ Chris Canal

jacobhaimesJan 27, 2025, 4:32 PM

4 points

0 comments1 min readLW link

(kairos.fm)

[Question] Best resources to learn philosophy of mind and AI?

Sky MooMar 27, 2023, 6:22 PM

1 point

0 comments1 min readLW link

No comments.

AI Align­ment In­tro Materials

AI Alignment Intro Materials