RSS

AI Align­ment In­tro Materials

TagLast edit: Dec 30, 2024, 10:34 AM by Dakara

AI Alignment Intro Materials are posts that help someone get oriented and skill up. Distinct from AI Public Materials is that they are more “inward facing” than “outward facing”, i.e. for people who are already sold AI risk is a problem and want to upskill.

Some basic intro resources include:

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

Jan 10, 2023, 4:06 PM
84 points
8 comments39 min readLW link
(arxiv.org)

Su­per­in­tel­li­gence FAQ

Scott AlexanderSep 20, 2016, 7:00 PM
137 points
39 comments27 min readLW link

“Cor­rigi­bil­ity at some small length” by dath ilan

Christopher KingApr 5, 2023, 1:47 AM
32 points
3 comments9 min readLW link
(www.glowfic.com)

A new­comer’s guide to the tech­ni­cal AI safety field

zeshenNov 4, 2022, 2:29 PM
42 points
3 comments10 min readLW link

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

Dec 13, 2023, 3:51 PM
236 points
24 comments10 min readLW link4 reviews

Align­ment Org Cheat Sheet

Sep 20, 2022, 5:36 PM
70 points
8 comments4 min readLW link

How to pur­sue a ca­reer in tech­ni­cal AI alignment

Charlie Rogers-SmithJun 4, 2022, 9:11 PM
69 points
1 comment39 min readLW link

UC Berkeley course on LLMs and ML Safety

Dan HJul 9, 2024, 3:40 PM
36 points
1 comment1 min readLW link
(rdi.berkeley.edu)

A starter guide for evals

Jan 8, 2024, 6:24 PM
53 points
2 comments12 min readLW link
(www.apolloresearch.ai)

A short course on AGI safety from the GDM Align­ment team

Feb 14, 2025, 3:43 PM
99 points
1 comment1 min readLW link
(deepmindsafetyresearch.medium.com)

Talk: AI safety field­build­ing at MATS

Ryan KiddJun 23, 2024, 11:06 PM
26 points
2 comments10 min readLW link

[Question] Where to be­gin in ML/​AI?

Jake the StudentApr 6, 2023, 8:45 PM
9 points
4 comments1 min readLW link

Tran­script of a pre­sen­ta­tion on catas­trophic risks from AI

RobertMMay 5, 2023, 1:38 AM
6 points
0 comments8 min readLW link

Wikipe­dia as an in­tro­duc­tion to the al­ign­ment problem

SoerenMindMay 29, 2023, 6:43 PM
83 points
10 comments1 min readLW link
(en.wikipedia.org)

Outreach suc­cess: In­tro to AI risk that has been successful

Michael TontchevJun 1, 2023, 11:12 PM
83 points
8 comments74 min readLW link
(medium.com)

Ad­vice for En­ter­ing AI Safety Research

scasperJun 2, 2023, 8:46 PM
26 points
2 comments5 min readLW link

In­tro­duc­ción al Riesgo Ex­is­ten­cial de In­teligen­cia Artificial

david.frivaJul 15, 2023, 8:37 PM
4 points
2 comments4 min readLW link
(youtu.be)

12 ca­reer-re­lated ques­tions that may (or may not) be helpful for peo­ple in­ter­ested in al­ign­ment research

Orpheus16Dec 12, 2022, 10:36 PM
20 points
0 comments2 min readLW link

My first year in AI alignment

Alex_AltairJan 2, 2023, 1:28 AM
61 points
10 comments7 min readLW link

List of links for get­ting into AI safety

zefJan 4, 2023, 7:45 PM
6 points
0 comments1 min readLW link

5-day In­tro to Trans­for­ma­tive AI course

Li-Lian AngDec 9, 2024, 7:15 AM
2 points
0 comments1 min readLW link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_EthApr 18, 2023, 8:09 AM
10 points
0 comments1 min readLW link

Into AI Safety Epi­sodes 1 & 2

jacobhaimesNov 9, 2023, 4:36 AM
2 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Lev­el­ling Up in AI Safety Re­search Engineering

Gabe MSep 2, 2022, 4:59 AM
58 points
9 comments17 min readLW link

The Ge­nie in the Bot­tle: An In­tro­duc­tion to AI Align­ment and Risk

SnorkelfarsanMay 25, 2023, 4:30 PM
5 points
1 comment25 min readLW link

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM
10 points
2 comments18 min readLW link

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimesDec 19, 2023, 7:03 PM
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

AI Safety Fun­da­men­tals: An In­for­mal Co­hort Start­ing Soon!

Tiago de VassalJun 4, 2023, 5:15 PM
4 points
0 comments1 min readLW link

An Ex­er­cise to Build In­tu­itions on AGI Risk

Lauro LangoscoJun 7, 2023, 6:35 PM
52 points
3 comments8 min readLW link

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimesMar 26, 2024, 12:25 AM
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimesJan 8, 2024, 5:10 PM
11 points
0 comments1 min readLW link
(into-ai-safety.github.io)

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-RaphaëlJul 25, 2023, 1:34 PM
27 points
0 comments19 min readLW link
(docs.google.com)

Ap­ply to a small iter­a­tion of MLAB to be run in Oxford

Aug 27, 2023, 2:21 PM
12 points
0 comments1 min readLW link

Doc­u­ment­ing Jour­ney Into AI Safety

jacobhaimesOct 10, 2023, 6:30 PM
17 points
4 comments6 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimesOct 22, 2023, 3:30 AM
5 points
1 comment1 min readLW link
(into-ai-safety.github.io)

So you want to work on tech­ni­cal AI safety

gwJun 24, 2024, 2:29 PM
51 points
3 comments14 min readLW link

Into AI Safety: Epi­sode 3

jacobhaimesDec 11, 2023, 4:30 PM
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

AI Align­ment and the Quest for Ar­tifi­cial Wisdom

MyspyJul 12, 2024, 9:34 PM
1 point
0 comments13 min readLW link

[Question] Do­ing Noth­ing Utility Function

k64Sep 26, 2024, 10:05 PM
9 points
9 comments1 min readLW link

Shal­low re­view of tech­ni­cal AI safety, 2024

Dec 29, 2024, 12:01 PM
185 points
34 comments41 min readLW link

The Hid­den Cost of Our Lies to AI

Nicholas AndresenMar 6, 2025, 5:03 AM
138 points
17 comments7 min readLW link
(substack.com)

Un­der­stand­ing AI World Models w/​ Chris Canal

jacobhaimesJan 27, 2025, 4:32 PM
4 points
0 comments1 min readLW link
(kairos.fm)

[Question] Best re­sources to learn philos­o­phy of mind and AI?

Sky MooMar 27, 2023, 6:22 PM
1 point
0 comments1 min readLW link
No comments.