RSS

AI Align­ment In­tro Materials

TagLast edit: 5 Nov 2023 0:17 UTC by plex

AI Alignment Intro Materials. Posts that help someone get oriented and skill up. Distinct from AI Public Materials is that they are more “inward facing” than “outward facing”, i.e. for people who are already sold AI risk is a problem and want to upskill.

Some basic intro resources include:

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

10 Jan 2023 16:06 UTC
84 points
8 comments39 min readLW link
(arxiv.org)

Su­per­in­tel­li­gence FAQ

Scott Alexander20 Sep 2016 19:00 UTC
133 points
38 comments27 min readLW link

“Cor­rigi­bil­ity at some small length” by dath ilan

Christopher King5 Apr 2023 1:47 UTC
32 points
3 comments9 min readLW link
(www.glowfic.com)

A new­comer’s guide to the tech­ni­cal AI safety field

zeshen4 Nov 2022 14:29 UTC
42 points
3 comments10 min readLW link

Align­ment Org Cheat Sheet

20 Sep 2022 17:36 UTC
70 points
8 comments4 min readLW link

How to pur­sue a ca­reer in tech­ni­cal AI alignment

Charlie Rogers-Smith4 Jun 2022 21:11 UTC
69 points
1 comment39 min readLW link

UC Berkeley course on LLMs and ML Safety

Dan H9 Jul 2024 15:40 UTC
36 points
1 comment1 min readLW link
(rdi.berkeley.edu)

Wikipe­dia as an in­tro­duc­tion to the al­ign­ment problem

SoerenMind29 May 2023 18:43 UTC
83 points
10 comments1 min readLW link
(en.wikipedia.org)

Talk: AI safety field­build­ing at MATS

Ryan Kidd23 Jun 2024 23:06 UTC
26 points
2 comments10 min readLW link

A starter guide for evals

8 Jan 2024 18:24 UTC
50 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Outreach suc­cess: In­tro to AI risk that has been successful

Michael Tontchev1 Jun 2023 23:12 UTC
83 points
8 comments74 min readLW link
(medium.com)

12 ca­reer-re­lated ques­tions that may (or may not) be helpful for peo­ple in­ter­ested in al­ign­ment research

Akash12 Dec 2022 22:36 UTC
20 points
0 comments2 min readLW link

My first year in AI alignment

Alex_Altair2 Jan 2023 1:28 UTC
61 points
10 comments7 min readLW link

List of links for get­ting into AI safety

zef4 Jan 2023 19:45 UTC
6 points
0 comments1 min readLW link

Tran­script of a pre­sen­ta­tion on catas­trophic risks from AI

RobertM5 May 2023 1:38 UTC
6 points
0 comments8 min readLW link

Ad­vice for En­ter­ing AI Safety Research

scasper2 Jun 2023 20:46 UTC
26 points
2 comments5 min readLW link

[Question] Where to be­gin in ML/​AI?

Jake the Student6 Apr 2023 20:45 UTC
9 points
4 comments1 min readLW link

In­tro­duc­ción al Riesgo Ex­is­ten­cial de In­teligen­cia Artificial

david.friva15 Jul 2023 20:37 UTC
4 points
2 comments4 min readLW link
(youtu.be)

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC
27 points
0 comments19 min readLW link
(docs.google.com)

AI Safety 101 : In­tro­duc­tion to Vi­sion Interpretability

28 Jul 2023 17:32 UTC
41 points
0 comments1 min readLW link
(github.com)

Ap­ply to a small iter­a­tion of MLAB to be run in Oxford

27 Aug 2023 14:21 UTC
12 points
0 comments1 min readLW link

Doc­u­ment­ing Jour­ney Into AI Safety

jacobhaimes10 Oct 2023 18:30 UTC
17 points
4 comments6 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimes22 Oct 2023 3:30 UTC
5 points
1 comment1 min readLW link
(into-ai-safety.github.io)

Lev­el­ling Up in AI Safety Re­search Engineering

Gabe M2 Sep 2022 4:59 UTC
58 points
9 comments17 min readLW link

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC
10 points
2 comments18 min readLW link

An Ex­er­cise to Build In­tu­itions on AGI Risk

Lauro Langosco7 Jun 2023 18:35 UTC
52 points
3 comments8 min readLW link

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC
11 points
0 comments1 min readLW link
(into-ai-safety.github.io)

5-day In­tro to Trans­for­ma­tive AI course

Li-Lian Ang9 Dec 2024 7:15 UTC
2 points
0 comments1 min readLW link

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)

So you want to work on tech­ni­cal AI safety

gw24 Jun 2024 14:29 UTC
51 points
3 comments14 min readLW link

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

13 Dec 2023 15:51 UTC
227 points
18 comments10 min readLW link1 review

AI Align­ment and the Quest for Ar­tifi­cial Wisdom

Myspy12 Jul 2024 21:34 UTC
1 point
0 comments13 min readLW link

[Question] Do­ing Noth­ing Utility Function

k6426 Sep 2024 22:05 UTC
9 points
9 comments1 min readLW link

[Question] Best re­sources to learn philos­o­phy of mind and AI?

Sky Moo27 Mar 2023 18:22 UTC
1 point
0 comments1 min readLW link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_Eth18 Apr 2023 8:09 UTC
10 points
0 comments1 min readLW link

Into AI Safety Epi­sodes 1 & 2

jacobhaimes9 Nov 2023 4:36 UTC
2 points
0 comments1 min readLW link
(into-ai-safety.github.io)

The Ge­nie in the Bot­tle: An In­tro­duc­tion to AI Align­ment and Risk

Snorkelfarsan25 May 2023 16:30 UTC
5 points
1 comment25 min readLW link

AI Safety Fun­da­men­tals: An In­for­mal Co­hort Start­ing Soon!

Tiago de Vassal4 Jun 2023 17:15 UTC
4 points
0 comments1 min readLW link

Into AI Safety: Epi­sode 3

jacobhaimes11 Dec 2023 16:30 UTC
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)
No comments.