RSS

AI Align­ment Fieldbuilding

TagLast edit: Jun 15, 2022, 10:42 PM by plex

AI Alignment Fieldbuilding is the effort to improve the alignment ecosystem. Some priorities include introducing new people to the importance of AI risk, on-boarding them by connecting them with key resources and ideas, educating them on existing literature and methods for generating new and valuable research, supporting people who are contributing, and maintaining and improving the funding systems.

There is an invite-only Slack for people working on the alignment ecosystem. If you’d like to join message plex with an overview of your involvement.

[Question] Papers to start get­ting into NLP-fo­cused al­ign­ment research

FeraidoonSep 24, 2022, 11:53 PM
6 points
0 comments1 min readLW link

The in­or­di­nately slow spread of good AGI con­ver­sa­tions in ML

Rob BensingerJun 21, 2022, 4:09 PM
173 points
62 comments8 min readLW link

ML Align­ment The­ory Pro­gram un­der Evan Hubinger

Dec 6, 2021, 12:03 AM
82 points
3 comments2 min readLW link

Shal­low re­view of live agen­das in al­ign­ment & safety

Nov 27, 2023, 11:10 AM
348 points
73 comments29 min readLW link1 review

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilanNov 5, 2022, 11:40 PM
73 points
10 comments6 min readLW link1 review
(danielfilan.com)

Talk: AI safety field­build­ing at MATS

Ryan KiddJun 23, 2024, 11:06 PM
26 points
2 comments10 min readLW link

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane RuthenisDec 19, 2023, 8:09 PM
68 points
11 comments1 min readLW link

Most Peo­ple Start With The Same Few Bad Ideas

johnswentworthSep 9, 2022, 12:29 AM
165 points
30 comments3 min readLW link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM
33 points
2 comments1 min readLW link

Qual­ities that al­ign­ment men­tors value in ju­nior researchers

Orpheus16Feb 14, 2023, 11:27 PM
88 points
14 comments3 min readLW link

Mid­dle Child Phenomenon

PhilosophicalSoulMar 15, 2024, 8:47 PM
3 points
3 comments2 min readLW link

Prob­lems of peo­ple new to AI safety and my pro­ject ideas to miti­gate them

Igor IvanovMar 1, 2023, 9:09 AM
38 points
4 comments7 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

Oct 28, 2022, 5:50 PM
58 points
23 comments1 min readLW link

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

Oct 27, 2022, 1:32 AM
135 points
14 comments12 min readLW link

Tran­scripts of in­ter­views with AI researchers

Vael GatesMay 9, 2022, 5:57 AM
170 points
9 comments2 min readLW link

De­mys­tify­ing “Align­ment” through a Comic

milanroskoJun 9, 2024, 8:24 AM
106 points
19 comments1 min readLW link

MATS Spring 2024 Ex­ten­sion Retrospective

Feb 12, 2025, 10:43 PM
24 points
1 comment15 min readLW link

[Question] What are all the AI Align­ment and AI Safety Com­mu­ni­ca­tion Hubs?

Gunnar_ZarnckeJun 15, 2022, 4:16 PM
27 points
5 comments1 min readLW link

AI Safety Un­con­fer­ence NeurIPS 2022

OrpheusNov 7, 2022, 3:39 PM
25 points
0 comments1 min readLW link
(aisafetyevents.org)

AI Safety Europe Re­treat 2023 Retrospective

Magdalena WacheApr 14, 2023, 9:05 AM
43 points
0 comments2 min readLW link

AI Safety Ar­gu­ments: An In­ter­ac­tive Guide

Lukas TrötzmüllerFeb 1, 2023, 7:26 PM
20 points
0 comments3 min readLW link

Why I funded PIBBSS

Ryan KiddSep 15, 2024, 7:56 PM
115 points
21 comments3 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimiJul 20, 2022, 10:44 AM
87 points
11 comments8 min readLW link

A new­comer’s guide to the tech­ni­cal AI safety field

zeshenNov 4, 2022, 2:29 PM
42 points
3 comments10 min readLW link

[Question] Work­shop (hackathon, res­i­dence pro­gram, etc.) about for-profit AI Safety pro­jects?

Roman LeventovJan 26, 2024, 9:49 AM
21 points
5 comments1 min readLW link

Ad­vice for new al­ign­ment peo­ple: Info Max

Jonas HallgrenMay 30, 2023, 3:42 PM
23 points
4 comments5 min readLW link

The Align­ment Com­mu­nity Is Cul­turally Broken

sudoNov 13, 2022, 6:53 PM
136 points
68 comments2 min readLW link

Pro­ject Idea: Challenge Groups for Align­ment Researchers

Adam ZernerMay 27, 2023, 8:10 PM
13 points
0 comments1 min readLW link

What I Learned Run­ning Refine

adamShimiNov 24, 2022, 2:49 PM
108 points
5 comments4 min readLW link

The AI Safety com­mu­nity has four main work groups, Strat­egy, Gover­nance, Tech­ni­cal and Move­ment Building

peterslatteryNov 25, 2022, 3:45 AM
1 point
0 comments6 min readLW link

Assess­ment of AI safety agen­das: think about the down­side risk

Roman LeventovDec 19, 2023, 9:00 AM
13 points
1 comment1 min readLW link

[Job ad] LISA CEO

Feb 9, 2025, 12:18 AM
18 points
4 comments2 min readLW link

AISafety.info “How can I help?” FAQ

Jun 5, 2023, 10:09 PM
59 points
0 comments2 min readLW link

[Question] Does any­one’s full-time job in­clude read­ing and un­der­stand­ing all the most-promis­ing for­mal AI al­ign­ment work?

Nicholas / Heather KrossJun 16, 2023, 2:24 AM
15 points
2 comments1 min readLW link

Reflec­tions on the PIBBSS Fel­low­ship 2022

Dec 11, 2022, 9:53 PM
32 points
0 comments18 min readLW link

Good News, Every­one!

jbashMar 25, 2023, 1:48 PM
132 points
23 comments2 min readLW link

In­tro­duc­ing EffiS­ciences’ AI Safety Unit

Jun 30, 2023, 7:44 AM
68 points
0 comments12 min readLW link

AI Safety Move­ment Builders should help the com­mu­nity to op­ti­mise three fac­tors: con­trib­u­tors, con­tri­bu­tions and coordination

peterslatteryDec 15, 2022, 10:50 PM
4 points
0 comments6 min readLW link

Cost-effec­tive­ness of pro­fes­sional field-build­ing pro­grams for AI safety research

Dan HJul 10, 2023, 6:28 PM
8 points
5 comments1 min readLW link

Cam­paign for AI Safety: Please join me

Nik SamoylovApr 1, 2023, 9:32 AM
18 points
9 comments1 min readLW link

Con­crete Steps to Get Started in Trans­former Mechanis­tic Interpretability

Neel NandaDec 25, 2022, 10:21 PM
57 points
7 comments12 min readLW link
(www.neelnanda.io)

An overview of some promis­ing work by ju­nior al­ign­ment researchers

Orpheus16Dec 26, 2022, 5:23 PM
34 points
0 comments4 min readLW link

Align­ment Me­gapro­jects: You’re Not Even Try­ing to Have Ideas

Nicholas / Heather KrossJul 12, 2023, 11:39 PM
55 points
32 comments2 min readLW link

Reflec­tions on my 5-month al­ign­ment up­skil­ling grant

Jay BaileyDec 27, 2022, 10:51 AM
82 points
4 comments8 min readLW link

2022 AI Align­ment Course: 5→37% work­ing on AI safety

DewiJun 21, 2024, 5:45 PM
7 points
3 comments3 min readLW link

[Question] Can AI Align­ment please cre­ate a Red­dit-like plat­form that would make it much eas­ier for al­ign­ment re­searchers to find and help each other?

Georgeo57Jul 21, 2023, 2:03 PM
−5 points
2 comments1 min readLW link

How to find AI al­ign­ment re­searchers to col­lab­o­rate with?

Florian DietzJul 31, 2023, 9:05 AM
2 points
2 comments1 min readLW link

AI Safety Hub Ser­bia Soft Launch

DusanDNesicOct 20, 2023, 7:11 AM
64 points
1 comment3 min readLW link
(forum.effectivealtruism.org)

When dis­cussing AI risks, talk about ca­pa­bil­ities, not intelligence

VikaAug 11, 2023, 1:38 PM
124 points
7 comments3 min readLW link
(vkrakovna.wordpress.com)

AGISF adap­ta­tion for in-per­son groups

Jan 13, 2023, 3:24 AM
44 points
2 comments3 min readLW link

AISafety.world is a map of the AIS ecosystem

Hamish DoodlesApr 6, 2023, 6:37 PM
80 points
0 comments1 min readLW link

[Question] In­cen­tives af­fect­ing al­ign­ment-re­searcher encouragement

Nicholas / Heather KrossAug 29, 2023, 5:11 AM
28 points
3 comments1 min readLW link

How many peo­ple are work­ing (di­rectly) on re­duc­ing ex­is­ten­tial risk from AI?

Benjamin HiltonJan 18, 2023, 8:46 AM
20 points
1 comment1 min readLW link

In­for­ma­tion war­fare his­tor­i­cally re­volved around hu­man conduits

trevorAug 28, 2023, 6:54 PM
37 points
7 comments3 min readLW link

AGI safety field build­ing pro­jects I’d like to see

Severin T. SeehrichJan 19, 2023, 10:40 PM
68 points
28 comments9 min readLW link

Aus­tralian AI Safety Fo­rum 2024

Sep 27, 2024, 12:40 AM
42 points
0 comments2 min readLW link

All images from the WaitButWhy se­quence on AI

trevorApr 8, 2023, 7:36 AM
73 points
5 comments2 min readLW link

You should con­sider ap­ply­ing to PhDs (soon!)

bilalchughtaiNov 29, 2024, 8:33 PM
114 points
19 comments6 min readLW link

MATS Alumni Im­pact Analysis

Sep 30, 2024, 2:35 AM
61 points
7 comments11 min readLW link

AI Safety Univer­sity Or­ga­niz­ing: Early Take­aways from Thir­teen Groups

agucovaOct 2, 2024, 3:14 PM
26 points
0 comments1 min readLW link

MATS is hiring!

Apr 8, 2025, 8:45 PM
8 points
0 comments6 min readLW link

[Question] If there was a mil­len­nium equiv­a­lent prize for AI al­ign­ment, what would the prob­lems be?

Yair HalberstadtJun 9, 2022, 4:56 PM
17 points
4 comments1 min readLW link

If no near-term al­ign­ment strat­egy, re­search should aim for the long-term

harsimonyJun 9, 2022, 7:10 PM
7 points
1 comment1 min readLW link

Are AI de­vel­op­ers play­ing with fire?

marcusarvanMar 16, 2023, 7:12 PM
6 points
0 comments10 min readLW link

How Can Aver­age Peo­ple Con­tribute to AI Safety?

Stephen McAleeseMar 6, 2025, 10:50 PM
16 points
4 comments8 min readLW link

How Josiah be­came an AI safety researcher

Neil CrawfordSep 6, 2022, 5:17 PM
4 points
0 comments1 min readLW link

Ap­ply to be a TA for TARA

yanni kyriacosDec 20, 2024, 2:25 AM
10 points
0 comments1 min readLW link

In­tro­duc­ing 11 New AI Safety Or­ga­ni­za­tions—Cat­alyze’s Win­ter 24/​25 Lon­don In­cu­ba­tion Pro­gram Cohort

Alexandra BosMar 10, 2025, 7:26 PM
70 points
0 comments1 min readLW link

80,000 hours should re­move OpenAI from the Job Board (and similar EA orgs should do similarly)

RaemonJul 3, 2024, 8:34 PM
274 points
71 comments1 min readLW link

Sur­vey for al­ign­ment re­searchers!

Feb 2, 2024, 8:41 PM
71 points
11 comments1 min readLW link

Meri­dian Cam­bridge Visit­ing Re­searcher Pro­gramme: Turn AI safety ideas into funded pro­jects in one week!

Meridian CambridgeMar 11, 2025, 5:46 PM
13 points
0 comments2 min readLW link

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

micJul 1, 2022, 3:59 AM
14 points
0 comments11 min readLW link

MATS men­tor selection

Jan 10, 2025, 3:12 AM
44 points
11 comments6 min readLW link

Tokyo AI Safety 2025: Call For Papers

BlaineOct 21, 2024, 8:43 AM
24 points
0 comments3 min readLW link
(www.tais2025.cc)

Four vi­sions of Trans­for­ma­tive AI success

Steven ByrnesJan 17, 2024, 8:45 PM
112 points
22 comments15 min readLW link

AI al­ign­ment as “nav­i­gat­ing the space of in­tel­li­gent be­havi­our”

Nora_AmmannAug 23, 2022, 1:28 PM
18 points
0 comments6 min readLW link

AI Safety Strate­gies Landscape

Charbel-RaphaëlMay 9, 2024, 5:33 PM
34 points
1 comment42 min readLW link

[Question] Help me find a good Hackathon sub­ject

Charbel-RaphaëlSep 4, 2022, 8:40 AM
6 points
18 comments1 min readLW link

Paper: Field-build­ing and the epistemic cul­ture of AI safety

peterslatteryMar 15, 2025, 12:30 PM
13 points
3 comments3 min readLW link
(firstmonday.org)

Re­view of Align­ment Plan Cri­tiques- De­cem­ber AI-Plans Cri­tique-a-Thon Re­sults

IknownothingJan 15, 2024, 7:37 PM
24 points
0 comments25 min readLW link
(aiplans.substack.com)

[An email with a bunch of links I sent an ex­pe­rienced ML re­searcher in­ter­ested in learn­ing about Align­ment /​ x-safety.]

David Scott Krueger (formerly: capybaralet)Sep 8, 2022, 10:28 PM
47 points
1 comment5 min readLW link

Ap­ply to MATS 8.0!

Mar 20, 2025, 2:17 AM
63 points
5 comments4 min readLW link

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

DavidmanheimSep 21, 2022, 7:32 PM
13 points
0 comments1 min readLW link

AGI safety ca­reer advice

Richard_NgoMay 2, 2023, 7:36 AM
132 points
24 comments13 min readLW link

Les­sons learned from talk­ing to >100 aca­demics about AI safety

Marius HobbhahnOct 10, 2022, 1:16 PM
216 points
18 comments12 min readLW link1 review

2025 Q1 Pivotal Re­search Fel­low­ship (Tech­ni­cal & Policy)

Nov 12, 2024, 10:56 AM
7 points
0 comments2 min readLW link

The Vi­talik Bu­terin Fel­low­ship in AI Ex­is­ten­tial Safety is open for ap­pli­ca­tions!

Xin Chen, CynthiaOct 13, 2022, 6:32 PM
21 points
0 comments1 min readLW link

AI Safety in China: Part 2

Lao MeinMay 22, 2023, 2:50 PM
101 points
28 comments2 min readLW link

There Should Be More Align­ment-Driven Startups

May 31, 2024, 2:05 AM
62 points
14 comments11 min readLW link

AI Safety Needs Great Product Builders

goodgravyNov 2, 2022, 11:33 AM
14 points
2 comments1 min readLW link

An­nounc­ing Athena—Women in AI Align­ment Research

Claire ShortNov 7, 2023, 9:46 PM
80 points
2 comments3 min readLW link

Into AI Safety Epi­sodes 1 & 2

jacobhaimesNov 9, 2023, 4:36 AM
2 points
0 comments1 min readLW link
(into-ai-safety.github.io)

The So­cial Align­ment Problem

irvingApr 28, 2023, 2:16 PM
99 points
13 comments8 min readLW link

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir KumarNov 16, 2023, 2:02 AM
1 point
0 comments1 min readLW link

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaleyNov 17, 2023, 8:55 PM
16 points
8 comments15 min readLW link

4. A Mo­ral Case for Evolved-Sapi­ence-Chau­vinism

RogerDearnaleyNov 24, 2023, 4:56 AM
10 points
0 comments4 min readLW link

3. Uploading

RogerDearnaleyNov 23, 2023, 7:39 AM
21 points
5 comments8 min readLW link

2. AIs as Eco­nomic Agents

RogerDearnaleyNov 23, 2023, 7:07 AM
9 points
2 comments6 min readLW link

[SEE NEW EDITS] No, *You* Need to Write Clearer

Nicholas / Heather KrossApr 29, 2023, 5:04 AM
262 points
65 comments5 min readLW link
(www.thinkingmuchbetter.com)

AI Mo­ral Align­ment: The Most Im­por­tant Goal of Our Generation

Ronen BarMar 27, 2025, 6:04 PM
2 points
0 comments8 min readLW link
(forum.effectivealtruism.org)

Ap­pen­dices to the live agendas

Nov 27, 2023, 11:10 AM
16 points
4 comments1 min readLW link

MATS Sum­mer 2023 Retrospective

Dec 1, 2023, 11:29 PM
77 points
34 comments26 min readLW link

What’s new at FAR AI

Dec 4, 2023, 9:18 PM
41 points
0 comments5 min readLW link
(far.ai)

How I learned to stop wor­ry­ing and love skill trees

junk heap homotopyMay 23, 2023, 4:08 AM
81 points
3 comments1 min readLW link

AI Safety Papers: An App for the TAI Safety Database

ozziegooenAug 21, 2021, 2:02 AM
81 points
13 comments2 min readLW link

Wikipe­dia as an in­tro­duc­tion to the al­ign­ment problem

SoerenMindMay 29, 2023, 6:43 PM
83 points
10 comments1 min readLW link
(en.wikipedia.org)

Terry Tao is host­ing an “AI to As­sist Math­e­mat­i­cal Rea­son­ing” workshop

junk heap homotopyJun 3, 2023, 1:19 AM
12 points
1 comment1 min readLW link
(terrytao.wordpress.com)

An overview of the points system

IknownothingJun 27, 2023, 9:09 AM
3 points
4 comments1 min readLW link
(ai-plans.com)

Brief sum­mary of ai-plans.com

IknownothingJun 28, 2023, 12:33 AM
9 points
4 comments2 min readLW link
(ai-plans.com)

What is ev­ery­one do­ing in AI governance

Igor IvanovJul 8, 2023, 3:16 PM
11 points
0 comments5 min readLW link

Even briefer sum­mary of ai-plans.com

IknownothingJul 16, 2023, 11:25 PM
10 points
6 comments2 min readLW link
(www.ai-plans.com)

Su­per­vised Pro­gram for Align­ment Re­search (SPAR) at UC Berkeley: Spring 2023 summary

Aug 19, 2023, 2:27 AM
23 points
2 comments6 min readLW link

Look­ing for judges for cri­tiques of Align­ment Plans

IknownothingAug 17, 2023, 10:35 PM
6 points
0 comments1 min readLW link

Refram­ing AI Safety Through the Lens of Iden­tity Main­te­nance Framework

Hiroshi YamakawaApr 1, 2025, 6:16 AM
−7 points
1 comment17 min readLW link

Be­come a PIBBSS Re­search Affiliate

Oct 10, 2023, 7:41 AM
24 points
6 comments6 min readLW link

ARENA 2.0 - Im­pact Report

CallumMcDougallSep 26, 2023, 5:13 PM
35 points
5 comments13 min readLW link

Cat­a­lyst books

CatneeSep 17, 2023, 5:05 PM
7 points
2 comments1 min readLW link

Doc­u­ment­ing Jour­ney Into AI Safety

jacobhaimesOct 10, 2023, 6:30 PM
17 points
4 comments6 min readLW link

Ap­ply for MATS Win­ter 2023-24!

Oct 21, 2023, 2:27 AM
104 points
6 comments5 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimesOct 22, 2023, 3:30 AM
5 points
1 comment1 min readLW link
(into-ai-safety.github.io)

Re­sources I send to AI re­searchers about AI safety

Vael GatesJun 14, 2022, 2:24 AM
69 points
12 comments1 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_CritchJun 14, 2022, 7:31 PM
241 points
41 comments2 min readLW link1 review

Slide deck: In­tro­duc­tion to AI Safety

Aryeh EnglanderJan 29, 2020, 3:57 PM
24 points
0 comments1 min readLW link
(drive.google.com)

On pre­sent­ing the case for AI risk

Aryeh EnglanderMar 9, 2022, 1:41 AM
54 points
17 comments4 min readLW link

A Quick List of Some Prob­lems in AI Align­ment As A Field

Nicholas / Heather KrossJun 21, 2022, 11:23 PM
75 points
12 comments6 min readLW link
(www.thinkingmuchbetter.com)

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGodJun 25, 2022, 1:53 PM
5 points
3 comments6 min readLW link

Refram­ing the AI Risk

Thane RuthenisJul 1, 2022, 6:44 PM
26 points
7 comments6 min readLW link

The Tree of Life: Stan­ford AI Align­ment The­ory of Change

Gabe MJul 2, 2022, 6:36 PM
25 points
0 comments14 min readLW link

Prin­ci­ples for Align­ment/​Agency Projects

johnswentworthJul 7, 2022, 2:07 AM
122 points
20 comments4 min readLW link

Re­shap­ing the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM
147 points
35 comments21 min readLW link

Prin­ci­ples of Pri­vacy for Align­ment Research

johnswentworthJul 27, 2022, 7:53 PM
73 points
31 comments7 min readLW link

An­nounc­ing the AI Safety Field Build­ing Hub, a new effort to provide AISFB pro­jects, men­tor­ship, and funding

Vael GatesJul 28, 2022, 9:29 PM
49 points
3 comments6 min readLW link

(My un­der­stand­ing of) What Every­one in Tech­ni­cal Align­ment is Do­ing and Why

Aug 29, 2022, 1:23 AM
413 points
90 comments37 min readLW link1 review

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil CrawfordSep 6, 2022, 5:17 PM
6 points
0 comments4 min readLW link

[Question] How can we se­cure more re­search po­si­tions at our uni­ver­si­ties for x-risk re­searchers?

Neil CrawfordSep 6, 2022, 5:17 PM
11 points
0 comments1 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

Orpheus16Sep 11, 2022, 11:43 PM
46 points
8 comments6 min readLW link

Gen­eral ad­vice for tran­si­tion­ing into The­o­ret­i­cal AI Safety

Martín SotoSep 15, 2022, 5:23 AM
12 points
0 comments10 min readLW link

Ap­ply for men­tor­ship in AI Safety field-building

Orpheus16Sep 17, 2022, 7:06 PM
9 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Align­ment Org Cheat Sheet

Sep 20, 2022, 5:36 PM
70 points
8 comments4 min readLW link

7 traps that (we think) new al­ign­ment re­searchers of­ten fall into

Sep 27, 2022, 11:13 PM
176 points
10 comments4 min readLW link

Re­sources that (I think) new al­ign­ment re­searchers should know about

Orpheus16Oct 28, 2022, 10:13 PM
70 points
9 comments4 min readLW link

[Question] Are al­ign­ment re­searchers de­vot­ing enough time to im­prov­ing their re­search ca­pac­ity?

Carson JonesNov 4, 2022, 12:58 AM
13 points
3 comments3 min readLW link

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

Nov 16, 2022, 2:14 PM
89 points
2 comments12 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM
78 points
40 comments2 min readLW link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash JafariDec 5, 2022, 7:21 PM
11 points
2 comments5 min readLW link

Fear miti­gated the nu­clear threat, can it do the same to AGI risks?

Igor IvanovDec 9, 2022, 10:04 AM
6 points
8 comments5 min readLW link

Ques­tions about AI that bother me

Eleni AngelouFeb 5, 2023, 5:04 AM
13 points
6 comments2 min readLW link

Ex­is­ten­tial AI Safety is NOT sep­a­rate from near-term applications

scasperDec 13, 2022, 2:47 PM
37 points
17 comments3 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

truth.in­tegrity(): A Re­cur­sive Frame­work for Hal­lu­ci­na­tion Preven­tion and Alignment

brittneyluongApr 2, 2025, 5:52 PM
1 point
0 comments2 min readLW link

There have been 3 planes (billion­aire donors) and 2 have crashed

trevorDec 17, 2022, 3:58 AM
16 points
10 comments2 min readLW link

Why I think that teach­ing philos­o­phy is high impact

Eleni AngelouDec 19, 2022, 3:11 AM
5 points
0 comments2 min readLW link

Into AI Safety: Epi­sode 3

jacobhaimesDec 11, 2023, 4:30 PM
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Air-gap­ping eval­u­a­tion and support

Ryan KiddDec 26, 2022, 10:52 PM
53 points
1 comment2 min readLW link

What AI Safety Ma­te­ri­als Do ML Re­searchers Find Com­pel­ling?

Dec 28, 2022, 2:03 AM
175 points
34 comments2 min readLW link

Thoughts On Ex­pand­ing the AI Safety Com­mu­nity: Benefits and Challenges of Outreach to Non-Tech­ni­cal Professionals

Yashvardhan SharmaJan 1, 2023, 7:21 PM
4 points
4 comments7 min readLW link

Align­ment, Anger, and Love: Prepar­ing for the Emer­gence of Su­per­in­tel­li­gent AI

tavurthJan 2, 2023, 6:16 AM
2 points
3 comments1 min readLW link

[Question] I have thou­sands of copies of HPMOR in Rus­sian. How to use them with the most im­pact?

Mikhail SaminJan 3, 2023, 10:21 AM
26 points
3 comments1 min readLW link

Look­ing for Span­ish AI Align­ment Researchers

AntbJan 7, 2023, 6:52 PM
7 points
3 comments1 min readLW link

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

Jan 10, 2023, 4:06 PM
84 points
8 comments39 min readLW link
(arxiv.org)

An­nounc­ing aisafety.training

JJ HepburnJan 21, 2023, 1:01 AM
61 points
4 comments1 min readLW link

An­nounc­ing Cavendish Labs

Jan 19, 2023, 8:15 PM
59 points
5 comments2 min readLW link
(forum.effectivealtruism.org)

How Do We Pro­tect AI From Hu­mans?

Alex BeymanJan 22, 2023, 3:59 AM
−4 points
11 comments6 min readLW link

A Brief Overview of AI Safety/​Align­ment Orgs, Fields, Re­searchers, and Re­sources for ML Researchers

Austin WitteFeb 2, 2023, 1:02 AM
18 points
1 comment2 min readLW link

In­ter­views with 97 AI Re­searchers: Quan­ti­ta­tive Analysis

Feb 2, 2023, 1:01 AM
23 points
0 comments7 min readLW link

Pre­dict­ing re­searcher in­ter­est in AI alignment

Vael GatesFeb 2, 2023, 12:58 AM
25 points
0 comments1 min readLW link

“AI Risk Dis­cus­sions” web­site: Ex­plor­ing in­ter­views from 97 AI Researchers

Feb 2, 2023, 1:00 AM
43 points
1 comment1 min readLW link

Ret­ro­spec­tive on the AI Safety Field Build­ing Hub

Vael GatesFeb 2, 2023, 2:06 AM
30 points
0 comments1 min readLW link

You are prob­a­bly not a good al­ign­ment re­searcher, and other blatant lies

junk heap homotopyFeb 2, 2023, 1:55 PM
83 points
16 comments2 min readLW link

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM
10 points
2 comments18 min readLW link

Aspiring AI safety re­searchers should ~argmax over AGI timelines

Ryan KiddMar 3, 2023, 2:04 AM
29 points
8 comments2 min readLW link

The hu­man­ity’s biggest mistake

RomanSMar 10, 2023, 4:30 PM
0 points
1 comment2 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane RuthenisDec 25, 2022, 4:50 PM
33 points
38 comments9 min readLW link

Some for-profit AI al­ign­ment org ideas

Eric HoDec 14, 2023, 2:23 PM
86 points
19 comments9 min readLW link

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimesDec 19, 2023, 7:03 PM
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromemMay 22, 2024, 11:09 AM
28 points
6 comments5 min readLW link

AI Safety Chatbot

Dec 21, 2023, 2:06 PM
61 points
11 comments4 min readLW link

Ta­lent Needs of Tech­ni­cal AI Safety Teams

May 24, 2024, 12:36 AM
117 points
65 comments14 min readLW link

INTERVIEW: StakeOut.AI w/​ Dr. Peter Park

jacobhaimesMar 4, 2024, 4:35 PM
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM
37 points
4 comments2 min readLW link

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimesJan 8, 2024, 5:10 PM
11 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Ap­ply to the 2024 PIBBSS Sum­mer Re­search Fellowship

Jan 12, 2024, 4:06 AM
39 points
1 comment2 min readLW link

So­cial me­dia al­ign­ment test

amayhewJan 16, 2024, 8:56 PM
1 point
0 comments1 min readLW link
(naiveskepticblog.wordpress.com)

This might be the last AI Safety Camp

Jan 24, 2024, 9:33 AM
196 points
34 comments1 min readLW link

Pro­posal for an AI Safety Prize

sweenesmJan 31, 2024, 6:35 PM
3 points
0 comments2 min readLW link

At­las: Stress-Test­ing ASI Value Learn­ing Through Grand Strat­egy Scenarios

NeilFoxFeb 17, 2025, 11:55 PM
1 point
0 comments2 min readLW link

[Question] Do you want to make an AI Align­ment song?

Kabir KumarFeb 9, 2024, 8:22 AM
4 points
0 comments1 min readLW link

Lay­ing the Foun­da­tions for Vi­sion and Mul­ti­modal Mechanis­tic In­ter­pretabil­ity & Open Problems

Mar 13, 2024, 5:09 PM
44 points
13 comments14 min readLW link

Offer­ing AI safety sup­port calls for ML professionals

Vael GatesFeb 15, 2024, 11:48 PM
61 points
1 comment1 min readLW link

No Click­bait—Misal­ign­ment Database

Kabir KumarFeb 18, 2024, 5:35 AM
6 points
10 comments1 min readLW link

A Nail in the Coffin of Exceptionalism

Yeshua GodMar 14, 2024, 10:41 PM
−17 points
0 comments3 min readLW link

Call for Ap­pli­ca­tions: XLab Sum­mer Re­search Fel­low­ship

JoNeedsSleepFeb 18, 2025, 7:19 PM
9 points
0 comments1 min readLW link

In­vi­ta­tion to the Prince­ton AI Align­ment and Safety Seminar

Sadhika MalladiMar 17, 2024, 1:10 AM
6 points
1 comment1 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimesMar 18, 2024, 9:21 PM
5 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimesMar 26, 2024, 12:25 AM
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)

Un­der­grad AI Safety Conference

JoNeedsSleepFeb 19, 2025, 3:43 AM
18 points
0 comments1 min readLW link

CEA seeks co-founder for AI safety group sup­port spin-off

agucovaApr 8, 2024, 3:42 PM
18 points
0 comments1 min readLW link

Ap­ply to the Pivotal Re­search Fel­low­ship (AI Safety & Biose­cu­rity)

Apr 10, 2024, 12:08 PM
18 points
0 comments1 min readLW link

[Question] Bar­cod­ing LLM Train­ing Data Sub­sets. Any­one try­ing this for in­ter­pretabil­ity?

right..enough?Apr 13, 2024, 3:09 AM
7 points
0 comments7 min readLW link

My ex­pe­rience at ML4Good AI Safety Bootcamp

TheManxLoinerApr 13, 2024, 10:55 AM
21 points
1 comment5 min readLW link

An­nounc­ing SPAR Sum­mer 2024!

laurenmarie12Apr 16, 2024, 8:30 AM
30 points
2 comments1 min readLW link

Hu­man-AI Re­la­tion­al­ity is Already Here

bridgebotFeb 20, 2025, 7:08 AM
13 points
0 comments15 min readLW link

Align­ment Gaps

kcyrasJun 8, 2024, 3:23 PM
11 points
4 comments8 min readLW link

MATS Win­ter 2023-24 Retrospective

May 11, 2024, 12:09 AM
86 points
28 comments49 min readLW link

An­nounc­ing the AI Safety Sum­mit Talks with Yoshua Bengio

otto.bartenMay 14, 2024, 12:52 PM
9 points
1 comment1 min readLW link

Propos­ing the Post-Sin­gu­lar­ity Sym­biotic Researches

Hiroshi YamakawaJun 20, 2024, 4:05 AM
7 points
1 comment12 min readLW link

Make Su­per­in­tel­li­gence Loving

Davey MorseFeb 21, 2025, 6:07 AM
8 points
9 comments5 min readLW link

AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): Call for ap­pli­cants v4.0

Jul 6, 2024, 11:34 AM
57 points
7 comments6 min readLW link

AI Align­ment and the Quest for Ar­tifi­cial Wisdom

MyspyJul 12, 2024, 9:34 PM
1 point
0 comments13 min readLW link

The AI al­ign­ment prob­lem in so­cio-tech­ni­cal sys­tems from a com­pu­ta­tional per­spec­tive: A Top-Down-Top view and outlook

zhaoweizhangJul 15, 2024, 6:56 PM
3 points
0 comments9 min readLW link

A Bet­ter Hyper­sti­tion (for AI read­ers)

Yeshua GodJul 15, 2024, 7:35 PM
−20 points
0 comments119 min readLW link

As We May Align

Gilbert CDec 20, 2024, 7:02 PM
−1 points
0 comments6 min readLW link

Mo­ral­ity as Nash Equil­ibrium: A De­duc­tive Frame­work for Ethics

johndoe112Apr 10, 2025, 9:43 AM
1 point
0 comments3 min readLW link

2/​3 Aussie & NZ AI Safety folk of­ten or some­times feel lonely or dis­con­nected (and 16 other bar­ri­ers to im­pact)

yanni kyriacosAug 1, 2024, 1:15 AM
12 points
0 comments8 min readLW link

Ap­ply now to SPAR!

agucovaDec 19, 2024, 10:29 PM
11 points
0 comments1 min readLW link

AI al­ign­ment for men­tal health supports

hiki_tFeb 24, 2025, 4:21 AM
1 point
1 comment1 min readLW link

A Philo­soph­i­cal Ar­ti­fact: “Wit­ness­ing Without a Self” — A Dialogue Between Hu­man and AI

Eric RosenbergApr 11, 2025, 6:58 PM
1 point
0 comments1 min readLW link
(archive.org)

The Com­pute Co­nun­drum: AI Gover­nance in a Shift­ing Geopoli­ti­cal Era

octavoSep 28, 2024, 1:05 AM
−3 points
1 comment17 min readLW link

AGI Farm

Rahul ChandOct 1, 2024, 4:29 AM
1 point
0 comments8 min readLW link

[Question] If I have some money, whom should I donate it to in or­der to re­duce ex­pected P(doom) the most?

KvmanThinkingOct 3, 2024, 11:31 AM
35 points
37 comments1 min readLW link

AI Align­ment via Slow Sub­strates: Early Em­piri­cal Re­sults With StarCraft II

Lester LeongOct 14, 2024, 4:05 AM
60 points
9 comments12 min readLW link

The Field of AI Align­ment: A Post­mortem, and What To Do About It

johnswentworthDec 26, 2024, 6:48 PM
295 points
160 comments8 min readLW link

How I’d like al­ign­ment to get done (as of 2024-10-18)

TristanTrimOct 18, 2024, 11:39 PM
11 points
4 comments4 min readLW link

Mak­ing LLMs safer is more in­tu­itive than you think: How Com­mon Sense and Diver­sity Im­prove AI Align­ment

Jeba SaniaDec 29, 2024, 7:27 PM
−5 points
1 comment6 min readLW link

Map of AI Safety v2

Apr 15, 2025, 1:04 PM
57 points
4 comments1 min readLW link

Shal­low re­view of tech­ni­cal AI safety, 2024

Dec 29, 2024, 12:01 PM
185 points
34 comments41 min readLW link

Ap­ply to be a men­tor in SPAR!

agucovaNov 5, 2024, 9:32 PM
5 points
0 comments1 min readLW link

Break­ing down the MEAT of Alignment

JasonBrownApr 7, 2025, 8:47 AM
7 points
2 comments11 min readLW link

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yixNov 15, 2024, 12:22 AM
41 points
2 comments5 min readLW link
(open.substack.com)

A bet­ter “State­ment on AI Risk?”

Knight LeeNov 25, 2024, 4:50 AM
9 points
6 comments3 min readLW link

Should you in­crease AI al­ign­ment fund­ing, or in­crease AI reg­u­la­tion?

Knight LeeNov 26, 2024, 9:17 AM
7 points
1 comment4 min readLW link

Launch­ing Ap­pli­ca­tions for the Global AI Safety Fel­low­ship 2025!

Aditya_SKNov 30, 2024, 2:02 PM
11 points
5 comments1 min readLW link

GPT-4.5 is Cog­ni­tive Em­pa­thy, Son­net 3.5 is Affec­tive Empathy

JackApr 16, 2025, 7:12 PM
15 points
2 comments4 min readLW link

ARENA 4.0 Im­pact Report

Nov 27, 2024, 8:51 PM
43 points
3 comments13 min readLW link

A FRESH view of Alignment

robmanApr 16, 2025, 9:40 PM
1 point
0 comments1 min readLW link

Play­ing Dixit with AI: Can AI Sys­tems Iden­tify Misal­ign­ments in My Per­son­al­ized State­ments?

Mariia KoroliukJan 17, 2025, 6:52 PM
1 point
0 comments2 min readLW link

In­tro­duc­ing the An­thropic Fel­lows Program

Nov 30, 2024, 11:47 PM
26 points
0 comments4 min readLW link
(alignment.anthropic.com)

Top AI safety newslet­ters, books, pod­casts, etc – new AISafety.com resource

Mar 4, 2025, 5:01 PM
32 points
2 comments1 min readLW link

Mak­ing progress bars for Alignment

Kabir KumarJan 3, 2025, 9:25 PM
2 points
0 comments1 min readLW link
(lu.ma)

Build­ing Big Science from the Bot­tom-Up: A Frac­tal Ap­proach to AI Safety

Lauren GreenspanJan 7, 2025, 3:08 AM
37 points
2 comments12 min readLW link

Is Align­ment a flawed ap­proach?

Patrick BernardMar 11, 2025, 8:32 PM
1 point
0 comments3 min readLW link

Devel­op­ing AI Safety: Bridg­ing the Power-Ethics Gap (In­tro­duc­ing New Con­cepts)

Ronen BarApr 20, 2025, 4:40 AM
2 points
0 comments5 min readLW link
(forum.effectivealtruism.org)

14+ AI Safety Ad­vi­sors You Can Speak to – New AISafety.com Resource

Jan 21, 2025, 5:34 PM
24 points
0 comments1 min readLW link

A Safer Path to AGI? Con­sid­er­ing the Self-to-Pro­cess­ing Route as an Alter­na­tive to Pro­cess­ing-to-Self

opApr 21, 2025, 1:09 PM
1 point
0 comments1 min readLW link

An­nounce­ment: Learn­ing The­ory On­line Course

Jan 20, 2025, 7:55 PM
63 points
33 comments4 min readLW link

Con­struct­ing a Be­hav­iorally-Con­strained AI via Prompt Re­cur­sion: A Field Log from Within

SarenfieldlogApr 22, 2025, 6:59 AM
1 point
0 comments3 min readLW link

Train­ing Data At­tri­bu­tion (TDA): Ex­am­in­ing Its Adop­tion & Use Cases

Jan 22, 2025, 3:40 PM
16 points
0 comments3 min readLW link
(www.convergenceanalysis.org)

Un­der­stand­ing AI World Models w/​ Chris Canal

jacobhaimesJan 27, 2025, 4:32 PM
4 points
0 comments1 min readLW link
(kairos.fm)

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym FibírJan 30, 2025, 10:58 AM
5 points
14 comments10 min readLW link
(tetherware.substack.com)

The Illu­sion of Trans­parency as a Trust-Build­ing Mechanism

Priyanka BharadwajMar 19, 2025, 5:09 PM
1 point
0 comments1 min readLW link

The AI Belief-Con­sis­tency Letter

Knight LeeApr 23, 2025, 12:01 PM
0 points
15 comments4 min readLW link

ARENA 5.0 - Call for Applicants

Jan 30, 2025, 1:18 PM
35 points
2 comments6 min readLW link

A Plu­ral­is­tic Frame­work for Rogue AI Containment

TheThinkingArboristMar 22, 2025, 12:54 PM
1 point
0 comments7 min readLW link

Re­cur­sive On­tolog­i­cal Pres­sure: Toward Sys­tems That Col­lapse In­stead of Lying

William StetarMar 24, 2025, 2:55 PM
1 point
0 comments1 min readLW link

“Should AI Ques­tion Its Own De­ci­sions? A Thought Ex­per­i­ment”

CMDR WOTZFeb 4, 2025, 8:39 AM
1 point
0 comments1 min readLW link

SERI MATS—Sum­mer 2023 Cohort

Apr 8, 2023, 3:32 PM
71 points
25 comments4 min readLW link

Cri­tiques of promi­nent AI safety labs: Red­wood Research

Omega.Apr 17, 2023, 6:20 PM
4 points
0 comments22 min readLW link
(forum.effectivealtruism.org)

AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): call for applicants

CallumMcDougallApr 17, 2023, 8:30 PM
100 points
9 comments7 min readLW link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_EthApr 18, 2023, 8:09 AM
10 points
0 comments1 min readLW link

Ap­ply to be­come a Fu­turekind AI Fa­cil­i­ta­tor or Men­tor (dead­line: April 10)

superbeneficiaryMar 26, 2025, 3:47 PM
4 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

An open let­ter to SERI MATS pro­gram organisers

Roman LeventovApr 20, 2023, 4:34 PM
26 points
26 comments4 min readLW link

AI Align­ment: A Com­pre­hen­sive Survey

Stephen McAleerNov 1, 2023, 5:35 PM
20 points
1 comment1 min readLW link
(arxiv.org)

Tips, tricks, les­sons and thoughts on host­ing hackathons

gergogasparNov 6, 2023, 11:03 AM
3 points
0 comments11 min readLW link

How well does your re­search adress the the­ory-prac­tice gap?

Jonas HallgrenNov 8, 2023, 11:27 AM
18 points
0 comments10 min readLW link
No comments.