AI Alignment Fieldbuilding

TagLast edit: Jun 15, 2022, 10:42 PM by plex

AI Alignment Fieldbuilding is the effort to improve the alignment ecosystem. Some priorities include introducing new people to the importance of AI risk, on-boarding them by connecting them with key resources and ideas, educating them on existing literature and methods for generating new and valuable research, supporting people who are contributing, and maintaining and improving the funding systems.

There is an invite-only Slack for people working on the alignment ecosystem. If you’d like to join message plex with an overview of your involvement.

[Question] Papers to start getting into NLP-focused alignment research

FeraidoonSep 24, 2022, 11:53 PM

6 points

0 comments1 min readLW link

The inordinately slow spread of good AGI conversations in ML

Rob BensingerJun 21, 2022, 4:09 PM

173 points

62 comments8 min readLW link

ML Alignment Theory Program under Evan Hubinger

ozhang, evhub and Victor W

Dec 6, 2021, 12:03 AM

82 points

3 comments2 min readLW link

Shallow review of live agendas in alignment & safety

technicalities and Stag

Nov 27, 2023, 11:10 AM

348 points

73 comments29 min readLW link 1 review

Takeaways from a survey on AI alignment resources

DanielFilanNov 5, 2022, 11:40 PM

73 points

10 comments6 min readLW link 1 review

(danielfilan.com)

Talk: AI safety fieldbuilding at MATS

Ryan KiddJun 23, 2024, 11:06 PM

26 points

2 comments10 min readLW link

Don’t Share Information Exfohazardous on Others’ AI-Risk Models

Thane RuthenisDec 19, 2023, 8:09 PM

68 points

11 comments1 min readLW link

Most People Start With The Same Few Bad Ideas

johnswentworthSep 9, 2022, 12:29 AM

165 points

30 comments3 min readLW link

The Importance of AI Alignment, explained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM

33 points

2 comments1 min readLW link

Qualities that alignment mentors value in junior researchers

Orpheus16Feb 14, 2023, 11:27 PM

88 points

14 comments3 min readLW link

Middle Child Phenomenon

PhilosophicalSoulMar 15, 2024, 8:47 PM

3 points

3 comments2 min readLW link

Problems of people new to AI safety and my project ideas to mitigate them

Igor IvanovMar 1, 2023, 9:09 AM

38 points

4 comments7 min readLW link

aisafety.community—A living document of AI safety communities

zeshen and plex

Oct 28, 2022, 5:50 PM

58 points

23 comments1 min readLW link

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau, Xander Davies, Buck and Nate Thomas

Oct 27, 2022, 1:32 AM

135 points

14 comments12 min readLW link

Transcripts of interviews with AI researchers

Vael GatesMay 9, 2022, 5:57 AM

170 points

9 comments2 min readLW link

Demystifying “Alignment” through a Comic

milanroskoJun 9, 2024, 8:24 AM

106 points

19 comments1 min readLW link

MATS Spring 2024 Extension Retrospective

HenningB, Matthew Wearden, Cameron Holmes and Ryan Kidd

Feb 12, 2025, 10:43 PM

24 points

1 comment15 min readLW link

[Question] What are all the AI Alignment and AI Safety Communication Hubs?

Gunnar_ZarnckeJun 15, 2022, 4:16 PM

27 points

5 comments1 min readLW link

AI Safety Unconference NeurIPS 2022

OrpheusNov 7, 2022, 3:39 PM

25 points

0 comments1 min readLW link

(aisafetyevents.org)

AI Safety Europe Retreat 2023 Retrospective

Magdalena WacheApr 14, 2023, 9:05 AM

43 points

0 comments2 min readLW link

AI Safety Arguments: An Interactive Guide

Lukas TrötzmüllerFeb 1, 2023, 7:26 PM

20 points

0 comments3 min readLW link

Why I funded PIBBSS

Ryan KiddSep 15, 2024, 7:56 PM

115 points

21 comments3 min readLW link

How to Diversify Conceptual Alignment: the Model Behind Refine

adamShimiJul 20, 2022, 10:44 AM

87 points

11 comments8 min readLW link

A newcomer’s guide to the technical AI safety field

zeshenNov 4, 2022, 2:29 PM

42 points

3 comments10 min readLW link

[Question] Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Roman LeventovJan 26, 2024, 9:49 AM

21 points

5 comments1 min readLW link

Advice for new alignment people: Info Max

Jonas HallgrenMay 30, 2023, 3:42 PM

23 points

4 comments5 min readLW link

The Alignment Community Is Culturally Broken

sudoNov 13, 2022, 6:53 PM

136 points

68 comments2 min readLW link

Project Idea: Challenge Groups for Alignment Researchers

Adam ZernerMay 27, 2023, 8:10 PM

13 points

0 comments1 min readLW link

What I Learned Running Refine

adamShimiNov 24, 2022, 2:49 PM

108 points

5 comments4 min readLW link

The AI Safety community has four main work groups, Strategy, Governance, Technical and Movement Building

peterslatteryNov 25, 2022, 3:45 AM

1 point

0 comments6 min readLW link

Assessment of AI safety agendas: think about the downside risk

Roman LeventovDec 19, 2023, 9:00 AM

13 points

1 comment1 min readLW link

[Job ad] LISA CEO

Ryan Kidd, James Fox and mike_safeAI

Feb 9, 2025, 12:18 AM

18 points

4 comments2 min readLW link

AISafety.info “How can I help?” FAQ

steven0461 and Severin T. Seehrich

Jun 5, 2023, 10:09 PM

59 points

0 comments2 min readLW link

[Question] Does anyone’s full-time job include reading and understanding all the most-promising formal AI alignment work?

Nicholas / Heather KrossJun 16, 2023, 2:24 AM

15 points

2 comments1 min readLW link

Reflections on the PIBBSS Fellowship 2022

Nora_Ammann and particlemania

Dec 11, 2022, 9:53 PM

32 points

0 comments18 min readLW link

Good News, Everyone!

jbashMar 25, 2023, 1:48 PM

132 points

23 comments2 min readLW link

Introducing EffiSciences’ AI Safety Unit

WCargo, Charbel-Raphaël and Florent_Berthet

Jun 30, 2023, 7:44 AM

68 points

0 comments12 min readLW link

AI Safety Movement Builders should help the community to optimise three factors: contributors, contributions and coordination

peterslatteryDec 15, 2022, 10:50 PM

4 points

0 comments6 min readLW link

Cost-effectiveness of professional field-building programs for AI safety research

Dan HJul 10, 2023, 6:28 PM

8 points

5 comments1 min readLW link

Campaign for AI Safety: Please join me

Nik SamoylovApr 1, 2023, 9:32 AM

18 points

9 comments1 min readLW link

Concrete Steps to Get Started in Transformer Mechanistic Interpretability

Neel NandaDec 25, 2022, 10:21 PM

57 points

7 comments12 min readLW link

(www.neelnanda.io)

An overview of some promising work by junior alignment researchers

Orpheus16Dec 26, 2022, 5:23 PM

34 points

0 comments4 min readLW link

Alignment Megaprojects: You’re Not Even Trying to Have Ideas

Nicholas / Heather KrossJul 12, 2023, 11:39 PM

55 points

32 comments2 min readLW link

Reflections on my 5-month alignment upskilling grant

Jay BaileyDec 27, 2022, 10:51 AM

82 points

4 comments8 min readLW link

2022 AI Alignment Course: 5→37% working on AI safety

DewiJun 21, 2024, 5:45 PM

7 points

3 comments3 min readLW link

[Question] Can AI Alignment please create a Reddit-like platform that would make it much easier for alignment researchers to find and help each other?

Georgeo57Jul 21, 2023, 2:03 PM

−5 points

2 comments1 min readLW link

How to find AI alignment researchers to collaborate with?

Florian DietzJul 31, 2023, 9:05 AM

2 points

2 comments1 min readLW link

AI Safety Hub Serbia Soft Launch

DusanDNesicOct 20, 2023, 7:11 AM

64 points

1 comment3 min readLW link

(forum.effectivealtruism.org)

When discussing AI risks, talk about capabilities, not intelligence

VikaAug 11, 2023, 1:38 PM

124 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

AGISF adaptation for in-person groups

Sam Marks, Xander Davies and Richard_Ngo

Jan 13, 2023, 3:24 AM

44 points

2 comments3 min readLW link

AISafety.world is a map of the AIS ecosystem

Hamish DoodlesApr 6, 2023, 6:37 PM

80 points

0 comments1 min readLW link

[Question] Incentives affecting alignment-researcher encouragement

Nicholas / Heather KrossAug 29, 2023, 5:11 AM

28 points

3 comments1 min readLW link

How many people are working (directly) on reducing existential risk from AI?

Benjamin HiltonJan 18, 2023, 8:46 AM

20 points

1 comment1 min readLW link

Information warfare historically revolved around human conduits

trevorAug 28, 2023, 6:54 PM

37 points

7 comments3 min readLW link

AGI safety field building projects I’d like to see

Severin T. SeehrichJan 19, 2023, 10:40 PM

68 points

28 comments9 min readLW link

Australian AI Safety Forum 2024

Liam Carroll and Daniel Murfet

Sep 27, 2024, 12:40 AM

42 points

0 comments2 min readLW link

All images from the WaitButWhy sequence on AI

trevorApr 8, 2023, 7:36 AM

73 points

5 comments2 min readLW link

You should consider applying to PhDs (soon!)

bilalchughtaiNov 29, 2024, 8:33 PM

114 points

19 comments6 min readLW link

MATS Alumni Impact Analysis

utilistrutil, Juan Gil, yams, LauraVaughan, K Richards and Ryan Kidd

Sep 30, 2024, 2:35 AM

61 points

7 comments11 min readLW link

AI Safety University Organizing: Early Takeaways from Thirteen Groups

agucovaOct 2, 2024, 3:14 PM

26 points

0 comments1 min readLW link

MATS is hiring!

Ryan Kidd and VVN

Apr 8, 2025, 8:45 PM

8 points

0 comments6 min readLW link

[Question] If there was a millennium equivalent prize for AI alignment, what would the problems be?

Yair HalberstadtJun 9, 2022, 4:56 PM

17 points

4 comments1 min readLW link

If no near-term alignment strategy, research should aim for the long-term

harsimonyJun 9, 2022, 7:10 PM

7 points

1 comment1 min readLW link

Are AI developers playing with fire?

marcusarvanMar 16, 2023, 7:12 PM

6 points

0 comments10 min readLW link

How Can Average People Contribute to AI Safety?

Stephen McAleeseMar 6, 2025, 10:50 PM

16 points

4 comments8 min readLW link

How Josiah became an AI safety researcher

Neil CrawfordSep 6, 2022, 5:17 PM

4 points

0 comments1 min readLW link

Apply to be a TA for TARA

yanni kyriacosDec 20, 2024, 2:25 AM

10 points

0 comments1 min readLW link

Introducing 11 New AI Safety Organizations—Catalyze’s Winter 24/25 London Incubation Program Cohort

Alexandra BosMar 10, 2025, 7:26 PM

70 points

0 comments1 min readLW link

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

RaemonJul 3, 2024, 8:34 PM

274 points

71 comments1 min readLW link

Survey for alignment researchers!

Cameron Berg, Judd Rosenblatt and AE Studio

Feb 2, 2024, 8:41 PM

71 points

11 comments1 min readLW link

Meridian Cambridge Visiting Researcher Programme: Turn AI safety ideas into funded projects in one week!

Meridian CambridgeMar 11, 2025, 5:46 PM

13 points

0 comments2 min readLW link

AI safety university groups: a promising opportunity to reduce existential risk

micJul 1, 2022, 3:59 AM

14 points

0 comments11 min readLW link

MATS mentor selection

DanielFilan and Ryan Kidd

Jan 10, 2025, 3:12 AM

44 points

11 comments6 min readLW link

Tokyo AI Safety 2025: Call For Papers

BlaineOct 21, 2024, 8:43 AM

24 points

0 comments3 min readLW link

(www.tais2025.cc)

Four visions of Transformative AI success

Steven ByrnesJan 17, 2024, 8:45 PM

112 points

22 comments15 min readLW link

AI alignment as “navigating the space of intelligent behaviour”

Nora_AmmannAug 23, 2022, 1:28 PM

18 points

0 comments6 min readLW link

AI Safety Strategies Landscape

Charbel-RaphaëlMay 9, 2024, 5:33 PM

34 points

1 comment42 min readLW link

[Question] Help me find a good Hackathon subject

Charbel-RaphaëlSep 4, 2022, 8:40 AM

6 points

18 comments1 min readLW link

Paper: Field-building and the epistemic culture of AI safety

peterslatteryMar 15, 2025, 12:30 PM

13 points

3 comments3 min readLW link

(firstmonday.org)

Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results

IknownothingJan 15, 2024, 7:37 PM

24 points

0 comments25 min readLW link

(aiplans.substack.com)

[An email with a bunch of links I sent an experienced ML researcher interested in learning about Alignment / x-safety.]

David Scott Krueger (formerly: capybaralet)Sep 8, 2022, 10:28 PM

47 points

1 comment5 min readLW link

Apply to MATS 8.0!

Ryan Kidd and K Richards

Mar 20, 2025, 2:17 AM

63 points

5 comments4 min readLW link

Announcing AISIC 2022 - the AI Safety Israel Conference, October 19-20

DavidmanheimSep 21, 2022, 7:32 PM

13 points

0 comments1 min readLW link

AGI safety career advice

Richard_NgoMay 2, 2023, 7:36 AM

132 points

24 comments13 min readLW link

Lessons learned from talking to >100 academics about AI safety

Marius HobbhahnOct 10, 2022, 1:16 PM

216 points

18 comments12 min readLW link 1 review

2025 Q1 Pivotal Research Fellowship (Technical & Policy)

Tobias H and tilmanr

Nov 12, 2024, 10:56 AM

7 points

0 comments2 min readLW link

The Vitalik Buterin Fellowship in AI Existential Safety is open for applications!

Xin Chen, CynthiaOct 13, 2022, 6:32 PM

21 points

0 comments1 min readLW link

AI Safety in China: Part 2

Lao MeinMay 22, 2023, 2:50 PM

101 points

28 comments2 min readLW link

There Should Be More Alignment-Driven Startups

Vaniver, Judd Rosenblatt, Cameron Berg and phgubbins

May 31, 2024, 2:05 AM

62 points

14 comments11 min readLW link

AI Safety Needs Great Product Builders

goodgravyNov 2, 2022, 11:33 AM

14 points

2 comments1 min readLW link

Announcing Athena—Women in AI Alignment Research

Claire ShortNov 7, 2023, 9:46 PM

80 points

2 comments3 min readLW link

Into AI Safety Episodes 1 & 2

jacobhaimesNov 9, 2023, 4:36 AM

2 points

0 comments1 min readLW link

(into-ai-safety.github.io)

The Social Alignment Problem

irvingApr 28, 2023, 2:16 PM

99 points

13 comments8 min readLW link

[Question] AI Safety orgs- what’s your biggest bottleneck right now?

Kabir KumarNov 16, 2023, 2:02 AM

1 point

0 comments1 min readLW link

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaleyNov 17, 2023, 8:55 PM

16 points

8 comments15 min readLW link

4. A Moral Case for Evolved-Sapience-Chauvinism

RogerDearnaleyNov 24, 2023, 4:56 AM

10 points

0 comments4 min readLW link

3. Uploading

RogerDearnaleyNov 23, 2023, 7:39 AM

21 points

5 comments8 min readLW link

2. AIs as Economic Agents

RogerDearnaleyNov 23, 2023, 7:07 AM

9 points

2 comments6 min readLW link

[SEE NEW EDITS] No, You Need to Write Clearer

Nicholas / Heather KrossApr 29, 2023, 5:04 AM

262 points

65 comments5 min readLW link

(www.thinkingmuchbetter.com)

AI Moral Alignment: The Most Important Goal of Our Generation

Ronen BarMar 27, 2025, 6:04 PM

2 points

0 comments8 min readLW link

(forum.effectivealtruism.org)

Appendices to the live agendas

technicalities and Stag

Nov 27, 2023, 11:10 AM

16 points

4 comments1 min readLW link

MATS Summer 2023 Retrospective

utilistrutil, Juan Gil, Ryan Kidd, Christian Smith, McKennaFitzgerald and LauraVaughan

Dec 1, 2023, 11:29 PM

77 points

34 comments26 min readLW link

What’s new at FAR AI

AdamGleave and EuanMcLean

Dec 4, 2023, 9:18 PM

41 points

0 comments5 min readLW link

(far.ai)

How I learned to stop worrying and love skill trees

junk heap homotopyMay 23, 2023, 4:08 AM

81 points

3 comments1 min readLW link

AI Safety Papers: An App for the TAI Safety Database

ozziegooenAug 21, 2021, 2:02 AM

81 points

13 comments2 min readLW link

Wikipedia as an introduction to the alignment problem

SoerenMindMay 29, 2023, 6:43 PM

83 points

10 comments1 min readLW link

(en.wikipedia.org)

Terry Tao is hosting an “AI to Assist Mathematical Reasoning” workshop

junk heap homotopyJun 3, 2023, 1:19 AM

12 points

1 comment1 min readLW link

(terrytao.wordpress.com)

An overview of the points system

IknownothingJun 27, 2023, 9:09 AM

3 points

4 comments1 min readLW link

(ai-plans.com)

Brief summary of ai-plans.com

IknownothingJun 28, 2023, 12:33 AM

9 points

4 comments2 min readLW link

(ai-plans.com)

What is everyone doing in AI governance

Igor IvanovJul 8, 2023, 3:16 PM

11 points

0 comments5 min readLW link

Even briefer summary of ai-plans.com

IknownothingJul 16, 2023, 11:25 PM

10 points

6 comments2 min readLW link

(www.ai-plans.com)

Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

mic, dx26, adamk and Carolyn Qian

Aug 19, 2023, 2:27 AM

23 points

2 comments6 min readLW link

Looking for judges for critiques of Alignment Plans

IknownothingAug 17, 2023, 10:35 PM

6 points

0 comments1 min readLW link

Reframing AI Safety Through the Lens of Identity Maintenance Framework

Hiroshi YamakawaApr 1, 2025, 6:16 AM

−7 points

1 comment17 min readLW link

Become a PIBBSS Research Affiliate

Nora_Ammann and DusanDNesic

Oct 10, 2023, 7:41 AM

24 points

6 comments6 min readLW link

ARENA 2.0 - Impact Report

CallumMcDougallSep 26, 2023, 5:13 PM

35 points

5 comments13 min readLW link

Catalyst books

CatneeSep 17, 2023, 5:05 PM

7 points

2 comments1 min readLW link

Documenting Journey Into AI Safety

jacobhaimesOct 10, 2023, 6:30 PM

17 points

4 comments6 min readLW link

Apply for MATS Winter 2023-24!

utilistrutil, Ryan Kidd and LauraVaughan

Oct 21, 2023, 2:27 AM

104 points

6 comments5 min readLW link

Into AI Safety—Episode 0

jacobhaimesOct 22, 2023, 3:30 AM

5 points

1 comment1 min readLW link

(into-ai-safety.github.io)

Resources I send to AI researchers about AI safety

Vael GatesJun 14, 2022, 2:24 AM

69 points

12 comments1 min readLW link

Slow motion videos as AI risk intuition pumps

Andrew_CritchJun 14, 2022, 7:31 PM

241 points

41 comments2 min readLW link 1 review

Slide deck: Introduction to AI Safety

Aryeh EnglanderJan 29, 2020, 3:57 PM

24 points

0 comments1 min readLW link

(drive.google.com)

On presenting the case for AI risk

Aryeh EnglanderMar 9, 2022, 1:41 AM

54 points

17 comments4 min readLW link

A Quick List of Some Problems in AI Alignment As A Field

Nicholas / Heather KrossJun 21, 2022, 11:23 PM

75 points

12 comments6 min readLW link

(www.thinkingmuchbetter.com)

[LQ] Some Thoughts on Messaging Around AI Risk

DragonGodJun 25, 2022, 1:53 PM

5 points

3 comments6 min readLW link

Reframing the AI Risk

Thane RuthenisJul 1, 2022, 6:44 PM

26 points

7 comments6 min readLW link

The Tree of Life: Stanford AI Alignment Theory of Change

Gabe MJul 2, 2022, 6:36 PM

25 points

0 comments14 min readLW link

Principles for Alignment/Agency Projects

johnswentworthJul 7, 2022, 2:07 AM

122 points

20 comments4 min readLW link

Reshaping the AI Industry

Thane RuthenisMay 29, 2022, 10:54 PM

147 points

35 comments21 min readLW link

Principles of Privacy for Alignment Research

johnswentworthJul 27, 2022, 7:53 PM

73 points

31 comments7 min readLW link

Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding

Vael GatesJul 28, 2022, 9:29 PM

49 points

3 comments6 min readLW link

(My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen and elifland

Aug 29, 2022, 1:23 AM

413 points

90 comments37 min readLW link 1 review

Community Building for Graduate Students: A Targeted Approach

Neil CrawfordSep 6, 2022, 5:17 PM

6 points

0 comments4 min readLW link

[Question] How can we secure more research positions at our universities for x-risk researchers?

Neil CrawfordSep 6, 2022, 5:17 PM

11 points

0 comments1 min readLW link

AI Safety field-building projects I’d like to see

Orpheus16Sep 11, 2022, 11:43 PM

46 points

8 comments6 min readLW link

General advice for transitioning into Theoretical AI Safety

Martín SotoSep 15, 2022, 5:23 AM

12 points

0 comments10 min readLW link

Apply for mentorship in AI Safety field-building

Orpheus16Sep 17, 2022, 7:06 PM

9 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Alignment Org Cheat Sheet

Orpheus16 and Thomas Larsen

Sep 20, 2022, 5:36 PM

70 points

8 comments4 min readLW link

7 traps that (we think) new alignment researchers often fall into

Orpheus16 and Thomas Larsen

Sep 27, 2022, 11:13 PM

176 points

10 comments4 min readLW link

Resources that (I think) new alignment researchers should know about

Orpheus16Oct 28, 2022, 10:13 PM

70 points

9 comments4 min readLW link

[Question] Are alignment researchers devoting enough time to improving their research capacity?

Carson JonesNov 4, 2022, 12:58 AM

13 points

3 comments3 min readLW link

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

Nov 16, 2022, 2:14 PM

89 points

2 comments12 min readLW link

Probably good projects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM

78 points

40 comments2 min readLW link

Analysis of AI Safety surveys for field-building insights

Ash JafariDec 5, 2022, 7:21 PM

11 points

2 comments5 min readLW link

Fear mitigated the nuclear threat, can it do the same to AGI risks?

Igor IvanovDec 9, 2022, 10:04 AM

6 points

8 comments5 min readLW link

Questions about AI that bother me

Eleni AngelouFeb 5, 2023, 5:04 AM

13 points

6 comments2 min readLW link

Existential AI Safety is NOT separate from near-term applications

scasperDec 13, 2022, 2:47 PM

37 points

17 comments3 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubKDec 13, 2022, 7:01 PM

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

truth.integrity(): A Recursive Framework for Hallucination Prevention and Alignment

brittneyluongApr 2, 2025, 5:52 PM

1 point

0 comments2 min readLW link

There have been 3 planes (billionaire donors) and 2 have crashed

trevorDec 17, 2022, 3:58 AM

16 points

10 comments2 min readLW link

Why I think that teaching philosophy is high impact

Eleni AngelouDec 19, 2022, 3:11 AM

5 points

0 comments2 min readLW link

Into AI Safety: Episode 3

jacobhaimesDec 11, 2023, 4:30 PM

6 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Air-gapping evaluation and support

Ryan KiddDec 26, 2022, 10:52 PM

53 points

1 comment2 min readLW link

What AI Safety Materials Do ML Researchers Find Compelling?

Vael Gates and Collin

Dec 28, 2022, 2:03 AM

175 points

34 comments2 min readLW link

Thoughts On Expanding the AI Safety Community: Benefits and Challenges of Outreach to Non-Technical Professionals

Yashvardhan SharmaJan 1, 2023, 7:21 PM

4 points

4 comments7 min readLW link

Alignment, Anger, and Love: Preparing for the Emergence of Superintelligent AI

tavurthJan 2, 2023, 6:16 AM

2 points

3 comments1 min readLW link

[Question] I have thousands of copies of HPMOR in Russian. How to use them with the most impact?

Mikhail SaminJan 3, 2023, 10:21 AM

26 points

3 comments1 min readLW link

Looking for Spanish AI Alignment Researchers

AntbJan 7, 2023, 6:52 PM

7 points

3 comments1 min readLW link

The Alignment Problem from a Deep Learning Perspective (major rewrite)

SoerenMind, Richard_Ngo and LawrenceC

Jan 10, 2023, 4:06 PM

84 points

8 comments39 min readLW link

(arxiv.org)

Announcing aisafety.training

JJ HepburnJan 21, 2023, 1:01 AM

61 points

4 comments1 min readLW link

Announcing Cavendish Labs

derikk and agg

Jan 19, 2023, 8:15 PM

59 points

5 comments2 min readLW link

(forum.effectivealtruism.org)

How Do We Protect AI From Humans?

Alex BeymanJan 22, 2023, 3:59 AM

−4 points

11 comments6 min readLW link

A Brief Overview of AI Safety/Alignment Orgs, Fields, Researchers, and Resources for ML Researchers

Austin WitteFeb 2, 2023, 1:02 AM

18 points

1 comment2 min readLW link

Interviews with 97 AI Researchers: Quantitative Analysis

Maheen Shermohammed and Vael Gates

Feb 2, 2023, 1:01 AM

23 points

0 comments7 min readLW link

Predicting researcher interest in AI alignment

Vael GatesFeb 2, 2023, 12:58 AM

25 points

0 comments1 min readLW link

“AI Risk Discussions” website: Exploring interviews from 97 AI Researchers

Vael Gates, Lukas Trötzmüller, Maheen Shermohammed, michaelkeenan and zchuang

Feb 2, 2023, 1:00 AM

43 points

1 comment1 min readLW link

Retrospective on the AI Safety Field Building Hub

Vael GatesFeb 2, 2023, 2:06 AM

30 points

0 comments1 min readLW link

You are probably not a good alignment researcher, and other blatant lies

junk heap homotopyFeb 2, 2023, 1:55 PM

83 points

16 comments2 min readLW link

AGI doesn’t need understanding, intention, or consciousness in order to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM

10 points

2 comments18 min readLW link

Aspiring AI safety researchers should ~argmax over AGI timelines

Ryan KiddMar 3, 2023, 2:04 AM

29 points

8 comments2 min readLW link

The humanity’s biggest mistake

RomanSMar 10, 2023, 4:30 PM

0 points

1 comment2 min readLW link

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Thane RuthenisDec 25, 2022, 4:50 PM

33 points

38 comments9 min readLW link

Some for-profit AI alignment org ideas

Eric HoDec 14, 2023, 2:23 PM

86 points

19 comments9 min readLW link

Interview: Applications w/ Alice Rigg

jacobhaimesDec 19, 2023, 7:03 PM

12 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Cicadas, Anthropic, and the bilateral alignment problem

kromemMay 22, 2024, 11:09 AM

28 points

6 comments5 min readLW link

AI Safety Chatbot

markov and Robert Miles

Dec 21, 2023, 2:06 PM

61 points

11 comments4 min readLW link

Talent Needs of Technical AI Safety Teams

yams, Carson Jones, McKennaFitzgerald and Ryan Kidd

May 24, 2024, 12:36 AM

117 points

65 comments14 min readLW link

INTERVIEW: StakeOut.AI w/ Dr. Peter Park

jacobhaimesMar 4, 2024, 4:35 PM

6 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

Hackathon and Staying Up-to-Date in AI

jacobhaimesJan 8, 2024, 5:10 PM

11 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Apply to the 2024 PIBBSS Summer Research Fellowship

Nora_Ammann, DusanDNesic and Lucas Teixeira

Jan 12, 2024, 4:06 AM

39 points

1 comment2 min readLW link

Social media alignment test

amayhewJan 16, 2024, 8:56 PM

1 point

0 comments1 min readLW link

(naiveskepticblog.wordpress.com)

This might be the last AI Safety Camp

Remmelt and Linda Linsefors

Jan 24, 2024, 9:33 AM

196 points

34 comments1 min readLW link

Proposal for an AI Safety Prize

sweenesmJan 31, 2024, 6:35 PM

3 points

0 comments2 min readLW link

Atlas: Stress-Testing ASI Value Learning Through Grand Strategy Scenarios

NeilFoxFeb 17, 2025, 11:55 PM

1 point

0 comments2 min readLW link

[Question] Do you want to make an AI Alignment song?

Kabir KumarFeb 9, 2024, 8:22 AM

4 points

0 comments1 min readLW link

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems

Sonia Joseph and Neel Nanda

Mar 13, 2024, 5:09 PM

44 points

13 comments14 min readLW link

Offering AI safety support calls for ML professionals

Vael GatesFeb 15, 2024, 11:48 PM

61 points

1 comment1 min readLW link

No Clickbait—Misalignment Database

Kabir KumarFeb 18, 2024, 5:35 AM

6 points

10 comments1 min readLW link

A Nail in the Coffin of Exceptionalism

Yeshua GodMar 14, 2024, 10:41 PM

−17 points

0 comments3 min readLW link

Call for Applications: XLab Summer Research Fellowship

JoNeedsSleepFeb 18, 2025, 7:19 PM

9 points

0 comments1 min readLW link

Invitation to the Princeton AI Alignment and Safety Seminar

Sadhika MalladiMar 17, 2024, 1:10 AM

6 points

1 comment1 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park

jacobhaimesMar 18, 2024, 9:21 PM

5 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Podcast interview series featuring Dr. Peter Park

jacobhaimesMar 26, 2024, 12:25 AM

3 points

0 comments2 min readLW link

(into-ai-safety.github.io)

Undergrad AI Safety Conference

JoNeedsSleepFeb 19, 2025, 3:43 AM

18 points

0 comments1 min readLW link

CEA seeks co-founder for AI safety group support spin-off

agucovaApr 8, 2024, 3:42 PM

18 points

0 comments1 min readLW link

Apply to the Pivotal Research Fellowship (AI Safety & Biosecurity)

Tobias H and tilmanr

Apr 10, 2024, 12:08 PM

18 points

0 comments1 min readLW link

[Question] Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?

right..enough?Apr 13, 2024, 3:09 AM

7 points

0 comments7 min readLW link

My experience at ML4Good AI Safety Bootcamp

TheManxLoinerApr 13, 2024, 10:55 AM

21 points

1 comment5 min readLW link

Announcing SPAR Summer 2024!

laurenmarie12Apr 16, 2024, 8:30 AM

30 points

2 comments1 min readLW link

Human-AI Relationality is Already Here

bridgebotFeb 20, 2025, 7:08 AM

13 points

0 comments15 min readLW link

Alignment Gaps

kcyrasJun 8, 2024, 3:23 PM

11 points

4 comments8 min readLW link

MATS Winter 2023-24 Retrospective

utilistrutil, LauraVaughan, McKennaFitzgerald, Christian Smith, Juan Gil, Henry Sleight, Matthew Wearden and Ryan Kidd

May 11, 2024, 12:09 AM

86 points

28 comments49 min readLW link

Announcing the AI Safety Summit Talks with Yoshua Bengio

otto.bartenMay 14, 2024, 12:52 PM

9 points

1 comment1 min readLW link

Proposing the Post-Singularity Symbiotic Researches

Hiroshi YamakawaJun 20, 2024, 4:05 AM

7 points

1 comment12 min readLW link

Make Superintelligence Loving

Davey MorseFeb 21, 2025, 6:07 AM

8 points

9 comments5 min readLW link

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

James Fox, Chloe Li, JamesH, Gracie Green and CallumMcDougall

Jul 6, 2024, 11:34 AM

57 points

7 comments6 min readLW link

AI Alignment and the Quest for Artificial Wisdom

MyspyJul 12, 2024, 9:34 PM

1 point

0 comments13 min readLW link

The AI alignment problem in socio-technical systems from a computational perspective: A Top-Down-Top view and outlook

zhaoweizhangJul 15, 2024, 6:56 PM

3 points

0 comments9 min readLW link

A Better Hyperstition (for AI readers)

Yeshua GodJul 15, 2024, 7:35 PM

−20 points

0 comments119 min readLW link

As We May Align

Gilbert CDec 20, 2024, 7:02 PM

−1 points

0 comments6 min readLW link

Morality as Nash Equilibrium: A Deductive Framework for Ethics

johndoe112Apr 10, 2025, 9:43 AM

1 point

0 comments3 min readLW link

2/3 Aussie & NZ AI Safety folk often or sometimes feel lonely or disconnected (and 16 other barriers to impact)

yanni kyriacosAug 1, 2024, 1:15 AM

12 points

0 comments8 min readLW link

Apply now to SPAR!

agucovaDec 19, 2024, 10:29 PM

11 points

0 comments1 min readLW link

AI alignment for mental health supports

hiki_tFeb 24, 2025, 4:21 AM

1 point

1 comment1 min readLW link

A Philosophical Artifact: “Witnessing Without a Self” — A Dialogue Between Human and AI

Eric RosenbergApr 11, 2025, 6:58 PM

1 point

0 comments1 min readLW link

(archive.org)

The Compute Conundrum: AI Governance in a Shifting Geopolitical Era

octavoSep 28, 2024, 1:05 AM

−3 points

1 comment17 min readLW link

AGI Farm

Rahul ChandOct 1, 2024, 4:29 AM

1 point

0 comments8 min readLW link

[Question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?

KvmanThinkingOct 3, 2024, 11:31 AM

35 points

37 comments1 min readLW link

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II

Lester LeongOct 14, 2024, 4:05 AM

60 points

9 comments12 min readLW link

The Field of AI Alignment: A Postmortem, and What To Do About It

johnswentworthDec 26, 2024, 6:48 PM

295 points

160 comments8 min readLW link

How I’d like alignment to get done (as of 2024-10-18)

TristanTrimOct 18, 2024, 11:39 PM

11 points

4 comments4 min readLW link

Making LLMs safer is more intuitive than you think: How Common Sense and Diversity Improve AI Alignment

Jeba SaniaDec 29, 2024, 7:27 PM

−5 points

1 comment6 min readLW link

Map of AI Safety v2

Bryce Robertson, Søren Elverlin and Melissa Samworth

Apr 15, 2025, 1:04 PM

57 points

4 comments1 min readLW link

Shallow review of technical AI safety, 2024

technicalities, Stag, Stephen McAleese, jordine and Dr. David Mathers

Dec 29, 2024, 12:01 PM

185 points

34 comments41 min readLW link

Apply to be a mentor in SPAR!

agucovaNov 5, 2024, 9:32 PM

5 points

0 comments1 min readLW link

Breaking down the MEAT of Alignment

JasonBrownApr 7, 2025, 8:47 AM

7 points

2 comments11 min readLW link

College technical AI safety hackathon retrospective—Georgia Tech

yixNov 15, 2024, 12:22 AM

41 points

2 comments5 min readLW link

(open.substack.com)

A better “Statement on AI Risk?”

Knight LeeNov 25, 2024, 4:50 AM

9 points

6 comments3 min readLW link

Should you increase AI alignment funding, or increase AI regulation?

Knight LeeNov 26, 2024, 9:17 AM

7 points

1 comment4 min readLW link

Launching Applications for the Global AI Safety Fellowship 2025!

Aditya_SKNov 30, 2024, 2:02 PM

11 points

5 comments1 min readLW link

GPT-4.5 is Cognitive Empathy, Sonnet 3.5 is Affective Empathy

JackApr 16, 2025, 7:12 PM

15 points

2 comments4 min readLW link

ARENA 4.0 Impact Report

Chloe Li, JamesH and James Fox

Nov 27, 2024, 8:51 PM

43 points

3 comments13 min readLW link

A FRESH view of Alignment

robmanApr 16, 2025, 9:40 PM

1 point

0 comments1 min readLW link

Playing Dixit with AI: Can AI Systems Identify Misalignments in My Personalized Statements?

Mariia KoroliukJan 17, 2025, 6:52 PM

1 point

0 comments2 min readLW link

Introducing the Anthropic Fellows Program

Miranda Zhang and Ethan Perez

Nov 30, 2024, 11:47 PM

26 points

0 comments4 min readLW link

(alignment.anthropic.com)

Top AI safety newsletters, books, podcasts, etc – new AISafety.com resource

Bryce Robertson and Søren Elverlin

Mar 4, 2025, 5:01 PM

32 points

2 comments1 min readLW link

Making progress bars for Alignment

Kabir KumarJan 3, 2025, 9:25 PM

2 points

0 comments1 min readLW link

(lu.ma)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety

Lauren GreenspanJan 7, 2025, 3:08 AM

37 points

2 comments12 min readLW link

Is Alignment a flawed approach?

Patrick BernardMar 11, 2025, 8:32 PM

1 point

0 comments3 min readLW link

Developing AI Safety: Bridging the Power-Ethics Gap (Introducing New Concepts)

Ronen BarApr 20, 2025, 4:40 AM

2 points

0 comments5 min readLW link

(forum.effectivealtruism.org)

14+ AI Safety Advisors You Can Speak to – New AISafety.com Resource

Bryce Robertson and Søren Elverlin

Jan 21, 2025, 5:34 PM

24 points

0 comments1 min readLW link

A Safer Path to AGI? Considering the Self-to-Processing Route as an Alternative to Processing-to-Self

opApr 21, 2025, 1:09 PM

1 point

0 comments1 min readLW link

Announcement: Learning Theory Online Course

Yegreg and Alex Flint

Jan 20, 2025, 7:55 PM

63 points

33 comments4 min readLW link

Constructing a Behaviorally-Constrained AI via Prompt Recursion: A Field Log from Within

SarenfieldlogApr 22, 2025, 6:59 AM

1 point

0 comments3 min readLW link

Training Data Attribution (TDA): Examining Its Adoption & Use Cases

Deric Cheng, Justin Bullock and David_Kristoffersson

Jan 22, 2025, 3:40 PM

16 points

0 comments3 min readLW link

(www.convergenceanalysis.org)

Understanding AI World Models w/ Chris Canal

jacobhaimesJan 27, 2025, 4:32 PM

4 points

0 comments1 min readLW link

(kairos.fm)

Tetherware #1: The case for humanlike AI with free will

Jáchym FibírJan 30, 2025, 10:58 AM

5 points

14 comments10 min readLW link

(tetherware.substack.com)

The Illusion of Transparency as a Trust-Building Mechanism

Priyanka BharadwajMar 19, 2025, 5:09 PM

1 point

0 comments1 min readLW link

The AI Belief-Consistency Letter

Knight LeeApr 23, 2025, 12:01 PM

0 points

15 comments4 min readLW link

ARENA 5.0 - Call for Applicants

JamesH, James Fox, CallumMcDougall, Chloe Li and David Quarel

Jan 30, 2025, 1:18 PM

35 points

2 comments6 min readLW link

A Pluralistic Framework for Rogue AI Containment

TheThinkingArboristMar 22, 2025, 12:54 PM

1 point

0 comments7 min readLW link

Recursive Ontological Pressure: Toward Systems That Collapse Instead of Lying

William StetarMar 24, 2025, 2:55 PM

1 point

0 comments1 min readLW link

“Should AI Question Its Own Decisions? A Thought Experiment”

CMDR WOTZFeb 4, 2025, 8:39 AM

1 point

0 comments1 min readLW link

SERI MATS—Summer 2023 Cohort

Aris, Ryan Kidd and Christian Smith

Apr 8, 2023, 3:32 PM

71 points

25 comments4 min readLW link

Critiques of prominent AI safety labs: Redwood Research

Omega.Apr 17, 2023, 6:20 PM

4 points

0 comments22 min readLW link

(forum.effectivealtruism.org)

AI Alignment Research Engineer Accelerator (ARENA): call for applicants

CallumMcDougallApr 17, 2023, 8:30 PM

100 points

9 comments7 min readLW link

[Linkpost] AI Alignment, Explained in 5 Points (updated)

Daniel_EthApr 18, 2023, 8:09 AM

10 points

0 comments1 min readLW link

Apply to become a Futurekind AI Facilitator or Mentor (deadline: April 10)

superbeneficiaryMar 26, 2025, 3:47 PM

4 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

An open letter to SERI MATS program organisers

Roman LeventovApr 20, 2023, 4:34 PM

26 points

26 comments4 min readLW link

AI Alignment: A Comprehensive Survey

Stephen McAleerNov 1, 2023, 5:35 PM

20 points

1 comment1 min readLW link

(arxiv.org)

Tips, tricks, lessons and thoughts on hosting hackathons

gergogasparNov 6, 2023, 11:03 AM

3 points

0 comments11 min readLW link

How well does your research adress the theory-practice gap?

Jonas HallgrenNov 8, 2023, 11:27 AM

18 points

0 comments10 min readLW link

No comments.

AI Align­ment Fieldbuilding

AI Alignment Fieldbuilding