RSS

AI Align­ment Fieldbuilding

TagLast edit: 15 Jun 2022 22:42 UTC by plex

AI Alignment Fieldbuilding is the effort to improve the alignment ecosystem. Some priorities include introducing new people to the importance of AI risk, on-boarding them by connecting them with key resources and ideas, educating them on existing literature and methods for generating new and valuable research, supporting people who are contributing, and maintaining and improving the funding systems.

There is an invite-only Slack for people working on the alignment ecosystem. If you’d like to join message plex with an overview of your involvement.

The in­or­di­nately slow spread of good AGI con­ver­sa­tions in ML

Rob Bensinger21 Jun 2022 16:09 UTC
173 points
62 comments8 min readLW link

[Question] Papers to start get­ting into NLP-fo­cused al­ign­ment research

Feraidoon24 Sep 2022 23:53 UTC
6 points
0 comments1 min readLW link

ML Align­ment The­ory Pro­gram un­der Evan Hubinger

6 Dec 2021 0:03 UTC
82 points
3 comments2 min readLW link

Shal­low re­view of live agen­das in al­ign­ment & safety

27 Nov 2023 11:10 UTC
322 points
69 comments29 min readLW link

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilan5 Nov 2022 23:40 UTC
73 points
10 comments6 min readLW link1 review
(danielfilan.com)

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane Ruthenis19 Dec 2023 20:09 UTC
67 points
11 comments1 min readLW link

Talk: AI safety field­build­ing at MATS

Ryan Kidd23 Jun 2024 23:06 UTC
26 points
2 comments10 min readLW link

AI Safety Ar­gu­ments: An In­ter­ac­tive Guide

Lukas Trötzmüller1 Feb 2023 19:26 UTC
20 points
0 comments3 min readLW link

aisafety.com­mu­nity—A liv­ing doc­u­ment of AI safety communities

28 Oct 2022 17:50 UTC
58 points
23 comments1 min readLW link

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

27 Oct 2022 1:32 UTC
135 points
14 comments12 min readLW link

[Question] What are all the AI Align­ment and AI Safety Com­mu­ni­ca­tion Hubs?

Gunnar_Zarncke15 Jun 2022 16:16 UTC
27 points
5 comments1 min readLW link

Mid­dle Child Phenomenon

PhilosophicalSoul15 Mar 2024 20:47 UTC
3 points
3 comments2 min readLW link

so you think you’re not qual­ified to do tech­ni­cal al­ign­ment re­search?

Tamsin Leake7 Feb 2023 1:54 UTC
55 points
7 comments1 min readLW link
(carado.moe)

De­mys­tify­ing “Align­ment” through a Comic

milanrosko9 Jun 2024 8:24 UTC
106 points
19 comments1 min readLW link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC
33 points
2 comments1 min readLW link

Qual­ities that al­ign­ment men­tors value in ju­nior researchers

Akash14 Feb 2023 23:27 UTC
88 points
14 comments3 min readLW link

Why I funded PIBBSS

Ryan Kidd15 Sep 2024 19:56 UTC
115 points
21 comments3 min readLW link

Prob­lems of peo­ple new to AI safety and my pro­ject ideas to miti­gate them

Igor Ivanov1 Mar 2023 9:09 UTC
38 points
4 comments7 min readLW link

AI Safety Un­con­fer­ence NeurIPS 2022

Orpheus7 Nov 2022 15:39 UTC
25 points
0 comments1 min readLW link
(aisafetyevents.org)

Most Peo­ple Start With The Same Few Bad Ideas

johnswentworth9 Sep 2022 0:29 UTC
164 points
30 comments3 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimi20 Jul 2022 10:44 UTC
87 points
11 comments8 min readLW link

Tran­scripts of in­ter­views with AI researchers

Vael Gates9 May 2022 5:57 UTC
170 points
9 comments2 min readLW link

AI Safety Europe Re­treat 2023 Retrospective

Magdalena Wache14 Apr 2023 9:05 UTC
43 points
0 comments2 min readLW link

The Align­ment Com­mu­nity Is Cul­turally Broken

sudo13 Nov 2022 18:53 UTC
136 points
68 comments2 min readLW link

What I Learned Run­ning Refine

adamShimi24 Nov 2022 14:49 UTC
108 points
5 comments4 min readLW link

The AI Safety com­mu­nity has four main work groups, Strat­egy, Gover­nance, Tech­ni­cal and Move­ment Building

peterslattery25 Nov 2022 3:45 UTC
1 point
0 comments6 min readLW link

Align­ment Me­gapro­jects: You’re Not Even Try­ing to Have Ideas

Nicholas / Heather Kross12 Jul 2023 23:39 UTC
55 points
30 comments2 min readLW link

Four vi­sions of Trans­for­ma­tive AI success

Steven Byrnes17 Jan 2024 20:45 UTC
112 points
22 comments15 min readLW link

[Question] Can AI Align­ment please cre­ate a Red­dit-like plat­form that would make it much eas­ier for al­ign­ment re­searchers to find and help each other?

Georgeo5721 Jul 2023 14:03 UTC
−5 points
2 comments1 min readLW link

How to find AI al­ign­ment re­searchers to col­lab­o­rate with?

Florian Dietz31 Jul 2023 9:05 UTC
2 points
2 comments1 min readLW link

Reflec­tions on the PIBBSS Fel­low­ship 2022

11 Dec 2022 21:53 UTC
32 points
0 comments18 min readLW link

AI Safety Hub Ser­bia Soft Launch

DusanDNesic20 Oct 2023 7:11 UTC
65 points
1 comment3 min readLW link
(forum.effectivealtruism.org)

When dis­cussing AI risks, talk about ca­pa­bil­ities, not intelligence

Vika11 Aug 2023 13:38 UTC
116 points
7 comments3 min readLW link
(vkrakovna.wordpress.com)

AI Safety Move­ment Builders should help the com­mu­nity to op­ti­mise three fac­tors: con­trib­u­tors, con­tri­bu­tions and coordination

peterslattery15 Dec 2022 22:50 UTC
4 points
0 comments6 min readLW link

Re­view of Align­ment Plan Cri­tiques- De­cem­ber AI-Plans Cri­tique-a-Thon Re­sults

Iknownothing15 Jan 2024 19:37 UTC
24 points
0 comments25 min readLW link
(aiplans.substack.com)

[Question] In­cen­tives af­fect­ing al­ign­ment-re­searcher encouragement

Nicholas / Heather Kross29 Aug 2023 5:11 UTC
28 points
3 comments1 min readLW link

Con­crete Steps to Get Started in Trans­former Mechanis­tic Interpretability

Neel Nanda25 Dec 2022 22:21 UTC
56 points
7 comments12 min readLW link
(www.neelnanda.io)

An overview of some promis­ing work by ju­nior al­ign­ment researchers

Akash26 Dec 2022 17:23 UTC
34 points
0 comments4 min readLW link

In­for­ma­tion war­fare his­tor­i­cally re­volved around hu­man conduits

trevor28 Aug 2023 18:54 UTC
37 points
7 comments3 min readLW link

Reflec­tions on my 5-month al­ign­ment up­skil­ling grant

Jay Bailey27 Dec 2022 10:51 UTC
82 points
4 comments8 min readLW link

AI Safety Strate­gies Landscape

Charbel-Raphaël9 May 2024 17:33 UTC
34 points
1 comment42 min readLW link

Aus­tralian AI Safety Fo­rum 2024

27 Sep 2024 0:40 UTC
42 points
0 comments2 min readLW link

MATS Alumni Im­pact Analysis

30 Sep 2024 2:35 UTC
61 points
6 comments11 min readLW link

AI Safety Univer­sity Or­ga­niz­ing: Early Take­aways from Thir­teen Groups

agucova2 Oct 2024 15:14 UTC
24 points
0 comments1 min readLW link

AGISF adap­ta­tion for in-per­son groups

13 Jan 2023 3:24 UTC
44 points
2 comments3 min readLW link

[Question] Work­shop (hackathon, res­i­dence pro­gram, etc.) about for-profit AI Safety pro­jects?

Roman Leventov26 Jan 2024 9:49 UTC
21 points
5 comments1 min readLW link

[Question] If there was a mil­len­nium equiv­a­lent prize for AI al­ign­ment, what would the prob­lems be?

Yair Halberstadt9 Jun 2022 16:56 UTC
17 points
4 comments1 min readLW link

How many peo­ple are work­ing (di­rectly) on re­duc­ing ex­is­ten­tial risk from AI?

Benjamin Hilton18 Jan 2023 8:46 UTC
20 points
1 comment1 min readLW link

If no near-term al­ign­ment strat­egy, re­search should aim for the long-term

harsimony9 Jun 2022 19:10 UTC
7 points
1 comment1 min readLW link

AGI safety field build­ing pro­jects I’d like to see

Severin T. Seehrich19 Jan 2023 22:40 UTC
68 points
28 comments9 min readLW link

There Should Be More Align­ment-Driven Startups

31 May 2024 2:05 UTC
60 points
14 comments11 min readLW link

Assess­ment of AI safety agen­das: think about the down­side risk

Roman Leventov19 Dec 2023 9:00 UTC
13 points
1 comment1 min readLW link

Tokyo AI Safety 2025: Call For Papers

Blaine21 Oct 2024 8:43 UTC
24 points
0 comments3 min readLW link
(www.tais2025.cc)

2022 AI Align­ment Course: 5→37% work­ing on AI safety

Dewi21 Jun 2024 17:45 UTC
7 points
3 comments3 min readLW link

Many im­por­tant tech­nolo­gies start out as sci­ence fic­tion be­fore be­com­ing real

trevor10 Feb 2023 9:36 UTC
28 points
2 comments2 min readLW link

2025 Q1 Pivotal Re­search Fel­low­ship (Tech­ni­cal & Policy)

12 Nov 2024 10:56 UTC
6 points
0 comments2 min readLW link

Good News, Every­one!

jbash25 Mar 2023 13:48 UTC
131 points
23 comments2 min readLW link

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

mic1 Jul 2022 3:59 UTC
14 points
0 comments11 min readLW link

Are AI de­vel­op­ers play­ing with fire?

marcusarvan16 Mar 2023 19:12 UTC
6 points
0 comments10 min readLW link

How Josiah be­came an AI safety researcher

Neil Crawford6 Sep 2022 17:17 UTC
4 points
0 comments1 min readLW link

Cam­paign for AI Safety: Please join me

Nik Samoylov1 Apr 2023 9:32 UTC
18 points
9 comments1 min readLW link

80,000 hours should re­move OpenAI from the Job Board (and similar EA orgs should do similarly)

Raemon3 Jul 2024 20:34 UTC
272 points
71 comments1 min readLW link

AGI safety ca­reer advice

Richard_Ngo2 May 2023 7:36 UTC
132 points
24 comments13 min readLW link

AI al­ign­ment as “nav­i­gat­ing the space of in­tel­li­gent be­havi­our”

Nora_Ammann23 Aug 2022 13:28 UTC
18 points
0 comments6 min readLW link

AI Safety in China: Part 2

Lao Mein22 May 2023 14:50 UTC
95 points
28 comments2 min readLW link

[Question] Help me find a good Hackathon sub­ject

Charbel-Raphaël4 Sep 2022 8:40 UTC
6 points
18 comments1 min readLW link

AISafety.world is a map of the AIS ecosystem

Hamish Doodles6 Apr 2023 18:37 UTC
79 points
0 comments1 min readLW link

[An email with a bunch of links I sent an ex­pe­rienced ML re­searcher in­ter­ested in learn­ing about Align­ment /​ x-safety.]

David Scott Krueger (formerly: capybaralet)8 Sep 2022 22:28 UTC
47 points
1 comment5 min readLW link

Sur­vey for al­ign­ment re­searchers!

2 Feb 2024 20:41 UTC
71 points
11 comments1 min readLW link

Ad­vice for new al­ign­ment peo­ple: Info Max

Jonas Hallgren30 May 2023 15:42 UTC
27 points
4 comments5 min readLW link

Pro­ject Idea: Challenge Groups for Align­ment Researchers

Adam Zerner27 May 2023 20:10 UTC
13 points
0 comments1 min readLW link

All images from the WaitButWhy se­quence on AI

trevor8 Apr 2023 7:36 UTC
73 points
5 comments2 min readLW link

An­nounc­ing AISIC 2022 - the AI Safety Is­rael Con­fer­ence, Oc­to­ber 19-20

Davidmanheim21 Sep 2022 19:32 UTC
13 points
0 comments1 min readLW link

AISafety.info “How can I help?” FAQ

5 Jun 2023 22:09 UTC
59 points
0 comments2 min readLW link

Les­sons learned from talk­ing to >100 aca­demics about AI safety

Marius Hobbhahn10 Oct 2022 13:16 UTC
216 points
18 comments12 min readLW link1 review

[Question] Does any­one’s full-time job in­clude read­ing and un­der­stand­ing all the most-promis­ing for­mal AI al­ign­ment work?

Nicholas / Heather Kross16 Jun 2023 2:24 UTC
15 points
2 comments1 min readLW link

The Vi­talik Bu­terin Fel­low­ship in AI Ex­is­ten­tial Safety is open for ap­pli­ca­tions!

Cynthia Chen13 Oct 2022 18:32 UTC
21 points
0 comments1 min readLW link

AI Safety Needs Great Product Builders

goodgravy2 Nov 2022 11:33 UTC
14 points
2 comments1 min readLW link

In­tro­duc­ing EffiS­ciences’ AI Safety Unit

30 Jun 2023 7:44 UTC
68 points
0 comments12 min readLW link

A new­comer’s guide to the tech­ni­cal AI safety field

zeshen4 Nov 2022 14:29 UTC
42 points
3 comments10 min readLW link

Cost-effec­tive­ness of pro­fes­sional field-build­ing pro­grams for AI safety research

Dan H10 Jul 2023 18:28 UTC
8 points
5 comments1 min readLW link

On pre­sent­ing the case for AI risk

Aryeh Englander9 Mar 2022 1:41 UTC
54 points
17 comments4 min readLW link

A Quick List of Some Prob­lems in AI Align­ment As A Field

Nicholas / Heather Kross21 Jun 2022 23:23 UTC
75 points
12 comments6 min readLW link
(www.thinkingmuchbetter.com)

[LQ] Some Thoughts on Mes­sag­ing Around AI Risk

DragonGod25 Jun 2022 13:53 UTC
5 points
3 comments6 min readLW link

Refram­ing the AI Risk

Thane Ruthenis1 Jul 2022 18:44 UTC
26 points
7 comments6 min readLW link

The Tree of Life: Stan­ford AI Align­ment The­ory of Change

Gabe M2 Jul 2022 18:36 UTC
25 points
0 comments14 min readLW link

Prin­ci­ples for Align­ment/​Agency Projects

johnswentworth7 Jul 2022 2:07 UTC
122 points
20 comments4 min readLW link

Re­shap­ing the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC
147 points
35 comments21 min readLW link

Prin­ci­ples of Pri­vacy for Align­ment Research

johnswentworth27 Jul 2022 19:53 UTC
72 points
31 comments7 min readLW link

An­nounc­ing the AI Safety Field Build­ing Hub, a new effort to provide AISFB pro­jects, men­tor­ship, and funding

Vael Gates28 Jul 2022 21:29 UTC
49 points
3 comments6 min readLW link

(My un­der­stand­ing of) What Every­one in Tech­ni­cal Align­ment is Do­ing and Why

29 Aug 2022 1:23 UTC
413 points
90 comments37 min readLW link1 review

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford6 Sep 2022 17:17 UTC
6 points
0 comments4 min readLW link

[Question] How can we se­cure more re­search po­si­tions at our uni­ver­si­ties for x-risk re­searchers?

Neil Crawford6 Sep 2022 17:17 UTC
11 points
0 comments1 min readLW link

AI Safety field-build­ing pro­jects I’d like to see

Akash11 Sep 2022 23:43 UTC
46 points
8 comments6 min readLW link

Gen­eral ad­vice for tran­si­tion­ing into The­o­ret­i­cal AI Safety

Martín Soto15 Sep 2022 5:23 UTC
12 points
0 comments10 min readLW link

Ap­ply for men­tor­ship in AI Safety field-building

Akash17 Sep 2022 19:06 UTC
9 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Align­ment Org Cheat Sheet

20 Sep 2022 17:36 UTC
70 points
8 comments4 min readLW link

7 traps that (we think) new al­ign­ment re­searchers of­ten fall into

27 Sep 2022 23:13 UTC
176 points
10 comments4 min readLW link

Re­sources that (I think) new al­ign­ment re­searchers should know about

Akash28 Oct 2022 22:13 UTC
69 points
9 comments4 min readLW link

[Question] Are al­ign­ment re­searchers de­vot­ing enough time to im­prov­ing their re­search ca­pac­ity?

Carson Jones4 Nov 2022 0:58 UTC
13 points
3 comments3 min readLW link

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

16 Nov 2022 14:14 UTC
89 points
2 comments12 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan Kidd5 Dec 2022 2:26 UTC
78 points
40 comments2 min readLW link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash Jafari5 Dec 2022 19:21 UTC
11 points
2 comments5 min readLW link

Fear miti­gated the nu­clear threat, can it do the same to AGI risks?

Igor Ivanov9 Dec 2022 10:04 UTC
6 points
8 comments5 min readLW link

Ques­tions about AI that bother me

Eleni Angelou5 Feb 2023 5:04 UTC
13 points
6 comments2 min readLW link

Ex­is­ten­tial AI Safety is NOT sep­a­rate from near-term applications

scasper13 Dec 2022 14:47 UTC
37 points
17 comments3 min readLW link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC
21 points
9 comments2 min readLW link
(forum.effectivealtruism.org)

There have been 3 planes (billion­aire donors) and 2 have crashed

trevor17 Dec 2022 3:58 UTC
16 points
10 comments2 min readLW link

Why I think that teach­ing philos­o­phy is high impact

Eleni Angelou19 Dec 2022 3:11 UTC
5 points
0 comments2 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC
31 points
38 comments9 min readLW link

Air-gap­ping eval­u­a­tion and support

Ryan Kidd26 Dec 2022 22:52 UTC
53 points
1 comment2 min readLW link

What AI Safety Ma­te­ri­als Do ML Re­searchers Find Com­pel­ling?

28 Dec 2022 2:03 UTC
175 points
34 comments2 min readLW link

Thoughts On Ex­pand­ing the AI Safety Com­mu­nity: Benefits and Challenges of Outreach to Non-Tech­ni­cal Professionals

Yashvardhan Sharma1 Jan 2023 19:21 UTC
4 points
4 comments7 min readLW link

Align­ment, Anger, and Love: Prepar­ing for the Emer­gence of Su­per­in­tel­li­gent AI

tavurth2 Jan 2023 6:16 UTC
2 points
3 comments1 min readLW link

Into AI Safety: Epi­sode 3

jacobhaimes11 Dec 2023 16:30 UTC
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Look­ing for Span­ish AI Align­ment Researchers

Antb7 Jan 2023 18:52 UTC
7 points
3 comments1 min readLW link

The Align­ment Prob­lem from a Deep Learn­ing Per­spec­tive (ma­jor rewrite)

10 Jan 2023 16:06 UTC
84 points
8 comments39 min readLW link
(arxiv.org)

An­nounc­ing aisafety.training

JJ Hepburn21 Jan 2023 1:01 UTC
61 points
4 comments1 min readLW link

An­nounc­ing Cavendish Labs

19 Jan 2023 20:15 UTC
59 points
5 comments2 min readLW link
(forum.effectivealtruism.org)

How Do We Pro­tect AI From Hu­mans?

Alex Beyman22 Jan 2023 3:59 UTC
−4 points
11 comments6 min readLW link

A Brief Overview of AI Safety/​Align­ment Orgs, Fields, Re­searchers, and Re­sources for ML Researchers

Austin Witte2 Feb 2023 1:02 UTC
18 points
1 comment2 min readLW link

In­ter­views with 97 AI Re­searchers: Quan­ti­ta­tive Analysis

2 Feb 2023 1:01 UTC
23 points
0 comments7 min readLW link

Pre­dict­ing re­searcher in­ter­est in AI alignment

Vael Gates2 Feb 2023 0:58 UTC
25 points
0 comments1 min readLW link

“AI Risk Dis­cus­sions” web­site: Ex­plor­ing in­ter­views from 97 AI Researchers

2 Feb 2023 1:00 UTC
43 points
1 comment1 min readLW link

Ret­ro­spec­tive on the AI Safety Field Build­ing Hub

Vael Gates2 Feb 2023 2:06 UTC
30 points
0 comments1 min readLW link

You are prob­a­bly not a good al­ign­ment re­searcher, and other blatant lies

junk heap homotopy2 Feb 2023 13:55 UTC
83 points
16 comments2 min readLW link

AGI doesn’t need un­der­stand­ing, in­ten­tion, or con­scious­ness in or­der to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC
10 points
2 comments18 min readLW link

Aspiring AI safety re­searchers should ~argmax over AGI timelines

Ryan Kidd3 Mar 2023 2:04 UTC
29 points
8 comments2 min readLW link

The hu­man­ity’s biggest mistake

RomanS10 Mar 2023 16:30 UTC
0 points
1 comment2 min readLW link

[Question] I have thou­sands of copies of HPMOR in Rus­sian. How to use them with the most im­pact?

Mikhail Samin3 Jan 2023 10:21 UTC
26 points
3 comments1 min readLW link

Some for-profit AI al­ign­ment org ideas

Eric Ho14 Dec 2023 14:23 UTC
84 points
19 comments9 min readLW link

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromem22 May 2024 11:09 UTC
28 points
6 comments5 min readLW link

AI Safety Chatbot

21 Dec 2023 14:06 UTC
61 points
11 comments4 min readLW link

Ta­lent Needs of Tech­ni­cal AI Safety Teams

24 May 2024 0:36 UTC
115 points
64 comments14 min readLW link

INTERVIEW: StakeOut.AI w/​ Dr. Peter Park

jacobhaimes4 Mar 2024 16:35 UTC
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Strik­ing Im­pli­ca­tions for Learn­ing The­ory, In­ter­pretabil­ity — and Safety?

RogerDearnaley5 Jan 2024 8:46 UTC
37 points
4 comments2 min readLW link

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC
11 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Ap­ply to the PIBBSS Sum­mer Re­search Fellowship

12 Jan 2024 4:06 UTC
39 points
1 comment2 min readLW link

So­cial me­dia al­ign­ment test

amayhew16 Jan 2024 20:56 UTC
1 point
0 comments1 min readLW link
(naiveskepticblog.wordpress.com)

This might be the last AI Safety Camp

24 Jan 2024 9:33 UTC
195 points
34 comments1 min readLW link

Pro­posal for an AI Safety Prize

sweenesm31 Jan 2024 18:35 UTC
3 points
0 comments2 min readLW link

[Question] Do you want to make an AI Align­ment song?

Kabir Kumar9 Feb 2024 8:22 UTC
4 points
0 comments1 min readLW link

Lay­ing the Foun­da­tions for Vi­sion and Mul­ti­modal Mechanis­tic In­ter­pretabil­ity & Open Problems

13 Mar 2024 17:09 UTC
44 points
13 comments14 min readLW link

Offer­ing AI safety sup­port calls for ML professionals

Vael Gates15 Feb 2024 23:48 UTC
61 points
1 comment1 min readLW link

No Click­bait—Misal­ign­ment Database

Kabir Kumar18 Feb 2024 5:35 UTC
6 points
10 comments1 min readLW link

A Nail in the Coffin of Exceptionalism

Yeshua God14 Mar 2024 22:41 UTC
−17 points
0 comments3 min readLW link

In­vi­ta­tion to the Prince­ton AI Align­ment and Safety Seminar

Sadhika Malladi17 Mar 2024 1:10 UTC
6 points
1 comment1 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC
5 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)

CEA seeks co-founder for AI safety group sup­port spin-off

agucova8 Apr 2024 15:42 UTC
18 points
0 comments1 min readLW link

Ap­ply to the Pivotal Re­search Fel­low­ship (AI Safety & Biose­cu­rity)

10 Apr 2024 12:08 UTC
18 points
0 comments1 min readLW link

[Question] Bar­cod­ing LLM Train­ing Data Sub­sets. Any­one try­ing this for in­ter­pretabil­ity?

right..enough?13 Apr 2024 3:09 UTC
7 points
0 comments7 min readLW link

My ex­pe­rience at ML4Good AI Safety Bootcamp

TheManxLoiner13 Apr 2024 10:55 UTC
20 points
0 comments5 min readLW link

An­nounc­ing SPAR Sum­mer 2024!

laurenmarie1216 Apr 2024 8:30 UTC
30 points
2 comments1 min readLW link

Align­ment Gaps

kcyras8 Jun 2024 15:23 UTC
10 points
3 comments8 min readLW link

MATS Win­ter 2023-24 Retrospective

11 May 2024 0:09 UTC
84 points
28 comments49 min readLW link

An­nounc­ing the AI Safety Sum­mit Talks with Yoshua Bengio

otto.barten14 May 2024 12:52 UTC
9 points
1 comment1 min readLW link

Propos­ing the Post-Sin­gu­lar­ity Sym­biotic Researches

Hiroshi Yamakawa20 Jun 2024 4:05 UTC
5 points
0 comments12 min readLW link

AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): Call for ap­pli­cants v4.0

6 Jul 2024 11:34 UTC
57 points
7 comments6 min readLW link

AI Align­ment and the Quest for Ar­tifi­cial Wisdom

Myspy12 Jul 2024 21:34 UTC
1 point
0 comments13 min readLW link

The AI al­ign­ment prob­lem in so­cio-tech­ni­cal sys­tems from a com­pu­ta­tional per­spec­tive: A Top-Down-Top view and outlook

zhaoweizhang15 Jul 2024 18:56 UTC
3 points
0 comments9 min readLW link

A Bet­ter Hyper­sti­tion (for AI read­ers)

Yeshua God15 Jul 2024 19:35 UTC
−20 points
0 comments119 min readLW link

2/​3 Aussie & NZ AI Safety folk of­ten or some­times feel lonely or dis­con­nected (and 16 other bar­ri­ers to im­pact)

yanni kyriacos1 Aug 2024 1:15 UTC
12 points
0 comments8 min readLW link

The Com­pute Co­nun­drum: AI Gover­nance in a Shift­ing Geopoli­ti­cal Era

octavo28 Sep 2024 1:05 UTC
−3 points
1 comment17 min readLW link

AGI Farm

Rahul Chand1 Oct 2024 4:29 UTC
1 point
0 comments8 min readLW link

[Question] If I have some money, whom should I donate it to in or­der to re­duce ex­pected P(doom) the most?

KvmanThinking3 Oct 2024 11:31 UTC
34 points
36 comments1 min readLW link

AI Align­ment via Slow Sub­strates: Early Em­piri­cal Re­sults With StarCraft II

Lester Leong14 Oct 2024 4:05 UTC
60 points
9 comments12 min readLW link

How I’d like al­ign­ment to get done (as of 2024-10-18)

TristanTrim18 Oct 2024 23:39 UTC
12 points
2 comments4 min readLW link

Ap­ply to be a men­tor in SPAR!

agucova5 Nov 2024 21:32 UTC
5 points
0 comments1 min readLW link

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yix15 Nov 2024 0:22 UTC
33 points
2 comments5 min readLW link
(open.substack.com)

SERI MATS—Sum­mer 2023 Cohort

8 Apr 2023 15:32 UTC
71 points
25 comments4 min readLW link

Cri­tiques of promi­nent AI safety labs: Red­wood Research

Omega.17 Apr 2023 18:20 UTC
2 points
0 comments22 min readLW link
(forum.effectivealtruism.org)

AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): call for applicants

CallumMcDougall17 Apr 2023 20:30 UTC
100 points
9 comments7 min readLW link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_Eth18 Apr 2023 8:09 UTC
10 points
0 comments1 min readLW link

An open let­ter to SERI MATS pro­gram organisers

Roman Leventov20 Apr 2023 16:34 UTC
26 points
26 comments4 min readLW link

AI Align­ment: A Com­pre­hen­sive Survey

Stephen McAleer1 Nov 2023 17:35 UTC
15 points
1 comment1 min readLW link
(arxiv.org)

Tips, tricks, les­sons and thoughts on host­ing hackathons

gergogaspar6 Nov 2023 11:03 UTC
3 points
0 comments11 min readLW link

How well does your re­search adress the the­ory-prac­tice gap?

Jonas Hallgren8 Nov 2023 11:27 UTC
18 points
0 comments10 min readLW link

An­nounc­ing Athena—Women in AI Align­ment Research

Claire Short7 Nov 2023 21:46 UTC
80 points
2 comments3 min readLW link

Into AI Safety Epi­sodes 1 & 2

jacobhaimes9 Nov 2023 4:36 UTC
2 points
0 comments1 min readLW link
(into-ai-safety.github.io)

The So­cial Align­ment Problem

irving28 Apr 2023 14:16 UTC
98 points
13 comments8 min readLW link

[Question] AI Safety orgs- what’s your biggest bot­tle­neck right now?

Kabir Kumar16 Nov 2023 2:02 UTC
1 point
0 comments1 min readLW link

1. A Sense of Fair­ness: De­con­fus­ing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC
16 points
8 comments15 min readLW link

4. A Mo­ral Case for Evolved-Sapi­ence-Chau­vinism

RogerDearnaley24 Nov 2023 4:56 UTC
10 points
0 comments4 min readLW link

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC
21 points
5 comments8 min readLW link

2. AIs as Eco­nomic Agents

RogerDearnaley23 Nov 2023 7:07 UTC
9 points
2 comments6 min readLW link

[SEE NEW EDITS] No, *You* Need to Write Clearer

Nicholas / Heather Kross29 Apr 2023 5:04 UTC
261 points
65 comments5 min readLW link
(www.thinkingmuchbetter.com)

Ap­pen­dices to the live agendas

27 Nov 2023 11:10 UTC
16 points
4 comments1 min readLW link

MATS Sum­mer 2023 Retrospective

1 Dec 2023 23:29 UTC
77 points
34 comments26 min readLW link

What’s new at FAR AI

4 Dec 2023 21:18 UTC
41 points
0 comments5 min readLW link
(far.ai)

How I learned to stop wor­ry­ing and love skill trees

junk heap homotopy23 May 2023 4:08 UTC
81 points
2 comments1 min readLW link

AI Safety Papers: An App for the TAI Safety Database

ozziegooen21 Aug 2021 2:02 UTC
81 points
13 comments2 min readLW link

Wikipe­dia as an in­tro­duc­tion to the al­ign­ment problem

SoerenMind29 May 2023 18:43 UTC
83 points
10 comments1 min readLW link
(en.wikipedia.org)

Terry Tao is host­ing an “AI to As­sist Math­e­mat­i­cal Rea­son­ing” workshop

junk heap homotopy3 Jun 2023 1:19 UTC
12 points
1 comment1 min readLW link
(terrytao.wordpress.com)

An overview of the points system

Iknownothing27 Jun 2023 9:09 UTC
3 points
4 comments1 min readLW link
(ai-plans.com)

Brief sum­mary of ai-plans.com

Iknownothing28 Jun 2023 0:33 UTC
9 points
4 comments2 min readLW link
(ai-plans.com)

What is ev­ery­one do­ing in AI governance

Igor Ivanov8 Jul 2023 15:16 UTC
10 points
0 comments5 min readLW link

Even briefer sum­mary of ai-plans.com

Iknownothing16 Jul 2023 23:25 UTC
10 points
6 comments2 min readLW link
(www.ai-plans.com)

Su­per­vised Pro­gram for Align­ment Re­search (SPAR) at UC Berkeley: Spring 2023 summary

19 Aug 2023 2:27 UTC
20 points
2 comments6 min readLW link

Look­ing for judges for cri­tiques of Align­ment Plans

Iknownothing17 Aug 2023 22:35 UTC
5 points
0 comments1 min readLW link

Be­come a PIBBSS Re­search Affiliate

10 Oct 2023 7:41 UTC
24 points
6 comments6 min readLW link

ARENA 2.0 - Im­pact Report

CallumMcDougall26 Sep 2023 17:13 UTC
35 points
5 comments13 min readLW link

Cat­a­lyst books

Catnee17 Sep 2023 17:05 UTC
7 points
2 comments1 min readLW link

Doc­u­ment­ing Jour­ney Into AI Safety

jacobhaimes10 Oct 2023 18:30 UTC
17 points
4 comments6 min readLW link

Ap­ply for MATS Win­ter 2023-24!

21 Oct 2023 2:27 UTC
104 points
6 comments5 min readLW link

Into AI Safety—Epi­sode 0

jacobhaimes22 Oct 2023 3:30 UTC
5 points
1 comment1 min readLW link
(into-ai-safety.github.io)

Re­sources I send to AI re­searchers about AI safety

Vael Gates14 Jun 2022 2:24 UTC
69 points
12 comments1 min readLW link

Slow mo­tion videos as AI risk in­tu­ition pumps

Andrew_Critch14 Jun 2022 19:31 UTC
238 points
41 comments2 min readLW link1 review

Slide deck: In­tro­duc­tion to AI Safety

Aryeh Englander29 Jan 2020 15:57 UTC
23 points
0 comments1 min readLW link
(drive.google.com)
No comments.