RSS

AI Suc­cess Models

TagLast edit: Nov 17, 2021, 11:17 PM by plex

AI Success Models are proposed paths to an existential win via aligned AI. They are (so far) high level overviews and won’t contain all the details, but present at least a sketch of what a full solution might look like. They can be contrasted with threat models, which are stories about how AI might lead to major problems.

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven ByrnesApr 8, 2021, 3:14 PM
63 points
7 comments26 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhubMay 29, 2020, 8:38 PM
214 points
36 comments38 min readLW link2 reviews

A pos­i­tive case for how we might suc­ceed at pro­saic AI alignment

evhubNov 16, 2021, 1:49 AM
81 points
46 comments6 min readLW link

Con­ver­sa­tion with Eliezer: What do you want the sys­tem to do?

AkashJun 25, 2022, 5:36 PM
113 points
38 comments2 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM
58 points
0 comments59 min readLW link

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

KriegerOct 2, 2022, 9:53 AM
8 points
6 comments1 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM
31 points
23 comments10 min readLW link

Gaia Net­work: a prac­ti­cal, in­cre­men­tal path­way to Open Agency Architecture

Dec 20, 2023, 5:11 PM
22 points
8 comments16 min readLW link

Four vi­sions of Trans­for­ma­tive AI success

Steven ByrnesJan 17, 2024, 8:45 PM
112 points
22 comments15 min readLW link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman LeventovJan 18, 2024, 10:05 AM
5 points
2 comments4 min readLW link

Gra­di­ent Des­cent on the Hu­man Brain

Apr 1, 2024, 10:39 PM
58 points
5 comments2 min readLW link

Against blan­ket ar­gu­ments against interpretability

Dmitry VaintrobJan 22, 2025, 9:46 AM
50 points
4 comments7 min readLW link

AI Safety “Suc­cess Sto­ries”

Wei DaiSep 7, 2019, 2:54 AM
125 points
27 comments4 min readLW link1 review

Var­i­ous Align­ment Strate­gies (and how likely they are to work)

Logan ZoellnerMay 3, 2022, 4:54 PM
84 points
34 comments11 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
59 points
8 comments20 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidadDec 20, 2022, 1:04 PM
80 points
22 comments4 min readLW link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

HoldenKarnofskyMar 14, 2023, 7:23 PM
76 points
17 comments15 min readLW link

An AI-in-a-box suc­cess model

azsantoskApr 11, 2022, 10:28 PM
16 points
1 comment10 min readLW link

How Might an Align­ment At­trac­tor Look like?

ShmiApr 28, 2022, 6:46 AM
47 points
15 comments2 min readLW link

Towards Hodge-podge Alignment

Cleo NardoDec 19, 2022, 8:12 PM
95 points
30 comments9 min readLW link

In­tro­duc­tion to the se­quence: In­ter­pretabil­ity Re­search for the Most Im­por­tant Century

Evan R. MurphyMay 12, 2022, 7:59 PM
16 points
0 comments8 min readLW link

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland BarstadJun 21, 2022, 12:36 PM
13 points
7 comments9 min readLW link

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM
15 points
5 comments22 min readLW link

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

Jul 12, 2022, 8:11 PM
50 points
0 comments1 min readLW link
(docs.google.com)

Gaia Net­work: An Illus­trated Primer

Jan 18, 2024, 6:23 PM
3 points
2 comments15 min readLW link

AI Safety Endgame Stories

Ivan VendrovSep 28, 2022, 4:58 PM
31 points
11 comments11 min readLW link

The Dou­ble Body Paradigm: What Comes After ASI Align­ment?

De_Carvalho_LoickDec 14, 2024, 6:09 PM
1 point
0 comments6 min readLW link

[Question] What Does AI Align­ment Suc­cess Look Like?

ShmiOct 20, 2022, 12:32 AM
23 points
7 comments1 min readLW link

What suc­cess looks like

Jun 28, 2022, 2:38 PM
19 points
4 comments1 min readLW link
(forum.effectivealtruism.org)

Pos­si­ble miracles

Oct 9, 2022, 6:17 PM
64 points
34 comments8 min readLW link

AI Safety via Luck

JozdienApr 1, 2023, 8:13 PM
81 points
7 comments11 min readLW link

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM
10 points
5 comments45 min readLW link

[Question] If AGI were com­ing in a year, what should we do?

MichaelStJulesApr 1, 2022, 12:41 AM
20 points
16 comments1 min readLW link