RSS

AI Suc­cess Models

TagLast edit: 17 Nov 2021 23:17 UTC by plex

AI Success Models are proposed paths to an existential win via aligned AI. They are (so far) high level overviews and won’t contain all the details, but present at least a sketch of what a full solution might look like. They can be contrasted with threat models, which are stories about how AI might lead to major problems.

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven Byrnes8 Apr 2021 15:14 UTC
63 points
7 comments26 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
212 points
36 comments38 min readLW link2 reviews

A pos­i­tive case for how we might suc­ceed at pro­saic AI alignment

evhub16 Nov 2021 1:49 UTC
81 points
46 comments6 min readLW link

Con­ver­sa­tion with Eliezer: What do you want the sys­tem to do?

Akash25 Jun 2022 17:36 UTC
113 points
38 comments2 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
58 points
0 comments59 min readLW link

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

Krieger2 Oct 2022 9:53 UTC
8 points
6 comments1 min readLW link

a nar­ra­tive ex­pla­na­tion of the QACI al­ign­ment plan

Tamsin Leake15 Feb 2023 3:28 UTC
56 points
29 comments6 min readLW link
(carado.moe)

How Would an Utopia-Max­i­mizer Look Like?

Thane Ruthenis20 Dec 2023 20:01 UTC
31 points
23 comments10 min readLW link

Gaia Net­work: a prac­ti­cal, in­cre­men­tal path­way to Open Agency Architecture

20 Dec 2023 17:11 UTC
22 points
8 comments16 min readLW link

Four vi­sions of Trans­for­ma­tive AI success

Steven Byrnes17 Jan 2024 20:45 UTC
112 points
22 comments15 min readLW link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC
5 points
2 comments4 min readLW link

Gra­di­ent Des­cent on the Hu­man Brain

1 Apr 2024 22:39 UTC
52 points
5 comments2 min readLW link

AI Safety “Suc­cess Sto­ries”

Wei Dai7 Sep 2019 2:54 UTC
125 points
27 comments4 min readLW link1 review

Var­i­ous Align­ment Strate­gies (and how likely they are to work)

Logan Zoellner3 May 2022 16:54 UTC
84 points
34 comments11 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
59 points
8 comments20 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidad20 Dec 2022 13:04 UTC
79 points
22 comments4 min readLW link

for­mal al­ign­ment: what it is, and some proposals

Tamsin Leake29 Jan 2023 11:32 UTC
53 points
3 comments1 min readLW link
(carado.moe)

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

HoldenKarnofsky14 Mar 2023 19:23 UTC
76 points
17 comments15 min readLW link

How Might an Align­ment At­trac­tor Look like?

Shmi28 Apr 2022 6:46 UTC
47 points
15 comments2 min readLW link

Towards Hodge-podge Alignment

Cleo Nardo19 Dec 2022 20:12 UTC
93 points
30 comments9 min readLW link

AI Safety via Luck

Jozdien1 Apr 2023 20:13 UTC
81 points
7 comments11 min readLW link

In­tro­duc­tion to the se­quence: In­ter­pretabil­ity Re­search for the Most Im­por­tant Century

Evan R. Murphy12 May 2022 19:59 UTC
16 points
0 comments8 min readLW link

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland Barstad21 Jun 2022 12:36 UTC
13 points
7 comments9 min readLW link

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland Barstad9 Jul 2022 14:42 UTC
15 points
5 comments22 min readLW link

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

12 Jul 2022 20:11 UTC
50 points
0 comments1 min readLW link
(docs.google.com)

Gaia Net­work: An Illus­trated Primer

18 Jan 2024 18:23 UTC
3 points
2 comments15 min readLW link

AI Safety Endgame Stories

Ivan Vendrov28 Sep 2022 16:58 UTC
31 points
11 comments11 min readLW link

[Question] What Does AI Align­ment Suc­cess Look Like?

Shmi20 Oct 2022 0:32 UTC
23 points
7 comments1 min readLW link

What suc­cess looks like

28 Jun 2022 14:38 UTC
19 points
4 comments1 min readLW link
(forum.effectivealtruism.org)

Pos­si­ble miracles

9 Oct 2022 18:17 UTC
64 points
34 comments8 min readLW link

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland Barstad13 Dec 2022 2:17 UTC
10 points
5 comments45 min readLW link

[Question] If AGI were com­ing in a year, what should we do?

MichaelStJules1 Apr 2022 0:41 UTC
20 points
16 comments1 min readLW link

An AI-in-a-box suc­cess model

azsantosk11 Apr 2022 22:28 UTC
16 points
1 comment10 min readLW link