RSS

Iter­ated Am­plifi­ca­tion

TagLast edit: Jul 17, 2020, 6:41 AM by Ben Pace

Iterated Amplification is an approach to AI alignment, spearheaded by Paul Christiano. In this setup, we build powerful, aligned ML systems through a process of initially building weak aligned AIs, and recursively using each new AI to build a slightly smarter and still aligned AI.

See also: Factored cognition.

Iter­ated Distil­la­tion and Amplification

Ajeya CotraNov 30, 2018, 4:47 AM
47 points
14 comments6 min readLW link

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Eliezer YudkowskyMay 19, 2018, 6:18 PM
124 points
54 comments23 min readLW link1 review

Paul’s re­search agenda FAQ

zhukeepaJul 1, 2018, 6:25 AM
128 points
74 comments19 min readLW link1 review

A guide to Iter­ated Am­plifi­ca­tion & Debate

Rafael HarthNov 15, 2020, 5:14 PM
75 points
12 comments15 min readLW link

HCH and Ad­ver­sar­ial Questions

David UdellFeb 19, 2022, 12:52 AM
15 points
7 comments26 min readLW link

AlphaGo Zero and ca­pa­bil­ity amplification

paulfchristianoJan 9, 2019, 12:40 AM
33 points
23 comments2 min readLW link

De­bate up­date: Obfus­cated ar­gu­ments problem

Beth BarnesDec 23, 2020, 3:24 AM
135 points
24 comments16 min readLW link

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

Chi NguyenAug 15, 2020, 8:02 PM
120 points
20 comments39 min readLW link

Pre­face to the se­quence on iter­ated amplification

paulfchristianoNov 10, 2018, 1:24 PM
44 points
8 comments3 min readLW link

[Question] Why Can’t Sub-AGI Solve AI Align­ment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThinkJul 2, 2024, 8:13 PM
4 points
23 comments1 min readLW link

[Question] How does iter­ated am­plifi­ca­tion ex­ceed hu­man abil­ities?

riceissaMay 2, 2020, 11:44 PM
19 points
9 comments2 min readLW link

[Question] Does iter­ated am­plifi­ca­tion tackle the in­ner al­ign­ment prob­lem?

JanBFeb 15, 2020, 12:58 PM
7 points
4 comments1 min readLW link

Prize for prob­a­ble problems

paulfchristianoMar 8, 2018, 4:58 PM
60 points
63 comments4 min readLW link

Ap­proval-di­rected agents

paulfchristianoNov 22, 2018, 9:15 PM
31 points
10 comments15 min readLW link

Ap­proval-di­rected bootstrapping

paulfchristianoNov 25, 2018, 11:18 PM
24 points
0 comments1 min readLW link

Hu­mans Con­sult­ing HCH

paulfchristianoNov 25, 2018, 11:18 PM
39 points
9 comments1 min readLW link

Corrigibility

paulfchristianoNov 27, 2018, 9:50 PM
57 points
8 comments6 min readLW link

Benign model-free RL

paulfchristianoDec 2, 2018, 4:10 AM
15 points
1 comment7 min readLW link

Fac­tored Cognition

stuhlmuellerDec 5, 2018, 1:01 AM
45 points
6 comments17 min readLW link

Su­per­vis­ing strong learn­ers by am­plify­ing weak experts

paulfchristianoJan 6, 2019, 7:00 AM
29 points
1 comment1 min readLW link
(arxiv.org)

Prob­lems with Am­plifi­ca­tion/​Distillation

Stuart_ArmstrongMar 27, 2018, 11:12 AM
29 points
7 comments10 min readLW link

Re­in­force­ment Learn­ing in the Iter­ated Am­plifi­ca­tion Framework

William_SFeb 9, 2019, 12:56 AM
25 points
12 comments4 min readLW link

[Question] Should Au­toGPT up­date us to­wards re­search­ing IDA?

Michaël TrazziApr 12, 2023, 4:41 PM
15 points
5 comments1 min readLW link

A com­ment on the IDA-AlphaGoZero metaphor; ca­pa­bil­ities ver­sus alignment

AlexMennenJul 11, 2018, 1:03 AM
40 points
1 comment1 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhubMay 29, 2020, 8:38 PM
220 points
36 comments38 min readLW link2 reviews

Wri­teup: Progress on AI Safety via Debate

Feb 5, 2020, 9:04 PM
102 points
18 comments33 min readLW link

My con­fu­sions with Paul’s Agenda

VaniverApr 20, 2018, 5:24 PM
37 points
1 comment6 min readLW link

Ex­pla­na­tion of Paul’s AI-Align­ment agenda by Ajeya Cotra

habrykaMar 5, 2018, 3:10 AM
20 points
0 comments1 min readLW link
(ai-alignment.com)

Un­der­stand­ing Iter­ated Distil­la­tion and Am­plifi­ca­tion: Claims and Oversight

William_SApr 17, 2018, 10:36 PM
35 points
30 comments9 min readLW link

Syn­the­siz­ing am­plifi­ca­tion and debate

evhubFeb 5, 2020, 10:53 PM
33 points
10 comments4 min readLW link

Garrabrant and Shah on hu­man mod­el­ing in AGI

Rob BensingerAug 4, 2021, 4:35 AM
60 points
10 comments47 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:44 PM
127 points
9 comments15 min readLW link

Ca­pa­bil­ity amplification

paulfchristianoJan 20, 2019, 7:03 AM
24 points
8 comments13 min readLW link

Disagree­ment with Paul: al­ign­ment induction

Stuart_ArmstrongSep 10, 2018, 1:54 PM
31 points
6 comments1 min readLW link

A pro­posal for iter­ated in­ter­pretabil­ity with known-in­ter­pretable nar­row AIs

Peter BerggrenJan 11, 2025, 2:43 PM
6 points
0 comments2 min readLW link

Thoughts on re­ward en­g­ineer­ing

paulfchristianoJan 24, 2019, 8:15 PM
30 points
30 comments11 min readLW link

The re­ward en­g­ineer­ing prob­lem

paulfchristianoJan 16, 2019, 6:47 PM
26 points
3 comments7 min readLW link

[Question] What are the differ­ences be­tween all the iter­a­tive/​re­cur­sive ap­proaches to AI al­ign­ment?

riceissaSep 21, 2019, 2:09 AM
33 points
14 comments2 min readLW link

Direc­tions and desider­ata for AI alignment

paulfchristianoJan 13, 2019, 7:47 AM
48 points
1 comment14 min readLW link

Model splin­ter­ing: mov­ing from one im­perfect model to another

Stuart_ArmstrongAug 27, 2020, 11:53 AM
79 points
10 comments33 min readLW link

Ought will host a fac­tored cog­ni­tion “Lab Meet­ing”

Sep 9, 2022, 11:46 PM
35 points
1 comment1 min readLW link

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-RaphaëlJul 25, 2023, 1:34 PM
27 points
0 comments19 min readLW link
(docs.google.com)

Can you force a neu­ral net­work to keep gen­er­al­iz­ing?

Q HomeSep 12, 2022, 10:14 AM
2 points
10 comments5 min readLW link

Tech­niques for op­ti­miz­ing worst-case performance

paulfchristianoJan 28, 2019, 9:29 PM
23 points
12 comments8 min readLW link

Reli­a­bil­ity am­plifi­ca­tion

paulfchristianoJan 31, 2019, 9:12 PM
24 points
3 comments7 min readLW link

Se­cu­rity amplification

paulfchristianoFeb 6, 2019, 5:28 PM
21 points
2 comments13 min readLW link

Meta-execution

paulfchristianoNov 1, 2018, 10:18 PM
20 points
1 comment5 min readLW link

Map­ping the Con­cep­tual Ter­ri­tory in AI Ex­is­ten­tial Safety and Alignment

jbkjrFeb 12, 2021, 7:55 AM
15 points
0 comments27 min readLW link

Imi­ta­tive Gen­er­al­i­sa­tion (AKA ‘Learn­ing the Prior’)

Beth BarnesJan 10, 2021, 12:30 AM
107 points
15 comments11 min readLW link1 review

Three AI Safety Re­lated Ideas

Wei DaiDec 13, 2018, 9:32 PM
69 points
38 comments2 min readLW link

Thoughts on Iter­ated Distil­la­tion and Amplification

WaddingtonMay 11, 2021, 9:32 PM
9 points
2 comments20 min readLW link

Notes on OpenAI’s al­ign­ment plan

Alex FlintDec 8, 2022, 7:13 PM
40 points
5 comments7 min readLW link

[Question] Is iter­ated am­plifi­ca­tion re­ally more pow­er­ful than imi­ta­tion?

ChantielAug 2, 2021, 11:20 PM
5 points
0 comments2 min readLW link

Is there a ML agent that aban­dons it’s util­ity func­tion out-of-dis­tri­bu­tion with­out los­ing ca­pa­bil­ities?

Christopher KingFeb 22, 2023, 4:49 PM
1 point
7 comments1 min readLW link

Mak­ing LLMs safer is more in­tu­itive than you think: How Com­mon Sense and Diver­sity Im­prove AI Align­ment

Jeba SaniaDec 29, 2024, 7:27 PM
−5 points
1 comment6 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM
58 points
0 comments59 min readLW link

Iter­ated Distil­la­tion-Am­plifi­ca­tion, Gato, and Proto-AGI [Re-Ex­plained]

Gabe MMay 27, 2022, 5:42 AM
22 points
4 comments6 min readLW link

RAISE is launch­ing their MVP

nullFeb 26, 2019, 11:45 AM
67 points
1 comment1 min readLW link

Am­plifi­ca­tion Dis­cus­sion Notes

William_SJun 1, 2018, 7:03 PM
17 points
3 comments3 min readLW link

Ma­chine Learn­ing Pro­jects on IDA

Jun 24, 2019, 6:38 PM
49 points
3 comments2 min readLW link

A gen­eral model of safety-ori­ented AI development

Wei DaiJun 11, 2018, 9:00 PM
65 points
8 comments1 min readLW link

Sur­prised by ELK re­port’s coun­terex­am­ple to De­bate, IDA

Evan R. MurphyAug 4, 2022, 2:12 AM
18 points
0 comments5 min readLW link

[Question] What’s wrong with these analo­gies for un­der­stand­ing In­formed Over­sight and IDA?

Wei DaiMar 20, 2019, 9:11 AM
35 points
3 comments1 min readLW link
No comments.