Iterated Amplification

TagLast edit: Jul 17, 2020, 6:41 AM by Ben Pace

Iterated Amplification is an approach to AI alignment, spearheaded by Paul Christiano. In this setup, we build powerful, aligned ML systems through a process of initially building weak aligned AIs, and recursively using each new AI to build a slightly smarter and still aligned AI.

Iterated Distillation and Amplification

Ajeya CotraNov 30, 2018, 4:47 AM

47 points

14 comments6 min readLW link

Challenges to Christiano’s capability amplification proposal

Eliezer YudkowskyMay 19, 2018, 6:18 PM

124 points

54 comments23 min readLW link 1 review

Paul’s research agenda FAQ

zhukeepaJul 1, 2018, 6:25 AM

128 points

74 comments19 min readLW link 1 review

A guide to Iterated Amplification & Debate

Rafael HarthNov 15, 2020, 5:14 PM

75 points

12 comments15 min readLW link

HCH and Adversarial Questions

David UdellFeb 19, 2022, 12:52 AM

15 points

7 comments26 min readLW link

AlphaGo Zero and capability amplification

paulfchristianoJan 9, 2019, 12:40 AM

33 points

23 comments2 min readLW link

Debate update: Obfuscated arguments problem

Beth BarnesDec 23, 2020, 3:24 AM

135 points

24 comments16 min readLW link

My Understanding of Paul Christiano’s Iterated Amplification AI Safety Research Agenda

Chi NguyenAug 15, 2020, 8:02 PM

120 points

20 comments39 min readLW link

Preface to the sequence on iterated amplification

paulfchristianoNov 10, 2018, 1:24 PM

44 points

8 comments3 min readLW link

[Question] Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?

MrThinkJul 2, 2024, 8:13 PM

4 points

23 comments1 min readLW link

[Question] How does iterated amplification exceed human abilities?

riceissaMay 2, 2020, 11:44 PM

19 points

9 comments2 min readLW link

[Question] Does iterated amplification tackle the inner alignment problem?

JanBFeb 15, 2020, 12:58 PM

7 points

4 comments1 min readLW link

Prize for probable problems

paulfchristianoMar 8, 2018, 4:58 PM

60 points

63 comments4 min readLW link

Approval-directed agents

paulfchristianoNov 22, 2018, 9:15 PM

31 points

10 comments15 min readLW link

Approval-directed bootstrapping

paulfchristianoNov 25, 2018, 11:18 PM

24 points

0 comments1 min readLW link

Humans Consulting HCH

paulfchristianoNov 25, 2018, 11:18 PM

39 points

9 comments1 min readLW link

Corrigibility

paulfchristianoNov 27, 2018, 9:50 PM

57 points

8 comments6 min readLW link

Benign model-free RL

paulfchristianoDec 2, 2018, 4:10 AM

15 points

1 comment7 min readLW link

Factored Cognition

stuhlmuellerDec 5, 2018, 1:01 AM

45 points

6 comments17 min readLW link

Supervising strong learners by amplifying weak experts

paulfchristianoJan 6, 2019, 7:00 AM

29 points

1 comment1 min readLW link

(arxiv.org)

Problems with Amplification/Distillation

Stuart_ArmstrongMar 27, 2018, 11:12 AM

29 points

7 comments10 min readLW link

Reinforcement Learning in the Iterated Amplification Framework

William_SFeb 9, 2019, 12:56 AM

25 points

12 comments4 min readLW link

[Question] Should AutoGPT update us towards researching IDA?

Michaël TrazziApr 12, 2023, 4:41 PM

15 points

5 comments1 min readLW link

A comment on the IDA-AlphaGoZero metaphor; capabilities versus alignment

AlexMennenJul 11, 2018, 1:03 AM

40 points

1 comment1 min readLW link

An overview of 11 proposals for building safe advanced AI

evhubMay 29, 2020, 8:38 PM

220 points

37 comments38 min readLW link 2 reviews

Writeup: Progress on AI Safety via Debate

Beth Barnes and paulfchristiano

Feb 5, 2020, 9:04 PM

103 points

18 comments33 min readLW link

My confusions with Paul’s Agenda

VaniverApr 20, 2018, 5:24 PM

37 points

1 comment6 min readLW link

Explanation of Paul’s AI-Alignment agenda by Ajeya Cotra

habrykaMar 5, 2018, 3:10 AM

20 points

0 comments1 min readLW link

(ai-alignment.com)

Understanding Iterated Distillation and Amplification: Claims and Oversight

William_SApr 17, 2018, 10:36 PM

35 points

30 comments9 min readLW link

Synthesizing amplification and debate

evhubFeb 5, 2020, 10:53 PM

33 points

10 comments4 min readLW link

Garrabrant and Shah on human modeling in AGI

Rob BensingerAug 4, 2021, 4:35 AM

60 points

10 comments47 min readLW link

My Overview of the AI Alignment Landscape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:44 PM

127 points

9 comments15 min readLW link

Capability amplification

paulfchristianoJan 20, 2019, 7:03 AM

24 points

8 comments13 min readLW link

Disagreement with Paul: alignment induction

Stuart_ArmstrongSep 10, 2018, 1:54 PM

31 points

6 comments1 min readLW link

A proposal for iterated interpretability with known-interpretable narrow AIs

Peter BerggrenJan 11, 2025, 2:43 PM

6 points

0 comments2 min readLW link

Thoughts on reward engineering

paulfchristianoJan 24, 2019, 8:15 PM

30 points

30 comments11 min readLW link

The reward engineering problem

paulfchristianoJan 16, 2019, 6:47 PM

26 points

3 comments7 min readLW link

[Question] What are the differences between all the iterative/recursive approaches to AI alignment?

riceissaSep 21, 2019, 2:09 AM

33 points

14 comments2 min readLW link

Directions and desiderata for AI alignment

paulfchristianoJan 13, 2019, 7:47 AM

48 points

1 comment14 min readLW link

Model splintering: moving from one imperfect model to another

Stuart_ArmstrongAug 27, 2020, 11:53 AM

79 points

10 comments33 min readLW link

Ought will host a factored cognition “Lab Meeting”

jungofthewon and stuhlmueller

Sep 9, 2022, 11:46 PM

35 points

1 comment1 min readLW link

AIS 101: Task decomposition for scalable oversight

Charbel-RaphaëlJul 25, 2023, 1:34 PM

27 points

0 comments19 min readLW link

(docs.google.com)

Can you force a neural network to keep generalizing?

Q HomeSep 12, 2022, 10:14 AM

2 points

10 comments5 min readLW link

Techniques for optimizing worst-case performance

paulfchristianoJan 28, 2019, 9:29 PM

23 points

12 comments8 min readLW link

Reliability amplification

paulfchristianoJan 31, 2019, 9:12 PM

24 points

3 comments7 min readLW link

Security amplification

paulfchristianoFeb 6, 2019, 5:28 PM

21 points

2 comments13 min readLW link

Meta-execution

paulfchristianoNov 1, 2018, 10:18 PM

20 points

1 comment5 min readLW link

Mapping the Conceptual Territory in AI Existential Safety and Alignment

jbkjrFeb 12, 2021, 7:55 AM

15 points

0 comments27 min readLW link

Imitative Generalisation (AKA ‘Learning the Prior’)

Beth BarnesJan 10, 2021, 12:30 AM

107 points

15 comments11 min readLW link 1 review

Three AI Safety Related Ideas

Wei DaiDec 13, 2018, 9:32 PM

69 points

38 comments2 min readLW link

Thoughts on Iterated Distillation and Amplification

WaddingtonMay 11, 2021, 9:32 PM

9 points

2 comments20 min readLW link

Notes on OpenAI’s alignment plan

Alex FlintDec 8, 2022, 7:13 PM

40 points

5 comments7 min readLW link

[Question] Is iterated amplification really more powerful than imitation?

ChantielAug 2, 2021, 11:20 PM

5 points

0 comments2 min readLW link

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher KingFeb 22, 2023, 4:49 PM

1 point

7 comments1 min readLW link

Making LLMs safer is more intuitive than you think: How Common Sense and Diversity Improve AI Alignment

Jeba SaniaDec 29, 2024, 7:27 PM

−5 points

1 comment6 min readLW link

Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM

58 points

0 comments59 min readLW link

Iterated Distillation-Amplification, Gato, and Proto-AGI [Re-Explained]

Gabe MMay 27, 2022, 5:42 AM

22 points

4 comments6 min readLW link

RAISE is launching their MVP

nullFeb 26, 2019, 11:45 AM

67 points

1 comment1 min readLW link

Amplification Discussion Notes

William_SJun 1, 2018, 7:03 PM

17 points

3 comments3 min readLW link

Machine Learning Projects on IDA

Owain_Evans, William_S and stuhlmueller

Jun 24, 2019, 6:38 PM

49 points

3 comments2 min readLW link

A general model of safety-oriented AI development

Wei DaiJun 11, 2018, 9:00 PM

65 points

8 comments1 min readLW link

Surprised by ELK report’s counterexample to Debate, IDA

Evan R. MurphyAug 4, 2022, 2:12 AM

18 points

0 comments5 min readLW link

[Question] What’s wrong with these analogies for understanding Informed Oversight and IDA?

Wei DaiMar 20, 2019, 9:11 AM

35 points

3 comments1 min readLW link

No comments.

Iter­ated Am­plifi­ca­tion

Iterated Amplification