Iterated Amplification

Oct 29, 2018, 1:26 PM

This is a sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification.

Preface to the sequence on iterated amplification

paulfchristianoNov 10, 2018, 1:24 PM

44 points

0. Problem statement

The first part of this sequence clarifies the problem that iterated amplification is trying to solve, which is both narrower and broader than you might expect.

The Steering Problem

paulfchristianoNov 13, 2018, 5:14 PM

43 points

12 comments7 min readLW link

Clarifying “AI Alignment”

paulfchristianoNov 15, 2018, 2:41 PM

66 points

84 comments3 min readLW link 2 reviews

An unaligned benchmark

paulfchristianoNov 17, 2018, 3:51 PM

31 points

0 comments9 min readLW link

Prosaic AI alignment

paulfchristianoNov 20, 2018, 1:56 PM

47 points

10 comments8 min readLW link

1. Basic intuition

The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

Approval-directed agents

paulfchristianoNov 22, 2018, 9:15 PM

31 points

10 comments15 min readLW link

Approval-directed bootstrapping

paulfchristianoNov 25, 2018, 11:18 PM

24 points

0 comments1 min readLW link

Humans Consulting HCH

paulfchristianoNov 25, 2018, 11:18 PM

39 points

9 comments1 min readLW link

Corrigibility

paulfchristianoNov 27, 2018, 9:50 PM

57 points

8 comments6 min readLW link

2. The scheme

The core of the sequence is the third section. Benign model-free RL describes iterated amplification, as a general outline into which we can substitute arbitrary algorithms for reward learning, amplification, and robustness. The first four posts all describe variants of this idea from different perspectives, and if you find that one of those descriptions is clearest for you then I recommend focusing on that one and skimming the others.

Iterated Distillation and Amplification

Ajeya CotraNov 30, 2018, 4:47 AM

47 points

14 comments6 min readLW link

Benign model-free RL

paulfchristianoDec 2, 2018, 4:10 AM

15 points

1 comment7 min readLW link

Factored Cognition

stuhlmuellerDec 5, 2018, 1:01 AM

45 points

6 comments17 min readLW link

Supervising strong learners by amplifying weak experts

paulfchristianoJan 6, 2019, 7:00 AM

29 points

1 comment1 min readLW link

(arxiv.org)

AlphaGo Zero and capability amplification

paulfchristianoJan 9, 2019, 12:40 AM

33 points

23 comments2 min readLW link

3. What needs doing

The fourth part of the sequence describes some of the black boxes in iterated amplification and discusses what we would need to do to fill in those boxes. I think these are some of the most important open questions in AI alignment.

Directions and desiderata for AI alignment

paulfchristianoJan 13, 2019, 7:47 AM

48 points

1 comment14 min readLW link

The reward engineering problem

paulfchristianoJan 16, 2019, 6:47 PM

26 points

3 comments7 min readLW link

Capability amplification

paulfchristianoJan 20, 2019, 7:03 AM

24 points

8 comments13 min readLW link

Learning with catastrophes

paulfchristianoJan 23, 2019, 3:01 AM

27 points

9 comments4 min readLW link

4. Possible approaches

The fifth section of the sequence breaks down some of these problems further and describes some possible approaches.

Thoughts on reward engineering

paulfchristianoJan 24, 2019, 8:15 PM

30 points

30 comments11 min readLW link

Techniques for optimizing worst-case performance

paulfchristianoJan 28, 2019, 9:29 PM

23 points

12 comments8 min readLW link

Reliability amplification

paulfchristianoJan 31, 2019, 9:12 PM

24 points

3 comments7 min readLW link

Security amplification

paulfchristianoFeb 6, 2019, 5:28 PM

21 points

2 comments13 min readLW link

Meta-execution

paulfchristianoNov 1, 2018, 10:18 PM

20 points

1 comment5 min readLW link