riceissa comments on Iterated Distillation and Amplification

riceissa 30 Aug 2019 5:47 UTC
6 points
Based on discussion between Vladimir Slepnev and Paul in this thread, it seems like statements in this post (“we assume that A[0] can acquire nearly human-level capabilities through this process”, “Given an aligned agent H we can use narrow safe learning techniques to train a much faster agent A which behaves as H would have behaved”) that the first stage of IDA will produce nearly-human-level assistants are misleading. In the same thread, Paul says that he “will probably correct it”, but as far as I can tell, neither the Medium post nor the version of the post in this sequence (which was published after the discussion) has been corrected.
What links here?
- riceissa's comment on List of resolved confusions about IDA by Wei Dai (30 Sep 2019 22:49 UTC; 8 points)