michaelcohen comments on Iterated Distillation and Amplification

michaelcohen 26 Jan 2021 15:27 UTC
LW: 1 AF: 1
AF
When A[n+1] is supposed to imitate the output of (H, A[n]), I think IDA is safe, because I think imitation is safe. (If A[0] is a rock and (H, A[n]) is a group of one human and two A[n]’s, then A[n] is basically imitating a group of $2^{n} - 1$ humans). If (H, A[n]) is supposed to provide a reward signal to A[n+1], which A[n+1] tries to optimize, I think this version of IDA is unsafe, for reasons similar to what Wei Dai expressed in a comment (on a post I now can’t find) taking issue with the inductive step in the original argument. Can we standardize different names for these two designs? Unless, is the latter version deprecated?