When A[n+1] is supposed to imitate the output of (H, A[n]), I think IDA is safe, because I think imitation is safe. (If A[0] is a rock and (H, A[n]) is a group of one human and two A[n]’s, then A[n] is basically imitating a group of 2n−1 humans). If (H, A[n]) is supposed to provide a reward signal to A[n+1], which A[n+1] tries to optimize, I think this version of IDA is unsafe, for reasons similar to what Wei Dai expressed in a comment (on a post I now can’t find) taking issue with the inductive step in the original argument. Can we standardize different names for these two designs? Unless, is the latter version deprecated?
When A[n+1] is supposed to imitate the output of (H, A[n]), I think IDA is safe, because I think imitation is safe. (If A[0] is a rock and (H, A[n]) is a group of one human and two A[n]’s, then A[n] is basically imitating a group of 2n−1 humans). If (H, A[n]) is supposed to provide a reward signal to A[n+1], which A[n+1] tries to optimize, I think this version of IDA is unsafe, for reasons similar to what Wei Dai expressed in a comment (on a post I now can’t find) taking issue with the inductive step in the original argument. Can we standardize different names for these two designs? Unless, is the latter version deprecated?