This is Eliezer’s description of the core insight behind Paul’s imitative amplification proposal. I find this somewhat compelling, but less so than I used to, since I’ve realized that the line between imitation learning and reinforcement learning is blurrier than I used to think (e.g. see this or this).
I didn’t understand what you mean by the line being blurrier… Is this a comment about what works in practice for imitation learning? Does a similar objection apply if we replace imitation
I didn’t understand what you mean by the line being blurrier… Is this a comment about what works in practice for imitation learning? Does a similar objection apply if we replace imitation
learning with behavioral cloning?