Curated! To ramble a bit on why: I love how this post makes me feel like I have a good sense of what John has been up to, been thinking about, and why, the insight of asking “how would an AI ensure a child AI is aligned with it?” feels substantive, and the optimism is nice and doesn’t seem entirely foolhardy. Perhaps most significantly, it feels to me like a very big deal if alignment is moving towards something paradigmatic (should models and assumptions and questions and methods). I had thought that something we weren’t going to get, but John does point out that many people are converging on similar interpretability/abstraction targets, and now he does point it out, that seems true and hopeful. I’m not an alignment researcher myself, so I don’t put too much stock in my assessment, but this update is one of the most hopeful things I’ve read any time recently.
Curated! To ramble a bit on why: I love how this post makes me feel like I have a good sense of what John has been up to, been thinking about, and why, the insight of asking “how would an AI ensure a child AI is aligned with it?” feels substantive, and the optimism is nice and doesn’t seem entirely foolhardy. Perhaps most significantly, it feels to me like a very big deal if alignment is moving towards something paradigmatic (should models and assumptions and questions and methods). I had thought that something we weren’t going to get, but John does point out that many people are converging on similar interpretability/abstraction targets, and now he does point it out, that seems true and hopeful. I’m not an alignment researcher myself, so I don’t put too much stock in my assessment, but this update is one of the most hopeful things I’ve read any time recently.