Peter S. Park comments on Current themes in mechanistic interpretability research

Peter S. Park 17 Nov 2022 11:29 UTC
3 points
2
Thank you very much for the detailed and insightful post, Lee, Sid, and Beren! I really appreciate it.
In the spirit of full communication, I’m writing to share my recent argument that mechanistic interpretability may not be a reliable safety plan for AGI-scale models.
It would be really helpful to hear your thoughts on it!