Thank you very much for the detailed and insightful post, Lee, Sid, and Beren! I really appreciate it.
In the spirit of full communication, I’m writing to share my recent argument that mechanistic interpretability may not be a reliable safety plan for AGI-scale models.
It would be really helpful to hear your thoughts on it!
Thank you very much for the detailed and insightful post, Lee, Sid, and Beren! I really appreciate it.
In the spirit of full communication, I’m writing to share my recent argument that mechanistic interpretability may not be a reliable safety plan for AGI-scale models.
It would be really helpful to hear your thoughts on it!