I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers.
I thought he specifically mentioned “using AI as a microscope.”
Is that a real post, or am I misremembering this one?
I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers.
I thought he specifically mentioned “using AI as a microscope.”
Is that a real post, or am I misremembering this one?
https://www.lesswrong.com/posts/X2i9dQQK3gETCyqh2/chris-olah-s-views-on-agi-safety