Joshua Clancy comments on Should we publish mechanistic interpretability research?

Joshua Clancy 18 Feb 2024 2:02 UTC
1 point
0
I have a mechanistic interpretability paper I am working on / about to publish. It may qualify. Difficult to say. Currently, I think it would be better to be in the open. I kind of think of it as if… we were building bigger and bigger engines in cars without having invented the steering wheel (or perhaps windows?). I intend to post it to LessWrong / Alignment Forum. If the author gives me a link to that google doc group, I will send it there first. (Very possible it’s not all that, I might be wrong, humans naturally overestimate their own stuff, etc.)