Assume you had a tool that basically allows to you explain the entire network, every circuit and mechanism, etc. The tool spits out explanations that are easy to understand and easy to connect to specific parts of the network, e.g. attention head x is doing y. Would you publish this tool to the entire world or keep it private or semi-private for a while?
I think this case is unclear, but also not central because I’m imagining the primary benefit of publishing interp research as being making interp research go faster, and this seems like you’ve basically “solved interp”, so the benefits no longer really apply?
Similarly, if you thought that you should publish capabilities research to accelerate to AGI, and you found out how to build AGI, then whether you should publish is not really relevant anymore.
Just to get some intuitions.
Assume you had a tool that basically allows to you explain the entire network, every circuit and mechanism, etc. The tool spits out explanations that are easy to understand and easy to connect to specific parts of the network, e.g. attention head x is doing y. Would you publish this tool to the entire world or keep it private or semi-private for a while?
I think this case is unclear, but also not central because I’m imagining the primary benefit of publishing interp research as being making interp research go faster, and this seems like you’ve basically “solved interp”, so the benefits no longer really apply?
Similarly, if you thought that you should publish capabilities research to accelerate to AGI, and you found out how to build AGI, then whether you should publish is not really relevant anymore.