But this informs what kind of mechanical interpretability one should be doing: not just “random”, because “all problems in mechanical interpretability should be solved sooner or later” (no, at least not manually), but choosing the mechanical experiments to test particular predicts of this-or-that scientific theory of DNNs or Transformers.)
FWIW, my sense is that the projects people do are very often taken from Neel’s concrete open problems and are often meant for skilling up.
In traditional sciences, people often spend years learning to think better and use tools on problems that are “toy” and no serious researcher would work on, because this is how you improve and get to that point. I don’t think people should necessarily take years to get there, but it’s worth considering that upskilling is the primary motivation for much work.
If you think specific researchers aren’t testing, or working towards testing predictions about systems then this would be useful to discuss with them and to give specific feedback.
FWIW, my sense is that the projects people do are very often taken from Neel’s concrete open problems and are often meant for skilling up.
In traditional sciences, people often spend years learning to think better and use tools on problems that are “toy” and no serious researcher would work on, because this is how you improve and get to that point. I don’t think people should necessarily take years to get there, but it’s worth considering that upskilling is the primary motivation for much work.
The implication is that problems from Neel’s list are easier than mining interpretability observations that directly test predictions of some of the theories of DNNs or Transformers. I’m not sure if anyone tried to look directly at this question (also, these are not disjoint lists).
FWIW, my sense is that the projects people do are very often taken from Neel’s concrete open problems and are often meant for skilling up.
In traditional sciences, people often spend years learning to think better and use tools on problems that are “toy” and no serious researcher would work on, because this is how you improve and get to that point. I don’t think people should necessarily take years to get there, but it’s worth considering that upskilling is the primary motivation for much work.
If you think specific researchers aren’t testing, or working towards testing predictions about systems then this would be useful to discuss with them and to give specific feedback.
The implication is that problems from Neel’s list are easier than mining interpretability observations that directly test predictions of some of the theories of DNNs or Transformers. I’m not sure if anyone tried to look directly at this question (also, these are not disjoint lists).