Richard_Ngo comments on Against Almost Every Theory of Impact of Interpretability

Richard_Ngo 18 Aug 2023 4:03 UTC
2 points
−6
I agree that people who could do either good interpretability or conceptual work should focus on conceptual work
This seems like a false dichotomy; in general I expect that the best conceptual work will be done in close conjunction with interpretability work or other empirical work.
(In general I think that almost all attempts to do “conceptual” work that doesn’t involve either empirical results or proofs is pretty doomed. I’d be interested in any counterexamples you’ve seen; my main counterexample is threat modeling, which is why I’ve been focusing a lot on that lately.)
EDIT: many downvotes, no counterexamples. Please provide some.
- leogao 18 Aug 2023 4:35 UTC
  3 points
  2
  Parent
  I agree that doing conceptual work in conjunction with empirical work is good. I don’t know if I agree that pure conceptual work is completely doomed but I’m at least sympathetic. However, I think my point still stands: I think someone who can do conceptual+empirical work will probably have more impact doing that than not thinking about the conceptual side and just working really hard on conceptual work.
  1. They may find some other avenue of empirical work that can help with alignment. I think probably there exist empirical avenues substantially more valuable for alignment than making progress on interpretability and opening those up requires thinking about the conceptual side.
  2. Even if they think hard about it and can’t think of anything better than conceptual+interpretability, it still seems better for an interpretability researcher to have an idea of how their work will fit into the broader picture. Even if they aren’t backchaining, this still seems more useful than just randomly doing something under the heading of interpretability.
  - Richard_Ngo 18 Aug 2023 15:41 UTC
    4 points
    0
    Parent
    However, I think my point still stands: I think someone who can do conceptual+empirical work will probably have more impact doing that than not thinking about the conceptual side and just working really hard on conceptual work.
    (I assume that the last “conceptual” should be “empirical”.)
    I agree that not thinking about the conceptual side is bad. But that’s standard for science. Like, top scientists in almost any domain aren’t just thinking about their day-to-day empirical research, they have broader opinions about the field as a whole, and more speculative and philosophical ideas, and so on. The difference is whether they treat those ideas as outputs in their own right, versus as inputs that feed into some empirical or theoretical output. Most scientists do the latter; when people in alignment talk about “conceptual work” my impression is that they’re typically thinking about the former.
- Andrew McKnight 6 Nov 2024 22:19 UTC
  1 point
  0
  Parent
  Do you think putting extra effort into learning about existing empirical work while doing conceptual work would be sufficient for good conceptual work or do you think people need to be producing empirical work themselves to really make progress conceptually?
  - Richard_Ngo 6 Nov 2024 23:25 UTC
    5 points
    1
    Parent
    The former can be sufficient—e.g. there are good theoretical researchers who have never done empirical work themselves.
    In hindsight I think “close conjunction” was too strong—it’s more about picking up the ontologies and key insights from empirical work, which can be possible without following it very closely.