LawrenceC comments on Analogies between Software Reverse Engineering and Mechanistic Interpretability

LawrenceC 26 Dec 2022 15:57 UTC
LW: 16 AF: 10
7
AF
The distinction between “newbies get caught up trying to understand every detail, experts think in higher-level abstractions, make educated guesses, and only zoom in on the details that matter” felt super interesting and surprising to me.
I claim that this is 1) an instance of a common pattern that 2) is currently missing a step (the pre-newbie stage).
The general pattern is the following (terminology borrowed from Terry Tao):
1. The pre-rigorous stage: Really new people don’t know how ~anything works in a field, and so use high-level abstractions that aren’t necessarily grounded in reality.
2. The rigorous stage: Intermediate people learn the concrete grounding behind a field, but get bogged down in minutia.
3. The post rigorous stage: Experts form correct high level abstractions informed by their understanding of the grounding, but still use the grounding when the high level abstractions break down.
I think that many experts mainly notice the 2->3 transition, but not the 1->2 one, and so often dissuade newbies by encouraging them to not work in the rigorous stage. I claim this is really, really bad, and that a solid understanding of the rigorous stage is a good idea for ~everyone doing technical work.
Here’s a few examples:
- Terry Tao talks about this in math: Early students start out not knowing what a proof is and have to manipulate high-level, handwavy concepts. More advanced students learn the rigorous foundations behind various fields of math but are encouraged to focus on the formalism as opposed to ‘focusing too much on what such objects actually “mean”’. Eventually, mathematicians are able to revisit their early intuitions and focus on the big picture, converting their intuitions to rigorous arguments when needed.
- The exact same thing is true in almost discipline with mathematical proofs, e.g. physics or theoretical CS.
- This happens in a very similar with programming as well.
- In many strategy games (e.g. Chess), you see crazy high level strategizing at the super low and high levels, while the middle levels are focused on improving technique.
- I’d also claim that something similar happens in psychology: freshmen undergrads come up with grand theories of human cognition that are pretty detached from reality, many intermediate researchers get bogged down in experimental results, while the good psychology researchers form high level representations and theories based on their knowledge of experiments (and are able to trivially translate intuitions into empirical claims).
What links here?
- Touch reality as soon as possible (when doing machine learning research) by LawrenceC (3 Jan 2023 19:11 UTC; 116 points)
- Analogies between Software Reverse Engineering and Mechanistic Interpretability by Neel Nanda (26 Dec 2022 12:26 UTC; 34 points)
- Itay Yona 27 Dec 2022 22:19 UTC
  5 points
  0
  Parent
  I strongly agree! When you study towards RE it is critical to understand lots of details about how the machine works, and most people I knew were already familiar with those. They were lacking the skills of using their low-level understanding to actually conduct useful research effectively.
  It is natural to pay much less attention to 1->2 phase since there are much more intermediate researchers than complete newbies or experts. It is interesting because when discussing with the intermediate researchers they might think they are discussing with person 1 instead of person 3.
  
  Thanks you gave me something to think about :)
- Neel Nanda 26 Dec 2022 16:12 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I like the analogy! I hadn’t explicitly made the connection, but strongly agree (both that this is an important general phenomena, and that it specifically applies here). Though I’m pretty unsure how much I/other MI researchers are in 1 vs 3 when we try to reason about systems!
  
  To be clear, I definitely do not want to suggest that people don’t try to rigorously reverse engineer systems a bunch, and be super details oriented. Linked to your comment in the post.