leogao comments on A Rocket–Interpretability Analogy

leogao 21 Oct 2024 20:37 UTC
11 points
3
SAE steering doesn’t seem like it obviously beats other steering techniques in terms of usefulness. I haven’t looked closely into Hyena but my prior is that subquadratic attention papers probably suck unless proven otherwise.

Interpretability is certainly vastly more appealing to lab leadership than weird philosophy, but it’s vastly less appealing than RLHF. But there are many many ML flavored directions and only a few of them are any good, so it’s not surprising that most directions don’t get a lot of attention.

Probably as interp gets better it will start to be helpful for capabilities. I’m uncertain whether it will be more or less useful for capabilities than just working on capabilities directly; on the one hand, mechanistic understanding has historically underperformed as a research strategy, on the other hand it could be that this will change once we have a sufficiently good mechanistic understanding.
- 1stuserhere 22 Oct 2024 21:13 UTC
  1 point
  0
  Parent
  
  on the one hand, mechanistic understanding has historically underperformed as a research strategy,
  
  Are you talking about ML or in general? What are you deriving this from?
  - leogao 22 Oct 2024 21:33 UTC
    4 points
    0
    Parent
    For ML, yes. I’m deriving this from the bitter lesson.