SAE steering doesn’t seem like it obviously beats other steering techniques in terms of usefulness. I haven’t looked closely into Hyena but my prior is that subquadratic attention papers probably suck unless proven otherwise.
Interpretability is certainly vastly more appealing to lab leadership than weird philosophy, but it’s vastly less appealing than RLHF. But there are many many ML flavored directions and only a few of them are any good, so it’s not surprising that most directions don’t get a lot of attention.
Probably as interp gets better it will start to be helpful for capabilities. I’m uncertain whether it will be more or less useful for capabilities than just working on capabilities directly; on the one hand, mechanistic understanding has historically underperformed as a research strategy, on the other hand it could be that this will change once we have a sufficiently good mechanistic understanding.
SAE steering doesn’t seem like it obviously beats other steering techniques in terms of usefulness. I haven’t looked closely into Hyena but my prior is that subquadratic attention papers probably suck unless proven otherwise.
Interpretability is certainly vastly more appealing to lab leadership than weird philosophy, but it’s vastly less appealing than RLHF. But there are many many ML flavored directions and only a few of them are any good, so it’s not surprising that most directions don’t get a lot of attention.
Probably as interp gets better it will start to be helpful for capabilities. I’m uncertain whether it will be more or less useful for capabilities than just working on capabilities directly; on the one hand, mechanistic understanding has historically underperformed as a research strategy, on the other hand it could be that this will change once we have a sufficiently good mechanistic understanding.
Are you talking about ML or in general? What are you deriving this from?
For ML, yes. I’m deriving this from the bitter lesson.