I don’t think we really engaged with that question in this post, so the following is fairly speculative. But I think there’s some situations where this would be a superior technique, mostly low resource settings where doing a backwards pass is prohibitive for memory reasons, or with a very tight compute budget. But yeah, this isn’t a load bearing claim for me, I still count it as a partial victory to find a novel technique that’s a bit worse than fine tuning, and think this is significantly better than prior interp work. Seems reasonable to disagree though, and say you need to be better or bust
I don’t think we really engaged with that question in this post, so the following is fairly speculative. But I think there’s some situations where this would be a superior technique, mostly low resource settings where doing a backwards pass is prohibitive for memory reasons, or with a very tight compute budget. But yeah, this isn’t a load bearing claim for me, I still count it as a partial victory to find a novel technique that’s a bit worse than fine tuning, and think this is significantly better than prior interp work. Seems reasonable to disagree though, and say you need to be better or bust