Maybe my original comment was unclear. I was making a claim of “evidently this has improved on whatever they did” and not “there’s no way for them to have done comparably well if they tried.”
I do expect this kind of technique to stack benefits on top of finetuning, making the techniques complementary. That is, if you consider the marginal improvement on some alignment metric on validation data, I expect the “effort required to increase metric via finetuning” and “effort via activation addition” to be correlated but not equal. Thus, I suspect that even after finetuning a model, there will be low- or medium-hanging activation additions which further boost alignment.
Maybe my original comment was unclear. I was making a claim of “evidently this has improved on whatever they did” and not “there’s no way for them to have done comparably well if they tried.”
I do expect this kind of technique to stack benefits on top of finetuning, making the techniques complementary. That is, if you consider the marginal improvement on some alignment metric on validation data, I expect the “effort required to increase metric via finetuning” and “effort via activation addition” to be correlated but not equal. Thus, I suspect that even after finetuning a model, there will be low- or medium-hanging activation additions which further boost alignment.