Insofar as you mean to imply that “negative vectors” are obviously comparable to our technique, I disagree. Those are not activation additions, and I would guess it’s not particularly similar to our approach. These “task vectors” involve subtracting weight vectors, not activation vectors. See also footnote 39 (EDIT: and the related work appendix now talks about this directly).
Page 4 of this paper compares negative vectors with fine-tuning for reducing toxic text: https://arxiv.org/pdf/2212.04089.pdf#page=4
In Table 3, they show in some cases task vectors can improve fine-tuned models.
Insofar as you mean to imply that “negative vectors” are obviously comparable to our technique, I disagree. Those are not activation additions, and I would guess it’s not particularly similar to our approach. These “task vectors” involve subtracting weight vectors, not activation vectors. See also footnote 39 (EDIT: and the related work appendix now talks about this directly).