Due to the results noted in in TurnTrout’s comment here from Liu et al., I now don’t think the action is mostly coming from contrast pairs (in at least some cases).
So, there is higher sample efficiency for activation engineering stuff over LoRA finetuning in some cases.[1]
(Though it feels to me like there should be some more principled SGD style method which captures the juice.)
Up to methodological error in learning rates etc.
Due to the results noted in in TurnTrout’s comment here from Liu et al., I now don’t think the action is mostly coming from contrast pairs (in at least some cases).
So, there is higher sample efficiency for activation engineering stuff over LoRA finetuning in some cases.[1]
(Though it feels to me like there should be some more principled SGD style method which captures the juice.)
Up to methodological error in learning rates etc.