I’d be curious about how much more costly this attack is on LMs Pretrained with Human Preferences (including when that method is only applied to “a small fraction of pretraining tokens” as in PaLM 2).
I’d be curious about how much more costly this attack is on LMs Pretrained with Human Preferences (including when that method is only applied to “a small fraction of pretraining tokens” as in PaLM 2).