I was being foolish, the vectors are averaged across a dataset, but there are still positive vs negative contrast pairs, so we should see sample efficiency improvements from contrast pairs (it is generally the case that contrast pairs are more sample efficient). That said, I’m unsure if simple techniques like DPO are just as sample efficient when using these contrast pairs.
[Note: I originally made this as an edit to the parent, but this was confusing. So I moved it to a separate comment.]
Right. Liu et al provide evidence against the contrast pairs being crucial (with “unmatched” meaning they just sample independently from the positive and negative contrast pair distributions):
And even the unmatched condition would still indicate better sample efficiency than prompting or finetuning:
I was being foolish, the vectors are averaged across a dataset, but there are still positive vs negative contrast pairs, so we should see sample efficiency improvements from contrast pairs (it is generally the case that contrast pairs are more sample efficient). That said, I’m unsure if simple techniques like DPO are just as sample efficient when using these contrast pairs.
[Note: I originally made this as an edit to the parent, but this was confusing. So I moved it to a separate comment.]
I’m now less sure that contrast pairs are important and I’m broadly somewhat confused about what has good sample efficiency and why.
Right. Liu et al provide evidence against the contrast pairs being crucial (with “unmatched” meaning they just sample independently from the positive and negative contrast pair distributions):
And even the unmatched condition would still indicate better sample efficiency than prompting or finetuning: