For what it’s worth, I buy the claim from Gwern that Microsoft trained Sydney pretty poorly, much worse than is achievable with SFT on highly rated data. For example, Sydney shows significant repetition, which you don’t see even on text-davinci-002 or (early 2022) LaMDA, both trained without RLHF.
Yep, I think it’s pretty plausible this is just a data-quality issue, though I find myself somewhat skeptical of this. Maybe worth a bet?
I would be happy to bet that conditional on them trying to solve this with more supervised training and no RLHF, we are going to see error modes substantially more catastrophic than current Chat-GPT.
For what it’s worth, I buy the claim from Gwern that Microsoft trained Sydney pretty poorly, much worse than is achievable with SFT on highly rated data. For example, Sydney shows significant repetition, which you don’t see even on
text-davinci-002
or (early 2022) LaMDA, both trained without RLHF.Yep, I think it’s pretty plausible this is just a data-quality issue, though I find myself somewhat skeptical of this. Maybe worth a bet?
I would be happy to bet that conditional on them trying to solve this with more supervised training and no RLHF, we are going to see error modes substantially more catastrophic than current Chat-GPT.