Does anyone know how well these instances of mode collapse can be reproduced using text-davinci-003? Are there notable differences in how it manifests for text-davinci-003 vs text-davinci-002? Given that text-davinci-002 was trained with supervised fine-tuning, whereas text-davinci-003 was trained with RLHF (according to the docs), it might be interesting to see whether these techniques have different failure modes.
Some of the experiments are pretty easy to replicate, e.g. checking text-davinci-003’s favorite random number:
Seems much closer to base davinci than to text-davinci-002’s mode collapse.
I tried to replicate some of the other experiments, but it turns out that text-davinci-003 stops answering questions the same way as davinci/text-davinci-002, which probably means that the prompts have to be adjusted. For example, on the “roll a d6” test, text-davinci-003 assigns almost no probability to the numbers 1-6, and a lot of probability on things like X and ____: (you can fix this using logit_bias, but I’m not sure we should trust the relative ratios of incredibly unlikely tokens in the first place.)
While both text-davinci-002 and davinci assign much high probabilities to the numbers than other options, and text-davinci-002 even assigns more than 73% chance to the token 6.
Does anyone know how well these instances of mode collapse can be reproduced using text-davinci-003? Are there notable differences in how it manifests for text-davinci-003 vs text-davinci-002? Given that text-davinci-002 was trained with supervised fine-tuning, whereas text-davinci-003 was trained with RLHF (according to the docs), it might be interesting to see whether these techniques have different failure modes.
Some of the experiments are pretty easy to replicate, e.g. checking
text-davinci-003
’s favorite random number:Seems much closer to base
davinci
than totext-davinci-002
’s mode collapse.I tried to replicate some of the other experiments, but it turns out that text-davinci-003 stops answering questions the same way as
davinci
/text-davinci-002
, which probably means that the prompts have to be adjusted. For example, on the “roll a d6” test,text-davinci-003
assigns almost no probability to the numbers 1-6, and a lot of probability on things like X and ____: (you can fix this using logit_bias, but I’m not sure we should trust the relative ratios of incredibly unlikely tokens in the first place.)While both
text-davinci-002
anddavinci
assign much high probabilities to the numbers than other options, andtext-davinci-002
even assigns more than 73% chance to the token 6.