mic comments on Mysteries of mode collapse

mic 17 Dec 2022 20:21 UTC
1 point
0
Does anyone know how well these instances of mode collapse can be reproduced using text-davinci-003? Are there notable differences in how it manifests for text-davinci-003 vs text-davinci-002? Given that text-davinci-002 was trained with supervised fine-tuning, whereas text-davinci-003 was trained with RLHF (according to the docs), it might be interesting to see whether these techniques have different failure modes.
- LawrenceC 17 Dec 2022 20:59 UTC
  2 points
  1
  Parent
  Some of the experiments are pretty easy to replicate, e.g. checking text-davinci-003’s favorite random number:
  
  Seems much closer to base davinci than to text-davinci-002’s mode collapse.
  I tried to replicate some of the other experiments, but it turns out that text-davinci-003 stops answering questions the same way as davinci/text-davinci-002, which probably means that the prompts have to be adjusted. For example, on the “roll a d6” test, text-davinci-003 assigns almost no probability to the numbers 1-6, and a lot of probability on things like X and ____: (you can fix this using logit_bias, but I’m not sure we should trust the relative ratios of incredibly unlikely tokens in the first place.)
  While both text-davinci-002 and davinci assign much high probabilities to the numbers than other options, and text-davinci-002 even assigns more than 73% chance to the token 6.