janus comments on An Unexpected GPT-3 Decision in a Simple Gamble

janus 25 Sep 2022 17:24 UTC
1 point
0
Is this text-davinci-002 or davinci?
- casualphysicsenjoyer 25 Sep 2022 17:57 UTC
  2 points
  0
  Parent
  text-davinci-002, updated with a link to github
- casualphysicsenjoyer 25 Sep 2022 17:42 UTC
  1 point
  0
  Parent
  text-davinci-002
  - janus 25 Sep 2022 20:27 UTC
    2 points
    0
    Parent
    What’s weird is the level of conviction of choice J—above 90%. I have no idea why this happens.
    text-davinci-002 is often extremely confident about its “predictions” for no apparent good reason (e.g. when generating “open-ended” text being ~99% confident about the exact phrasing)
    
    This is almost certainly due to the RLHF “Instruct” tuning text-davinci-002 has been subjected to. To whatever extent probabilities output by models trained with pure SSL can be assigned an epistemic interpretation (the model’s credence for the next token in a hypothetical training sample), that interpretation no longer holds for models modified by RLHF.