Erik Jenner comments on RLHF does not appear to differentially cause mode-collapse

Erik Jenner 20 Mar 2023 17:14 UTC
7 points
2
we compare the base model (davinci) with the supervised fine-tuned model (text-davinci-002) and the RLHF model (text-davinci-003)
The base model for text-davinci-002 and −003 is code-davinci-002, not davinci. So that would seem to be the better comparison unless I’m missing something.
- Arthur Conmy 20 Mar 2023 17:59 UTC
  4 points
  0
  Parent
  ~~Thanks—this doesn’t seem to change observations much, except there doesn’t seem to be a case where this model has starkly the lowest entropy, as we found with davinci~~
  EDIT: I added code-davinci-002 as the main focus of the post, thanks!
- cubefox 21 Mar 2023 22:26 UTC
  3 points
  0
  Parent
  And now OpenAI is removing access to code-davinci-002, the GPT-3.5 foundation model: https://twitter.com/deepfates/status/1638212305887567873
  
  The GPT-4 base model will apparently also not be available via the API. So it seems the most powerful publicly available foundation model is now Facebook’s leaked LLaMA.