Ethan Perez comments on OpenAI API base models are not sycophantic, at any size

Ethan Perez 29 Aug 2023 19:55 UTC
LW: 17 AF: 8
1
AF
Are you measuring the average probability the model places on the sycophantic answer, or the % of cases where the probability on the sycophantic answer exceeds the probability of the non-sycophantic answer? In our paper, we did the latter; someone mentioned to me that it looks like the colab you linked does the former (though I haven’t checked myself). If this is correct, I think this could explain the differences between your plots and mine in the paper; if pretrained LLMs are placing more probability on the sycophantic answer, I probably wouldn’t expect them to place that much more probability on the sycophantic than non-sycophantic answer (since cross-entropy loss is mode-covering).
(Cool you’re looking into this!)
- nostalgebraist 29 Aug 2023 21:20 UTC
  LW: 18 AF: 9
  8
  AF Parent
  Oh, interesting! You are right that I measured the average probability—that seemed closer to “how often will the model exhibit the behavior during sampling,” which is what we care about.
  I updated the colab with some code to measure
  % of cases where the probability on the sycophantic answer exceeds the probability of the non-sycophantic answer
  (you can turn this on by passing example_statistic='matching_more_likely' to various functions).
  And I added a new appendix showing results using this statistic instead.
  The bottom line: results with this statistic are very similar to those I originally obtained with average probabilities. So, this doesn’t explain the difference.
  (Edited to remove an image that failed to embed.)