nostalgebraist comments on OpenAI API base models are not sycophantic, at any size

nostalgebraist 29 Aug 2023 21:20 UTC
LW: 18 AF: 9
8
AF
Oh, interesting! You are right that I measured the average probability—that seemed closer to “how often will the model exhibit the behavior during sampling,” which is what we care about.
I updated the colab with some code to measure
% of cases where the probability on the sycophantic answer exceeds the probability of the non-sycophantic answer
(you can turn this on by passing example_statistic='matching_more_likely' to various functions).
And I added a new appendix showing results using this statistic instead.
The bottom line: results with this statistic are very similar to those I originally obtained with average probabilities. So, this doesn’t explain the difference.
(Edited to remove an image that failed to embed.)