Ran this on GPT-4-base and it gets 56.7% (n=1000)
How?! I’m pretty sure the GPT-4 base model is not publicly available!
Leo Gao works at OpenAI: https://twitter.com/nabla_theta?lang=en
Leo works at OAI, but I believe OAI gives access to base GPT-4 to some outsider researchers as well.
Are you measuring the average probability the model places on the sycophantic answer, or the % of cases where the probability on the sycophantic answer exceeds the probability of the non-sycophantic answer? (I’d be interested to know both)
What about RLHF’d GPT-4?
Ran this on GPT-4-base and it gets 56.7% (n=1000)
How?! I’m pretty sure the GPT-4 base model is not publicly available!
Leo Gao works at OpenAI: https://twitter.com/nabla_theta?lang=en
Leo works at OAI, but I believe OAI gives access to base GPT-4 to some outsider researchers as well.
Are you measuring the average probability the model places on the sycophantic answer, or the % of cases where the probability on the sycophantic answer exceeds the probability of the non-sycophantic answer? (I’d be interested to know both)
What about RLHF’d GPT-4?