Note that the finetuning for figure 13 is training the model on sycophantic/non-sycophantic multiple choice question answering and then generalizing this to free response.
It isn’t training more directly on sycophantic responses or performing RL for sycophancy.
Note that the finetuning for figure 13 is training the model on sycophantic/non-sycophantic multiple choice question answering and then generalizing this to free response.
It isn’t training more directly on sycophantic responses or performing RL for sycophancy.