Johannes C. Mayer comments on Why is o1 so deceptive?

Johannes C. Mayer 28 Sep 2024 12:14 UTC
LW: 2 AF: 1
0
AF
It seems potentially important to compare this to GPT4o. In my experience when asking GPT4 for research papers on particular subjects it seemed to make up non-existent research papers (at least I didn’t find them after multiple minutes of searching the web). I don’t have any precise statistics on this.
- abramdemski 3 Oct 2024 3:06 UTC
  LW: 5 AF: 2
  1
  AF Parent
  What I’m trying to express here is that it is surprising that o1 seems to explicitly encourage itself to fake links; not just that it fakes links. I agree that other models often hallucinate plausible references. What I haven’t seen before is a chain of thought which encourages this. Furthermore, while it’s plausible that you can solicit such a chain of thought from 4o under some circumstances, it seems a priori surprising that such behavior would survive at such a high rate in a model whose chain of thought has specifically been trained via rl to help produce correct answers. This leads me to guess the rl is badly misaligned.