It seems potentially important to compare this to GPT4o. In my experience when asking GPT4 for research papers on particular subjects it seemed to make up non-existent research papers (at least I didn’t find them after multiple minutes of searching the web). I don’t have any precise statistics on this.
What I’m trying to express here is that it is surprising that o1 seems to explicitly encourage itself to fake links; not just that it fakes links. I agree that other models often hallucinate plausible references. What I haven’t seen before is a chain of thought which encourages this. Furthermore, while it’s plausible that you can solicit such a chain of thought from 4o under some circumstances, it seems a priori surprising that such behavior would survive at such a high rate in a model whose chain of thought has specifically been trained via rl to help produce correct answers. This leads me to guess the rl is badly misaligned.
It seems potentially important to compare this to GPT4o. In my experience when asking GPT4 for research papers on particular subjects it seemed to make up non-existent research papers (at least I didn’t find them after multiple minutes of searching the web). I don’t have any precise statistics on this.
What I’m trying to express here is that it is surprising that o1 seems to explicitly encourage itself to fake links; not just that it fakes links. I agree that other models often hallucinate plausible references. What I haven’t seen before is a chain of thought which encourages this. Furthermore, while it’s plausible that you can solicit such a chain of thought from 4o under some circumstances, it seems a priori surprising that such behavior would survive at such a high rate in a model whose chain of thought has specifically been trained via rl to help produce correct answers. This leads me to guess the rl is badly misaligned.