What I’m trying to express here is that it is surprising that o1 seems to explicitly encourage itself to fake links; not just that it fakes links. I agree that other models often hallucinate plausible references. What I haven’t seen before is a chain of thought which encourages this. Furthermore, while it’s plausible that you can solicit such a chain of thought from 4o under some circumstances, it seems a priori surprising that such behavior would survive at such a high rate in a model whose chain of thought has specifically been trained via rl to help produce correct answers. This leads me to guess the rl is badly misaligned.
What I’m trying to express here is that it is surprising that o1 seems to explicitly encourage itself to fake links; not just that it fakes links. I agree that other models often hallucinate plausible references. What I haven’t seen before is a chain of thought which encourages this. Furthermore, while it’s plausible that you can solicit such a chain of thought from 4o under some circumstances, it seems a priori surprising that such behavior would survive at such a high rate in a model whose chain of thought has specifically been trained via rl to help produce correct answers. This leads me to guess the rl is badly misaligned.